File size: 3,487 Bytes
5ca5118 58be95d 5ca5118 58be95d 5ca5118 58be95d 5ca5118 58be95d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
language: id
pipeline_tag: dependency-parsing # Tag yang benar untuk tugas model ini
widget:
# Contoh kalimat yang bisa diproses model ini
- text: "Presiden Joko Widodo mengunjungi korban bencana alam di Palu."
license: mit # Sesuai input Anda
library_name: spacy # Pastikan tidak ada karakter aneh setelah 'spacy'
tags:
- id
- spacy
- dependency-parsing
- indonesian
- gsd # Karena menggunakan corpus GSD
model-index: # Bagian ini membantu pengindeksan lebih lanjut (opsional tapi bagus)
- name: spacy-dep-parsing-id # Nama model/repo Anda
results: # Menambahkan hasil evaluasi utama di sini
- task:
type: dependency-parsing
name: Dependency Parsing
dataset:
type: ud-id-gsd # Mengacu pada dataset test
name: UD Indonesian GSD (Test Split)
config: test
split: test
revision: main # atau commit hash jika spesifik
metrics:
- type: dep_uas
value: 0.8282 # UAS pada test set
name: UAS (Unlabeled Attachment Score)
- type: dep_las
value: 0.7436 # LAS pada test set
name: LAS (Labeled Attachment Score)
- type: sents_f
value: 0.9937 # Sentence F-score pada test set
name: Sentence F-Score
---
# spaCy Dependency Parsing Model for Indonesian (UD-ID-GSD)
This repository contains a spaCy v3 model trained for **Dependency Parsing** on the Indonesian language. The model was trained using the configuration generated by `spacy init config` with default settings for the parser component.
## Dataset
The model was trained on the **Universal Dependencies Indonesian GSD (UD-ID-GSD)** dataset.
*(Reference: McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bednářová, Z., Wang, S., & Lee, Y. (2013). Universal Dependency Annotation for Multilingual Parsing.)*
The dataset splits used contained the following number of documents:
* **Total Sentences (approx.):** 5,593
* **Training Set:** 4,477 documents
* **Development (Dev) Set:** 559 documents
* **Test Set:** 557 documents
## Pipeline Components
This model's pipeline only contains the `parser` component. It does **not** include a tagger, NER, or other components by default. The parser relies on internal token-to-vector embeddings trained during the process.
## How to Use
You can load this model directly using spaCy after installing it:
```python
import spacy
# Load the model from Hugging Face Hub
model_id = "freksowibowo/spacy-dep-parsing-id"
try:
nlp = spacy.load(model_id)
print(f"Model '{model_id}' loaded successfully.")
# Example usage
text = "Gubernur Jawa Barat Ridwan Kamil meresmikan jembatan baru di Cirebon."
doc = nlp(text)
print("\nDependency Parse Results:")
print(f"{'Token':<15} {'Relation':<10} {'Head':<15} {'Head POS':<8}")
print("-" * 50)
for token in doc:
print(f"{token.text:<15} {token.dep_:<10} {token.head.text:<15} {token.head.pos_:<8}")
# You can also visualize using displacy (if in Jupyter/IPython)
# from spacy import displacy
# displacy.render(doc, style="dep", jupyter=True, options={'distance': 100})
except OSError:
print(f"Error: Model '{model_id}' not found.")
print("Please ensure you have internet connection and the repository ID is correct.")
except Exception as e:
print(f"An error occurred: {e}") |