spaCy Dependency Parsing Model for Indonesian (UD-ID-GSD)
This repository contains a spaCy v3 model trained for Dependency Parsing on the Indonesian language. The model was trained using the configuration generated by spacy init config
with default settings for the parser component.
Dataset
The model was trained on the Universal Dependencies Indonesian GSD (UD-ID-GSD) dataset. (Reference: McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bednářová, Z., Wang, S., & Lee, Y. (2013). Universal Dependency Annotation for Multilingual Parsing.)
The dataset splits used contained the following number of documents:
- Total Sentences (approx.): 5,593
- Training Set: 4,477 documents
- Development (Dev) Set: 559 documents
- Test Set: 557 documents
Pipeline Components
This model's pipeline only contains the parser
component. It does not include a tagger, NER, or other components by default. The parser relies on internal token-to-vector embeddings trained during the process.
How to Use
You can load this model directly using spaCy after installing it:
import spacy
# Load the model from Hugging Face Hub
model_id = "freksowibowo/spacy-dep-parsing-id"
try:
nlp = spacy.load(model_id)
print(f"Model '{model_id}' loaded successfully.")
# Example usage
text = "Gubernur Jawa Barat Ridwan Kamil meresmikan jembatan baru di Cirebon."
doc = nlp(text)
print("\nDependency Parse Results:")
print(f"{'Token':<15} {'Relation':<10} {'Head':<15} {'Head POS':<8}")
print("-" * 50)
for token in doc:
print(f"{token.text:<15} {token.dep_:<10} {token.head.text:<15} {token.head.pos_:<8}")
# You can also visualize using displacy (if in Jupyter/IPython)
# from spacy import displacy
# displacy.render(doc, style="dep", jupyter=True, options={'distance': 100})
except OSError:
print(f"Error: Model '{model_id}' not found.")
print("Please ensure you have internet connection and the repository ID is correct.")
except Exception as e:
print(f"An error occurred: {e}")
- Downloads last month
- 0
Evaluation results
- UAS (Unlabeled Attachment Score) on UD Indonesian GSD (Test Split)test set self-reported0.828
- LAS (Labeled Attachment Score) on UD Indonesian GSD (Test Split)test set self-reported0.744
- Sentence F-Score on UD Indonesian GSD (Test Split)test set self-reported0.994