metadata

language: id
pipeline_tag: dependency-parsing
widget:
  - text: Presiden Joko Widodo mengunjungi korban bencana alam di Palu.
license: mit
library_name: spacy
tags:
  - id
  - spacy
  - dependency-parsing
  - indonesian
  - gsd
model-index:
  - name: spacy-dep-parsing-id
    results:
      - task:
          type: dependency-parsing
          name: Dependency Parsing
        dataset:
          type: ud-id-gsd
          name: UD Indonesian GSD (Test Split)
          config: test
          split: test
          revision: main
        metrics:
          - type: dep_uas
            value: 0.8282
            name: UAS (Unlabeled Attachment Score)
          - type: dep_las
            value: 0.7436
            name: LAS (Labeled Attachment Score)
          - type: sents_f
            value: 0.9937
            name: Sentence F-Score

spaCy Dependency Parsing Model for Indonesian (UD-ID-GSD)

This repository contains a spaCy v3 model trained for Dependency Parsing on the Indonesian language. The model was trained using the configuration generated by spacy init config with default settings for the parser component.

Dataset

The model was trained on the Universal Dependencies Indonesian GSD (UD-ID-GSD) dataset. (Reference: McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bednářová, Z., Wang, S., & Lee, Y. (2013). Universal Dependency Annotation for Multilingual Parsing.)

The dataset splits used contained the following number of documents:

Total Sentences (approx.): 5,593
Training Set: 4,477 documents
Development (Dev) Set: 559 documents
Test Set: 557 documents

Pipeline Components

This model's pipeline only contains the parser component. It does not include a tagger, NER, or other components by default. The parser relies on internal token-to-vector embeddings trained during the process.

How to Use

You can load this model directly using spaCy after installing it:

import spacy

# Load the model from Hugging Face Hub
model_id = "freksowibowo/spacy-dep-parsing-id"
try:
    nlp = spacy.load(model_id)
    print(f"Model '{model_id}' loaded successfully.")

    # Example usage
    text = "Gubernur Jawa Barat Ridwan Kamil meresmikan jembatan baru di Cirebon."
    doc = nlp(text)

    print("\nDependency Parse Results:")
    print(f"{'Token':<15} {'Relation':<10} {'Head':<15} {'Head POS':<8}")
    print("-" * 50)
    for token in doc:
        print(f"{token.text:<15} {token.dep_:<10} {token.head.text:<15} {token.head.pos_:<8}")

    # You can also visualize using displacy (if in Jupyter/IPython)
    # from spacy import displacy
    # displacy.render(doc, style="dep", jupyter=True, options={'distance': 100})

except OSError:
    print(f"Error: Model '{model_id}' not found.")
    print("Please ensure you have internet connection and the repository ID is correct.")
except Exception as e:
    print(f"An error occurred: {e}")