File size: 3,487 Bytes
5ca5118
 
 
 
 
 
 
58be95d
5ca5118
 
 
 
 
58be95d
5ca5118
 
58be95d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ca5118
 
58be95d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
language: id
pipeline_tag: dependency-parsing # Tag yang benar untuk tugas model ini
widget:
  # Contoh kalimat yang bisa diproses model ini
  - text: "Presiden Joko Widodo mengunjungi korban bencana alam di Palu."
license: mit # Sesuai input Anda
library_name: spacy # Pastikan tidak ada karakter aneh setelah 'spacy'
tags:
  - id
  - spacy
  - dependency-parsing
  - indonesian
  - gsd # Karena menggunakan corpus GSD
model-index: # Bagian ini membantu pengindeksan lebih lanjut (opsional tapi bagus)
  - name: spacy-dep-parsing-id # Nama model/repo Anda
    results: # Menambahkan hasil evaluasi utama di sini
      - task:
          type: dependency-parsing
          name: Dependency Parsing
        dataset:
          type: ud-id-gsd # Mengacu pada dataset test
          name: UD Indonesian GSD (Test Split)
          config: test
          split: test
          revision: main # atau commit hash jika spesifik
        metrics:
          - type: dep_uas
            value: 0.8282 # UAS pada test set
            name: UAS (Unlabeled Attachment Score)
          - type: dep_las
            value: 0.7436 # LAS pada test set
            name: LAS (Labeled Attachment Score)
          - type: sents_f
            value: 0.9937 # Sentence F-score pada test set
            name: Sentence F-Score
---

# spaCy Dependency Parsing Model for Indonesian (UD-ID-GSD)

This repository contains a spaCy v3 model trained for **Dependency Parsing** on the Indonesian language. The model was trained using the configuration generated by `spacy init config` with default settings for the parser component.

## Dataset

The model was trained on the **Universal Dependencies Indonesian GSD (UD-ID-GSD)** dataset.
*(Reference: McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bednářová, Z., Wang, S., & Lee, Y. (2013). Universal Dependency Annotation for Multilingual Parsing.)*

The dataset splits used contained the following number of documents:
*   **Total Sentences (approx.):** 5,593
*   **Training Set:** 4,477 documents
*   **Development (Dev) Set:** 559 documents
*   **Test Set:** 557 documents

## Pipeline Components

This model's pipeline only contains the `parser` component. It does **not** include a tagger, NER, or other components by default. The parser relies on internal token-to-vector embeddings trained during the process.

## How to Use

You can load this model directly using spaCy after installing it:

```python
import spacy

# Load the model from Hugging Face Hub
model_id = "freksowibowo/spacy-dep-parsing-id"
try:
    nlp = spacy.load(model_id)
    print(f"Model '{model_id}' loaded successfully.")

    # Example usage
    text = "Gubernur Jawa Barat Ridwan Kamil meresmikan jembatan baru di Cirebon."
    doc = nlp(text)

    print("\nDependency Parse Results:")
    print(f"{'Token':<15} {'Relation':<10} {'Head':<15} {'Head POS':<8}")
    print("-" * 50)
    for token in doc:
        print(f"{token.text:<15} {token.dep_:<10} {token.head.text:<15} {token.head.pos_:<8}")

    # You can also visualize using displacy (if in Jupyter/IPython)
    # from spacy import displacy
    # displacy.render(doc, style="dep", jupyter=True, options={'distance': 100})

except OSError:
    print(f"Error: Model '{model_id}' not found.")
    print("Please ensure you have internet connection and the repository ID is correct.")
except Exception as e:
    print(f"An error occurred: {e}")