File size: 8,728 Bytes
4375796
 
 
76a269c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ecf433d
76a269c
 
ecf433d
76a269c
 
ecf433d
76a269c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c171767
 
 
76a269c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c171767
76a269c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c171767
 
 
76a269c
 
 
 
 
c171767
76a269c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
---
language:
- ka
- en
license: apache-2.0
tags:
- translation
- evaluation
- comet
- mt-evaluation
- georgian
metrics:
- kendall_tau
- spearman_correlation
- pearson_correlation
model-index:
- name: Georgian-COMET
  results:
  - task:
      type: translation-evaluation
      name: Machine Translation Evaluation
    dataset:
      name: Georgian MT Evaluation Dataset
      type: Darsala/georgian_metric_evaluation
    metrics:
    - type: pearson_correlation
      value: 0.876
      name: Pearson Correlation
    - type: spearman_correlation
      value: 0.773
      name: Spearman Correlation
    - type: kendall_tau
      value: 0.579
      name: Kendall's Tau
base_model: Unbabel/wmt22-comet-da
datasets:
- Darsala/georgian_metric_evaluation
---

# Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation

This is a [COMET](https://github.com/Unbabel/COMET) evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.

## Model Description

Georgian-COMET is a fine-tuned version of [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations.

### Key Improvements over Base Model

| Metric | Base COMET | Georgian-COMET | Improvement |
|--------|------------|----------------|-------------|
| Pearson | 0.867 | **0.876** | +0.9% |
| Spearman | 0.759 | **0.773** | +1.4% |
| Kendall | 0.564 | **0.579** | +1.5% |

## Paper

- **Base Model Paper**: [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https://aclanthology.org/2022.wmt-1.52) (Rei et al., WMT 2022)
- **This Model**: Paper coming soon

## Repository

[https://github.com/LukaDarsalia/nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research)

## License

Apache-2.0

## Usage (unbabel-comet)

Using this model requires unbabel-comet to be installed:

```bash
pip install --upgrade pip  # ensures that pip is current 
pip install unbabel-comet
```

### Option 1: Direct Download from HuggingFace

```python
from comet import load_from_checkpoint
import requests
import os

# Download the model checkpoint
model_path = download_model("Darsala/georgian_comet")

# Load the model
model = load_from_checkpoint(model_path)

# Prepare your data
data = [
    {
        "src": "The cat sat on the mat.",
        "mt": "แƒ™แƒแƒขแƒ แƒ–แƒ˜แƒก แƒฎแƒแƒšแƒ˜แƒฉแƒแƒ–แƒ”.",
        "ref": "แƒ™แƒแƒขแƒ แƒ˜แƒฏแƒ“แƒ แƒฎแƒแƒšแƒ˜แƒฉแƒแƒ–แƒ”."
    },
    {
        "src": "Schools and kindergartens were opened.",
        "mt": "แƒกแƒ™แƒแƒšแƒ”แƒ‘แƒ˜ แƒ“แƒ แƒกแƒแƒ‘แƒแƒ•แƒจแƒ•แƒ แƒ‘แƒแƒฆแƒ”แƒ‘แƒ˜ แƒ’แƒแƒ˜แƒฎแƒกแƒœแƒ.",
        "ref": "แƒ’แƒแƒ˜แƒฎแƒกแƒœแƒ แƒกแƒ™แƒแƒšแƒ”แƒ‘แƒ˜ แƒ“แƒ แƒกแƒแƒ‘แƒแƒ•แƒจแƒ•แƒ แƒ‘แƒแƒฆแƒ”แƒ‘แƒ˜."
    }
]

# Get predictions
model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)
```

### Option 2: Using comet CLI

First download the model checkpoint:
```bash
wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt
```

Then use it with comet CLI:
```bash
comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt
```

### Option 3: Integration with Evaluation Pipeline

```python
from comet import load_from_checkpoint
import pandas as pd

# Load model
model = load_from_checkpoint("georgian_comet.ckpt")

# Load your evaluation data
df = pd.read_csv("your_evaluation_data.csv")

# Prepare data in COMET format
data = [
    {
        "src": row["sourceText"],
        "mt": row["targetText"],
        "ref": row["referenceText"]
    }
    for _, row in df.iterrows()
]

# Get scores
scores = model.predict(data, batch_size=16)
print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}")
```

## Intended Uses

This model is intended to be used for **English-Georgian MT evaluation**. 

Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation.

### Primary Use Cases

1. **MT System Development**: Evaluate and compare different English-Georgian MT systems
2. **Quality Assurance**: Automated quality checks for Georgian translations
3. **Research**: Study MT evaluation for morphologically rich languages like Georgian
4. **Production Monitoring**: Track translation quality in production environments

### Out-of-Scope Use

- **Other Language Pairs**: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs
- **Reference-Free Evaluation**: The model requires reference translations
- **Document-Level**: Optimized for sentence-level evaluation

## Training Details

### Training Data

- **Dataset**: 5,000 English-Georgian pairs from [corp.dict.ge](https://corp.dict.ge/)
- **MT Systems**: Translations from SMaLL-100, Google Translate, and Ucraft Translate
- **Scoring Method**: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (ฯƒ=3)
- **Details**: See [Darsala/georgian_metric_evaluation](https://huggingface.co/datasets/Darsala/georgian_metric_evaluation)

### Training Configuration

```yaml
regression_metric:
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 1.5e-05
    learning_rate: 1.5e-05
    loss: mse
    dropout: 0.1
    batch_size: 8
```

### Training Procedure

1. **Base Model**: Started from Unbabel/wmt22-comet-da checkpoint
2. **Knowledge Distillation**: Used Claude Sonnet 4 scores as training targets
3. **Robustness**: Added Gaussian noise to training scores to prevent overfitting
4. **Optimization**: 8 epochs with early stopping (patience=4) on validation Kendall's tau

## Evaluation Results

### Test Set Performance

Evaluated on 400 human-annotated English-Georgian translation pairs:

| Metric | Score | p-value |
|--------|-------|---------|
| Pearson | 0.876 | < 0.001 |
| Spearman | 0.773 | < 0.001 |
| Kendall | 0.579 | < 0.001 |

### Comparison with Other Metrics

| Metric | Pearson | Spearman | Kendall |
|--------|---------|----------|---------|
| **Georgian-COMET** | **0.876** | 0.773 | 0.579 |
| Base COMET | 0.867 | 0.759 | 0.564 |
| LLM-Reference-Based | 0.852 | **0.798** | **0.660** |
| CHRF++ | 0.739 | 0.690 | 0.498 |
| TER | 0.466 | 0.443 | 0.311 |
| BLEU | 0.413 | 0.497 | 0.344 |

## Languages Covered

While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for:
- **Source Language**: English (en)
- **Target Language**: Georgian (ka)

For other language pairs, we recommend using the base [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) model.

## Limitations

1. **Language Specific**: Optimized only for Englishโ†’Georgian evaluation
2. **Domain**: Training data primarily from corp.dict.ge (general/literary domain)
3. **Reference Required**: Cannot perform reference-free evaluation
4. **Sentence Level**: Not optimized for document-level evaluation

## Citation

If you use this model, please cite:

```bibtex
@misc{georgian-comet-2025,
  title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation},
  author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/Darsala/georgian_comet}
}

@inproceedings{rei-etal-2022-comet,
  title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task",
  author = "Rei, Ricardo  and
    C. de Souza, Jos{\'e} G.  and
    Alves, Duarte  and
    Zerva, Chrysoula  and
    Farinha, Ana C  and
    Glushkova, Taisiya  and
    Lavie, Alon  and
    Coheur, Luisa  and
    Martins, Andr{\'e} F. T.",
  booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
  year = "2022",
  address = "Abu Dhabi, United Arab Emirates",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2022.wmt-1.52",
  pages = "578--585",
}
```

## Acknowledgments

- [Unbabel](https://unbabel.com/) team for the base COMET model
- [Anthropic](https://anthropic.com/) for Claude Sonnet 4 used in knowledge distillation
- [corp.dict.ge](https://corp.dict.ge/) for the Georgian-English corpus
- All contributors to the [nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research) project