File size: 8,728 Bytes
4375796 76a269c ecf433d 76a269c ecf433d 76a269c ecf433d 76a269c c171767 76a269c c171767 76a269c c171767 76a269c c171767 76a269c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
---
language:
- ka
- en
license: apache-2.0
tags:
- translation
- evaluation
- comet
- mt-evaluation
- georgian
metrics:
- kendall_tau
- spearman_correlation
- pearson_correlation
model-index:
- name: Georgian-COMET
results:
- task:
type: translation-evaluation
name: Machine Translation Evaluation
dataset:
name: Georgian MT Evaluation Dataset
type: Darsala/georgian_metric_evaluation
metrics:
- type: pearson_correlation
value: 0.876
name: Pearson Correlation
- type: spearman_correlation
value: 0.773
name: Spearman Correlation
- type: kendall_tau
value: 0.579
name: Kendall's Tau
base_model: Unbabel/wmt22-comet-da
datasets:
- Darsala/georgian_metric_evaluation
---
# Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation
This is a [COMET](https://github.com/Unbabel/COMET) evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.
## Model Description
Georgian-COMET is a fine-tuned version of [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations.
### Key Improvements over Base Model
| Metric | Base COMET | Georgian-COMET | Improvement |
|--------|------------|----------------|-------------|
| Pearson | 0.867 | **0.876** | +0.9% |
| Spearman | 0.759 | **0.773** | +1.4% |
| Kendall | 0.564 | **0.579** | +1.5% |
## Paper
- **Base Model Paper**: [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https://aclanthology.org/2022.wmt-1.52) (Rei et al., WMT 2022)
- **This Model**: Paper coming soon
## Repository
[https://github.com/LukaDarsalia/nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research)
## License
Apache-2.0
## Usage (unbabel-comet)
Using this model requires unbabel-comet to be installed:
```bash
pip install --upgrade pip # ensures that pip is current
pip install unbabel-comet
```
### Option 1: Direct Download from HuggingFace
```python
from comet import load_from_checkpoint
import requests
import os
# Download the model checkpoint
model_path = download_model("Darsala/georgian_comet")
# Load the model
model = load_from_checkpoint(model_path)
# Prepare your data
data = [
{
"src": "The cat sat on the mat.",
"mt": "แแแขแ แแแก แฎแแแแฉแแแ.",
"ref": "แแแขแ แแฏแแ แฎแแแแฉแแแ."
},
{
"src": "Schools and kindergartens were opened.",
"mt": "แกแแแแแแ แแ แกแแแแแจแแ แแแฆแแแ แแแแฎแกแแ.",
"ref": "แแแแฎแกแแ แกแแแแแแ แแ แกแแแแแจแแ แแแฆแแแ."
}
]
# Get predictions
model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)
```
### Option 2: Using comet CLI
First download the model checkpoint:
```bash
wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt
```
Then use it with comet CLI:
```bash
comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt
```
### Option 3: Integration with Evaluation Pipeline
```python
from comet import load_from_checkpoint
import pandas as pd
# Load model
model = load_from_checkpoint("georgian_comet.ckpt")
# Load your evaluation data
df = pd.read_csv("your_evaluation_data.csv")
# Prepare data in COMET format
data = [
{
"src": row["sourceText"],
"mt": row["targetText"],
"ref": row["referenceText"]
}
for _, row in df.iterrows()
]
# Get scores
scores = model.predict(data, batch_size=16)
print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}")
```
## Intended Uses
This model is intended to be used for **English-Georgian MT evaluation**.
Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation.
### Primary Use Cases
1. **MT System Development**: Evaluate and compare different English-Georgian MT systems
2. **Quality Assurance**: Automated quality checks for Georgian translations
3. **Research**: Study MT evaluation for morphologically rich languages like Georgian
4. **Production Monitoring**: Track translation quality in production environments
### Out-of-Scope Use
- **Other Language Pairs**: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs
- **Reference-Free Evaluation**: The model requires reference translations
- **Document-Level**: Optimized for sentence-level evaluation
## Training Details
### Training Data
- **Dataset**: 5,000 English-Georgian pairs from [corp.dict.ge](https://corp.dict.ge/)
- **MT Systems**: Translations from SMaLL-100, Google Translate, and Ucraft Translate
- **Scoring Method**: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (ฯ=3)
- **Details**: See [Darsala/georgian_metric_evaluation](https://huggingface.co/datasets/Darsala/georgian_metric_evaluation)
### Training Configuration
```yaml
regression_metric:
init_args:
nr_frozen_epochs: 0.3
keep_embeddings_frozen: True
optimizer: AdamW
encoder_learning_rate: 1.5e-05
learning_rate: 1.5e-05
loss: mse
dropout: 0.1
batch_size: 8
```
### Training Procedure
1. **Base Model**: Started from Unbabel/wmt22-comet-da checkpoint
2. **Knowledge Distillation**: Used Claude Sonnet 4 scores as training targets
3. **Robustness**: Added Gaussian noise to training scores to prevent overfitting
4. **Optimization**: 8 epochs with early stopping (patience=4) on validation Kendall's tau
## Evaluation Results
### Test Set Performance
Evaluated on 400 human-annotated English-Georgian translation pairs:
| Metric | Score | p-value |
|--------|-------|---------|
| Pearson | 0.876 | < 0.001 |
| Spearman | 0.773 | < 0.001 |
| Kendall | 0.579 | < 0.001 |
### Comparison with Other Metrics
| Metric | Pearson | Spearman | Kendall |
|--------|---------|----------|---------|
| **Georgian-COMET** | **0.876** | 0.773 | 0.579 |
| Base COMET | 0.867 | 0.759 | 0.564 |
| LLM-Reference-Based | 0.852 | **0.798** | **0.660** |
| CHRF++ | 0.739 | 0.690 | 0.498 |
| TER | 0.466 | 0.443 | 0.311 |
| BLEU | 0.413 | 0.497 | 0.344 |
## Languages Covered
While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for:
- **Source Language**: English (en)
- **Target Language**: Georgian (ka)
For other language pairs, we recommend using the base [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) model.
## Limitations
1. **Language Specific**: Optimized only for EnglishโGeorgian evaluation
2. **Domain**: Training data primarily from corp.dict.ge (general/literary domain)
3. **Reference Required**: Cannot perform reference-free evaluation
4. **Sentence Level**: Not optimized for document-level evaluation
## Citation
If you use this model, please cite:
```bibtex
@misc{georgian-comet-2025,
title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation},
author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/Darsala/georgian_comet}
}
@inproceedings{rei-etal-2022-comet,
title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task",
author = "Rei, Ricardo and
C. de Souza, Jos{\'e} G. and
Alves, Duarte and
Zerva, Chrysoula and
Farinha, Ana C and
Glushkova, Taisiya and
Lavie, Alon and
Coheur, Luisa and
Martins, Andr{\'e} F. T.",
booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.wmt-1.52",
pages = "578--585",
}
```
## Acknowledgments
- [Unbabel](https://unbabel.com/) team for the base COMET model
- [Anthropic](https://anthropic.com/) for Claude Sonnet 4 used in knowledge distillation
- [corp.dict.ge](https://corp.dict.ge/) for the Georgian-English corpus
- All contributors to the [nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research) project |