File size: 6,362 Bytes
4f9613b
6432040
 
 
 
4f9613b
b923990
4f9613b
6432040
 
 
 
 
 
 
 
 
 
 
 
 
4f9613b
 
6432040
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
title: Siswati-English Linguistic Translation Tool
emoji: πŸ”¬
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.33.2
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - translation
  - siswati
  - linguistics
  - african-languages
  - nlp
  - research
  - corpus-analysis
  - bantu-languages
  - m2m100
  - multilingual
---

# πŸ”¬ Siswati-English Linguistic Translation Tool

An advanced AI-powered translation system with comprehensive linguistic analysis features, designed specifically for linguists, researchers, and language documentation projects working with Siswati and English.

## 🌟 Features

### πŸ”„ Translation Capabilities
- **Bidirectional Translation**: High-quality English ↔ Siswati translation
- **Advanced Model Architecture**: Built on M2M100 transformer models
- **Batch Processing**: Process multiple texts simultaneously for corpus analysis
- **Real-time Analysis**: Instant linguistic metrics and feature detection

### πŸ“Š Linguistic Analysis
- **Morphological Complexity**: Word length, sentence structure analysis
- **Lexical Diversity**: Vocabulary richness measurements
- **Language-Specific Features**: Siswati agglutination, click consonants, tone markers
- **Translation Ratios**: Comparative analysis between source and target languages
- **Statistical Metrics**: Character count, word count, sentence segmentation

### πŸ”¬ Research Tools
- **Translation History**: Track and analyze translation patterns over time
- **CSV Export**: Research-ready data export for statistical analysis
- **Corpus Management**: Batch processing for linguistic corpora
- **Performance Metrics**: Processing time and efficiency tracking

## πŸ—£οΈ About Siswati

**Siswati** (also known as **Swati** or **Swazi**) is a Bantu language spoken by approximately 2.3 million people, primarily in:
- πŸ‡ΈπŸ‡Ώ **Eswatini** (Kingdom of Eswatini) - Official language
- πŸ‡ΏπŸ‡¦ **South Africa** - One of 11 official languages

### Linguistic Features
- **Language Family**: Niger-Congo β†’ Bantu β†’ Southeast Bantu
- **Script**: Latin alphabet
- **Characteristics**: Agglutinative morphology, click consonants, tonal
- **ISO Code**: ss (ISO 639-1), ssw (ISO 639-3)

## πŸ€– Model Information

This tool uses state-of-the-art transformer models developed by the **Data Science for Social Impact Research Group**:

- **English β†’ Siswati**: `dsfsi/en-ss-m2m100-combo`
- **Siswati β†’ English**: `dsfsi/ss-en-m2m100-combo`

Both models are based on Meta's M2M100 architecture, fine-tuned specifically for Siswati-English translation pairs.

## 🎯 Use Cases

### For Linguists & Researchers
- **Language Documentation**: Analyze translation patterns and linguistic features
- **Corpus Studies**: Process large text collections with batch translation
- **Comparative Analysis**: Study morphological and syntactic differences
- **Quality Assessment**: Evaluate translation adequacy and fluency

### For Educators & Students
- **Language Learning**: Understand translation patterns and linguistic structures
- **Academic Research**: Export data for statistical analysis and publications
- **Computational Linguistics**: Study machine translation for low-resource languages

### For Community & Cultural Projects
- **Language Preservation**: Support Siswati language documentation efforts
- **Cultural Exchange**: Facilitate communication between English and Siswati speakers
- **Content Translation**: Assist in translating educational and cultural materials

## πŸš€ Getting Started

1. **Single Translation**: Enter text and select translation direction
2. **Batch Processing**: Upload `.txt` files or paste multiple lines for corpus analysis
3. **Analysis Export**: Use the research tools to export translation data as CSV
4. **Linguistic Study**: Explore the real-time analysis features for detailed insights

## πŸ“ˆ Linguistic Metrics Explained

### Text Complexity
- **Word Count**: Total number of words in the text
- **Character Count**: Total characters including spaces and punctuation
- **Sentence Count**: Number of sentences detected
- **Average Word Length**: Mean character length per word
- **Lexical Diversity**: Ratio of unique words to total words (vocabulary richness)

### Translation Analysis
- **Word Ratio**: Target word count / Source word count
- **Character Ratio**: Target character count / Source character count
- **Processing Time**: Time taken for model inference

### Siswati-Specific Features
- **Agglutination Detection**: Identification of potentially agglutinated words (>10 characters)
- **Click Consonants**: Count of clicks (c, q, x sounds)
- **Tone Markers**: Detection of acute (́) and grave (Μ€) accent marks

## πŸ“š Academic Usage

If you use this tool in your research, please cite the original models:

```bibtex
@misc{dsfsi-siswati-translation,
  title={Siswati-English Translation Models},
  author={Marivate, Vukosi and Lastrucci, Richard},
  year={2024},
  publisher={Data Science for Social Impact Research Group},
  url={https://github.com/dsfsi/}
}
```

## πŸ”— Related Resources

- **Model Repositories**: [En-Ss Model](https://github.com/dsfsi/en-ss-m2m100-combo) | [Ss-En Model](https://github.com/dsfsi/ss-en-m2m100-combo)
- **Research Group**: [DSFSI](https://dsfsi.github.io/)
- **Feedback**: [Research Feedback Form](https://docs.google.com/forms/d/e/1FAIpQLSf7S36dyAUPx2egmXbFpnTBuzoRulhL5Elu-N1eoMhaO7v10w/viewform)

## 🀝 Contributing

We welcome contributions from the linguistic and NLP communities! Areas of interest:
- Improving translation quality
- Adding more linguistic analysis features
- Expanding to other African languages
- Enhancing the user interface for research workflows

## πŸ“„ License

This project is licensed under the Apache 2.0 License. The underlying models may have their own licensing terms - please check the individual model repositories.

## 🌍 Supporting African Languages

This tool is part of a broader effort to support African language technology and computational linguistics research. By providing advanced NLP tools for Siswati, we aim to:

- Preserve and promote African languages in the digital age
- Support linguistic research and documentation
- Enable better communication across language barriers
- Contribute to the development of multilingual AI systems

---

**Built with ❀️ for the African NLP community**