File size: 6,362 Bytes
4f9613b 6432040 4f9613b b923990 4f9613b 6432040 4f9613b 6432040 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
title: Siswati-English Linguistic Translation Tool
emoji: π¬
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.33.2
app_file: app.py
pinned: false
license: apache-2.0
tags:
- translation
- siswati
- linguistics
- african-languages
- nlp
- research
- corpus-analysis
- bantu-languages
- m2m100
- multilingual
---
# π¬ Siswati-English Linguistic Translation Tool
An advanced AI-powered translation system with comprehensive linguistic analysis features, designed specifically for linguists, researchers, and language documentation projects working with Siswati and English.
## π Features
### π Translation Capabilities
- **Bidirectional Translation**: High-quality English β Siswati translation
- **Advanced Model Architecture**: Built on M2M100 transformer models
- **Batch Processing**: Process multiple texts simultaneously for corpus analysis
- **Real-time Analysis**: Instant linguistic metrics and feature detection
### π Linguistic Analysis
- **Morphological Complexity**: Word length, sentence structure analysis
- **Lexical Diversity**: Vocabulary richness measurements
- **Language-Specific Features**: Siswati agglutination, click consonants, tone markers
- **Translation Ratios**: Comparative analysis between source and target languages
- **Statistical Metrics**: Character count, word count, sentence segmentation
### π¬ Research Tools
- **Translation History**: Track and analyze translation patterns over time
- **CSV Export**: Research-ready data export for statistical analysis
- **Corpus Management**: Batch processing for linguistic corpora
- **Performance Metrics**: Processing time and efficiency tracking
## π£οΈ About Siswati
**Siswati** (also known as **Swati** or **Swazi**) is a Bantu language spoken by approximately 2.3 million people, primarily in:
- πΈπΏ **Eswatini** (Kingdom of Eswatini) - Official language
- πΏπ¦ **South Africa** - One of 11 official languages
### Linguistic Features
- **Language Family**: Niger-Congo β Bantu β Southeast Bantu
- **Script**: Latin alphabet
- **Characteristics**: Agglutinative morphology, click consonants, tonal
- **ISO Code**: ss (ISO 639-1), ssw (ISO 639-3)
## π€ Model Information
This tool uses state-of-the-art transformer models developed by the **Data Science for Social Impact Research Group**:
- **English β Siswati**: `dsfsi/en-ss-m2m100-combo`
- **Siswati β English**: `dsfsi/ss-en-m2m100-combo`
Both models are based on Meta's M2M100 architecture, fine-tuned specifically for Siswati-English translation pairs.
## π― Use Cases
### For Linguists & Researchers
- **Language Documentation**: Analyze translation patterns and linguistic features
- **Corpus Studies**: Process large text collections with batch translation
- **Comparative Analysis**: Study morphological and syntactic differences
- **Quality Assessment**: Evaluate translation adequacy and fluency
### For Educators & Students
- **Language Learning**: Understand translation patterns and linguistic structures
- **Academic Research**: Export data for statistical analysis and publications
- **Computational Linguistics**: Study machine translation for low-resource languages
### For Community & Cultural Projects
- **Language Preservation**: Support Siswati language documentation efforts
- **Cultural Exchange**: Facilitate communication between English and Siswati speakers
- **Content Translation**: Assist in translating educational and cultural materials
## π Getting Started
1. **Single Translation**: Enter text and select translation direction
2. **Batch Processing**: Upload `.txt` files or paste multiple lines for corpus analysis
3. **Analysis Export**: Use the research tools to export translation data as CSV
4. **Linguistic Study**: Explore the real-time analysis features for detailed insights
## π Linguistic Metrics Explained
### Text Complexity
- **Word Count**: Total number of words in the text
- **Character Count**: Total characters including spaces and punctuation
- **Sentence Count**: Number of sentences detected
- **Average Word Length**: Mean character length per word
- **Lexical Diversity**: Ratio of unique words to total words (vocabulary richness)
### Translation Analysis
- **Word Ratio**: Target word count / Source word count
- **Character Ratio**: Target character count / Source character count
- **Processing Time**: Time taken for model inference
### Siswati-Specific Features
- **Agglutination Detection**: Identification of potentially agglutinated words (>10 characters)
- **Click Consonants**: Count of clicks (c, q, x sounds)
- **Tone Markers**: Detection of acute (Μ) and grave (Μ) accent marks
## π Academic Usage
If you use this tool in your research, please cite the original models:
```bibtex
@misc{dsfsi-siswati-translation,
title={Siswati-English Translation Models},
author={Marivate, Vukosi and Lastrucci, Richard},
year={2024},
publisher={Data Science for Social Impact Research Group},
url={https://github.com/dsfsi/}
}
```
## π Related Resources
- **Model Repositories**: [En-Ss Model](https://github.com/dsfsi/en-ss-m2m100-combo) | [Ss-En Model](https://github.com/dsfsi/ss-en-m2m100-combo)
- **Research Group**: [DSFSI](https://dsfsi.github.io/)
- **Feedback**: [Research Feedback Form](https://docs.google.com/forms/d/e/1FAIpQLSf7S36dyAUPx2egmXbFpnTBuzoRulhL5Elu-N1eoMhaO7v10w/viewform)
## π€ Contributing
We welcome contributions from the linguistic and NLP communities! Areas of interest:
- Improving translation quality
- Adding more linguistic analysis features
- Expanding to other African languages
- Enhancing the user interface for research workflows
## π License
This project is licensed under the Apache 2.0 License. The underlying models may have their own licensing terms - please check the individual model repositories.
## π Supporting African Languages
This tool is part of a broader effort to support African language technology and computational linguistics research. By providing advanced NLP tools for Siswati, we aim to:
- Preserve and promote African languages in the digital age
- Support linguistic research and documentation
- Enable better communication across language barriers
- Contribute to the development of multilingual AI systems
---
**Built with β€οΈ for the African NLP community** |