---
base_model: Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
datasets:
- Omartificial-Intelligence-Space/Arabic-stsb
language:
- ar
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:947818
- loss:SoftmaxLoss
- loss:CosineSimilarityLoss
widget:
- source_sentence: امرأة تكتب شيئاً
sentences:
- مراهق يتحدث إلى فتاة عبر كاميرا الإنترنت
- امرأة تقطع البصل الأخضر.
- مجموعة من كبار السن يتظاهرون حول طاولة الطعام.
- source_sentence: تتشكل النجوم في مناطق تكوين النجوم، والتي تنشأ نفسها من السحب الجزيئية.
sentences:
- لاعب كرة السلة على وشك تسجيل نقاط لفريقه.
- المقال التالي مأخوذ من نسختي من "أطلس البطريق الجديد للتاريخ الوسطى"
- قد يكون من الممكن أن يوجد نظام شمسي مثل نظامنا خارج المجرة
- source_sentence: تحت السماء الزرقاء مع الغيوم البيضاء، يصل طفل لمس مروحة طائرة واقفة
على حقل من العشب.
sentences:
- امرأة تحمل كأساً
- طفل يحاول لمس مروحة طائرة
- اثنان من عازبين عن الشرب يستعدون للعشاء
- source_sentence: رجل في منتصف العمر يحلق لحيته في غرفة ذات جدران بيضاء والتي لا
تبدو كحمام
sentences:
- فتى يخطط اسمه على مكتبه
- رجل ينام
- المرأة وحدها وهي نائمة في غرفة نومها
- source_sentence: الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.
sentences:
- شخص طويل القامة
- المرأة تنظر من النافذة.
- لقد مات الكلب
model-index:
- name: SentenceTransformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.8390853221830158
name: Pearson Cosine
- type: spearman_cosine
value: 0.8410008255002589
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8276538954353795
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8360889200075982
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8274021671008013
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8357887501417183
name: Spearman Euclidean
- type: pearson_dot
value: 0.8154259766643255
name: Pearson Dot
- type: spearman_dot
value: 0.81802827956939
name: Spearman Dot
- type: pearson_max
value: 0.8390853221830158
name: Pearson Max
- type: spearman_max
value: 0.8410008255002589
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.8130046542366043
name: Pearson Cosine
- type: spearman_cosine
value: 0.8172511596569861
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8113865863454744
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8164081961542164
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.810311097439534
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8157654465052717
name: Spearman Euclidean
- type: pearson_dot
value: 0.7907732563794702
name: Pearson Dot
- type: spearman_dot
value: 0.7886749863194292
name: Spearman Dot
- type: pearson_max
value: 0.8130046542366043
name: Pearson Max
- type: spearman_max
value: 0.8172511596569861
name: Spearman Max
---
# SentenceTransformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2](https://huggingface.co/Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2) on the all-nli and [sts](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2](https://huggingface.co/Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
- **Training Datasets:**
- all-nli
- [sts](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb)
- **Language:** ar
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2-multi-tas-v2k")
# Run inference
sentences = [
'الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.',
'لقد مات الكلب',
'شخص طويل القامة',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Semantic Similarity
* Dataset: `sts-dev`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:----------|
| pearson_cosine | 0.8391 |
| **spearman_cosine** | **0.841** |
| pearson_manhattan | 0.8277 |
| spearman_manhattan | 0.8361 |
| pearson_euclidean | 0.8274 |
| spearman_euclidean | 0.8358 |
| pearson_dot | 0.8154 |
| spearman_dot | 0.818 |
| pearson_max | 0.8391 |
| spearman_max | 0.841 |
#### Semantic Similarity
* Dataset: `sts-test`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.813 |
| **spearman_cosine** | **0.8173** |
| pearson_manhattan | 0.8114 |
| spearman_manhattan | 0.8164 |
| pearson_euclidean | 0.8103 |
| spearman_euclidean | 0.8158 |
| pearson_dot | 0.7908 |
| spearman_dot | 0.7887 |
| pearson_max | 0.813 |
| spearman_max | 0.8173 |
## Training Details
### Training Datasets
#### all-nli
* Dataset: all-nli
* Size: 942,069 training samples
* Columns: premise
, hypothesis
, and label
* Approximate statistics based on the first 1000 samples:
| | premise | hypothesis | label |
|:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------|
| type | string | string | int |
| details |
شخص على حصان يقفز فوق طائرة معطلة
| شخص يقوم بتدريب حصانه للمنافسة
| 1
|
| شخص على حصان يقفز فوق طائرة معطلة
| شخص في مطعم، يطلب عجة.
| 2
|
| شخص على حصان يقفز فوق طائرة معطلة
| شخص في الهواء الطلق، على حصان.
| 0
|
* Loss: [SoftmaxLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#softmaxloss)
#### sts
* Dataset: [sts](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb) at [f5a6f89](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb/tree/f5a6f89da460d307eff3acbbfcb62d0705cdbbb5)
* Size: 5,749 training samples
* Columns: sentence1
, sentence2
, and score
* Approximate statistics based on the first 1000 samples:
| | sentence1 | sentence2 | score |
|:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------|
| type | string | string | float |
| details | طائرة ستقلع
| طائرة جوية ستقلع
| 1.0
|
| رجل يعزف على ناي كبير
| رجل يعزف على الناي.
| 0.76
|
| رجل ينشر الجبن الممزق على البيتزا
| رجل ينشر الجبن الممزق على بيتزا غير مطبوخة
| 0.76
|
* Loss: [CosineSimilarityLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
```json
{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
```
### Evaluation Datasets
#### all-nli
* Dataset: all-nli
* Size: 1,000 evaluation samples
* Columns: premise
, hypothesis
, and label
* Approximate statistics based on the first 1000 samples:
| | premise | hypothesis | label |
|:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------|
| type | string | string | int |
| details | امرأتان يتعانقان بينما يحملان طرود
| الأخوات يعانقون بعضهم لوداعاً بينما يحملون حزمة بعد تناول الغداء
| 1
|
| امرأتان يتعانقان بينما يحملان حزمة
| إمرأتان يحملان حزمة
| 0
|
| امرأتان يتعانقان بينما يحملان حزمة
| الرجال يتشاجرون خارج مطعم
| 2
|
* Loss: [SoftmaxLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#softmaxloss)
#### sts
* Dataset: [sts](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb) at [f5a6f89](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb/tree/f5a6f89da460d307eff3acbbfcb62d0705cdbbb5)
* Size: 1,500 evaluation samples
* Columns: sentence1
, sentence2
, and score
* Approximate statistics based on the first 1000 samples:
| | sentence1 | sentence2 | score |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
| type | string | string | float |
| details | رجل يرتدي قبعة صلبة يرقص
| رجل يرتدي قبعة صلبة يرقص.
| 1.0
|
| طفل صغير يركب حصاناً.
| طفل يركب حصاناً.
| 0.95
|
| رجل يطعم فأراً لأفعى
| الرجل يطعم الفأر للثعبان.
| 1.0
|
* Loss: [CosineSimilarityLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
```json
{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `fp16`: True
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters