SentenceTransformer based on google/embeddinggemma-300m

This is a sentence-transformers model finetuned from google/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google/embeddinggemma-300m
  • Maximum Sequence Length: 2048 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("yasserrmd/endocrinology-gemma-300m-emb")
# Run inference
queries = [
    "How does in utero exposure to excess anti-M\u00fcllerian hormone (AMH) affect the GnRH neuronal morphology and electrical activity in offspring?\n",
]
documents = [
    'In utero exposure to excess AMH leads to protracted changes in GnRH neuronal morphology and electrical activity in offspring. PAMH female mice exhibit increased spine density on the soma and along the primary dendrite of GnRH neurons compared to controls during diestrus. This increased spine density is accompanied by a significant increase in the number of vesicular GABA transporter (vGaT) appositions onto GnRH cells. While there are no significant differences in the number of vesicular glutamate transporter 2 (vGluT2) appositions, it is important to note that GABA, although primarily recognized as an inhibitory neurotransmitter in the adult brain, is excitatory in adult GnRH neurons. This elevated hypothalamic excitatory apposition onto GnRH neurons in PAMH animals translates into increased neuronal activity.',
    'Prophylactic thyroidectomy is recommended as early as the age of five years in confirmed RET mutation carriers in MEN2A or FMTC families with normal (stimulated) plasma calcitonin levels. However, some clinicians may prefer to wait until the pentagastrin test results are abnormal before performing thyroidectomy. This is because the test for calcitonin levels may give false negative results, and medullary thyroid carcinoma has been encountered in children with normal calcitonin levels who underwent thyroidectomy after DNA diagnosis.',
    'The most common co-morbidities reported by patients with GHD are hypertension, arthritis, and diabetes mellitus. Additionally, 26% of patients had a history of fractures.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.7737, 0.0678, 0.0061]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 20,000 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 9 tokens
    • mean: 21.14 tokens
    • max: 54 tokens
    • min: 15 tokens
    • mean: 90.48 tokens
    • max: 223 tokens
  • Samples:
    sentence_0 sentence_1
    What factors contribute to the development of hypoglycemia unawareness in individuals with diabetes?
    Hypoglycemia unawareness, also known as HAAF (hypoglycemia-associated autonomic failure), is a known complication of insulin therapy for type 1 and type 2 diabetes. Even a single episode of antecedent hypoglycemia can alter the neuroendocrine response during subsequent hypoglycemia. While the exact mechanism of HAAF is not fully understood, improved brain glucose transport is considered a major factor. In individuals with HAAF, brain glucose concentration is higher compared to controls. Chronic and recurrent hypoglycemia can enhance blood-brain glucose transport capacity, and increased expression of glucose transporters at the blood-brain barrier has been observed in animal models. HAAF is characterized by a lack of suppression of endogenous insulin secretion and failure of glucagon and catecholamine secretion during hypoglycemia. Decreased cortisol secretion is commonly present, but adrenal medullary effects predominate. Increased CRH secretion, acting via CRH receptor 1, may be invol...
    How was the baby boy with the TRβ R243W mutation diagnosed with resistance to thyroid hormone (RTH) instead of neonatal Graves' disease (GD)?
    The baby boy was initially suspected of having neonatal GD due to his mother's condition. However, laboratory tests showed that his thyroid-stimulating hormone (TSH) levels were not suppressed, and he had high levels of free T4 (FT4) and free T3 (FT3) with no antibodies related to GD. Based on these findings, he was diagnosed with RTH instead of GD.
    What are the risk factors for developing diabetic muscle infarction (DMI)?
    The risk factors for developing diabetic muscle infarction (DMI) include poorly controlled diabetes mellitus, particularly type 1 diabetes, and the presence of late complications such as nephropathy, retinopathy, and neuropathy. Other factors that may contribute to the development of DMI include hyperglycemia and long-standing diabetes.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 6
  • per_device_eval_batch_size: 6
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 6
  • per_device_eval_batch_size: 6
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1500 500 0.0224
0.2999 1000 0.0171
0.4499 1500 0.0158
0.5999 2000 0.0062
0.7499 2500 0.0095
0.8998 3000 0.0043

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.2
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
8
Safetensors
Model size
303M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yasserrmd/endocrinology-gemma-300m-emb

Finetuned
(84)
this model

Collection including yasserrmd/endocrinology-gemma-300m-emb