SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v0_9_15")
# Run inference
sentences = [
    '科目:塗装。名称:PCa保護塗り(細幅物)。',
    '科目:塗装。名称:PCa面塗り(細幅物)。',
    '科目:塗装。名称:PCa面塗り(細幅物)。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 7,598 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string int
    details
    • min: 11 tokens
    • mean: 17.2 tokens
    • max: 29 tokens
    • 0: ~0.30%
    • 1: ~0.30%
    • 2: ~0.30%
    • 3: ~0.30%
    • 4: ~0.30%
    • 5: ~0.30%
    • 6: ~0.30%
    • 7: ~0.30%
    • 8: ~0.30%
    • 9: ~0.30%
    • 10: ~0.30%
    • 11: ~0.40%
    • 12: ~0.30%
    • 13: ~0.30%
    • 14: ~0.30%
    • 15: ~0.30%
    • 16: ~0.30%
    • 17: ~0.30%
    • 18: ~0.50%
    • 19: ~0.30%
    • 20: ~0.30%
    • 21: ~0.30%
    • 22: ~0.30%
    • 23: ~0.30%
    • 24: ~0.30%
    • 25: ~0.30%
    • 26: ~0.30%
    • 27: ~0.30%
    • 28: ~0.30%
    • 29: ~0.30%
    • 30: ~0.30%
    • 31: ~0.30%
    • 32: ~0.30%
    • 33: ~0.30%
    • 34: ~0.30%
    • 35: ~0.30%
    • 36: ~0.30%
    • 37: ~0.30%
    • 38: ~0.30%
    • 39: ~0.30%
    • 40: ~0.40%
    • 41: ~0.30%
    • 42: ~0.30%
    • 43: ~0.30%
    • 44: ~0.60%
    • 45: ~0.70%
    • 46: ~0.30%
    • 47: ~0.30%
    • 48: ~0.30%
    • 49: ~0.30%
    • 50: ~0.30%
    • 51: ~0.30%
    • 52: ~0.30%
    • 53: ~0.30%
    • 54: ~0.30%
    • 55: ~0.30%
    • 56: ~0.30%
    • 57: ~0.80%
    • 58: ~0.30%
    • 59: ~0.30%
    • 60: ~0.60%
    • 61: ~0.30%
    • 62: ~0.30%
    • 63: ~0.30%
    • 64: ~0.50%
    • 65: ~0.30%
    • 66: ~0.30%
    • 67: ~0.30%
    • 68: ~0.30%
    • 69: ~0.30%
    • 70: ~0.60%
    • 71: ~0.30%
    • 72: ~0.30%
    • 73: ~0.30%
    • 74: ~0.30%
    • 75: ~0.30%
    • 76: ~0.30%
    • 77: ~0.30%
    • 78: ~0.30%
    • 79: ~0.30%
    • 80: ~0.30%
    • 81: ~0.30%
    • 82: ~0.30%
    • 83: ~0.30%
    • 84: ~0.80%
    • 85: ~0.60%
    • 86: ~0.50%
    • 87: ~0.30%
    • 88: ~0.30%
    • 89: ~16.30%
    • 90: ~0.30%
    • 91: ~0.30%
    • 92: ~0.30%
    • 93: ~0.30%
    • 94: ~0.30%
    • 95: ~0.30%
    • 96: ~0.30%
    • 97: ~0.30%
    • 98: ~0.50%
    • 99: ~0.30%
    • 100: ~0.30%
    • 101: ~0.30%
    • 102: ~0.30%
    • 103: ~0.30%
    • 104: ~0.30%
    • 105: ~0.30%
    • 106: ~1.20%
    • 107: ~0.70%
    • 108: ~0.30%
    • 109: ~3.20%
    • 110: ~0.30%
    • 111: ~2.30%
    • 112: ~0.30%
    • 113: ~0.30%
    • 114: ~0.50%
    • 115: ~0.50%
    • 116: ~0.50%
    • 117: ~0.30%
    • 118: ~0.30%
    • 119: ~0.30%
    • 120: ~0.80%
    • 121: ~0.30%
    • 122: ~0.30%
    • 123: ~0.30%
    • 124: ~0.30%
    • 125: ~0.30%
    • 126: ~0.30%
    • 127: ~0.30%
    • 128: ~0.30%
    • 129: ~0.30%
    • 130: ~0.30%
    • 131: ~0.40%
    • 132: ~0.30%
    • 133: ~0.30%
    • 134: ~0.30%
    • 135: ~0.30%
    • 136: ~0.30%
    • 137: ~0.30%
    • 138: ~0.30%
    • 139: ~0.30%
    • 140: ~0.30%
    • 141: ~0.30%
    • 142: ~0.40%
    • 143: ~0.30%
    • 144: ~0.30%
    • 145: ~0.30%
    • 146: ~0.30%
    • 147: ~0.30%
    • 148: ~0.30%
    • 149: ~0.70%
    • 150: ~0.30%
    • 151: ~0.30%
    • 152: ~0.30%
    • 153: ~1.30%
    • 154: ~0.30%
    • 155: ~0.30%
    • 156: ~0.30%
    • 157: ~0.30%
    • 158: ~0.30%
    • 159: ~1.30%
    • 160: ~0.30%
    • 161: ~0.30%
    • 162: ~0.30%
    • 163: ~0.30%
    • 164: ~0.30%
    • 165: ~0.30%
    • 166: ~0.30%
    • 167: ~1.50%
    • 168: ~0.30%
    • 169: ~0.30%
    • 170: ~7.90%
    • 171: ~0.30%
    • 172: ~1.00%
    • 173: ~0.30%
    • 174: ~0.30%
    • 175: ~0.30%
    • 176: ~1.80%
    • 177: ~0.30%
    • 178: ~0.50%
    • 179: ~0.70%
    • 180: ~0.30%
    • 181: ~0.30%
    • 182: ~0.30%
    • 183: ~0.30%
    • 184: ~0.30%
    • 185: ~0.30%
    • 186: ~0.30%
    • 187: ~0.30%
    • 188: ~2.50%
  • Samples:
    sentence label
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
  • Loss: sentence_transformer_lib.custom_batch_all_trip_loss.CustomBatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 250
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: group_by_label

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 250
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: group_by_label
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.6667 10 0.0662
1.3333 20 0.0
2.0 30 0.0
2.6667 40 0.0
3.3333 50 0.0
4.0 60 0.0
4.6667 70 0.0
5.3333 80 0.0
6.0 90 0.0
6.6667 100 0.0
7.3333 110 0.0
8.0 120 0.0
8.6667 130 0.0
9.3333 140 0.0
10.0 150 0.0
10.0 10 2.7711
20.0 20 1.2115
30.0 30 0.3753
40.0 40 0.1646
50.0 50 0.0876
60.0 60 0.0559
70.0 70 0.0344
80.0 80 0.0262
90.0 90 0.0194
100.0 100 0.0218
110.0 110 0.0214
120.0 120 0.014
130.0 130 0.0231
140.0 140 0.0132
150.0 150 0.0146
3.7576 100 0.0701
7.7576 200 0.0747
11.7576 300 0.0709
15.7576 400 0.0689
19.7576 500 0.0622
23.7576 600 0.0639
27.7576 700 0.063
31.7576 800 0.0605
35.7576 900 0.061
39.7576 1000 0.0602
43.7576 1100 0.0609
47.7576 1200 0.0596
51.7576 1300 0.0568
55.7576 1400 0.0593
59.7576 1500 0.058
63.7576 1600 0.0613
67.7576 1700 0.0515
71.7576 1800 0.0511
75.7576 1900 0.0538
79.7576 2000 0.0559
83.7576 2100 0.0482
87.7576 2200 0.0511
91.7576 2300 0.0553
95.7576 2400 0.0522
99.7576 2500 0.0534
103.7576 2600 0.0477
107.7576 2700 0.052
111.7576 2800 0.0518
115.7576 2900 0.047
119.7576 3000 0.0503
123.7576 3100 0.0494
127.7576 3200 0.0488
131.7576 3300 0.052
135.7576 3400 0.0459
139.7576 3500 0.0467
143.7576 3600 0.0493
147.7576 3700 0.0453
151.7576 3800 0.0457
155.7576 3900 0.0462
159.7576 4000 0.0451
163.7576 4100 0.0446
167.7576 4200 0.0438
171.7576 4300 0.0398
175.7576 4400 0.0414
179.7576 4500 0.045
183.7576 4600 0.0448
187.7576 4700 0.0426
191.7576 4800 0.0427
195.7576 4900 0.0434
199.7576 5000 0.039
203.7576 5100 0.0381
207.7576 5200 0.0434
211.7576 5300 0.041
215.7576 5400 0.0463
219.7576 5500 0.0386
223.7576 5600 0.0453
227.7576 5700 0.0412
231.7576 5800 0.0373
235.7576 5900 0.0393
239.7576 6000 0.0362
243.7576 6100 0.0363
247.7576 6200 0.0372

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CustomBatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
14
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support