sentence_transformers_support (#2)

- Add support for Sentence Transformer (fd892c44ece765cd2eea98f34a84a52cda185d3c)
- Update README.md (2a36dcec5365886f4487325e059223f9b5c65e0b)

Files changed (5) hide show

1_SpladePooling/config.json +5 -0
README.md +44 -4
config_sentence_transformers.json +14 -0
modules.json +14 -0
sentence_bert_config.json +4 -0

1_SpladePooling/config.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+    "pooling_strategy": "max",
+    "activation_function": "relu",
+    "word_embedding_dimension": null
+}

README.md CHANGED Viewed

@@ -12,9 +12,12 @@ tags:
 - passage-retrieval
 - knowledge-distillation
 - document encoder
 pretty_name:  Independent Implementation of SPLADE++ Model with some efficiency tweaks for Industry setting.
-library_name: transformers
-pipeline_tag: fill-mask
 ---
 <center>
@@ -198,9 +201,46 @@ sparse_rep = expander.expand(
     ["The Manhattan Project and its atomic bomb helped bring an end to World War II. Its legacy of peaceful uses of atomic energy continues to have an impact on history and science."])
 ```
-## 6c. With HuggingFace
 **NOTEBOOK user? Login first**

 - passage-retrieval
 - knowledge-distillation
 - document encoder
+- sparse-encoder
+- sparse
+- splade
 pretty_name:  Independent Implementation of SPLADE++ Model with some efficiency tweaks for Industry setting.
+library_name: sentence-transformers
+pipeline_tag: feature-extraction
 ---
 <center>
     ["The Manhattan Project and its atomic bomb helped bring an end to World War II. Its legacy of peaceful uses of atomic energy continues to have an impact on history and science."])
 ```
+ ## 6c. With Sentence Transformers
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SparseEncoder
+# Download from the 🤗 Hub
+model = SparseEncoder("prithivida/Splade_PP_en_v2")
+# Run inference
+sentence = [
+    "The Manhattan Project and its atomic bomb helped bring an end to World War II. Its legacy of peaceful uses of atomic energy continues to have an impact on history and science."
+]
+embeddings = model.encode(sentence)
+print(embeddings.shape)
+# [1, 30522]
+decoded_sentence = model.decode(embeddings[0])
+print(f"Number of actual dimensions: {len(decoded_sentence)}")
+decoded_sentence_rounded = [(token, round(score, 2)) for token, score in decoded_sentence]
+print("SPLADE BOW rep:\n", decoded_sentence_rounded)
+# Number of actual dimensions: 103
+# SPLADE BOW rep:
+#  [('manhattan', 2.59), ('project', 2.1), ('atomic', 1.65), ('legacy', 1.62), ('bomb', 1.5), ('peaceful', 1.47), ('end', 1.42), ('helped', 1.37), ('wwii', 1.36), ('energy', 1.36), ('war', 1.29), ('1942', 1.29), ('bring', 1.21), ('impact', 1.14),
+#  ('help', 1.09), ('bombs', 1.05), ('ny', 0.93), ('scientist', 0.91), ('nuclear', 0.89), ('history', 0.87), ('projects', 0.87), ('mission', 0.83), ('stop', 0.77), ('wars', 0.76), ('peace', 0.76), ('ii', 0.76), ('affect', 0.76), ('power', 0.73),
+#  ('science', 0.72), ('bombing', 0.72), ('atom', 0.72), ('use', 0.7), ('did', 0.69), ('brought', 0.67), ('still', 0.66), ('purpose', 0.65), ('was', 0.65), ('effect', 0.59), ('scientists', 0.59), ('uses', 0.57), ('because', 0.53), ('historical', 0.48),
+#  ('experiment', 0.47), ('scientific', 0.47), ('safe', 0.46), ('w', 0.45), ('message', 0.44), ('##w', 0.42), ('ended', 0.41), ('hudson', 0.39), ('roosevelt', 0.38), ('were', 0.36), ('##nik', 0.35), ('continue', 0.34), ('hiroshima', 0.33), ('important', 0.33),
+#  ('benefit', 0.32), ('destruction', 0.31), ('used', 0.3), ('nazi', 0.3), ('destroyed', 0.29), ('story', 0.29), ('assisted', 0.27), ('close', 0.27), ('influenced', 0.25), ('world', 0.25), ('invented', 0.24), ('contribution', 0.24), ('military', 0.24), ('conflict', 0.22),
+#  ('1939', 0.22), ('success', 0.22), ('1940s', 0.21), ('nasa', 0.2), ('harry', 0.2), ('revolution', 0.2), ('today', 0.18), ('rescue', 0.17), ('radiation', 0.16), ('destiny', 0.16), ('last', 0.15), ('allies', 0.14), ('the', 0.14), ('created', 0.13), ('hess', 0.13), ('weapon', 0.13),
+#  ('started', 0.11), ('us', 0.1), ('secret', 0.1), ('campaign', 0.09), ('2', 0.08), ('cause', 0.08), ('and', 0.07), ('propaganda', 0.06), ('noah', 0.05), ('theory', 0.05), ('significance', 0.02), ('berlin', 0.01), ('fuel', 0.01), ('columbia', 0.01), ('strategy', 0.01), ('usage', 0.01), ('symbol', 0.0)]
+```
+## 6d. With HuggingFace
 **NOTEBOOK user? Login first**

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "model_type": "SparseEncoder",
+  "__version__": {
+    "sentence_transformers": "5.0.0",
+    "transformers": "4.50.3",
+    "pytorch": "2.6.0+cu124"
+  },
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "dot"
+}

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_SpladePooling",
+    "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 512,
+    "do_lower_case": false
+}