prithivida commited on
Commit
145e8e8
verified
1 Parent(s): a008fa4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -90,7 +90,7 @@ SPLADE models are a fine balance between retrieval effectiveness (quality) and r
90
 
91
  <details>
92
 
93
- While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
94
  (We will show Quantitative results in the next section.)
95
 
96
  So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (位Q,位D, see above table) will achieve a better MRR.
@@ -100,19 +100,19 @@ But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This i
100
 
101
  **Ours**
102
  ```python
103
- number of actual dimensions: 113
104
  SPLADE BOW rep:
105
- [('stress', 2.36), ('glass', 2.15), ('thermal', 2.06), ('pan', 1.83), ('glasses', 1.67), ('break', 1.47), ('crack', 1.47), ('heat', 1.45), ('warmth', 1.36), ('depression', 1.34), ('hotter', 1.23), ('hottest', 1.11), ('window', 1.11), ('hot', 1.1), ('area', 1.04), ('cause', 1.01), ('adjacent', 0.99), ('too', 0.94), ('created', 0.86), ('##pan', 0.84), ('phenomenon', 0.81), ('when', 0.78), ('temperature', 0.76), ('cracked', 0.75), ('factors', 0.74), ('windows', 0.72), ('create', 0.71), ('level', 0.7), ('formed', 0.61), ('stresses', 0.59), ('warm', 0.58), ('fracture', 0.57), ('adjoining', 0.56), ('areas', 0.56), ('nearby', 0.56), ('causes', 0.56), ('broken', 0.54), ('produced', 0.52), ('sash', 0.51), ('if', 0.51), ('breaks', 0.49), ('is', 0.49), ('effect', 0.45), ('heated', 0.44), ('process', 0.42), ('breaking', 0.42), ('one', 0.4), ('mirror', 0.39), ('factor', 0.38), ('shatter', 0.38), ('formation', 0.37), ('mathias', 0.37), ('damage', 0.36), ('cracking', 0.35), ('climate', 0.35), ('ceramic', 0.34), ('reaction', 0.34), ('steam', 0.33), ('reflection', 0.33), ('generated', 0.33), ('material', 0.32), ('burst', 0.31), ('fire', 0.31), ('neighboring', 0.3), ('explosion', 0.29), ('caused', 0.29), ('warmer', 0.29), ('because', 0.28), ('anxiety', 0.28), ('furnace', 0.28), ('tear', 0.27), ('induced', 0.27), ('fail', 0.26), ('are', 0.26), ('collapse', 0.26), ('##thermal', 0.26), ('and', 0.25), ('great', 0.25), ('get', 0.24), ('spark', 0.23), ('lens', 0.2), ('cooler', 0.19), ('determined', 0.19), ('leak', 0.19), ('disease', 0.19), ('emotion', 0.16), ('cork', 0.14), ('cooling', 0.14), ('heating', 0.13), ('governed', 0.13), ('optical', 0.12), ('surrounding', 0.12), ('warming', 0.12), ('convection', 0.11), ('regulated', 0.11), ('problem', 0.1), ('cool', 0.09), ('violence', 0.09), ('breaker', 0.09), ('image', 0.09), ('photo', 0.05), ('strike', 0.05), ('.', 0.04), ('shattering', 0.04), ('snap', 0.03), ('wilson', 0.03), ('weather', 0.02), ('eye', 0.02), ('produce', 0.01), ('crime', 0.01), ('humid', 0.0), ('impact', 0.0), ('earthquake', 0.0)]```
106
  ```
107
 
108
- **naver/splade-cocondenser-ensembledistil** (SoTA, ~10% more tokens + FLOPS = 1.85)
109
  ```python
110
  number of actual dimensions: 126
111
  SPLADE BOW rep:
112
  [('stress', 2.25), ('glass', 2.23), ('thermal', 2.18), ('glasses', 1.65), ('pan', 1.62), ('heat', 1.56), ('stressed', 1.42), ('crack', 1.31), ('break', 1.12), ('cracked', 1.1), ('hot', 0.93), ('created', 0.9), ('factors', 0.81), ('broken', 0.73), ('caused', 0.71), ('too', 0.71), ('damage', 0.69), ('if', 0.68), ('hotter', 0.65), ('governed', 0.61), ('heating', 0.59), ('temperature', 0.59), ('adjacent', 0.59), ('cause', 0.58), ('effect', 0.57), ('fracture', 0.56), ('bradford', 0.55), ('strain', 0.53), ('hammer', 0.51), ('brian', 0.48), ('error', 0.47), ('windows', 0.45), ('will', 0.45), ('reaction', 0.42), ('create', 0.42), ('windshield', 0.41), ('heated', 0.41), ('factor', 0.4), ('cracking', 0.39), ('failure', 0.38), ('mechanical', 0.38), ('when', 0.38), ('formed', 0.38), ('bolt', 0.38), ('mechanism', 0.37), ('warm', 0.37), ('areas', 0.36), ('area', 0.36), ('energy', 0.34), ('disorder', 0.33), ('barry', 0.33), ('shock', 0.32), ('determined', 0.32), ('gage', 0.32), ('sash', 0.31), ('theory', 0.31), ('level', 0.31), ('resistant', 0.31), ('brake', 0.3), ('window', 0.3), ('crash', 0.3), ('hazard', 0.29), ('##ink', 0.27), ('ceramic', 0.27), ('storm', 0.25), ('problem', 0.25), ('issue', 0.24), ('impact', 0.24), ('fridge', 0.24), ('injury', 0.23), ('ross', 0.22), ('causes', 0.22), ('affect', 0.21), ('pressure', 0.21), ('fatigue', 0.21), ('leak', 0.21), ('eye', 0.2), ('frank', 0.2), ('cool', 0.2), ('might', 0.19), ('gravity', 0.18), ('ray', 0.18), ('static', 0.18), ('collapse', 0.18), ('physics', 0.18), ('wave', 0.18), ('reflection', 0.17), ('parker', 0.17), ('strike', 0.17), ('hottest', 0.17), ('burst', 0.16), ('chance', 0.16), ('burn', 0.14), ('rubbing', 0.14), ('interference', 0.14), ('bailey', 0.13), ('vibration', 0.12), ('gilbert', 0.12), ('produced', 0.12), ('rock', 0.12), ('warmer', 0.11), ('get', 0.11), ('drink', 0.11), ('fireplace', 0.11), ('ruin', 0.1), ('brittle', 0.1), ('fragment', 0.1), ('stumble', 0.09), ('formation', 0.09), ('shatter', 0.08), ('great', 0.08), ('friction', 0.08), ('flash', 0.07), ('cracks', 0.07), ('levels', 0.07), ('smash', 0.04), ('fail', 0.04), ('fra', 0.04), ('##glass', 0.03), ('variables', 0.03), ('because', 0.02), ('knock', 0.02), ('sun', 0.02), ('crush', 0.01), ('##e', 0.01), ('anger', 0.01)]
113
  ```
114
 
115
- **naver/splade-v2-distil** (~100% more tokens + FLOPS = 3.82)
116
  ```python
117
  number of actual dimensions: 234
118
  SPLADE BOW rep:
@@ -135,7 +135,7 @@ But it is unclear how well these hyperparameters are transferable to other domai
135
  <details>
136
 
137
  **Note: Why Anserini not PISA?** *Anserini is a production ready lucene based library. Common industry search deployments use Solr or elastic which are lucene based, hence the performance can be comparable. PISA latency is irrelevant for industry as it is a a research only system.*
138
- The full [anserini evaluation log](https://huggingface.co/prithivida/Splade_PP_en_v1/blob/main/anserini_run.log) with encoding, indexing and querying details are here.
139
 
140
  - **BEIR ZST OOD performance**: Will be added to the end of page.
141
 
 
90
 
91
  <details>
92
 
93
+ While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~4% and ~48%, lesser tokens comparable SPLADE++ models including SoTA**.
94
  (We will show Quantitative results in the next section.)
95
 
96
  So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (位Q,位D, see above table) will achieve a better MRR.
 
100
 
101
  **Ours**
102
  ```python
103
+ number of actual dimensions: 121
104
  SPLADE BOW rep:
105
+ [('stress', 2.42), ('thermal', 2.31), ('glass', 2.27), ('pan', 1.78), ('heat', 1.66), ('glasses', 1.58), ('crack', 1.42), ('anxiety', 1.36), ('break', 1.31), ('window', 0.91), ('heating', 0.84), ('hot', 0.82), ('adjacent', 0.82), ('hotter', 0.82), ('if', 0.75), ('cause', 0.7), ('caused', 0.7), ('create', 0.7), ('factors', 0.69), ('created', 0.68), ('cracks', 0.67), ('breaks', 0.67), ('area', 0.66), ('##glass', 0.66), ('cracked', 0.63), ('areas', 0.6), ('cracking', 0.59), ('windows', 0.58), ('effect', 0.56), ('causes', 0.56), ('ruin', 0.54), ('severe', 0.54), ('too', 0.53), ('flame', 0.5), ('collapse', 0.49), ('stresses', 0.49), ('or', 0.48), ('physics', 0.47), ('temperature', 0.46), ('get', 0.46), ('heated', 0.45), ('problem', 0.45), ('energy', 0.44), ('hottest', 0.42), ('phenomenon', 0.42), ('sweating', 0.41), ('insulation', 0.39), ('level', 0.39), ('warm', 0.39), ('governed', 0.38), ('formation', 0.37), ('failure', 0.35), ('frank', 0.34), ('cooling', 0.32), ('fracture', 0.31), ('because', 0.31), ('crystal', 0.31), ('determined', 0.31), ('boiler', 0.31), ('mechanical', 0.3), ('shatter', 0.29), ('friction', 0.29), ('levels', 0.29), ('cold', 0.29), ('will', 0.29), ('ceramics', 0.29), ('factor', 0.28), ('crash', 0.28), ('reaction', 0.28), ('fatigue', 0.28), ('hazard', 0.27), ('##e', 0.26), ('anger', 0.26), ('bubble', 0.25), ('process', 0.24), ('cleaning', 0.23), ('surrounding', 0.22), ('theory', 0.22), ('sash', 0.22), ('distraction', 0.21), ('adjoining', 0.19), ('environmental', 0.19), ('ross', 0.18), ('formed', 0.17), ('broken', 0.16), ('affect', 0.16), ('##pan', 0.15), ('graphic', 0.14), ('damage', 0.14), ('bubbles', 0.13), ('windshield', 0.13), ('temporal', 0.13), ('roof', 0.12), ('strain', 0.12), ('clear', 0.09), ('ceramic', 0.08), ('stressed', 0.08), ('##uation', 0.08), ('cool', 0.08), ('expand', 0.07), ('storm', 0.07), ('shock', 0.07), ('psychological', 0.06), ('breaking', 0.06), ('##es', 0.06), ('melting', 0.05), ('burst', 0.05), ('sensing', 0.04), ('heats', 0.04), ('error', 0.03), ('weather', 0.03), ('drink', 0.03), ('fire', 0.03), ('vibration', 0.02), ('induced', 0.02), ('warmer', 0.02), ('leak', 0.02), ('fog', 0.02), ('safety', 0.01), ('surface', 0.01), ('##thermal', 0.0)]
106
  ```
107
 
108
+ **naver/splade-cocondenser-ensembledistil** (SoTA, ~4% more tokens + FLOPS = 1.85)
109
  ```python
110
  number of actual dimensions: 126
111
  SPLADE BOW rep:
112
  [('stress', 2.25), ('glass', 2.23), ('thermal', 2.18), ('glasses', 1.65), ('pan', 1.62), ('heat', 1.56), ('stressed', 1.42), ('crack', 1.31), ('break', 1.12), ('cracked', 1.1), ('hot', 0.93), ('created', 0.9), ('factors', 0.81), ('broken', 0.73), ('caused', 0.71), ('too', 0.71), ('damage', 0.69), ('if', 0.68), ('hotter', 0.65), ('governed', 0.61), ('heating', 0.59), ('temperature', 0.59), ('adjacent', 0.59), ('cause', 0.58), ('effect', 0.57), ('fracture', 0.56), ('bradford', 0.55), ('strain', 0.53), ('hammer', 0.51), ('brian', 0.48), ('error', 0.47), ('windows', 0.45), ('will', 0.45), ('reaction', 0.42), ('create', 0.42), ('windshield', 0.41), ('heated', 0.41), ('factor', 0.4), ('cracking', 0.39), ('failure', 0.38), ('mechanical', 0.38), ('when', 0.38), ('formed', 0.38), ('bolt', 0.38), ('mechanism', 0.37), ('warm', 0.37), ('areas', 0.36), ('area', 0.36), ('energy', 0.34), ('disorder', 0.33), ('barry', 0.33), ('shock', 0.32), ('determined', 0.32), ('gage', 0.32), ('sash', 0.31), ('theory', 0.31), ('level', 0.31), ('resistant', 0.31), ('brake', 0.3), ('window', 0.3), ('crash', 0.3), ('hazard', 0.29), ('##ink', 0.27), ('ceramic', 0.27), ('storm', 0.25), ('problem', 0.25), ('issue', 0.24), ('impact', 0.24), ('fridge', 0.24), ('injury', 0.23), ('ross', 0.22), ('causes', 0.22), ('affect', 0.21), ('pressure', 0.21), ('fatigue', 0.21), ('leak', 0.21), ('eye', 0.2), ('frank', 0.2), ('cool', 0.2), ('might', 0.19), ('gravity', 0.18), ('ray', 0.18), ('static', 0.18), ('collapse', 0.18), ('physics', 0.18), ('wave', 0.18), ('reflection', 0.17), ('parker', 0.17), ('strike', 0.17), ('hottest', 0.17), ('burst', 0.16), ('chance', 0.16), ('burn', 0.14), ('rubbing', 0.14), ('interference', 0.14), ('bailey', 0.13), ('vibration', 0.12), ('gilbert', 0.12), ('produced', 0.12), ('rock', 0.12), ('warmer', 0.11), ('get', 0.11), ('drink', 0.11), ('fireplace', 0.11), ('ruin', 0.1), ('brittle', 0.1), ('fragment', 0.1), ('stumble', 0.09), ('formation', 0.09), ('shatter', 0.08), ('great', 0.08), ('friction', 0.08), ('flash', 0.07), ('cracks', 0.07), ('levels', 0.07), ('smash', 0.04), ('fail', 0.04), ('fra', 0.04), ('##glass', 0.03), ('variables', 0.03), ('because', 0.02), ('knock', 0.02), ('sun', 0.02), ('crush', 0.01), ('##e', 0.01), ('anger', 0.01)]
113
  ```
114
 
115
+ **naver/splade-v2-distil** (~48% more tokens + FLOPS = 3.82)
116
  ```python
117
  number of actual dimensions: 234
118
  SPLADE BOW rep:
 
135
  <details>
136
 
137
  **Note: Why Anserini not PISA?** *Anserini is a production ready lucene based library. Common industry search deployments use Solr or elastic which are lucene based, hence the performance can be comparable. PISA latency is irrelevant for industry as it is a a research only system.*
138
+ The full [anserini evaluation log will be updated soon]() with encoding, indexing and querying details are here.
139
 
140
  - **BEIR ZST OOD performance**: Will be added to the end of page.
141