NohTow commited on
Commit
2f48436
·
verified ·
1 Parent(s): 7936b9a

Create README.md

Browse files

Rounding of average + Python layout

Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -785,7 +785,7 @@ GTE-ModernColBERT has been trained with knowledge distillation on MS MARCO with
785
  However, as illustrated in the ModernBERT paper, ColBERT models can generalize to documents lengths way beyond their training length and GTE-ModernColBERT actually yields results way above SOTA in long-context embedding benchmarks, see [LongEmbed results](#longembed-benchmark).
786
 
787
  Simply change adapt the document length parameter to your needs when loading the model:
788
- ```
789
  model = models.ColBERT(
790
  model_name_or_path=lightonai/GTE-ModernColBERT-v1,
791
  document_length=8192,
@@ -980,15 +980,15 @@ However, as illustrated in the ModernBERT paper, ColBERT models can generalize t
980
  | Model | Mean | LEMBNarrativeQARetrieval | LEMBNeedleRetrieval | LEMBPasskeyRetrieval | LEMBQMSumRetrieval | LEMBSummScreenFDRetrieval | LEMBWikimQARetrieval |
981
  |-----------------------------------------------|-----------|-------------------------|---------------------|----------------------|---------------------|---------------------------|----------------------|
982
  | GTE-ModernColBERT (with 32k document length) | **88.39** | **78.82** | **92.5** | 92 | **72.17** | 94.98 | **99.87** |
983
- | voyage-multilingual-2 | 79.17216667| 64.694 | 75.25 | **97** | 51.495 | **99.105** | 87.489 |
984
- | inf-retriever-v1 | 73.19366667| 60.702 | 61.5 | 78.75 | 55.072 | 97.387 | 85.751 |
985
- | snowflake-arctic-embed-l-v2,0 | 63.733 | 43.632 | 50.25 | 77.25 | 40.04 | 96.383 | 74.843 |
986
- | gte-multilingual-base | 62.11966667| 52.358 | 42.25 | 55.5 | 43.033 | 95.499 | 84.078 |
987
- | jasper_en_vision_language_v1 | 60.9325 | 37.928 | 55 | 62.25 | 41.186 | 97.206 | 72.025 |
988
- | bge-m3 | 58.72816667| 45.761 | 40.25 | 59 | 35.543 | 94.089 | 77.726 |
989
- | jina-embeddings-v3 | 55.66433333| 34.297 | 64 | 38 | 39.337 | 92.334 | 66.018 |
990
- | e5-base-4k | 54.50683333| 30.03 | 37.75 | 65.25 | 31.268 | 93.868 | 68.875 |
991
- | gte-Qwen2-7B-instruct | 47.24383333| 45.46 | 31 | 38.5 | 31.272 | 76.08 | 61.151 |
992
 
993
 
994
  ModernBERT itself has only been trained on 8K context length, but it seems that GTE-ModernColBERT can generalize to even bigger context sizes, though it is not guaranteed so please perform your own benches!
 
785
  However, as illustrated in the ModernBERT paper, ColBERT models can generalize to documents lengths way beyond their training length and GTE-ModernColBERT actually yields results way above SOTA in long-context embedding benchmarks, see [LongEmbed results](#longembed-benchmark).
786
 
787
  Simply change adapt the document length parameter to your needs when loading the model:
788
+ ```python
789
  model = models.ColBERT(
790
  model_name_or_path=lightonai/GTE-ModernColBERT-v1,
791
  document_length=8192,
 
980
  | Model | Mean | LEMBNarrativeQARetrieval | LEMBNeedleRetrieval | LEMBPasskeyRetrieval | LEMBQMSumRetrieval | LEMBSummScreenFDRetrieval | LEMBWikimQARetrieval |
981
  |-----------------------------------------------|-----------|-------------------------|---------------------|----------------------|---------------------|---------------------------|----------------------|
982
  | GTE-ModernColBERT (with 32k document length) | **88.39** | **78.82** | **92.5** | 92 | **72.17** | 94.98 | **99.87** |
983
+ | voyage-multilingual-2 | 79.17| 64.694 | 75.25 | **97** | 51.495 | **99.105** | 87.489 |
984
+ | inf-retriever-v1 | 73.19 | 60.702 | 61.5 | 78.75 | 55.072 | 97.387 | 85.751 |
985
+ | snowflake-arctic-embed-l-v2,0 | 63.73 | 43.632 | 50.25 | 77.25 | 40.04 | 96.383 | 74.843 |
986
+ | gte-multilingual-base | 62.12| 52.358 | 42.25 | 55.5 | 43.033 | 95.499 | 84.078 |
987
+ | jasper_en_vision_language_v1 | 60.93 | 37.928 | 55 | 62.25 | 41.186 | 97.206 | 72.025 |
988
+ | bge-m3 | 58.73 | 45.761 | 40.25 | 59 | 35.543 | 94.089 | 77.726 |
989
+ | jina-embeddings-v3 | 55.66| 34.297 | 64 | 38 | 39.337 | 92.334 | 66.018 |
990
+ | e5-base-4k | 54.51| 30.03 | 37.75 | 65.25 | 31.268 | 93.868 | 68.875 |
991
+ | gte-Qwen2-7B-instruct | 47.24| 45.46 | 31 | 38.5 | 31.272 | 76.08 | 61.151 |
992
 
993
 
994
  ModernBERT itself has only been trained on 8K context length, but it seems that GTE-ModernColBERT can generalize to even bigger context sizes, though it is not guaranteed so please perform your own benches!