lightonai
/

GTE-ModernColBERT-v1

@@ -785,7 +785,7 @@ GTE-ModernColBERT has been trained with knowledge distillation on MS MARCO with
 However, as illustrated in the ModernBERT paper, ColBERT models can generalize to documents lengths way beyond their training length and GTE-ModernColBERT actually yields results way above SOTA in long-context embedding benchmarks, see [LongEmbed results](#longembed-benchmark).
 Simply change adapt the document length parameter to your needs when loading the model:
-```
 model = models.ColBERT(
     model_name_or_path=lightonai/GTE-ModernColBERT-v1,
     document_length=8192,
@@ -980,15 +980,15 @@ However, as illustrated in the ModernBERT paper, ColBERT models can generalize t
 | Model                                         | Mean      | LEMBNarrativeQARetrieval | LEMBNeedleRetrieval | LEMBPasskeyRetrieval | LEMBQMSumRetrieval | LEMBSummScreenFDRetrieval | LEMBWikimQARetrieval |
 |-----------------------------------------------|-----------|-------------------------|---------------------|----------------------|---------------------|---------------------------|----------------------|
 | GTE-ModernColBERT (with 32k document length)                        | **88.39**     | **78.82**                   | **92.5**                | 92                   | **72.17**               | 94.98                     | **99.87**                |
-| voyage-multilingual-2                        | 79.17216667| 64.694                  | 75.25               | **97**                   | 51.495              | **99.105**                    | 87.489               |
-| inf-retriever-v1                             | 73.19366667| 60.702                  | 61.5                | 78.75                | 55.072              | 97.387                    | 85.751               |
-| snowflake-arctic-embed-l-v2,0                 | 63.733    | 43.632                  | 50.25               | 77.25                | 40.04               | 96.383                    | 74.843               |
-| gte-multilingual-base                        | 62.11966667| 52.358                  | 42.25               | 55.5                 | 43.033              | 95.499                    | 84.078               |
-| jasper_en_vision_language_v1                  | 60.9325   | 37.928                  | 55                  | 62.25                | 41.186              | 97.206                    | 72.025               |
-| bge-m3                                       | 58.72816667| 45.761                  | 40.25               | 59                   | 35.543              | 94.089                    | 77.726               |
-| jina-embeddings-v3                           | 55.66433333| 34.297                  | 64                  | 38                   | 39.337              | 92.334                    | 66.018               |
-| e5-base-4k                                   | 54.50683333| 30.03                   | 37.75               | 65.25                | 31.268              | 93.868                    | 68.875               |
-| gte-Qwen2-7B-instruct                        | 47.24383333| 45.46                   | 31                  | 38.5                 | 31.272              | 76.08                     | 61.151               |
 ModernBERT itself has only been trained on 8K context length, but it seems that GTE-ModernColBERT can generalize to even bigger context sizes, though it is not guaranteed so please perform your own benches!

 However, as illustrated in the ModernBERT paper, ColBERT models can generalize to documents lengths way beyond their training length and GTE-ModernColBERT actually yields results way above SOTA in long-context embedding benchmarks, see [LongEmbed results](#longembed-benchmark).
 Simply change adapt the document length parameter to your needs when loading the model:
+```python
 model = models.ColBERT(
     model_name_or_path=lightonai/GTE-ModernColBERT-v1,
     document_length=8192,
 | Model                                         | Mean      | LEMBNarrativeQARetrieval | LEMBNeedleRetrieval | LEMBPasskeyRetrieval | LEMBQMSumRetrieval | LEMBSummScreenFDRetrieval | LEMBWikimQARetrieval |
 |-----------------------------------------------|-----------|-------------------------|---------------------|----------------------|---------------------|---------------------------|----------------------|
 | GTE-ModernColBERT (with 32k document length)                        | **88.39**     | **78.82**                   | **92.5**                | 92                   | **72.17**               | 94.98                     | **99.87**                |
+| voyage-multilingual-2                        | 79.17| 64.694                  | 75.25               | **97**                   | 51.495              | **99.105**                    | 87.489               |
+| inf-retriever-v1                             | 73.19 | 60.702                  | 61.5                | 78.75                | 55.072              | 97.387                    | 85.751               |
+| snowflake-arctic-embed-l-v2,0                 | 63.73    | 43.632                  | 50.25               | 77.25                | 40.04               | 96.383                    | 74.843               |
+| gte-multilingual-base                        | 62.12| 52.358                  | 42.25               | 55.5                 | 43.033              | 95.499                    | 84.078               |
+| jasper_en_vision_language_v1                  | 60.93   | 37.928                  | 55                  | 62.25                | 41.186              | 97.206                    | 72.025               |
+| bge-m3                                       | 58.73 | 45.761                  | 40.25               | 59                   | 35.543              | 94.089                    | 77.726               |
+| jina-embeddings-v3                           | 55.66| 34.297                  | 64                  | 38                   | 39.337              | 92.334                    | 66.018               |
+| e5-base-4k                                   | 54.51| 30.03                   | 37.75               | 65.25                | 31.268              | 93.868                    | 68.875               |
+| gte-Qwen2-7B-instruct                        | 47.24| 45.46                   | 31                  | 38.5                 | 31.272              | 76.08                     | 61.151               |
 ModernBERT itself has only been trained on 8K context length, but it seems that GTE-ModernColBERT can generalize to even bigger context sizes, though it is not guaranteed so please perform your own benches!