Model2Vec
Safetensors
English
embeddings
static-embeddings
File size: 2,647 Bytes
f3c7a40
 
 
 
 
 
 
146ab7d
 
 
 
 
 
f3c7a40
 
146ab7d
f3c7a40
146ab7d
 
f3c7a40
146ab7d
f3c7a40
 
146ab7d
 
 
 
 
 
 
 
 
 
 
 
f3c7a40
 
146ab7d
f3c7a40
 
146ab7d
f3c7a40
146ab7d
 
f3c7a40
146ab7d
 
 
f3c7a40
 
146ab7d
f3c7a40
146ab7d
 
 
 
f3c7a40
146ab7d
 
 
 
f3c7a40
146ab7d
 
f3c7a40
146ab7d
 
 
f3c7a40
 
 
146ab7d
f3c7a40
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
library_name: model2vec
license: mit
model_name: tmpqsu1ee6a
tags:
- embeddings
- static-embeddings
datasets:
- HuggingFaceFW/fineweb-edu-llama3-annotations
language:
- en
base_model:
- minishlab/potion-base-8M
---

# potion-8m-edu-classifier Model Card

This [Model2Vec](https://github.com/MinishLab/model2vec) model is a fine-tuned version of [potion-base-8m](https://huggingface.co/minishlab/potion-base-8M).
It was trained to predict educational content, analogous to how the [fineweb-edu-classifier](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier) was used to filter educational content.

It achieves the following performance on the evaluation split:

```
              precision    recall  f1-score   support

           0       0.70      0.42      0.52      5694
           1       0.75      0.86      0.80     26512
           2       0.55      0.51      0.53     10322
           3       0.54      0.45      0.49      3407
           4       0.59      0.30      0.40       807
           5       0.00      0.00      0.00         1

    accuracy                           0.69     46743
   macro avg       0.52      0.42      0.46     46743
weighted avg       0.68      0.69      0.68     46743
```

When thresholded to a binary classifier, it achieves a macro-averaged F1-score of `0.79`. The original classifier achieves `0.81` on the same dataset, but this classifier is orders of magnitude faster on CPU.

```
              precision    recall  f1-score   support

     not edu       0.96      0.98      0.97     42528
         edu       0.70      0.54      0.61      4215

    accuracy                           0.94     46743
   macro avg       0.83      0.76      0.79     46743
weighted avg       0.93      0.94      0.93     46743
```

## Installation

Install model2vec with the inference extra using pip:
```
pip install model2vec[inference]
```

## Usage
Load this model using the `from_pretrained` method:
```python
from model2vec.inference import StaticModelPipeline

# Load a pretrained Model2Vec model
model = StaticModelPipeline.from_pretrained("minishlab/potion-8m-edu-classifier")

# Predict labels
label = model.predict(["Example sentence"])
```

## Library Authors

Model2Vec was developed by [Minish](https://github.com/MinishLab).

## Citation

Please cite the [Model2Vec repository](https://github.com/MinishLab/model2vec) if you use this model in your work.
```
@software{minishlab2024model2vec,
  authors = {Stephan Tulkens, Thomas van Dongen},
  title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec},
}
```