Model2Vec
Safetensors
English
embeddings
static-embeddings
stephantulkens's picture
Update README.md
146ab7d verified
|
raw
history blame
2.65 kB
---
library_name: model2vec
license: mit
model_name: tmpqsu1ee6a
tags:
- embeddings
- static-embeddings
datasets:
- HuggingFaceFW/fineweb-edu-llama3-annotations
language:
- en
base_model:
- minishlab/potion-base-8M
---
# potion-8m-edu-classifier Model Card
This [Model2Vec](https://github.com/MinishLab/model2vec) model is a fine-tuned version of [potion-base-8m](https://huggingface.co/minishlab/potion-base-8M).
It was trained to predict educational content, analogous to how the [fineweb-edu-classifier](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier) was used to filter educational content.
It achieves the following performance on the evaluation split:
```
precision recall f1-score support
0 0.70 0.42 0.52 5694
1 0.75 0.86 0.80 26512
2 0.55 0.51 0.53 10322
3 0.54 0.45 0.49 3407
4 0.59 0.30 0.40 807
5 0.00 0.00 0.00 1
accuracy 0.69 46743
macro avg 0.52 0.42 0.46 46743
weighted avg 0.68 0.69 0.68 46743
```
When thresholded to a binary classifier, it achieves a macro-averaged F1-score of `0.79`. The original classifier achieves `0.81` on the same dataset, but this classifier is orders of magnitude faster on CPU.
```
precision recall f1-score support
not edu 0.96 0.98 0.97 42528
edu 0.70 0.54 0.61 4215
accuracy 0.94 46743
macro avg 0.83 0.76 0.79 46743
weighted avg 0.93 0.94 0.93 46743
```
## Installation
Install model2vec with the inference extra using pip:
```
pip install model2vec[inference]
```
## Usage
Load this model using the `from_pretrained` method:
```python
from model2vec.inference import StaticModelPipeline
# Load a pretrained Model2Vec model
model = StaticModelPipeline.from_pretrained("minishlab/potion-8m-edu-classifier")
# Predict labels
label = model.predict(["Example sentence"])
```
## Library Authors
Model2Vec was developed by [Minish](https://github.com/MinishLab).
## Citation
Please cite the [Model2Vec repository](https://github.com/MinishLab/model2vec) if you use this model in your work.
```
@software{minishlab2024model2vec,
authors = {Stephan Tulkens, Thomas van Dongen},
title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model},
year = {2024},
url = {https://github.com/MinishLab/model2vec},
}
```