minishlab
/

potion-8m-edu-classifier

static-embeddings

Model card Files Files and versions Community

potion-8m-edu-classifier / README.md

stephantulkens's picture

Update README.md

146ab7d verified 4 months ago

|

2.65 kB

	---
	library_name: model2vec
	license: mit
	model_name: tmpqsu1ee6a
	tags:
	- embeddings
	- static-embeddings
	datasets:
	- HuggingFaceFW/fineweb-edu-llama3-annotations
	language:
	- en
	base_model:
	- minishlab/potion-base-8M
	---

	# potion-8m-edu-classifier Model Card

	This [Model2Vec](https://github.com/MinishLab/model2vec) model is a fine-tuned version of [potion-base-8m](https://huggingface.co/minishlab/potion-base-8M).
	It was trained to predict educational content, analogous to how the [fineweb-edu-classifier](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier) was used to filter educational content.

	It achieves the following performance on the evaluation split:

	```
	precision recall f1-score support

	0 0.70 0.42 0.52 5694
	1 0.75 0.86 0.80 26512
	2 0.55 0.51 0.53 10322
	3 0.54 0.45 0.49 3407
	4 0.59 0.30 0.40 807
	5 0.00 0.00 0.00 1

	accuracy 0.69 46743
	macro avg 0.52 0.42 0.46 46743
	weighted avg 0.68 0.69 0.68 46743
	```

	When thresholded to a binary classifier, it achieves a macro-averaged F1-score of `0.79`. The original classifier achieves `0.81` on the same dataset, but this classifier is orders of magnitude faster on CPU.

	```
	precision recall f1-score support

	not edu 0.96 0.98 0.97 42528
	edu 0.70 0.54 0.61 4215

	accuracy 0.94 46743
	macro avg 0.83 0.76 0.79 46743
	weighted avg 0.93 0.94 0.93 46743
	```

	## Installation

	Install model2vec with the inference extra using pip:
	```
	pip install model2vec[inference]
	```

	## Usage
	Load this model using the `from_pretrained` method:
	```python
	from model2vec.inference import StaticModelPipeline

	# Load a pretrained Model2Vec model
	model = StaticModelPipeline.from_pretrained("minishlab/potion-8m-edu-classifier")

	# Predict labels
	label = model.predict(["Example sentence"])
	```

	## Library Authors

	Model2Vec was developed by [Minish](https://github.com/MinishLab).

	## Citation

	Please cite the [Model2Vec repository](https://github.com/MinishLab/model2vec) if you use this model in your work.
	```
	@software{minishlab2024model2vec,
	authors = {Stephan Tulkens, Thomas van Dongen},
	title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model},
	year = {2024},
	url = {https://github.com/MinishLab/model2vec},
	}
	```