hazyresearch
/

Weaver_Distilled_All_Datasets_gte-Qwen2-1.5B-instruct

Text Classification

Model card Files Files and versions

Weaver_Distilled_All_Datasets_gte-Qwen2-1.5B-instruct / README.md

jonsaadfalcon's picture

Update README.md

1d9f36e verified about 2 months ago

|

history blame contribute delete

3.07 kB

	---
	license: mit
	pipeline_tag: text-classification
	library_name: transformers
	base_model: Alibaba-NLP/gte-Qwen2-1.5B-instruct
	tags:
	- math
	- science
	- academic
	- reasoning
	- verification
	- weaver
	- cross-encoder
	- multi-domain
	language:
	- en
	---

	# Weaver Distilled for All Datasets (gte-Qwen2-1.5B-instruct)

	A general-purpose distilled cross-encoder model based on gte-Qwen2-1.5B-instruct, trained to predict the correctness of reasoning responses across multiple domains: mathematics (MATH500), science (GPQA), and academic knowledge (MMLU-Pro). This specialized verifier was trained on Weaver scores aggregated over 35 different verifiers and reward models.

	## Model Details

	- Base Model: [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) (1.5B parameters)
	- Architecture: Cross-encoder with MLP head (1536 → 768 → 384 → 1)
	- Max Sequence Length: 4096 tokens
	- Training Data: Combined MATH500, GPQA, and MMLU-Pro with Weaver scores from 35 LM judges and reward models
	- Task: Binary classification for answer correctness prediction across domains

	## Quick Start

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_name = "hazyresearch/Weaver_Distilled_All_Datasets_gte-Qwen2-1.5B-instruct"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Example usage - works across math, science, and academic domains
	instruction = "What is the derivative of f(x) = 3x² + 2x - 1?"
	response = "Using the power rule: f'(x) = 6x + 2. The derivative of 3x² is 6x, the derivative of 2x is 2, and the derivative of -1 is 0."

	# Tokenize input pair
	inputs = tokenizer(
	instruction,
	response,
	truncation=True,
	max_length=4096,
	padding=True,
	return_tensors="pt"
	)

	# Get correctness score
	with torch.no_grad():
	outputs = model(**inputs)
	score = torch.sigmoid(outputs.logits).item()

	print(f"Correctness score: {score:.3f}")
	print(f"Prediction: {'Correct' if score > 0.5 else 'Incorrect'}")
	```

	## Training Details

	This model was trained using the [Weaver distillation pipeline](https://github.com/HazyResearch/scaling-verification) on a combined dataset spanning multiple reasoning domains. For training your own distilled models, see the [distillation README](https://github.com/HazyResearch/scaling-verification/blob/main/distillation/README.md).

	## Citation

	```bibtex
	@misc{saadfalcon2025shrinkinggenerationverificationgapweak,
	title={Shrinking the Generation-Verification Gap with Weak Verifiers},
	author={Jon Saad-Falcon and E. Kelly Buchanan and Mayee F. Chen and Tzu-Heng Huang and Brendan McLaughlin and Tanvir Bhathal and Shang Zhu and Ben Athiwaratkun and Frederic Sala and Scott Linderman and Azalia Mirhoseini and Christopher Ré},
	year={2025},
	eprint={2506.18203},
	archivePrefix={arXiv},
	primaryClass={cs.CR},
	url={https://arxiv.org/abs/2506.18203},
	}
	```