Post
66
Extended the ICM paper to show cross-model capability transfer - used Qwen3's mathematical reasoning to improve Gemma3 without any human supervision.
Key results:
Qwen3-0.6B: 63.2 → 66.0 on MATH-500 (+4%)
Gemma3-1B: 41.0 → 45.6 on MATH-500 (+11%)
The method extracts coherent reasoning patterns from one model via Internal Coherence Maximization, converts them to DPO training data, and uses that to improve a completely different model architecture.
This goes beyond the original ICM paper which only improved models using their own labels. We're showing you can transfer capabilities between any models - imagine extracting capabilities from strong models to improve your local ones.
Models available:
codelion/Qwen3-0.6B-ICM-DPO
codelion/gemma-3-1b-it-ICM-DPO
Complete collection with code and datasets:
codelion/internal-coherence-maximization-687a1bd1c1f5f1d6f76e9b3b
Full methodology and results:
https://huggingface.co/blog/codelion/internal-coherence-maximization
Planning to extend this to code generation next. The approach could enable community-driven capability sharing between different model families without expensive annotation.
Key results:
Qwen3-0.6B: 63.2 → 66.0 on MATH-500 (+4%)
Gemma3-1B: 41.0 → 45.6 on MATH-500 (+11%)
The method extracts coherent reasoning patterns from one model via Internal Coherence Maximization, converts them to DPO training data, and uses that to improve a completely different model architecture.
This goes beyond the original ICM paper which only improved models using their own labels. We're showing you can transfer capabilities between any models - imagine extracting capabilities from strong models to improve your local ones.
Models available:
codelion/Qwen3-0.6B-ICM-DPO
codelion/gemma-3-1b-it-ICM-DPO
Complete collection with code and datasets:
codelion/internal-coherence-maximization-687a1bd1c1f5f1d6f76e9b3b
Full methodology and results:
https://huggingface.co/blog/codelion/internal-coherence-maximization
Planning to extend this to code generation next. The approach could enable community-driven capability sharing between different model families without expensive annotation.