--- pipeline_tag: image-text-to-text library_name: transformers license: apache-2.0 --- [![arXiv](https://img.shields.io/badge/arXiv-2509.21268-b31b1b.svg)](https://arxiv.org/abs/2509.21268) [![Hugging Face Paper](https://img.shields.io/badge/HuggingFace-Paper-FFAE1A)](https://huggingface.co/papers/2509.21268) [![GitHub Code](https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github)](https://github.com/LengSicong/MMR1) # MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources This repository introduces the **MMR1** family of multimodal reasoning models, presented in the paper "[MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources](https://huggingface.co/papers/2509.21268)". MMR1 addresses critical limitations in the advancement of large multimodal reasoning models, specifically the absence of open, large-scale, high-quality long chain-of-thought (CoT) data, and the instability of reinforcement learning (RL) algorithms during post-training. ## Key Contributions - **Variance-Aware Sampling (VAS)**: A novel data selection strategy guided by a Variance Promotion Score (VPS). VAS combines outcome variance and trajectory diversity to promote reward variance, stabilize policy optimization, and improve convergence, especially in scenarios where Group Relative Policy Optimization (GRPO) is prone to gradient vanishing. - **Large-scale Curated Resources**: The project releases carefully curated resources, including ~1.6M long CoT cold-start data and ~15k RL QA pairs, designed to ensure quality, difficulty, and diversity. - **Open-source Codebase & Models**: A fully reproducible end-to-end training codebase and a family of open-source multimodal reasoning models across multiple scales (3B, 7B, 32B), establishing standardized baselines for the community. ## Methodology Overview MMR1 introduces **Variance-Aware Sampling (VAS)** to mitigate the *gradient vanishing problem* in reinforcement learning fine-tuning with GRPO. The framework balances exploration and coverage by combining a random sampler with a weighted sampler guided by the Variance Promotion Score (VPS). This ensures that training focuses on prompts providing strong learning signals, with VPS scores periodically re-estimated for dynamic adaptation. ## Open Resources The project open-sources the following resources for the community: - **[MMR1-SFT Dataset](https://huggingface.co/datasets/MMR1/MMR1-SFT)** (~1.6M): Supervised fine-tuning dataset with long Chain-of-Thought (CoT) cold-start trajectories. - **[MMR1-RL Dataset](https://huggingface.co/datasets/MMR1/MMR1-RL)** (15k): Reinforcement learning dataset with question-answer pairs. - **[MMR1-3B-SFT](https://huggingface.co/MMR1/MMR1-3B-SFT)**, **[MMR1-7B-SFT](https://huggingface.co/MMR1/MMR1-7B-SFT)**, **[MMR1-32B-SFT](https://huggingface.co/MMR1/MMR1-32B-SFT)**: Checkpoints trained with MMR1-SFT. - **[MMR1-3B-RL](https://huggingface.co/MMR1/MMR1-3B-RL)**, **[MMR1-7B-RL](https://huggingface.co/MMR1/MMR1-7B-RL)**, **[MMR1-32B-RL](https://huggingface.co/MMR1/MMR1-32B-RL)**: Checkpoints trained with MMR1-SFT and MMR1-RL. These resources cover diverse domains, including mathematics, science, charts/figures, document tables, and general understanding, integrating existing public resources with newly curated data. ## Evaluation Results MMR1 models have been evaluated on a suite of **mathematics-related multimodal reasoning benchmarks** (MathVerse, MathVista, MathVision, LogicVista, and ChartQA).

Evaluation Results

- **MMR1-7B-RL** achieves an average score of **58.4**, setting a new state-of-the-art among 7B-scale reasoning models. - **MMR1-3B-RL** demonstrates competitive performance with **52.7**, highlighting strong reasoning capabilities even at a smaller scale. The results demonstrate the effectiveness of Variance-Aware Sampling (VAS) and the curated long CoT training data. For detailed instructions on installation, training, and further evaluation, please refer to the [GitHub repository](https://github.com/LengSicong/MMR1). ## Citation If you find MMR1 useful for your research and applications, please cite using this BibTeX: ```bibtex @misc{leng2025mmr1, title={MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources}, author={Sicong Leng and Jing Wang and Jiaxi Li and Hao Zhang and Zhiqiang Hu and Boqiang Zhang and Yuming Jiang and Hang Zhang and Xin Li and Lidong Bing and Deli Zhao and Wei Lu and Yu Rong and Aixin Sun and Shijian Lu}, year={2025}, eprint={2509.21268}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.21268}, } ``` ## License This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for **non-commercial use ONLY**, subject to the model Licenses of Qwen, Terms of Use of the data generated by OpenAI and Gemini, and Privacy Practices of ShareGPT. Please get in touch with us if you find any potential violations.