--- license: mit library_name: transformers pipeline_tag: text-generation --- The base Qwen2.5-Math-1.5B model used by HAPO. We change to rope_theta from 10000 to 40000 and extend the context window to 16k. Also, we modify the chat_template for the system prompt and add . # Citation If you find our model, data, or evaluation code useful, please kindly cite our paper: ```bib @misc{liu2025uniformheterogeneoustailoringpolicy, title={From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature}, author={Zheng Liu and Mengjie Liu and Siwei Wen and Mengzhang Cai and Bin Cui and Conghui He and Wentao Zhang}, year={2025}, eprint={2509.16591}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.16591}, } ```