
James-WYang/ICR_M0_Llama-3-Base-8B-SFT-DPO_en_es_ru_de_fr
8B
•
Updated
•
1
All checkpoints for "Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment", https://arxiv.org/abs/2503.04647