Post
135
This summer TRL leveled up for multimodal alignment π
β New VLM alignment methods (MPO, GRPO, GSPO)
β Extended RLOO & Online DPO for VLMs
β Native SFT support
β Ready-to-use training scripts
π https://huggingface.co/blog/trl-vlm-alignment
β New VLM alignment methods (MPO, GRPO, GSPO)
β Extended RLOO & Online DPO for VLMs
β Native SFT support
β Ready-to-use training scripts
π https://huggingface.co/blog/trl-vlm-alignment