DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Abstract
Diffusion Negative-aware FineTuning (DiffusionNFT) optimizes diffusion models directly on the forward process via flow matching, improving efficiency and performance compared to existing methods.
Online reinforcement learning (RL) has been central to post-training language models, but its extension to diffusion models remains challenging due to intractable likelihoods. Recent works discretize the reverse sampling process to enable GRPO-style training, yet they inherit fundamental drawbacks, including solver restrictions, forward-reverse inconsistency, and complicated integration with classifier-free guidance (CFG). We introduce Diffusion Negative-aware FineTuning (DiffusionNFT), a new online RL paradigm that optimizes diffusion models directly on the forward process via flow matching. DiffusionNFT contrasts positive and negative generations to define an implicit policy improvement direction, naturally incorporating reinforcement signals into the supervised learning objective. This formulation enables training with arbitrary black-box solvers, eliminates the need for likelihood estimation, and requires only clean images rather than sampling trajectories for policy optimization. DiffusionNFT is up to 25times more efficient than FlowGRPO in head-to-head comparisons, while being CFG-free. For instance, DiffusionNFT improves the GenEval score from 0.24 to 0.98 within 1k steps, while FlowGRPO achieves 0.95 with over 5k steps and additional CFG employment. By leveraging multiple reward models, DiffusionNFT significantly boosts the performance of SD3.5-Medium in every benchmark tested.
Community
A new online diffusion RL methods based on forward process
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching (2025)
- Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation (2025)
- Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance (2025)
- One-Step Flow Policy Mirror Descent (2025)
- Flow Matching Policy Gradients (2025)
- Inpainting-Guided Policy Optimization for Diffusion Large Language Models (2025)
- Pretrained Diffusion Models Are Inherently Skipped-Step Samplers (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper