Text-to-Video

Wan2.2-Fun-Reward-LoRAs

Introduction

We explore the Reward Backpropagation technique 1 2 to optimized the generated videos by Wan2.2-Fun for better alignment with human preferences. We provide the following pre-trained models (i.e. LoRAs) along with the training script. You can use these LoRAs to enhance the corresponding base model as a plug-in or train your own reward LoRA.

For more details, please refer to our GitHub repo.

Name Base Model Reward Model Hugging Face Description
Wan2.2-Fun-A14B-InP-high-noise-HPS2.1.safetensors Wan2.2-Fun-A14B-InP (high noise) HPS v2.1 🤗Link Official HPS v2.1 reward LoRA (rank=128 and network_alpha=64) for Wan2.2-Fun-A14B-InP (high noise). It is trained with a batch size of 8 for 5,000 steps.
Wan2.2-Fun-A14B-InP-low-noise-HPS2.1.safetensors Wan2.2-Fun-A14B-InP (low noise) MPS 🤗Link Official HPS v2.1 reward LoRA (rank=128 and network_alpha=64) for Wan2.2-Fun-A14B-InP (low noise). It is trained with a batch size of 8 for 2,700 steps.
Wan2.2-Fun-A14B-InP-high-noise-MPS.safetensors Wan2.2-Fun-A14B-InP (high noise) HPS v2.1 🤗Link Official MPS reward LoRA (rank=128 and network_alpha=64) for Wan2.2-Fun-A14B-InP (high noise). It is trained with a batch size of 8 for 5,000 steps.
Wan2.2-Fun-A14B-InP-low-noise-MPS.safetensors Wan2.2-Fun-A14B-InP (low noise) MPS 🤗Link Official MPS reward LoRA (rank=128 and network_alpha=64) for Wan2.2-Fun-A14B-InP (low noise). It is trained with a batch size of 8 for 4,500 steps.

We found that, MPS reward LoRA for the low-noise model converges significantly more slowly than on the other models, and may not deliver satisfactory results. Therefore, for the low-noise model, we recommend using HPSv2.1 reward LoRA.

Demo

Wan2.2-Fun-A14B-InP

Prompt Wan2.2-Fun-A14B-InP Wan2.2-Fun-A14B-InP
high + low HPSv2.1 Reward LoRA
Wan2.2-Fun-A14B-InP
high MPS + low HPSv2.1 Reward LoRA
A panda eats bamboo while a monkey swings from branch to branch
Expanded

In a lush green forest, a panda sits comfortably against a tree, leisurely munching on bamboo stalks. Nearby, a lively monkey swings energetically from branch to branch, its tail curling around the limbs. Sunlight filters through the canopy, casting dappled shadows on the forest floor.

A dog runs through a field while a cat climbs a tree
Expanded

In a sunlit, expansive green field surrounded by tall trees, a playful golden retriever sprints energetically across the grass, its fur gleaming in the afternoon sun. Nearby, a nimble tabby cat gracefully climbs a sturdy tree, its claws gripping the bark effortlessly. The sky is clear blue with occasional birds flying.

A penguin waddles on the ice, a camel treks by
Expanded

A small penguin waddles slowly across a vast, icy surface under a clear blue sky. The penguin's short, flipper-like wings sway at its sides as it moves. Nearby, a camel treks steadily, its long legs navigating the snowy terrain with ease. The camel's fur is thick, providing warmth in the cold environment.

Pig with wings flying above a diamond mountain
Expanded

A whimsical pig, complete with delicate feathered wings, soars gracefully above a shimmering diamond mountain. The pig's pink skin glistens in the sunlight as it flaps its wings. The mountain below sparkles with countless facets, reflecting brilliant rays of light into the clear blue sky.

The above test prompts are from T2V-CompBench and expanded into detailed prompts by Llama-3.3. Videos are generated with HPSv2.1 Reward LoRA weight 0.5 and MPS Reward LoRA weight 0.5.

Quick Start

Set lora_path along with lora_weight for the low noise reward LoRA, while specifying lora_high_path and lora_high_weight for high noise reward LoRA in examples/wan2.2_fun/predict_t2v.py.

Training

Please refer to README_TRAIN_REWARD.md

Limitations

  1. We observe after training to a certain extent, the reward continues to increase, but the quality of the generated videos does not further improve. The model trickly learns some shortcuts (by adding artifacts in the background, i.e., adversarial patches) to increase the reward.
  2. Currently, there is still a lack of suitable preference models for video generation. Directly using image preference models cannot evaluate preferences along the temporal dimension (such as dynamism and consistency). Further more, We find using image preference models leads to a decrease in the dynamism of generated videos. Although this can be mitigated by computing the reward using only the first frame of the decoded video, the impact still persists.

Reference

  1. Clark, Kevin, et al. "Directly fine-tuning diffusion models on differentiable rewards.". In ICLR 2024.
  2. Prabhudesai, Mihir, et al. "Aligning text-to-image diffusion models with reward backpropagation." arXiv preprint arXiv:2310.03739 (2023).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alibaba-pai/Wan2.2-Fun-Reward-LoRAs

Finetuned
(21)
this model

Collection including alibaba-pai/Wan2.2-Fun-Reward-LoRAs