Spaces:
Running
on
T4
Running
on
T4
DeepBeepMeep
commited on
Commit
·
60fd3bd
1
Parent(s):
7e8fb61
Added support for CausVid Lora and MoviiGen
Browse files- README.md +23 -2
- examples/i2v_input.JPG +0 -3
- i2v_inference.py +2 -14
- requirements.txt +1 -1
- tests/README.md +0 -6
- tests/test.sh +0 -113
- wan/text2video.py +25 -16
- wgp.py +16 -6
README.md
CHANGED
@@ -21,11 +21,15 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
|
|
21 |
|
22 |
|
23 |
## 🔥 Latest News!!
|
24 |
-
* May
|
|
|
|
|
|
|
|
|
25 |
* May 17 2025: 👋 Wan 2.1GP v5.0 : One App to Rule Them All !\
|
26 |
Added support for the other great open source architectures:
|
27 |
- Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
|
28 |
-
- LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced VRAM requirements by 4 !
|
29 |
|
30 |
Also:
|
31 |
- Added supported for the best Control Video Model, released 2 days ago : Vace 14B
|
@@ -268,6 +272,23 @@ python wgp.py --lora-preset mylorapreset.lset # where 'mylorapreset.lset' is a
|
|
268 |
|
269 |
You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
|
270 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
271 |
### Macros (basic)
|
272 |
In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
|
273 |
```
|
|
|
21 |
|
22 |
|
23 |
## 🔥 Latest News!!
|
24 |
+
* May 20 2025: 👋 Wan 2.1GP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
|
25 |
+
The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
|
26 |
+
See instructions below on how to use CausVid.\
|
27 |
+
Also as an experiment I have added support for the MoviiGen, the first model that claims to capable to generate 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server.
|
28 |
+
* May 18 2025: 👋 Wan 2.1GP v5.1 : Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos !
|
29 |
* May 17 2025: 👋 Wan 2.1GP v5.0 : One App to Rule Them All !\
|
30 |
Added support for the other great open source architectures:
|
31 |
- Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
|
32 |
+
- LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced LTX Video VRAM requirements by 4 !
|
33 |
|
34 |
Also:
|
35 |
- Added supported for the best Control Video Model, released 2 days ago : Vace 14B
|
|
|
272 |
|
273 |
You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
|
274 |
|
275 |
+
### CausVid Lora
|
276 |
+
|
277 |
+
Wan CausVid is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. Also as a distilled model it doesnt require CFG and is two times faster for the same number of steps.
|
278 |
+
The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B to accelerate other models too. It is possible it works also with Wan i2v models.
|
279 |
+
|
280 |
+
Instructions:
|
281 |
+
1) Download first the Lora: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
|
282 |
+
2) Choose a Wan t2v model (for instance Wan 2.1 text2video 13B or Vace 13B )
|
283 |
+
3) Turn on the Advanced Mode by checking the corresponding checkbox
|
284 |
+
4) In the Advanced Generation Tab: select Guidance Scale =1, Shift Scale = 7
|
285 |
+
5) In the Advanced Lora Tab : Select the CausVid Lora (click the Refresh button at the top if you dont see it), and enter 0.3 as Lora multiplier
|
286 |
+
6) Now select a 12 steps generation and Click Generate
|
287 |
+
|
288 |
+
You can reduce the number of steps to as low as 4 but you will need to increase progressively at the same time the Lora muliplier up to 1. Please note the lower the number of steps the lower the quality (especially the motion).
|
289 |
+
|
290 |
+
You can combine the CausVid Lora and other Loras (just follow the instructions above)
|
291 |
+
|
292 |
### Macros (basic)
|
293 |
In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
|
294 |
```
|
examples/i2v_input.JPG
DELETED
Git LFS Details
|
i2v_inference.py
CHANGED
@@ -105,12 +105,6 @@ def load_i2v_model(model_filename, text_encoder_filename, is_720p):
|
|
105 |
wan_model = wan.WanI2V(
|
106 |
config=cfg,
|
107 |
checkpoint_dir=DATA_DIR,
|
108 |
-
device_id=0,
|
109 |
-
rank=0,
|
110 |
-
t5_fsdp=False,
|
111 |
-
dit_fsdp=False,
|
112 |
-
use_usp=False,
|
113 |
-
i2v720p=True,
|
114 |
model_filename=model_filename,
|
115 |
text_encoder_filename=text_encoder_filename
|
116 |
)
|
@@ -120,12 +114,6 @@ def load_i2v_model(model_filename, text_encoder_filename, is_720p):
|
|
120 |
wan_model = wan.WanI2V(
|
121 |
config=cfg,
|
122 |
checkpoint_dir=DATA_DIR,
|
123 |
-
device_id=0,
|
124 |
-
rank=0,
|
125 |
-
t5_fsdp=False,
|
126 |
-
dit_fsdp=False,
|
127 |
-
use_usp=False,
|
128 |
-
i2v720p=False,
|
129 |
model_filename=model_filename,
|
130 |
text_encoder_filename=text_encoder_filename
|
131 |
)
|
@@ -624,8 +612,8 @@ def main():
|
|
624 |
# Actually run the i2v generation
|
625 |
try:
|
626 |
sample_frames = wan_model.generate(
|
627 |
-
user_prompt,
|
628 |
-
input_img,
|
629 |
frame_num=frame_count,
|
630 |
width=width,
|
631 |
height=height,
|
|
|
105 |
wan_model = wan.WanI2V(
|
106 |
config=cfg,
|
107 |
checkpoint_dir=DATA_DIR,
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
model_filename=model_filename,
|
109 |
text_encoder_filename=text_encoder_filename
|
110 |
)
|
|
|
114 |
wan_model = wan.WanI2V(
|
115 |
config=cfg,
|
116 |
checkpoint_dir=DATA_DIR,
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
model_filename=model_filename,
|
118 |
text_encoder_filename=text_encoder_filename
|
119 |
)
|
|
|
612 |
# Actually run the i2v generation
|
613 |
try:
|
614 |
sample_frames = wan_model.generate(
|
615 |
+
input_prompt = user_prompt,
|
616 |
+
image_start = input_img,
|
617 |
frame_num=frame_count,
|
618 |
width=width,
|
619 |
height=height,
|
requirements.txt
CHANGED
@@ -17,7 +17,7 @@ gradio==5.23.0
|
|
17 |
numpy>=1.23.5,<2
|
18 |
einops
|
19 |
moviepy==1.0.3
|
20 |
-
mmgp==3.4.
|
21 |
peft==0.14.0
|
22 |
mutagen
|
23 |
pydantic==2.10.6
|
|
|
17 |
numpy>=1.23.5,<2
|
18 |
einops
|
19 |
moviepy==1.0.3
|
20 |
+
mmgp==3.4.6
|
21 |
peft==0.14.0
|
22 |
mutagen
|
23 |
pydantic==2.10.6
|
tests/README.md
DELETED
@@ -1,6 +0,0 @@
|
|
1 |
-
|
2 |
-
Put all your models (Wan2.1-T2V-1.3B, Wan2.1-T2V-14B, Wan2.1-I2V-14B-480P, Wan2.1-I2V-14B-720P) in a folder and specify the max GPU number you want to use.
|
3 |
-
|
4 |
-
```bash
|
5 |
-
bash ./test.sh <local model dir> <gpu number>
|
6 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tests/test.sh
DELETED
@@ -1,113 +0,0 @@
|
|
1 |
-
#!/bin/bash
|
2 |
-
|
3 |
-
|
4 |
-
if [ "$#" -eq 2 ]; then
|
5 |
-
MODEL_DIR=$(realpath "$1")
|
6 |
-
GPUS=$2
|
7 |
-
else
|
8 |
-
echo "Usage: $0 <local model dir> <gpu number>"
|
9 |
-
exit 1
|
10 |
-
fi
|
11 |
-
|
12 |
-
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
|
13 |
-
REPO_ROOT="$(dirname "$SCRIPT_DIR")"
|
14 |
-
cd "$REPO_ROOT" || exit 1
|
15 |
-
|
16 |
-
PY_FILE=./generate.py
|
17 |
-
|
18 |
-
|
19 |
-
function t2v_1_3B() {
|
20 |
-
T2V_1_3B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-1.3B"
|
21 |
-
|
22 |
-
# 1-GPU Test
|
23 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B 1-GPU Test: "
|
24 |
-
python $PY_FILE --task t2v-1.3B --size 480*832 --ckpt_dir $T2V_1_3B_CKPT_DIR
|
25 |
-
|
26 |
-
# Multiple GPU Test
|
27 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU Test: "
|
28 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
|
29 |
-
|
30 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU, prompt extend local_qwen: "
|
31 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
|
32 |
-
|
33 |
-
if [ -n "${DASH_API_KEY+x}" ]; then
|
34 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU, prompt extend dashscope: "
|
35 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_method "dashscope"
|
36 |
-
else
|
37 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> No DASH_API_KEY found, skip the dashscope extend test."
|
38 |
-
fi
|
39 |
-
}
|
40 |
-
|
41 |
-
function t2v_14B() {
|
42 |
-
T2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-14B"
|
43 |
-
|
44 |
-
# 1-GPU Test
|
45 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B 1-GPU Test: "
|
46 |
-
python $PY_FILE --task t2v-14B --size 480*832 --ckpt_dir $T2V_14B_CKPT_DIR
|
47 |
-
|
48 |
-
# Multiple GPU Test
|
49 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B Multiple GPU Test: "
|
50 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
|
51 |
-
|
52 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B Multiple GPU, prompt extend local_qwen: "
|
53 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
|
54 |
-
}
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
function t2i_14B() {
|
59 |
-
T2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-14B"
|
60 |
-
|
61 |
-
# 1-GPU Test
|
62 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B 1-GPU Test: "
|
63 |
-
python $PY_FILE --task t2i-14B --size 480*832 --ckpt_dir $T2V_14B_CKPT_DIR
|
64 |
-
|
65 |
-
# Multiple GPU Test
|
66 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B Multiple GPU Test: "
|
67 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task t2i-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
|
68 |
-
|
69 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B Multiple GPU, prompt extend local_qwen: "
|
70 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task t2i-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
|
71 |
-
}
|
72 |
-
|
73 |
-
|
74 |
-
function i2v_14B_480p() {
|
75 |
-
I2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-I2V-14B-480P"
|
76 |
-
|
77 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B 1-GPU Test: "
|
78 |
-
python $PY_FILE --task i2v-14B --size 832*480 --ckpt_dir $I2V_14B_CKPT_DIR
|
79 |
-
|
80 |
-
# Multiple GPU Test
|
81 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU Test: "
|
82 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
|
83 |
-
|
84 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU, prompt extend local_qwen: "
|
85 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-VL-3B-Instruct" --prompt_extend_target_lang "en"
|
86 |
-
|
87 |
-
if [ -n "${DASH_API_KEY+x}" ]; then
|
88 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU, prompt extend dashscope: "
|
89 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_method "dashscope"
|
90 |
-
else
|
91 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> No DASH_API_KEY found, skip the dashscope extend test."
|
92 |
-
fi
|
93 |
-
}
|
94 |
-
|
95 |
-
|
96 |
-
function i2v_14B_720p() {
|
97 |
-
I2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-I2V-14B-720P"
|
98 |
-
|
99 |
-
# 1-GPU Test
|
100 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B 1-GPU Test: "
|
101 |
-
python $PY_FILE --task i2v-14B --size 720*1280 --ckpt_dir $I2V_14B_CKPT_DIR
|
102 |
-
|
103 |
-
# Multiple GPU Test
|
104 |
-
echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU Test: "
|
105 |
-
torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 720*1280 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
|
106 |
-
}
|
107 |
-
|
108 |
-
|
109 |
-
t2i_14B
|
110 |
-
t2v_1_3B
|
111 |
-
t2v_14B
|
112 |
-
i2v_14B_480p
|
113 |
-
i2v_14B_720p
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
wan/text2video.py
CHANGED
@@ -26,7 +26,7 @@ from .utils.fm_solvers import (FlowDPMSolverMultistepScheduler,
|
|
26 |
from .utils.fm_solvers_unipc import FlowUniPCMultistepScheduler
|
27 |
from wan.modules.posemb_layers import get_rotary_pos_embed
|
28 |
from .utils.vace_preprocessor import VaceVideoProcessor
|
29 |
-
|
30 |
|
31 |
def optimized_scale(positive_flat, negative_flat):
|
32 |
|
@@ -82,7 +82,9 @@ class WanT2V:
|
|
82 |
from mmgp import offload
|
83 |
# model_filename = "c:/temp/vace1.3/diffusion_pytorch_model.safetensors"
|
84 |
# model_filename = "vace14B_quanto_bf16_int8.safetensors"
|
85 |
-
|
|
|
|
|
86 |
# offload.load_model_data(self.model, "e:/vace.safetensors")
|
87 |
# offload.load_model_data(self.model, "c:/temp/Phantom-Wan-1.3B.pth")
|
88 |
# self.model.to(torch.bfloat16)
|
@@ -90,8 +92,8 @@ class WanT2V:
|
|
90 |
self.model.lock_layers_dtypes(torch.float32 if mixed_precision_transformer else dtype)
|
91 |
# dtype = torch.bfloat16
|
92 |
offload.change_dtype(self.model, dtype, True)
|
93 |
-
# offload.save_model(self.model, "wan2.
|
94 |
-
# offload.save_model(self.model, "
|
95 |
self.model.eval().requires_grad_(False)
|
96 |
|
97 |
|
@@ -399,13 +401,14 @@ class WanT2V:
|
|
399 |
|
400 |
# evaluation mode
|
401 |
|
402 |
-
if
|
403 |
-
sample_scheduler =
|
404 |
-
|
405 |
-
|
406 |
-
|
407 |
-
sample_scheduler.
|
408 |
-
|
|
|
409 |
timesteps = sample_scheduler.timesteps
|
410 |
elif sample_solver == 'dpm++':
|
411 |
sample_scheduler = FlowDPMSolverMultistepScheduler(
|
@@ -468,7 +471,11 @@ class WanT2V:
|
|
468 |
timestep = torch.stack(timestep)
|
469 |
kwargs["current_step"] = i
|
470 |
kwargs["t"] = timestep
|
471 |
-
if
|
|
|
|
|
|
|
|
|
472 |
if phantom:
|
473 |
pos_it, pos_i, neg = self.model(
|
474 |
[ torch.cat([latent_model_input[:,:-input_ref_images.shape[1]], input_ref_images], dim=1) ] * 2 +
|
@@ -509,7 +516,9 @@ class WanT2V:
|
|
509 |
# del latent_model_input
|
510 |
|
511 |
# CFG Zero *. Thanks to https://github.com/WeichenFan/CFG-Zero-star/
|
512 |
-
if
|
|
|
|
|
513 |
guide_scale_img= 5.0
|
514 |
guide_scale_text= guide_scale #7.5
|
515 |
noise_pred = neg + guide_scale_img * (pos_i - neg) + guide_scale_text * (pos_it - pos_i)
|
@@ -528,13 +537,13 @@ class WanT2V:
|
|
528 |
noise_pred_uncond *= alpha
|
529 |
noise_pred = noise_pred_uncond + guide_scale * (noise_pred_text - noise_pred_uncond)
|
530 |
noise_pred_uncond, noise_pred_cond, noise_pred_text, pos_it, pos_i, neg = None, None, None, None, None, None
|
531 |
-
|
532 |
temp_x0 = sample_scheduler.step(
|
533 |
noise_pred[:, :target_shape[1]].unsqueeze(0),
|
534 |
t,
|
535 |
latents.unsqueeze(0),
|
536 |
-
return_dict=False,
|
537 |
-
|
538 |
latents = temp_x0.squeeze(0)
|
539 |
del temp_x0
|
540 |
|
|
|
26 |
from .utils.fm_solvers_unipc import FlowUniPCMultistepScheduler
|
27 |
from wan.modules.posemb_layers import get_rotary_pos_embed
|
28 |
from .utils.vace_preprocessor import VaceVideoProcessor
|
29 |
+
from wan.utils.basic_flowmatch import FlowMatchScheduler
|
30 |
|
31 |
def optimized_scale(positive_flat, negative_flat):
|
32 |
|
|
|
82 |
from mmgp import offload
|
83 |
# model_filename = "c:/temp/vace1.3/diffusion_pytorch_model.safetensors"
|
84 |
# model_filename = "vace14B_quanto_bf16_int8.safetensors"
|
85 |
+
# model_filename = "c:/temp/movii/diffusion_pytorch_model-00001-of-00007.safetensors"
|
86 |
+
# config_filename= "c:/temp/movii/config.json"
|
87 |
+
self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer, writable_tensors= False) # , forcedConfigPath= config_filename)
|
88 |
# offload.load_model_data(self.model, "e:/vace.safetensors")
|
89 |
# offload.load_model_data(self.model, "c:/temp/Phantom-Wan-1.3B.pth")
|
90 |
# self.model.to(torch.bfloat16)
|
|
|
92 |
self.model.lock_layers_dtypes(torch.float32 if mixed_precision_transformer else dtype)
|
93 |
# dtype = torch.bfloat16
|
94 |
offload.change_dtype(self.model, dtype, True)
|
95 |
+
# offload.save_model(self.model, "wan2.1_moviigen_14B_mbf16.safetensors", config_file_path=config_filename)
|
96 |
+
# offload.save_model(self.model, "wan2.1_moviigen_14B_quanto_fp16_int8.safetensors", do_quantize= True, config_file_path=config_filename)
|
97 |
self.model.eval().requires_grad_(False)
|
98 |
|
99 |
|
|
|
401 |
|
402 |
# evaluation mode
|
403 |
|
404 |
+
if False:
|
405 |
+
sample_scheduler = FlowMatchScheduler(num_inference_steps=sampling_steps, shift=shift, sigma_min=0, extra_one_step=True)
|
406 |
+
timesteps = torch.tensor([1000, 934, 862, 756, 603, 410, 250, 140, 74, 0])[:sampling_steps].to(self.device)
|
407 |
+
sample_scheduler.timesteps =timesteps
|
408 |
+
elif sample_solver == 'unipc':
|
409 |
+
sample_scheduler = FlowUniPCMultistepScheduler( num_train_timesteps=self.num_train_timesteps, shift=1, use_dynamic_shifting=False)
|
410 |
+
sample_scheduler.set_timesteps( sampling_steps, device=self.device, shift=shift)
|
411 |
+
|
412 |
timesteps = sample_scheduler.timesteps
|
413 |
elif sample_solver == 'dpm++':
|
414 |
sample_scheduler = FlowDPMSolverMultistepScheduler(
|
|
|
471 |
timestep = torch.stack(timestep)
|
472 |
kwargs["current_step"] = i
|
473 |
kwargs["t"] = timestep
|
474 |
+
if guide_scale == 1:
|
475 |
+
noise_pred = self.model( [latent_model_input], x_id = 0, context = [context], **kwargs)[0]
|
476 |
+
if self._interrupt:
|
477 |
+
return None
|
478 |
+
elif joint_pass:
|
479 |
if phantom:
|
480 |
pos_it, pos_i, neg = self.model(
|
481 |
[ torch.cat([latent_model_input[:,:-input_ref_images.shape[1]], input_ref_images], dim=1) ] * 2 +
|
|
|
516 |
# del latent_model_input
|
517 |
|
518 |
# CFG Zero *. Thanks to https://github.com/WeichenFan/CFG-Zero-star/
|
519 |
+
if guide_scale == 1:
|
520 |
+
pass
|
521 |
+
elif phantom:
|
522 |
guide_scale_img= 5.0
|
523 |
guide_scale_text= guide_scale #7.5
|
524 |
noise_pred = neg + guide_scale_img * (pos_i - neg) + guide_scale_text * (pos_it - pos_i)
|
|
|
537 |
noise_pred_uncond *= alpha
|
538 |
noise_pred = noise_pred_uncond + guide_scale * (noise_pred_text - noise_pred_uncond)
|
539 |
noise_pred_uncond, noise_pred_cond, noise_pred_text, pos_it, pos_i, neg = None, None, None, None, None, None
|
540 |
+
scheduler_kwargs = {} if isinstance(sample_scheduler, FlowMatchScheduler) else {"generator": seed_g}
|
541 |
temp_x0 = sample_scheduler.step(
|
542 |
noise_pred[:, :target_shape[1]].unsqueeze(0),
|
543 |
t,
|
544 |
latents.unsqueeze(0),
|
545 |
+
# return_dict=False,
|
546 |
+
**scheduler_kwargs)[0]
|
547 |
latents = temp_x0.squeeze(0)
|
548 |
del temp_x0
|
549 |
|
wgp.py
CHANGED
@@ -42,7 +42,7 @@ global_queue_ref = []
|
|
42 |
AUTOSAVE_FILENAME = "queue.zip"
|
43 |
PROMPT_VARS_MAX = 10
|
44 |
|
45 |
-
target_mmgp_version = "3.4.
|
46 |
prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
|
47 |
|
48 |
from importlib.metadata import version
|
@@ -1529,7 +1529,9 @@ for path in ["wan2.1_Vace_1.3B_preview_bf16.safetensors", "sky_reels2_diffusion
|
|
1529 |
wan_choices_t2v=["ckpts/wan2.1_text2video_1.3B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_quanto_int8.safetensors", "ckpts/wan2.1_Vace_1.3B_mbf16.safetensors",
|
1530 |
"ckpts/wan2.1_recammaster_1.3B_bf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_1.3B_mbf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_14B_bf16.safetensors",
|
1531 |
"ckpts/sky_reels2_diffusion_forcing_14B_quanto_int8.safetensors", "ckpts/sky_reels2_diffusion_forcing_720p_14B_mbf16.safetensors","ckpts/sky_reels2_diffusion_forcing_720p_14B_quanto_mbf16_int8.safetensors",
|
1532 |
-
"ckpts/wan2_1_phantom_1.3B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_quanto_mbf16_int8.safetensors"
|
|
|
|
|
1533 |
wan_choices_i2v=["ckpts/wan2.1_image2video_480p_14B_mbf16.safetensors", "ckpts/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_image2video_720p_14B_mbf16.safetensors",
|
1534 |
"ckpts/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_Fun_InP_1.3B_bf16.safetensors", "ckpts/wan2.1_Fun_InP_14B_bf16.safetensors",
|
1535 |
"ckpts/wan2.1_Fun_InP_14B_quanto_int8.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_bf16.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_quanto_int8.safetensors",
|
@@ -1547,11 +1549,11 @@ def get_dependent_models(model_filename, quantization, dtype_policy ):
|
|
1547 |
return [get_model_filename("ltxv_13B", quantization, dtype_policy)]
|
1548 |
else:
|
1549 |
return []
|
1550 |
-
model_types = [ "t2v_1.3B", "t2v", "i2v", "i2v_720p", "flf2v_720p", "vace_1.3B","vace_14B", "phantom_1.3B", "fantasy", "fun_inp_1.3B", "fun_inp", "recam_1.3B", "sky_df_1.3B", "sky_df_14B", "sky_df_720p_14B", "ltxv_13B", "ltxv_13B_distilled", "hunyuan", "hunyuan_i2v", "hunyuan_custom"]
|
1551 |
model_signatures = {"t2v": "text2video_14B", "t2v_1.3B" : "text2video_1.3B", "fun_inp_1.3B" : "Fun_InP_1.3B", "fun_inp" : "Fun_InP_14B",
|
1552 |
"i2v" : "image2video_480p", "i2v_720p" : "image2video_720p" , "vace_1.3B" : "Vace_1.3B", "vace_14B" : "Vace_14B","recam_1.3B": "recammaster_1.3B",
|
1553 |
"flf2v_720p" : "FLF2V_720p", "sky_df_1.3B" : "sky_reels2_diffusion_forcing_1.3B", "sky_df_14B" : "sky_reels2_diffusion_forcing_14B",
|
1554 |
-
"sky_df_720p_14B" : "sky_reels2_diffusion_forcing_720p_14B",
|
1555 |
"phantom_1.3B" : "phantom_1.3B", "fantasy" : "fantasy", "ltxv_13B" : "ltxv_0.9.7_13B_dev", "ltxv_13B_distilled" : "ltxv_0.9.7_13B_distilled", "hunyuan" : "hunyuan_video_720", "hunyuan_i2v" : "hunyuan_video_i2v_720", "hunyuan_custom" : "hunyuan_video_custom" }
|
1556 |
|
1557 |
|
@@ -1616,6 +1618,9 @@ def get_model_name(model_filename, description_container = [""]):
|
|
1616 |
model_name = "Wan2.1 Fantasy Speaking 720p"
|
1617 |
model_name += " 14B" if "14B" in model_filename else " 1.3B"
|
1618 |
description = "The Fantasy Speaking model corresponds to the original Wan image 2 video model combined with the Fantasy Speaking extension to process an audio Input."
|
|
|
|
|
|
|
1619 |
elif "ltxv_0.9.7_13B_dev" in model_filename:
|
1620 |
model_name = "LTX Video 0.9.7 13B"
|
1621 |
description = "LTX Video is a fast model that can be used to generate long videos (up to 260 frames).It is recommended to keep the number of steps to 30 or you will need to update the file 'ltxv_video/configs/ltxv-13b-0.9.7-dev.yaml'.The LTX Video model expects very long prompts, so don't hesitate to use the Prompt Enhancer."
|
@@ -4541,12 +4546,17 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
|
|
4541 |
label = "Max Resolution (as it maybe less depending on video width / height ratio)"
|
4542 |
resolution = gr.Dropdown(
|
4543 |
choices=[
|
|
|
|
|
|
|
4544 |
# 720p
|
4545 |
("1280x720 (16:9, 720p)", "1280x720"),
|
4546 |
("720x1280 (9:16, 720p)", "720x1280"),
|
4547 |
("1024x1024 (1:1, 720p)", "1024x024"),
|
4548 |
-
("
|
|
|
4549 |
("1104x832 (4:3, 720p)", "1104x832"),
|
|
|
4550 |
("960x960 (1:1, 720p)", "960x960"),
|
4551 |
# 480p
|
4552 |
("960x544 (16:9, 540p)", "960x544"),
|
@@ -5651,7 +5661,7 @@ def create_demo():
|
|
5651 |
theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
|
5652 |
|
5653 |
with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
|
5654 |
-
gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.
|
5655 |
global model_list
|
5656 |
|
5657 |
tab_state = gr.State({ "tab_no":0 })
|
|
|
42 |
AUTOSAVE_FILENAME = "queue.zip"
|
43 |
PROMPT_VARS_MAX = 10
|
44 |
|
45 |
+
target_mmgp_version = "3.4.6"
|
46 |
prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
|
47 |
|
48 |
from importlib.metadata import version
|
|
|
1529 |
wan_choices_t2v=["ckpts/wan2.1_text2video_1.3B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_quanto_int8.safetensors", "ckpts/wan2.1_Vace_1.3B_mbf16.safetensors",
|
1530 |
"ckpts/wan2.1_recammaster_1.3B_bf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_1.3B_mbf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_14B_bf16.safetensors",
|
1531 |
"ckpts/sky_reels2_diffusion_forcing_14B_quanto_int8.safetensors", "ckpts/sky_reels2_diffusion_forcing_720p_14B_mbf16.safetensors","ckpts/sky_reels2_diffusion_forcing_720p_14B_quanto_mbf16_int8.safetensors",
|
1532 |
+
"ckpts/wan2_1_phantom_1.3B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_quanto_mbf16_int8.safetensors",
|
1533 |
+
"ckpts/wan2.1_moviigen1.1_14B_mbf16.safetensors", "ckpts/wan2.1_moviigen1.1_14B_quanto_mbf16_int8.safetensors",
|
1534 |
+
]
|
1535 |
wan_choices_i2v=["ckpts/wan2.1_image2video_480p_14B_mbf16.safetensors", "ckpts/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_image2video_720p_14B_mbf16.safetensors",
|
1536 |
"ckpts/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_Fun_InP_1.3B_bf16.safetensors", "ckpts/wan2.1_Fun_InP_14B_bf16.safetensors",
|
1537 |
"ckpts/wan2.1_Fun_InP_14B_quanto_int8.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_bf16.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_quanto_int8.safetensors",
|
|
|
1549 |
return [get_model_filename("ltxv_13B", quantization, dtype_policy)]
|
1550 |
else:
|
1551 |
return []
|
1552 |
+
model_types = [ "t2v_1.3B", "t2v", "i2v", "i2v_720p", "flf2v_720p", "vace_1.3B","vace_14B","moviigen", "phantom_1.3B", "fantasy", "fun_inp_1.3B", "fun_inp", "recam_1.3B", "sky_df_1.3B", "sky_df_14B", "sky_df_720p_14B", "ltxv_13B", "ltxv_13B_distilled", "hunyuan", "hunyuan_i2v", "hunyuan_custom"]
|
1553 |
model_signatures = {"t2v": "text2video_14B", "t2v_1.3B" : "text2video_1.3B", "fun_inp_1.3B" : "Fun_InP_1.3B", "fun_inp" : "Fun_InP_14B",
|
1554 |
"i2v" : "image2video_480p", "i2v_720p" : "image2video_720p" , "vace_1.3B" : "Vace_1.3B", "vace_14B" : "Vace_14B","recam_1.3B": "recammaster_1.3B",
|
1555 |
"flf2v_720p" : "FLF2V_720p", "sky_df_1.3B" : "sky_reels2_diffusion_forcing_1.3B", "sky_df_14B" : "sky_reels2_diffusion_forcing_14B",
|
1556 |
+
"sky_df_720p_14B" : "sky_reels2_diffusion_forcing_720p_14B", "moviigen" :"moviigen",
|
1557 |
"phantom_1.3B" : "phantom_1.3B", "fantasy" : "fantasy", "ltxv_13B" : "ltxv_0.9.7_13B_dev", "ltxv_13B_distilled" : "ltxv_0.9.7_13B_distilled", "hunyuan" : "hunyuan_video_720", "hunyuan_i2v" : "hunyuan_video_i2v_720", "hunyuan_custom" : "hunyuan_video_custom" }
|
1558 |
|
1559 |
|
|
|
1618 |
model_name = "Wan2.1 Fantasy Speaking 720p"
|
1619 |
model_name += " 14B" if "14B" in model_filename else " 1.3B"
|
1620 |
description = "The Fantasy Speaking model corresponds to the original Wan image 2 video model combined with the Fantasy Speaking extension to process an audio Input."
|
1621 |
+
elif "movii" in model_filename:
|
1622 |
+
model_name = "Wan2.1 MoviiGen 1080p 14B"
|
1623 |
+
description = "MoviiGen 1.1, a cutting-edge video generation model that excels in cinematic aesthetics and visual quality. Use it to generate videos in 720p or 1080p in the 21:9 ratio."
|
1624 |
elif "ltxv_0.9.7_13B_dev" in model_filename:
|
1625 |
model_name = "LTX Video 0.9.7 13B"
|
1626 |
description = "LTX Video is a fast model that can be used to generate long videos (up to 260 frames).It is recommended to keep the number of steps to 30 or you will need to update the file 'ltxv_video/configs/ltxv-13b-0.9.7-dev.yaml'.The LTX Video model expects very long prompts, so don't hesitate to use the Prompt Enhancer."
|
|
|
4546 |
label = "Max Resolution (as it maybe less depending on video width / height ratio)"
|
4547 |
resolution = gr.Dropdown(
|
4548 |
choices=[
|
4549 |
+
# 1080p
|
4550 |
+
("1920x832 (21:9, 1080p)", "1920x832"),
|
4551 |
+
("832x1920 (9:21, 1080p)", "832x1920"),
|
4552 |
# 720p
|
4553 |
("1280x720 (16:9, 720p)", "1280x720"),
|
4554 |
("720x1280 (9:16, 720p)", "720x1280"),
|
4555 |
("1024x1024 (1:1, 720p)", "1024x024"),
|
4556 |
+
("1280x544 (21:9, 720p)", "1280x544"),
|
4557 |
+
("544x1280 (9:21, 720p)", "544x1280"),
|
4558 |
("1104x832 (4:3, 720p)", "1104x832"),
|
4559 |
+
("832x1104 (3:4, 720p)", "832x1104"),
|
4560 |
("960x960 (1:1, 720p)", "960x960"),
|
4561 |
# 480p
|
4562 |
("960x544 (16:9, 540p)", "960x544"),
|
|
|
5661 |
theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
|
5662 |
|
5663 |
with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
|
5664 |
+
gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.2 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
|
5665 |
global model_list
|
5666 |
|
5667 |
tab_state = gr.State({ "tab_no":0 })
|