DeepBeepMeep commited on
Commit
60fd3bd
·
1 Parent(s): 7e8fb61

Added support for CausVid Lora and MoviiGen

Browse files
Files changed (8) hide show
  1. README.md +23 -2
  2. examples/i2v_input.JPG +0 -3
  3. i2v_inference.py +2 -14
  4. requirements.txt +1 -1
  5. tests/README.md +0 -6
  6. tests/test.sh +0 -113
  7. wan/text2video.py +25 -16
  8. wgp.py +16 -6
README.md CHANGED
@@ -21,11 +21,15 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
21
 
22
 
23
  ## 🔥 Latest News!!
24
- * May 18 2025: 👋 Wan 2.1GP v5.1 : Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos !\
 
 
 
 
25
  * May 17 2025: 👋 Wan 2.1GP v5.0 : One App to Rule Them All !\
26
  Added support for the other great open source architectures:
27
  - Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
28
- - LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced VRAM requirements by 4 !
29
 
30
  Also:
31
  - Added supported for the best Control Video Model, released 2 days ago : Vace 14B
@@ -268,6 +272,23 @@ python wgp.py --lora-preset mylorapreset.lset # where 'mylorapreset.lset' is a
268
 
269
  You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
270
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
271
  ### Macros (basic)
272
  In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
273
  ```
 
21
 
22
 
23
  ## 🔥 Latest News!!
24
+ * May 20 2025: 👋 Wan 2.1GP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
25
+ The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
26
+ See instructions below on how to use CausVid.\
27
+ Also as an experiment I have added support for the MoviiGen, the first model that claims to capable to generate 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server.
28
+ * May 18 2025: 👋 Wan 2.1GP v5.1 : Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos !
29
  * May 17 2025: 👋 Wan 2.1GP v5.0 : One App to Rule Them All !\
30
  Added support for the other great open source architectures:
31
  - Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
32
+ - LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced LTX Video VRAM requirements by 4 !
33
 
34
  Also:
35
  - Added supported for the best Control Video Model, released 2 days ago : Vace 14B
 
272
 
273
  You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
274
 
275
+ ### CausVid Lora
276
+
277
+ Wan CausVid is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. Also as a distilled model it doesnt require CFG and is two times faster for the same number of steps.
278
+ The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B to accelerate other models too. It is possible it works also with Wan i2v models.
279
+
280
+ Instructions:
281
+ 1) Download first the Lora: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
282
+ 2) Choose a Wan t2v model (for instance Wan 2.1 text2video 13B or Vace 13B )
283
+ 3) Turn on the Advanced Mode by checking the corresponding checkbox
284
+ 4) In the Advanced Generation Tab: select Guidance Scale =1, Shift Scale = 7
285
+ 5) In the Advanced Lora Tab : Select the CausVid Lora (click the Refresh button at the top if you dont see it), and enter 0.3 as Lora multiplier
286
+ 6) Now select a 12 steps generation and Click Generate
287
+
288
+ You can reduce the number of steps to as low as 4 but you will need to increase progressively at the same time the Lora muliplier up to 1. Please note the lower the number of steps the lower the quality (especially the motion).
289
+
290
+ You can combine the CausVid Lora and other Loras (just follow the instructions above)
291
+
292
  ### Macros (basic)
293
  In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
294
  ```
examples/i2v_input.JPG DELETED

Git LFS Details

  • SHA256: 077e3d965090c9028c69c00931675f42e1acc815c6eb450ab291b3b72d211a8e
  • Pointer size: 131 Bytes
  • Size of remote file: 251 kB
i2v_inference.py CHANGED
@@ -105,12 +105,6 @@ def load_i2v_model(model_filename, text_encoder_filename, is_720p):
105
  wan_model = wan.WanI2V(
106
  config=cfg,
107
  checkpoint_dir=DATA_DIR,
108
- device_id=0,
109
- rank=0,
110
- t5_fsdp=False,
111
- dit_fsdp=False,
112
- use_usp=False,
113
- i2v720p=True,
114
  model_filename=model_filename,
115
  text_encoder_filename=text_encoder_filename
116
  )
@@ -120,12 +114,6 @@ def load_i2v_model(model_filename, text_encoder_filename, is_720p):
120
  wan_model = wan.WanI2V(
121
  config=cfg,
122
  checkpoint_dir=DATA_DIR,
123
- device_id=0,
124
- rank=0,
125
- t5_fsdp=False,
126
- dit_fsdp=False,
127
- use_usp=False,
128
- i2v720p=False,
129
  model_filename=model_filename,
130
  text_encoder_filename=text_encoder_filename
131
  )
@@ -624,8 +612,8 @@ def main():
624
  # Actually run the i2v generation
625
  try:
626
  sample_frames = wan_model.generate(
627
- user_prompt,
628
- input_img,
629
  frame_num=frame_count,
630
  width=width,
631
  height=height,
 
105
  wan_model = wan.WanI2V(
106
  config=cfg,
107
  checkpoint_dir=DATA_DIR,
 
 
 
 
 
 
108
  model_filename=model_filename,
109
  text_encoder_filename=text_encoder_filename
110
  )
 
114
  wan_model = wan.WanI2V(
115
  config=cfg,
116
  checkpoint_dir=DATA_DIR,
 
 
 
 
 
 
117
  model_filename=model_filename,
118
  text_encoder_filename=text_encoder_filename
119
  )
 
612
  # Actually run the i2v generation
613
  try:
614
  sample_frames = wan_model.generate(
615
+ input_prompt = user_prompt,
616
+ image_start = input_img,
617
  frame_num=frame_count,
618
  width=width,
619
  height=height,
requirements.txt CHANGED
@@ -17,7 +17,7 @@ gradio==5.23.0
17
  numpy>=1.23.5,<2
18
  einops
19
  moviepy==1.0.3
20
- mmgp==3.4.5
21
  peft==0.14.0
22
  mutagen
23
  pydantic==2.10.6
 
17
  numpy>=1.23.5,<2
18
  einops
19
  moviepy==1.0.3
20
+ mmgp==3.4.6
21
  peft==0.14.0
22
  mutagen
23
  pydantic==2.10.6
tests/README.md DELETED
@@ -1,6 +0,0 @@
1
-
2
- Put all your models (Wan2.1-T2V-1.3B, Wan2.1-T2V-14B, Wan2.1-I2V-14B-480P, Wan2.1-I2V-14B-720P) in a folder and specify the max GPU number you want to use.
3
-
4
- ```bash
5
- bash ./test.sh <local model dir> <gpu number>
6
- ```
 
 
 
 
 
 
 
tests/test.sh DELETED
@@ -1,113 +0,0 @@
1
- #!/bin/bash
2
-
3
-
4
- if [ "$#" -eq 2 ]; then
5
- MODEL_DIR=$(realpath "$1")
6
- GPUS=$2
7
- else
8
- echo "Usage: $0 <local model dir> <gpu number>"
9
- exit 1
10
- fi
11
-
12
- SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
13
- REPO_ROOT="$(dirname "$SCRIPT_DIR")"
14
- cd "$REPO_ROOT" || exit 1
15
-
16
- PY_FILE=./generate.py
17
-
18
-
19
- function t2v_1_3B() {
20
- T2V_1_3B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-1.3B"
21
-
22
- # 1-GPU Test
23
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B 1-GPU Test: "
24
- python $PY_FILE --task t2v-1.3B --size 480*832 --ckpt_dir $T2V_1_3B_CKPT_DIR
25
-
26
- # Multiple GPU Test
27
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU Test: "
28
- torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
29
-
30
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU, prompt extend local_qwen: "
31
- torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
32
-
33
- if [ -n "${DASH_API_KEY+x}" ]; then
34
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU, prompt extend dashscope: "
35
- torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_method "dashscope"
36
- else
37
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> No DASH_API_KEY found, skip the dashscope extend test."
38
- fi
39
- }
40
-
41
- function t2v_14B() {
42
- T2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-14B"
43
-
44
- # 1-GPU Test
45
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B 1-GPU Test: "
46
- python $PY_FILE --task t2v-14B --size 480*832 --ckpt_dir $T2V_14B_CKPT_DIR
47
-
48
- # Multiple GPU Test
49
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B Multiple GPU Test: "
50
- torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
51
-
52
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B Multiple GPU, prompt extend local_qwen: "
53
- torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
54
- }
55
-
56
-
57
-
58
- function t2i_14B() {
59
- T2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-14B"
60
-
61
- # 1-GPU Test
62
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B 1-GPU Test: "
63
- python $PY_FILE --task t2i-14B --size 480*832 --ckpt_dir $T2V_14B_CKPT_DIR
64
-
65
- # Multiple GPU Test
66
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B Multiple GPU Test: "
67
- torchrun --nproc_per_node=$GPUS $PY_FILE --task t2i-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
68
-
69
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B Multiple GPU, prompt extend local_qwen: "
70
- torchrun --nproc_per_node=$GPUS $PY_FILE --task t2i-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
71
- }
72
-
73
-
74
- function i2v_14B_480p() {
75
- I2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-I2V-14B-480P"
76
-
77
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B 1-GPU Test: "
78
- python $PY_FILE --task i2v-14B --size 832*480 --ckpt_dir $I2V_14B_CKPT_DIR
79
-
80
- # Multiple GPU Test
81
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU Test: "
82
- torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
83
-
84
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU, prompt extend local_qwen: "
85
- torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-VL-3B-Instruct" --prompt_extend_target_lang "en"
86
-
87
- if [ -n "${DASH_API_KEY+x}" ]; then
88
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU, prompt extend dashscope: "
89
- torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_method "dashscope"
90
- else
91
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> No DASH_API_KEY found, skip the dashscope extend test."
92
- fi
93
- }
94
-
95
-
96
- function i2v_14B_720p() {
97
- I2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-I2V-14B-720P"
98
-
99
- # 1-GPU Test
100
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B 1-GPU Test: "
101
- python $PY_FILE --task i2v-14B --size 720*1280 --ckpt_dir $I2V_14B_CKPT_DIR
102
-
103
- # Multiple GPU Test
104
- echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU Test: "
105
- torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 720*1280 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
106
- }
107
-
108
-
109
- t2i_14B
110
- t2v_1_3B
111
- t2v_14B
112
- i2v_14B_480p
113
- i2v_14B_720p
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
wan/text2video.py CHANGED
@@ -26,7 +26,7 @@ from .utils.fm_solvers import (FlowDPMSolverMultistepScheduler,
26
  from .utils.fm_solvers_unipc import FlowUniPCMultistepScheduler
27
  from wan.modules.posemb_layers import get_rotary_pos_embed
28
  from .utils.vace_preprocessor import VaceVideoProcessor
29
-
30
 
31
  def optimized_scale(positive_flat, negative_flat):
32
 
@@ -82,7 +82,9 @@ class WanT2V:
82
  from mmgp import offload
83
  # model_filename = "c:/temp/vace1.3/diffusion_pytorch_model.safetensors"
84
  # model_filename = "vace14B_quanto_bf16_int8.safetensors"
85
- self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer, writable_tensors= False) # , forcedConfigPath= "c:/temp/vace1.3/config.json")
 
 
86
  # offload.load_model_data(self.model, "e:/vace.safetensors")
87
  # offload.load_model_data(self.model, "c:/temp/Phantom-Wan-1.3B.pth")
88
  # self.model.to(torch.bfloat16)
@@ -90,8 +92,8 @@ class WanT2V:
90
  self.model.lock_layers_dtypes(torch.float32 if mixed_precision_transformer else dtype)
91
  # dtype = torch.bfloat16
92
  offload.change_dtype(self.model, dtype, True)
93
- # offload.save_model(self.model, "wan2.1_Vace1.3B_mbf16.safetensors", config_file_path="c:/temp/vace1.3/config.json")
94
- # offload.save_model(self.model, "vace14B_quanto_fp16_int8.safetensors", do_quantize= True, config_file_path="c:/temp/vace/vace_config.json")
95
  self.model.eval().requires_grad_(False)
96
 
97
 
@@ -399,13 +401,14 @@ class WanT2V:
399
 
400
  # evaluation mode
401
 
402
- if sample_solver == 'unipc':
403
- sample_scheduler = FlowUniPCMultistepScheduler(
404
- num_train_timesteps=self.num_train_timesteps,
405
- shift=1,
406
- use_dynamic_shifting=False)
407
- sample_scheduler.set_timesteps(
408
- sampling_steps, device=self.device, shift=shift)
 
409
  timesteps = sample_scheduler.timesteps
410
  elif sample_solver == 'dpm++':
411
  sample_scheduler = FlowDPMSolverMultistepScheduler(
@@ -468,7 +471,11 @@ class WanT2V:
468
  timestep = torch.stack(timestep)
469
  kwargs["current_step"] = i
470
  kwargs["t"] = timestep
471
- if joint_pass:
 
 
 
 
472
  if phantom:
473
  pos_it, pos_i, neg = self.model(
474
  [ torch.cat([latent_model_input[:,:-input_ref_images.shape[1]], input_ref_images], dim=1) ] * 2 +
@@ -509,7 +516,9 @@ class WanT2V:
509
  # del latent_model_input
510
 
511
  # CFG Zero *. Thanks to https://github.com/WeichenFan/CFG-Zero-star/
512
- if phantom:
 
 
513
  guide_scale_img= 5.0
514
  guide_scale_text= guide_scale #7.5
515
  noise_pred = neg + guide_scale_img * (pos_i - neg) + guide_scale_text * (pos_it - pos_i)
@@ -528,13 +537,13 @@ class WanT2V:
528
  noise_pred_uncond *= alpha
529
  noise_pred = noise_pred_uncond + guide_scale * (noise_pred_text - noise_pred_uncond)
530
  noise_pred_uncond, noise_pred_cond, noise_pred_text, pos_it, pos_i, neg = None, None, None, None, None, None
531
-
532
  temp_x0 = sample_scheduler.step(
533
  noise_pred[:, :target_shape[1]].unsqueeze(0),
534
  t,
535
  latents.unsqueeze(0),
536
- return_dict=False,
537
- generator=seed_g)[0]
538
  latents = temp_x0.squeeze(0)
539
  del temp_x0
540
 
 
26
  from .utils.fm_solvers_unipc import FlowUniPCMultistepScheduler
27
  from wan.modules.posemb_layers import get_rotary_pos_embed
28
  from .utils.vace_preprocessor import VaceVideoProcessor
29
+ from wan.utils.basic_flowmatch import FlowMatchScheduler
30
 
31
  def optimized_scale(positive_flat, negative_flat):
32
 
 
82
  from mmgp import offload
83
  # model_filename = "c:/temp/vace1.3/diffusion_pytorch_model.safetensors"
84
  # model_filename = "vace14B_quanto_bf16_int8.safetensors"
85
+ # model_filename = "c:/temp/movii/diffusion_pytorch_model-00001-of-00007.safetensors"
86
+ # config_filename= "c:/temp/movii/config.json"
87
+ self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer, writable_tensors= False) # , forcedConfigPath= config_filename)
88
  # offload.load_model_data(self.model, "e:/vace.safetensors")
89
  # offload.load_model_data(self.model, "c:/temp/Phantom-Wan-1.3B.pth")
90
  # self.model.to(torch.bfloat16)
 
92
  self.model.lock_layers_dtypes(torch.float32 if mixed_precision_transformer else dtype)
93
  # dtype = torch.bfloat16
94
  offload.change_dtype(self.model, dtype, True)
95
+ # offload.save_model(self.model, "wan2.1_moviigen_14B_mbf16.safetensors", config_file_path=config_filename)
96
+ # offload.save_model(self.model, "wan2.1_moviigen_14B_quanto_fp16_int8.safetensors", do_quantize= True, config_file_path=config_filename)
97
  self.model.eval().requires_grad_(False)
98
 
99
 
 
401
 
402
  # evaluation mode
403
 
404
+ if False:
405
+ sample_scheduler = FlowMatchScheduler(num_inference_steps=sampling_steps, shift=shift, sigma_min=0, extra_one_step=True)
406
+ timesteps = torch.tensor([1000, 934, 862, 756, 603, 410, 250, 140, 74, 0])[:sampling_steps].to(self.device)
407
+ sample_scheduler.timesteps =timesteps
408
+ elif sample_solver == 'unipc':
409
+ sample_scheduler = FlowUniPCMultistepScheduler( num_train_timesteps=self.num_train_timesteps, shift=1, use_dynamic_shifting=False)
410
+ sample_scheduler.set_timesteps( sampling_steps, device=self.device, shift=shift)
411
+
412
  timesteps = sample_scheduler.timesteps
413
  elif sample_solver == 'dpm++':
414
  sample_scheduler = FlowDPMSolverMultistepScheduler(
 
471
  timestep = torch.stack(timestep)
472
  kwargs["current_step"] = i
473
  kwargs["t"] = timestep
474
+ if guide_scale == 1:
475
+ noise_pred = self.model( [latent_model_input], x_id = 0, context = [context], **kwargs)[0]
476
+ if self._interrupt:
477
+ return None
478
+ elif joint_pass:
479
  if phantom:
480
  pos_it, pos_i, neg = self.model(
481
  [ torch.cat([latent_model_input[:,:-input_ref_images.shape[1]], input_ref_images], dim=1) ] * 2 +
 
516
  # del latent_model_input
517
 
518
  # CFG Zero *. Thanks to https://github.com/WeichenFan/CFG-Zero-star/
519
+ if guide_scale == 1:
520
+ pass
521
+ elif phantom:
522
  guide_scale_img= 5.0
523
  guide_scale_text= guide_scale #7.5
524
  noise_pred = neg + guide_scale_img * (pos_i - neg) + guide_scale_text * (pos_it - pos_i)
 
537
  noise_pred_uncond *= alpha
538
  noise_pred = noise_pred_uncond + guide_scale * (noise_pred_text - noise_pred_uncond)
539
  noise_pred_uncond, noise_pred_cond, noise_pred_text, pos_it, pos_i, neg = None, None, None, None, None, None
540
+ scheduler_kwargs = {} if isinstance(sample_scheduler, FlowMatchScheduler) else {"generator": seed_g}
541
  temp_x0 = sample_scheduler.step(
542
  noise_pred[:, :target_shape[1]].unsqueeze(0),
543
  t,
544
  latents.unsqueeze(0),
545
+ # return_dict=False,
546
+ **scheduler_kwargs)[0]
547
  latents = temp_x0.squeeze(0)
548
  del temp_x0
549
 
wgp.py CHANGED
@@ -42,7 +42,7 @@ global_queue_ref = []
42
  AUTOSAVE_FILENAME = "queue.zip"
43
  PROMPT_VARS_MAX = 10
44
 
45
- target_mmgp_version = "3.4.5"
46
  prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
47
 
48
  from importlib.metadata import version
@@ -1529,7 +1529,9 @@ for path in ["wan2.1_Vace_1.3B_preview_bf16.safetensors", "sky_reels2_diffusion
1529
  wan_choices_t2v=["ckpts/wan2.1_text2video_1.3B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_quanto_int8.safetensors", "ckpts/wan2.1_Vace_1.3B_mbf16.safetensors",
1530
  "ckpts/wan2.1_recammaster_1.3B_bf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_1.3B_mbf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_14B_bf16.safetensors",
1531
  "ckpts/sky_reels2_diffusion_forcing_14B_quanto_int8.safetensors", "ckpts/sky_reels2_diffusion_forcing_720p_14B_mbf16.safetensors","ckpts/sky_reels2_diffusion_forcing_720p_14B_quanto_mbf16_int8.safetensors",
1532
- "ckpts/wan2_1_phantom_1.3B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_quanto_mbf16_int8.safetensors"]
 
 
1533
  wan_choices_i2v=["ckpts/wan2.1_image2video_480p_14B_mbf16.safetensors", "ckpts/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_image2video_720p_14B_mbf16.safetensors",
1534
  "ckpts/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_Fun_InP_1.3B_bf16.safetensors", "ckpts/wan2.1_Fun_InP_14B_bf16.safetensors",
1535
  "ckpts/wan2.1_Fun_InP_14B_quanto_int8.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_bf16.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_quanto_int8.safetensors",
@@ -1547,11 +1549,11 @@ def get_dependent_models(model_filename, quantization, dtype_policy ):
1547
  return [get_model_filename("ltxv_13B", quantization, dtype_policy)]
1548
  else:
1549
  return []
1550
- model_types = [ "t2v_1.3B", "t2v", "i2v", "i2v_720p", "flf2v_720p", "vace_1.3B","vace_14B", "phantom_1.3B", "fantasy", "fun_inp_1.3B", "fun_inp", "recam_1.3B", "sky_df_1.3B", "sky_df_14B", "sky_df_720p_14B", "ltxv_13B", "ltxv_13B_distilled", "hunyuan", "hunyuan_i2v", "hunyuan_custom"]
1551
  model_signatures = {"t2v": "text2video_14B", "t2v_1.3B" : "text2video_1.3B", "fun_inp_1.3B" : "Fun_InP_1.3B", "fun_inp" : "Fun_InP_14B",
1552
  "i2v" : "image2video_480p", "i2v_720p" : "image2video_720p" , "vace_1.3B" : "Vace_1.3B", "vace_14B" : "Vace_14B","recam_1.3B": "recammaster_1.3B",
1553
  "flf2v_720p" : "FLF2V_720p", "sky_df_1.3B" : "sky_reels2_diffusion_forcing_1.3B", "sky_df_14B" : "sky_reels2_diffusion_forcing_14B",
1554
- "sky_df_720p_14B" : "sky_reels2_diffusion_forcing_720p_14B",
1555
  "phantom_1.3B" : "phantom_1.3B", "fantasy" : "fantasy", "ltxv_13B" : "ltxv_0.9.7_13B_dev", "ltxv_13B_distilled" : "ltxv_0.9.7_13B_distilled", "hunyuan" : "hunyuan_video_720", "hunyuan_i2v" : "hunyuan_video_i2v_720", "hunyuan_custom" : "hunyuan_video_custom" }
1556
 
1557
 
@@ -1616,6 +1618,9 @@ def get_model_name(model_filename, description_container = [""]):
1616
  model_name = "Wan2.1 Fantasy Speaking 720p"
1617
  model_name += " 14B" if "14B" in model_filename else " 1.3B"
1618
  description = "The Fantasy Speaking model corresponds to the original Wan image 2 video model combined with the Fantasy Speaking extension to process an audio Input."
 
 
 
1619
  elif "ltxv_0.9.7_13B_dev" in model_filename:
1620
  model_name = "LTX Video 0.9.7 13B"
1621
  description = "LTX Video is a fast model that can be used to generate long videos (up to 260 frames).It is recommended to keep the number of steps to 30 or you will need to update the file 'ltxv_video/configs/ltxv-13b-0.9.7-dev.yaml'.The LTX Video model expects very long prompts, so don't hesitate to use the Prompt Enhancer."
@@ -4541,12 +4546,17 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
4541
  label = "Max Resolution (as it maybe less depending on video width / height ratio)"
4542
  resolution = gr.Dropdown(
4543
  choices=[
 
 
 
4544
  # 720p
4545
  ("1280x720 (16:9, 720p)", "1280x720"),
4546
  ("720x1280 (9:16, 720p)", "720x1280"),
4547
  ("1024x1024 (1:1, 720p)", "1024x024"),
4548
- ("832x1104 (3:4, 720p)", "832x1104"),
 
4549
  ("1104x832 (4:3, 720p)", "1104x832"),
 
4550
  ("960x960 (1:1, 720p)", "960x960"),
4551
  # 480p
4552
  ("960x544 (16:9, 540p)", "960x544"),
@@ -5651,7 +5661,7 @@ def create_demo():
5651
  theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
5652
 
5653
  with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
5654
- gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.1 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
5655
  global model_list
5656
 
5657
  tab_state = gr.State({ "tab_no":0 })
 
42
  AUTOSAVE_FILENAME = "queue.zip"
43
  PROMPT_VARS_MAX = 10
44
 
45
+ target_mmgp_version = "3.4.6"
46
  prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
47
 
48
  from importlib.metadata import version
 
1529
  wan_choices_t2v=["ckpts/wan2.1_text2video_1.3B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_quanto_int8.safetensors", "ckpts/wan2.1_Vace_1.3B_mbf16.safetensors",
1530
  "ckpts/wan2.1_recammaster_1.3B_bf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_1.3B_mbf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_14B_bf16.safetensors",
1531
  "ckpts/sky_reels2_diffusion_forcing_14B_quanto_int8.safetensors", "ckpts/sky_reels2_diffusion_forcing_720p_14B_mbf16.safetensors","ckpts/sky_reels2_diffusion_forcing_720p_14B_quanto_mbf16_int8.safetensors",
1532
+ "ckpts/wan2_1_phantom_1.3B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_quanto_mbf16_int8.safetensors",
1533
+ "ckpts/wan2.1_moviigen1.1_14B_mbf16.safetensors", "ckpts/wan2.1_moviigen1.1_14B_quanto_mbf16_int8.safetensors",
1534
+ ]
1535
  wan_choices_i2v=["ckpts/wan2.1_image2video_480p_14B_mbf16.safetensors", "ckpts/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_image2video_720p_14B_mbf16.safetensors",
1536
  "ckpts/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_Fun_InP_1.3B_bf16.safetensors", "ckpts/wan2.1_Fun_InP_14B_bf16.safetensors",
1537
  "ckpts/wan2.1_Fun_InP_14B_quanto_int8.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_bf16.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_quanto_int8.safetensors",
 
1549
  return [get_model_filename("ltxv_13B", quantization, dtype_policy)]
1550
  else:
1551
  return []
1552
+ model_types = [ "t2v_1.3B", "t2v", "i2v", "i2v_720p", "flf2v_720p", "vace_1.3B","vace_14B","moviigen", "phantom_1.3B", "fantasy", "fun_inp_1.3B", "fun_inp", "recam_1.3B", "sky_df_1.3B", "sky_df_14B", "sky_df_720p_14B", "ltxv_13B", "ltxv_13B_distilled", "hunyuan", "hunyuan_i2v", "hunyuan_custom"]
1553
  model_signatures = {"t2v": "text2video_14B", "t2v_1.3B" : "text2video_1.3B", "fun_inp_1.3B" : "Fun_InP_1.3B", "fun_inp" : "Fun_InP_14B",
1554
  "i2v" : "image2video_480p", "i2v_720p" : "image2video_720p" , "vace_1.3B" : "Vace_1.3B", "vace_14B" : "Vace_14B","recam_1.3B": "recammaster_1.3B",
1555
  "flf2v_720p" : "FLF2V_720p", "sky_df_1.3B" : "sky_reels2_diffusion_forcing_1.3B", "sky_df_14B" : "sky_reels2_diffusion_forcing_14B",
1556
+ "sky_df_720p_14B" : "sky_reels2_diffusion_forcing_720p_14B", "moviigen" :"moviigen",
1557
  "phantom_1.3B" : "phantom_1.3B", "fantasy" : "fantasy", "ltxv_13B" : "ltxv_0.9.7_13B_dev", "ltxv_13B_distilled" : "ltxv_0.9.7_13B_distilled", "hunyuan" : "hunyuan_video_720", "hunyuan_i2v" : "hunyuan_video_i2v_720", "hunyuan_custom" : "hunyuan_video_custom" }
1558
 
1559
 
 
1618
  model_name = "Wan2.1 Fantasy Speaking 720p"
1619
  model_name += " 14B" if "14B" in model_filename else " 1.3B"
1620
  description = "The Fantasy Speaking model corresponds to the original Wan image 2 video model combined with the Fantasy Speaking extension to process an audio Input."
1621
+ elif "movii" in model_filename:
1622
+ model_name = "Wan2.1 MoviiGen 1080p 14B"
1623
+ description = "MoviiGen 1.1, a cutting-edge video generation model that excels in cinematic aesthetics and visual quality. Use it to generate videos in 720p or 1080p in the 21:9 ratio."
1624
  elif "ltxv_0.9.7_13B_dev" in model_filename:
1625
  model_name = "LTX Video 0.9.7 13B"
1626
  description = "LTX Video is a fast model that can be used to generate long videos (up to 260 frames).It is recommended to keep the number of steps to 30 or you will need to update the file 'ltxv_video/configs/ltxv-13b-0.9.7-dev.yaml'.The LTX Video model expects very long prompts, so don't hesitate to use the Prompt Enhancer."
 
4546
  label = "Max Resolution (as it maybe less depending on video width / height ratio)"
4547
  resolution = gr.Dropdown(
4548
  choices=[
4549
+ # 1080p
4550
+ ("1920x832 (21:9, 1080p)", "1920x832"),
4551
+ ("832x1920 (9:21, 1080p)", "832x1920"),
4552
  # 720p
4553
  ("1280x720 (16:9, 720p)", "1280x720"),
4554
  ("720x1280 (9:16, 720p)", "720x1280"),
4555
  ("1024x1024 (1:1, 720p)", "1024x024"),
4556
+ ("1280x544 (21:9, 720p)", "1280x544"),
4557
+ ("544x1280 (9:21, 720p)", "544x1280"),
4558
  ("1104x832 (4:3, 720p)", "1104x832"),
4559
+ ("832x1104 (3:4, 720p)", "832x1104"),
4560
  ("960x960 (1:1, 720p)", "960x960"),
4561
  # 480p
4562
  ("960x544 (16:9, 540p)", "960x544"),
 
5661
  theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
5662
 
5663
  with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
5664
+ gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.2 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
5665
  global model_list
5666
 
5667
  tab_state = gr.State({ "tab_no":0 })