Spaces:

jbilcke-hf
/

Wan2GP_you_must_clone_this_space_to_use_it

Running on T4

App Files Files Community

DeepBeepMeep commited on May 20

Commit

60fd3bd

1 Parent(s): 7e8fb61

Added support for CausVid Lora and MoviiGen

Browse files

Files changed (8) hide show

README.md +23 -2
examples/i2v_input.JPG +0 -3
i2v_inference.py +2 -14
requirements.txt +1 -1
tests/README.md +0 -6
tests/test.sh +0 -113
wan/text2video.py +25 -16
wgp.py +16 -6

README.md CHANGED Viewed

@@ -21,11 +21,15 @@ WanGP supports the Wan (and derived models), Hunyuan Video and LTV Video models
 ## 🔥 Latest News!!
-* May 18 2025: 👋 Wan 2.1GP v5.1 : Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos !\
 * May 17 2025: 👋 Wan 2.1GP v5.0 : One App to Rule Them All !\
     Added support for the other great open source architectures:
     - Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
-    - LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced VRAM requirements by 4 !
     Also:
     - Added supported for the best Control Video Model, released 2 days ago : Vace 14B
@@ -268,6 +272,23 @@ python wgp.py --lora-preset  mylorapreset.lset # where 'mylorapreset.lset' is a
 You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
 ### Macros (basic)
 In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
 ```

 ## 🔥 Latest News!!
+* May 20 2025: 👋 Wan 2.1GP v5.2 : Added support for Wan CausVid which is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps.
+ The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B.
+ See instructions below on how to use CausVid.\
+ Also as an experiment I have added support for the MoviiGen, the first model that claims to capable to generate 1080p videos (if you have enough VRAM (20GB...) and be ready to wait for a long time...). Don't hesitate to share your impressions on the Discord server.
+* May 18 2025: 👋 Wan 2.1GP v5.1 : Bonus Day, added LTX Video 13B Distilled: generate in less than one minute, very high quality Videos !
 * May 17 2025: 👋 Wan 2.1GP v5.0 : One App to Rule Them All !\
     Added support for the other great open source architectures:
     - Hunyuan Video : text 2 video (one of the best, if not the best t2v) ,image 2 video and the recently released Hunyuan Custom (very good identify preservation when injecting a person into a video)
+    - LTX Video 13B (released last week): very long video support and fast 720p generation.Wan GP version has been greatly optimzed and reduced LTX Video VRAM requirements by 4 !
     Also:
     - Added supported for the best Control Video Model, released 2 days ago : Vace 14B
 You will find prebuilt Loras on https://civitai.com/ or you will be able to build them with tools such as kohya or onetrainer.
+### CausVid Lora
+Wan CausVid is a distilled Wan model that can generate nice looking videos in only 4 to 12 steps. Also as a distilled model it doesnt require CFG and is two times faster for the same number of steps.
+The great thing is that Kijai (Kudos to him !) has created a CausVid Lora that can be combined with any existing Wan t2v model 14B like Wan Vace 14B to accelerate other models too. It is possible it works also with Wan i2v models.
+Instructions:
+1) Download first the Lora: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
+2) Choose a Wan t2v model (for instance Wan 2.1 text2video 13B or Vace 13B )
+3) Turn on the Advanced Mode by checking the corresponding checkbox
+4) In the Advanced Generation Tab: select Guidance Scale =1, Shift Scale = 7
+5) In the Advanced Lora Tab : Select the CausVid Lora (click the Refresh button at the top if you dont see it), and enter 0.3 as Lora multiplier
+6) Now select a 12 steps generation and Click Generate
+You can reduce the number of steps to as low as 4 but you will need to increase progressively at the same time the Lora muliplier up to 1. Please note the lower the number of steps the lower the quality (especially the motion).
+You can combine the CausVid Lora and other Loras (just follow the instructions above)
 ### Macros (basic)
 In *Advanced Mode*, you can starts prompt lines with a "!" , for instance:\
 ```

examples/i2v_input.JPG DELETED Viewed

Git LFS Details

SHA256: 077e3d965090c9028c69c00931675f42e1acc815c6eb450ab291b3b72d211a8e
Pointer size: 131 Bytes
Size of remote file: 251 kB

i2v_inference.py CHANGED Viewed

@@ -105,12 +105,6 @@ def load_i2v_model(model_filename, text_encoder_filename, is_720p):
         wan_model = wan.WanI2V(
             config=cfg,
             checkpoint_dir=DATA_DIR,
-            device_id=0,
-            rank=0,
-            t5_fsdp=False,
-            dit_fsdp=False,
-            use_usp=False,
-            i2v720p=True,
             model_filename=model_filename,
             text_encoder_filename=text_encoder_filename
         )
@@ -120,12 +114,6 @@ def load_i2v_model(model_filename, text_encoder_filename, is_720p):
         wan_model = wan.WanI2V(
             config=cfg,
             checkpoint_dir=DATA_DIR,
-            device_id=0,
-            rank=0,
-            t5_fsdp=False,
-            dit_fsdp=False,
-            use_usp=False,
-            i2v720p=False,
             model_filename=model_filename,
             text_encoder_filename=text_encoder_filename
         )
@@ -624,8 +612,8 @@ def main():
     # Actually run the i2v generation
     try:
         sample_frames = wan_model.generate(
-            user_prompt,
-            input_img,
             frame_num=frame_count,
             width=width,
             height=height,

         wan_model = wan.WanI2V(
             config=cfg,
             checkpoint_dir=DATA_DIR,
             model_filename=model_filename,
             text_encoder_filename=text_encoder_filename
         )
         wan_model = wan.WanI2V(
             config=cfg,
             checkpoint_dir=DATA_DIR,
             model_filename=model_filename,
             text_encoder_filename=text_encoder_filename
         )
     # Actually run the i2v generation
     try:
         sample_frames = wan_model.generate(
+            input_prompt = user_prompt,
+            image_start = input_img,
             frame_num=frame_count,
             width=width,
             height=height,

requirements.txt CHANGED Viewed

@@ -17,7 +17,7 @@ gradio==5.23.0
 numpy>=1.23.5,<2
 einops
 moviepy==1.0.3
-mmgp==3.4.5
 peft==0.14.0
 mutagen
 pydantic==2.10.6

 numpy>=1.23.5,<2
 einops
 moviepy==1.0.3
+mmgp==3.4.6
 peft==0.14.0
 mutagen
 pydantic==2.10.6

tests/README.md DELETED Viewed

@@ -1,6 +0,0 @@
-Put all your models (Wan2.1-T2V-1.3B, Wan2.1-T2V-14B, Wan2.1-I2V-14B-480P, Wan2.1-I2V-14B-720P) in a folder and specify the max GPU number you want to use.
-```bash
-bash ./test.sh <local model dir> <gpu number>
-```

tests/test.sh DELETED Viewed

@@ -1,113 +0,0 @@
-#!/bin/bash
-if [ "$#" -eq 2 ]; then
-  MODEL_DIR=$(realpath "$1")
-  GPUS=$2
-else
-  echo "Usage: $0 <local model dir> <gpu number>"
-  exit 1
-fi
-SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
-REPO_ROOT="$(dirname "$SCRIPT_DIR")"
-cd "$REPO_ROOT" || exit 1
-PY_FILE=./generate.py
-function t2v_1_3B() {
-    T2V_1_3B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-1.3B"
-    # 1-GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B 1-GPU Test: "
-    python $PY_FILE --task t2v-1.3B --size 480*832 --ckpt_dir $T2V_1_3B_CKPT_DIR
-    # Multiple GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU Test: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU, prompt extend local_qwen: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
-    if [ -n "${DASH_API_KEY+x}" ]; then
-        echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_1_3B Multiple GPU, prompt extend dashscope: "
-        torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-1.3B --ckpt_dir $T2V_1_3B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_method "dashscope"
-    else
-        echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> No DASH_API_KEY found, skip the dashscope extend test."
-    fi
-}
-function t2v_14B() {
-    T2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-14B"
-    # 1-GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B 1-GPU Test: "
-    python $PY_FILE --task t2v-14B --size 480*832 --ckpt_dir $T2V_14B_CKPT_DIR
-    # Multiple GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B Multiple GPU Test: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2v_14B Multiple GPU, prompt extend local_qwen: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task t2v-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
-}
-function t2i_14B() {
-    T2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-T2V-14B"
-    # 1-GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B 1-GPU Test: "
-    python $PY_FILE --task t2i-14B --size 480*832 --ckpt_dir $T2V_14B_CKPT_DIR
-    # Multiple GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B Multiple GPU Test: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task t2i-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> t2i_14B Multiple GPU, prompt extend local_qwen: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task t2i-14B --ckpt_dir $T2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-3B-Instruct" --prompt_extend_target_lang "en"
-}
-function i2v_14B_480p() {
-    I2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-I2V-14B-480P"
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B 1-GPU Test: "
-    python $PY_FILE --task i2v-14B --size 832*480 --ckpt_dir $I2V_14B_CKPT_DIR
-    # Multiple GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU Test: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU, prompt extend local_qwen: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_model "Qwen/Qwen2.5-VL-3B-Instruct" --prompt_extend_target_lang "en"
-    if [ -n "${DASH_API_KEY+x}" ]; then
-        echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU, prompt extend dashscope: "
-        torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 832*480 --dit_fsdp --t5_fsdp --ulysses_size $GPUS --use_prompt_extend --prompt_extend_method "dashscope"
-    else
-        echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> No DASH_API_KEY found, skip the dashscope extend test."
-    fi
-}
-function i2v_14B_720p() {
-    I2V_14B_CKPT_DIR="$MODEL_DIR/Wan2.1-I2V-14B-720P"
-    # 1-GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B 1-GPU Test: "
-    python $PY_FILE --task i2v-14B --size 720*1280 --ckpt_dir $I2V_14B_CKPT_DIR
-    # Multiple GPU Test
-    echo -e "\n\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> i2v_14B Multiple GPU Test: "
-    torchrun --nproc_per_node=$GPUS $PY_FILE --task i2v-14B --ckpt_dir $I2V_14B_CKPT_DIR --size 720*1280 --dit_fsdp --t5_fsdp --ulysses_size $GPUS
-}
-t2i_14B
-t2v_1_3B
-t2v_14B
-i2v_14B_480p
-i2v_14B_720p

wan/text2video.py CHANGED Viewed

@@ -26,7 +26,7 @@ from .utils.fm_solvers import (FlowDPMSolverMultistepScheduler,
 from .utils.fm_solvers_unipc import FlowUniPCMultistepScheduler
 from wan.modules.posemb_layers import get_rotary_pos_embed
 from .utils.vace_preprocessor import VaceVideoProcessor
 def optimized_scale(positive_flat, negative_flat):
@@ -82,7 +82,9 @@ class WanT2V:
         from mmgp import offload
         # model_filename = "c:/temp/vace1.3/diffusion_pytorch_model.safetensors"
         # model_filename = "vace14B_quanto_bf16_int8.safetensors"
-        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer, writable_tensors= False) # , forcedConfigPath= "c:/temp/vace1.3/config.json")
         # offload.load_model_data(self.model, "e:/vace.safetensors")
         # offload.load_model_data(self.model, "c:/temp/Phantom-Wan-1.3B.pth")
         # self.model.to(torch.bfloat16)
@@ -90,8 +92,8 @@ class WanT2V:
         self.model.lock_layers_dtypes(torch.float32 if mixed_precision_transformer else dtype)
         # dtype = torch.bfloat16
         offload.change_dtype(self.model, dtype, True)
-        # offload.save_model(self.model, "wan2.1_Vace1.3B_mbf16.safetensors", config_file_path="c:/temp/vace1.3/config.json")
-        # offload.save_model(self.model, "vace14B_quanto_fp16_int8.safetensors", do_quantize= True, config_file_path="c:/temp/vace/vace_config.json")
         self.model.eval().requires_grad_(False)
@@ -399,13 +401,14 @@ class WanT2V:
         # evaluation mode
-        if sample_solver == 'unipc':
-            sample_scheduler = FlowUniPCMultistepScheduler(
-                num_train_timesteps=self.num_train_timesteps,
-                shift=1,
-                use_dynamic_shifting=False)
-            sample_scheduler.set_timesteps(
-                sampling_steps, device=self.device, shift=shift)
             timesteps = sample_scheduler.timesteps
         elif sample_solver == 'dpm++':
             sample_scheduler = FlowDPMSolverMultistepScheduler(
@@ -468,7 +471,11 @@ class WanT2V:
             timestep = torch.stack(timestep)
             kwargs["current_step"] = i
             kwargs["t"] = timestep
-            if joint_pass:
                 if phantom:
                     pos_it, pos_i, neg = self.model(
                          [ torch.cat([latent_model_input[:,:-input_ref_images.shape[1]], input_ref_images], dim=1) ] * 2 +
@@ -509,7 +516,9 @@ class WanT2V:
             # del latent_model_input
             # CFG Zero *. Thanks to https://github.com/WeichenFan/CFG-Zero-star/
-            if phantom:
                 guide_scale_img= 5.0
                 guide_scale_text= guide_scale #7.5
                 noise_pred = neg + guide_scale_img * (pos_i - neg) + guide_scale_text * (pos_it - pos_i)
@@ -528,13 +537,13 @@ class WanT2V:
                         noise_pred_uncond *= alpha
                 noise_pred = noise_pred_uncond + guide_scale * (noise_pred_text - noise_pred_uncond)
             noise_pred_uncond, noise_pred_cond, noise_pred_text, pos_it, pos_i, neg  = None, None, None, None, None, None
             temp_x0 = sample_scheduler.step(
                 noise_pred[:, :target_shape[1]].unsqueeze(0),
                 t,
                 latents.unsqueeze(0),
-                return_dict=False,
-                generator=seed_g)[0]
             latents = temp_x0.squeeze(0)
             del temp_x0

 from .utils.fm_solvers_unipc import FlowUniPCMultistepScheduler
 from wan.modules.posemb_layers import get_rotary_pos_embed
 from .utils.vace_preprocessor import VaceVideoProcessor
+from wan.utils.basic_flowmatch import FlowMatchScheduler
 def optimized_scale(positive_flat, negative_flat):
         from mmgp import offload
         # model_filename = "c:/temp/vace1.3/diffusion_pytorch_model.safetensors"
         # model_filename = "vace14B_quanto_bf16_int8.safetensors"
+        # model_filename = "c:/temp/movii/diffusion_pytorch_model-00001-of-00007.safetensors"
+        # config_filename= "c:/temp/movii/config.json"
+        self.model = offload.fast_load_transformers_model(model_filename, modelClass=WanModel,do_quantize= quantizeTransformer, writable_tensors= False) # , forcedConfigPath= config_filename)
         # offload.load_model_data(self.model, "e:/vace.safetensors")
         # offload.load_model_data(self.model, "c:/temp/Phantom-Wan-1.3B.pth")
         # self.model.to(torch.bfloat16)
         self.model.lock_layers_dtypes(torch.float32 if mixed_precision_transformer else dtype)
         # dtype = torch.bfloat16
         offload.change_dtype(self.model, dtype, True)
+        # offload.save_model(self.model, "wan2.1_moviigen_14B_mbf16.safetensors", config_file_path=config_filename)
+        # offload.save_model(self.model, "wan2.1_moviigen_14B_quanto_fp16_int8.safetensors", do_quantize= True, config_file_path=config_filename)
         self.model.eval().requires_grad_(False)
         # evaluation mode
+        if False:
+            sample_scheduler = FlowMatchScheduler(num_inference_steps=sampling_steps, shift=shift, sigma_min=0, extra_one_step=True)
+            timesteps = torch.tensor([1000, 934, 862, 756, 603, 410, 250, 140, 74, 0])[:sampling_steps].to(self.device)
+            sample_scheduler.timesteps =timesteps
+        elif sample_solver == 'unipc':
+            sample_scheduler = FlowUniPCMultistepScheduler( num_train_timesteps=self.num_train_timesteps, shift=1, use_dynamic_shifting=False)
+            sample_scheduler.set_timesteps( sampling_steps, device=self.device, shift=shift)
             timesteps = sample_scheduler.timesteps
         elif sample_solver == 'dpm++':
             sample_scheduler = FlowDPMSolverMultistepScheduler(
             timestep = torch.stack(timestep)
             kwargs["current_step"] = i
             kwargs["t"] = timestep
+            if guide_scale == 1:
+                noise_pred = self.model( [latent_model_input], x_id = 0, context = [context], **kwargs)[0]
+                if self._interrupt:
+                    return None
+            elif joint_pass:
                 if phantom:
                     pos_it, pos_i, neg = self.model(
                          [ torch.cat([latent_model_input[:,:-input_ref_images.shape[1]], input_ref_images], dim=1) ] * 2 +
             # del latent_model_input
             # CFG Zero *. Thanks to https://github.com/WeichenFan/CFG-Zero-star/
+            if guide_scale == 1:
+                pass
+            elif phantom:
                 guide_scale_img= 5.0
                 guide_scale_text= guide_scale #7.5
                 noise_pred = neg + guide_scale_img * (pos_i - neg) + guide_scale_text * (pos_it - pos_i)
                         noise_pred_uncond *= alpha
                 noise_pred = noise_pred_uncond + guide_scale * (noise_pred_text - noise_pred_uncond)
             noise_pred_uncond, noise_pred_cond, noise_pred_text, pos_it, pos_i, neg  = None, None, None, None, None, None
+            scheduler_kwargs = {} if isinstance(sample_scheduler, FlowMatchScheduler) else {"generator": seed_g}
             temp_x0 = sample_scheduler.step(
                 noise_pred[:, :target_shape[1]].unsqueeze(0),
                 t,
                 latents.unsqueeze(0),
+                # return_dict=False,
+                **scheduler_kwargs)[0]
             latents = temp_x0.squeeze(0)
             del temp_x0

wgp.py CHANGED Viewed

@@ -42,7 +42,7 @@ global_queue_ref = []
 AUTOSAVE_FILENAME = "queue.zip"
 PROMPT_VARS_MAX = 10
-target_mmgp_version = "3.4.5"
 prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
 from importlib.metadata import version
@@ -1529,7 +1529,9 @@ for path in  ["wan2.1_Vace_1.3B_preview_bf16.safetensors", "sky_reels2_diffusion
 wan_choices_t2v=["ckpts/wan2.1_text2video_1.3B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_quanto_int8.safetensors", "ckpts/wan2.1_Vace_1.3B_mbf16.safetensors",
                          "ckpts/wan2.1_recammaster_1.3B_bf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_1.3B_mbf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_14B_bf16.safetensors",
                         "ckpts/sky_reels2_diffusion_forcing_14B_quanto_int8.safetensors",  "ckpts/sky_reels2_diffusion_forcing_720p_14B_mbf16.safetensors","ckpts/sky_reels2_diffusion_forcing_720p_14B_quanto_mbf16_int8.safetensors",
-                        "ckpts/wan2_1_phantom_1.3B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_quanto_mbf16_int8.safetensors"]
 wan_choices_i2v=["ckpts/wan2.1_image2video_480p_14B_mbf16.safetensors", "ckpts/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_image2video_720p_14B_mbf16.safetensors",
                         "ckpts/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_Fun_InP_1.3B_bf16.safetensors", "ckpts/wan2.1_Fun_InP_14B_bf16.safetensors",
                         "ckpts/wan2.1_Fun_InP_14B_quanto_int8.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_bf16.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_quanto_int8.safetensors",
@@ -1547,11 +1549,11 @@ def get_dependent_models(model_filename, quantization, dtype_policy ):
         return [get_model_filename("ltxv_13B", quantization, dtype_policy)]
     else:
         return []
-model_types = [ "t2v_1.3B", "t2v", "i2v", "i2v_720p", "flf2v_720p", "vace_1.3B","vace_14B", "phantom_1.3B", "fantasy",  "fun_inp_1.3B", "fun_inp", "recam_1.3B",  "sky_df_1.3B", "sky_df_14B", "sky_df_720p_14B", "ltxv_13B", "ltxv_13B_distilled", "hunyuan", "hunyuan_i2v", "hunyuan_custom"]
 model_signatures = {"t2v": "text2video_14B", "t2v_1.3B" : "text2video_1.3B",   "fun_inp_1.3B" : "Fun_InP_1.3B",  "fun_inp" :  "Fun_InP_14B",
                     "i2v" : "image2video_480p", "i2v_720p" : "image2video_720p" , "vace_1.3B" : "Vace_1.3B", "vace_14B" : "Vace_14B","recam_1.3B": "recammaster_1.3B",
                     "flf2v_720p" : "FLF2V_720p", "sky_df_1.3B" : "sky_reels2_diffusion_forcing_1.3B", "sky_df_14B" : "sky_reels2_diffusion_forcing_14B",
-                    "sky_df_720p_14B" : "sky_reels2_diffusion_forcing_720p_14B",
                      "phantom_1.3B" : "phantom_1.3B", "fantasy" : "fantasy", "ltxv_13B" : "ltxv_0.9.7_13B_dev", "ltxv_13B_distilled" : "ltxv_0.9.7_13B_distilled",  "hunyuan" : "hunyuan_video_720", "hunyuan_i2v" : "hunyuan_video_i2v_720", "hunyuan_custom" : "hunyuan_video_custom" }
@@ -1616,6 +1618,9 @@ def get_model_name(model_filename, description_container = [""]):
         model_name = "Wan2.1 Fantasy Speaking 720p"
         model_name += " 14B" if "14B" in model_filename else " 1.3B"
         description = "The Fantasy Speaking model corresponds to the original Wan image 2 video model combined with the Fantasy Speaking extension to process an audio Input."
     elif "ltxv_0.9.7_13B_dev" in model_filename:
         model_name = "LTX Video 0.9.7 13B"
         description = "LTX Video is a fast model that can be used to generate long videos (up to 260 frames).It is recommended to keep the number of steps to 30 or you will need to update the file 'ltxv_video/configs/ltxv-13b-0.9.7-dev.yaml'.The LTX Video model expects very long prompts, so don't hesitate to use the Prompt Enhancer."
@@ -4541,12 +4546,17 @@ def generate_video_tab(update_form = False, state_dict = None, ui_defaults = Non
                     label = "Max Resolution (as it maybe less depending on video width / height ratio)"
                 resolution = gr.Dropdown(
                     choices=[
                         # 720p
                         ("1280x720 (16:9, 720p)", "1280x720"),
                         ("720x1280 (9:16, 720p)", "720x1280"),
                         ("1024x1024 (1:1, 720p)", "1024x024"),
-                        ("832x1104 (3:4, 720p)", "832x1104"),
                         ("1104x832 (4:3, 720p)", "1104x832"),
                         ("960x960 (1:1, 720p)", "960x960"),
                         # 480p
                         ("960x544 (16:9, 540p)", "960x544"),
@@ -5651,7 +5661,7 @@ def create_demo():
         theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
     with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
-        gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.1 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
         global model_list
         tab_state = gr.State({ "tab_no":0 })

 AUTOSAVE_FILENAME = "queue.zip"
 PROMPT_VARS_MAX = 10
+target_mmgp_version = "3.4.6"
 prompt_enhancer_image_caption_model, prompt_enhancer_image_caption_processor, prompt_enhancer_llm_model, prompt_enhancer_llm_tokenizer = None, None, None, None
 from importlib.metadata import version
 wan_choices_t2v=["ckpts/wan2.1_text2video_1.3B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_bf16.safetensors", "ckpts/wan2.1_text2video_14B_quanto_int8.safetensors", "ckpts/wan2.1_Vace_1.3B_mbf16.safetensors",
                          "ckpts/wan2.1_recammaster_1.3B_bf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_1.3B_mbf16.safetensors", "ckpts/sky_reels2_diffusion_forcing_14B_bf16.safetensors",
                         "ckpts/sky_reels2_diffusion_forcing_14B_quanto_int8.safetensors",  "ckpts/sky_reels2_diffusion_forcing_720p_14B_mbf16.safetensors","ckpts/sky_reels2_diffusion_forcing_720p_14B_quanto_mbf16_int8.safetensors",
+                        "ckpts/wan2_1_phantom_1.3B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_mbf16.safetensors", "ckpts/wan2.1_Vace_14B_quanto_mbf16_int8.safetensors",
+                        "ckpts/wan2.1_moviigen1.1_14B_mbf16.safetensors", "ckpts/wan2.1_moviigen1.1_14B_quanto_mbf16_int8.safetensors",
+                        ]
 wan_choices_i2v=["ckpts/wan2.1_image2video_480p_14B_mbf16.safetensors", "ckpts/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_image2video_720p_14B_mbf16.safetensors",
                         "ckpts/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors", "ckpts/wan2.1_Fun_InP_1.3B_bf16.safetensors", "ckpts/wan2.1_Fun_InP_14B_bf16.safetensors",
                         "ckpts/wan2.1_Fun_InP_14B_quanto_int8.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_bf16.safetensors", "ckpts/wan2.1_FLF2V_720p_14B_quanto_int8.safetensors",
         return [get_model_filename("ltxv_13B", quantization, dtype_policy)]
     else:
         return []
+model_types = [ "t2v_1.3B", "t2v", "i2v", "i2v_720p", "flf2v_720p", "vace_1.3B","vace_14B","moviigen", "phantom_1.3B", "fantasy",  "fun_inp_1.3B", "fun_inp", "recam_1.3B",  "sky_df_1.3B", "sky_df_14B", "sky_df_720p_14B", "ltxv_13B", "ltxv_13B_distilled", "hunyuan", "hunyuan_i2v", "hunyuan_custom"]
 model_signatures = {"t2v": "text2video_14B", "t2v_1.3B" : "text2video_1.3B",   "fun_inp_1.3B" : "Fun_InP_1.3B",  "fun_inp" :  "Fun_InP_14B",
                     "i2v" : "image2video_480p", "i2v_720p" : "image2video_720p" , "vace_1.3B" : "Vace_1.3B", "vace_14B" : "Vace_14B","recam_1.3B": "recammaster_1.3B",
                     "flf2v_720p" : "FLF2V_720p", "sky_df_1.3B" : "sky_reels2_diffusion_forcing_1.3B", "sky_df_14B" : "sky_reels2_diffusion_forcing_14B",
+                    "sky_df_720p_14B" : "sky_reels2_diffusion_forcing_720p_14B",  "moviigen" :"moviigen",
                      "phantom_1.3B" : "phantom_1.3B", "fantasy" : "fantasy", "ltxv_13B" : "ltxv_0.9.7_13B_dev", "ltxv_13B_distilled" : "ltxv_0.9.7_13B_distilled",  "hunyuan" : "hunyuan_video_720", "hunyuan_i2v" : "hunyuan_video_i2v_720", "hunyuan_custom" : "hunyuan_video_custom" }
         model_name = "Wan2.1 Fantasy Speaking 720p"
         model_name += " 14B" if "14B" in model_filename else " 1.3B"
         description = "The Fantasy Speaking model corresponds to the original Wan image 2 video model combined with the Fantasy Speaking extension to process an audio Input."
+    elif "movii" in model_filename:
+        model_name = "Wan2.1 MoviiGen 1080p 14B"
+        description = "MoviiGen 1.1, a cutting-edge video generation model that excels in cinematic aesthetics and visual quality. Use it to generate videos in 720p or 1080p in the 21:9 ratio."
     elif "ltxv_0.9.7_13B_dev" in model_filename:
         model_name = "LTX Video 0.9.7 13B"
         description = "LTX Video is a fast model that can be used to generate long videos (up to 260 frames).It is recommended to keep the number of steps to 30 or you will need to update the file 'ltxv_video/configs/ltxv-13b-0.9.7-dev.yaml'.The LTX Video model expects very long prompts, so don't hesitate to use the Prompt Enhancer."
                     label = "Max Resolution (as it maybe less depending on video width / height ratio)"
                 resolution = gr.Dropdown(
                     choices=[
+                        # 1080p
+                        ("1920x832 (21:9, 1080p)", "1920x832"),
+                        ("832x1920 (9:21, 1080p)", "832x1920"),
                         # 720p
                         ("1280x720 (16:9, 720p)", "1280x720"),
                         ("720x1280 (9:16, 720p)", "720x1280"),
                         ("1024x1024 (1:1, 720p)", "1024x024"),
+                        ("1280x544 (21:9, 720p)", "1280x544"),
+                        ("544x1280 (9:21, 720p)", "544x1280"),
                         ("1104x832 (4:3, 720p)", "1104x832"),
+                        ("832x1104 (3:4, 720p)", "832x1104"),
                         ("960x960 (1:1, 720p)", "960x960"),
                         # 480p
                         ("960x544 (16:9, 540p)", "960x544"),
         theme = gr.themes.Soft(font=["Verdana"], primary_hue="sky", neutral_hue="slate", text_size="md")
     with gr.Blocks(css=css, theme=theme, title= "WanGP") as main:
+        gr.Markdown("<div align=center><H1>Wan<SUP>GP</SUP> v5.2 <FONT SIZE=4>by <I>DeepBeepMeep</I></FONT> <FONT SIZE=3>") # (<A HREF='https://github.com/deepbeepmeep/Wan2GP'>Updates</A>)</FONT SIZE=3></H1></div>")
         global model_list
         tab_state = gr.State({ "tab_no":0 })