YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Yato Text-to-Video Generation

This repository contains the necessary steps and scripts to generate videos using the Yato text-to-video model. The model leverages LoRA (Low-Rank Adaptation) weights and pre-trained components to create high-quality anime-style videos based on textual prompts.

Prerequisites

Before proceeding, ensure that you have the following installed on your system:

• Ubuntu (or a compatible Linux distribution) • Python 3.x • pip (Python package manager) • Git • Git LFS (Git Large File Storage) • FFmpeg

Installation

  1. Update and Install Dependencies

    sudo apt-get update && sudo apt-get install cbm git-lfs ffmpeg
    
  2. Clone the Repository

    git clone https://huggingface.co/svjack/Yato_wan_2_1_1_3_B_text2video_lora
    cd Yato_wan_2_1_1_3_B_text2video_lora
    
  3. Install Python Dependencies

    pip install torch torchvision
    pip install -r requirements.txt
    pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
    pip install moviepy==1.0.3
    pip install sageattention==1.0.6
    
  4. Download Model Weights

    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
    wget https://huggingface.co/DeepBeepMeep/Wan2.1/resolve/main/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
    wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
    wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors
    wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors
    

Usage

To generate a video, use the wan_generate_video.py script with the appropriate parameters. Below are examples of how to generate videos using the Yato model.

Burger

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 50 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Yato_outputs/Yato_w1_3_lora-000010.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of Noragami , The video features a series of close-up shots of an animated character with black hair and blue eyes. The character is eating a burger."

Money

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 50 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Yato_outputs/Yato_w1_3_lora-000010.safetensors \
--lora_multiplier 1.0 --seed 77 \
--prompt "In the style of Noragami , The video features a series of close-up shots of an animated character with black hair and blue eyes. The character carry money in a wallet at home"

Sun

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 50 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Yato_outputs/Yato_w1_3_lora-000010.safetensors \
--lora_multiplier 1.0 \
--prompt "In the style of Noragami , The video features a series of close-up shots of an animated character with black hair and blue eyes. the character shade from the sun with an umbrella outdoor."

Sleep

python wan_generate_video.py --fp8 --task t2v-1.3B --video_size 480 832 --video_length 81 --infer_steps 50 \
--save_path save --output_type both \
--dit wan2.1_t2v_1.3B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Yato_outputs/Yato_w1_3_lora-000010.safetensors \
--lora_multiplier 1.0 --seed 57 \
--prompt "In the style of Noragami , The video features a series of close-up shots of an animated character with black hair and blue eyes. the character is sleep on the bed"

Parameters

  • --fp8: Enable FP8 precision (optional).
  • --task: Specify the task (e.g., t2v-1.3B).
  • --video_size: Set the resolution of the generated video (e.g., 1024 1024).
  • --video_length: Define the length of the video in frames.
  • --infer_steps: Number of inference steps.
  • --save_path: Directory to save the generated video.
  • --output_type: Output type (e.g., both for video and frames).
  • --dit: Path to the diffusion model weights.
  • --vae: Path to the VAE model weights.
  • --t5: Path to the T5 model weights.
  • --attn_mode: Attention mode (e.g., torch).
  • --lora_weight: Path to the LoRA weights.
  • --lora_multiplier: Multiplier for LoRA weights.
  • --prompt: Textual prompt for video generation.

Output

The generated video and frames will be saved in the specified save_path directory.

Troubleshooting

• Ensure all dependencies are correctly installed. • Verify that the model weights are downloaded and placed in the correct locations. • Check for any missing Python packages and install them using pip.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

• Hugging Face for hosting the model weights. • Wan-AI for providing the pre-trained models. • DeepBeepMeep for contributing to the model weights.

Contact

For any questions or issues, please open an issue on the repository or contact the maintainer.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support