Update README.md
Browse files
README.md
CHANGED
@@ -25,16 +25,18 @@ LTX-Video is the first DiT-based video generation model capable of generating hi
|
|
25 |
|
26 |
# Models & Workflows
|
27 |
|
28 |
-
| Name | Notes | inference.py config
|
29 |
-
|
30 |
-
| ltxv-13b-0.9.
|
31 |
-
| [ltxv-13b-0.9.
|
32 |
-
| [ltxv-13b-0.9.
|
33 |
-
|
|
34 |
-
| ltxv-13b-0.9.
|
35 |
-
| ltxv-13b-0.9.
|
36 |
-
| ltxv-2b-0.9.
|
37 |
-
| ltxv-2b-0.9.6
|
|
|
|
|
38 |
|
39 |
## Model Details
|
40 |
- **Developed by:** Lightricks
|
@@ -61,7 +63,15 @@ You can use the model for purposes under the license:
|
|
61 |
- 13B version 0.9.7-ICLoRA Canny [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
62 |
- Temporal upscaler version 0.9.7 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
63 |
- Spatial upscaler version 0.9.7 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
### General tips:
|
67 |
* The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames.
|
@@ -103,7 +113,7 @@ To use our model, please follow the inference code in [inference.py](https://git
|
|
103 |
#### For image-to-video generation:
|
104 |
|
105 |
```bash
|
106 |
-
python inference.py --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.
|
107 |
```
|
108 |
|
109 |
#### For video generation with multiple conditions:
|
@@ -112,7 +122,7 @@ You can now generate a video conditioned on a set of images and/or short video s
|
|
112 |
Simply provide a list of paths to the images or video segments you want to condition on, along with their target frame numbers in the generated video. You can also specify the conditioning strength for each item (default: 1.0).
|
113 |
|
114 |
```bash
|
115 |
-
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.
|
116 |
```
|
117 |
|
118 |
### Diffusers 🧨
|
@@ -136,8 +146,8 @@ from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
|
|
136 |
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
|
137 |
from diffusers.utils import export_to_video, load_image, load_video
|
138 |
|
139 |
-
pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.
|
140 |
-
pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained("Lightricks/ltxv-spatial-upscaler-0.9.
|
141 |
pipe.to("cuda")
|
142 |
pipe_upsample.to("cuda")
|
143 |
pipe.vae.enable_tiling()
|
@@ -212,8 +222,8 @@ from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
|
|
212 |
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
|
213 |
from diffusers.utils import export_to_video, load_video
|
214 |
|
215 |
-
pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.
|
216 |
-
pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained("Lightricks/ltxv-spatial-upscaler-0.9.
|
217 |
pipe.to("cuda")
|
218 |
pipe_upsample.to("cuda")
|
219 |
pipe.vae.enable_tiling()
|
|
|
25 |
|
26 |
# Models & Workflows
|
27 |
|
28 |
+
| Name | Notes | inference.py config | ComfyUI workflow (Recommended) |
|
29 |
+
|----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
30 |
+
| ltxv-13b-0.9.8-dev | Highest quality, requires more VRAM | [ltxv-13b-0.9.8-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-dev.yaml) | [ltxv-13b-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base.json) |
|
31 |
+
| [ltxv-13b-0.9.8-mix](https://app.ltx.studio/motion-workspace?videoModel=ltxv-13b) | Mix ltxv-13b-dev and ltxv-13b-distilled in the same multi-scale rendering workflow for balanced speed-quality | N/A | [ltxv-13b-i2v-mixed-multiscale.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-mixed-multiscale.json) |
|
32 |
+
| [ltxv-13b-0.9.8-distilled](https://app.ltx.studio/motion-workspace?videoModel=ltxv) | Faster, less VRAM usage, slight quality reduction compared to 13b. Ideal for rapid iterations | [ltxv-13b-0.9.8-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-dev.yaml) | [ltxv-13b-dist-i2v-base.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-i2v-base.json) |
|
33 |
+
| ltxv-2b-0.9.8-distilled | Smaller model, slight quality reduction compared to 13b distilled. Ideal for light VRAM usage | [ltxv-2b-0.9.8-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.8-dev.yaml) | N/A |
|
34 |
+
| ltxv-13b-0.9.8-fp8 | Quantized version of ltxv-13b | [ltxv-13b-0.9.8-dev-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-dev-fp8.yaml) | [ltxv-13b-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/ltxv-13b-i2v-base-fp8.json) |
|
35 |
+
| ltxv-13b-0.9.8-distilled-fp8 | Quantized version of ltxv-13b-distilled | [ltxv-13b-0.9.8-distilled-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-13b-0.9.8-distilled-fp8.yaml) | [ltxv-13b-dist-i2v-base-fp8.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/13b-distilled/ltxv-13b-dist-i2v-base-fp8.json) |
|
36 |
+
| ltxv-2b-0.9.8-distilled-fp8 | Quantized version of ltxv-2b-distilled | [ltxv-2b-0.9.8-distilled-fp8.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.8-distilled-fp8.yaml) | N/A |
|
37 |
+
| ltxv-2b-0.9.6 | Good quality, lower VRAM requirement than ltxv-13b | [ltxv-2b-0.9.6-dev.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-dev.yaml) | [ltxvideo-i2v.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v.json) |
|
38 |
+
| ltxv-2b-0.9.6-distilled | 15× faster, real-time capable, fewer steps needed, no STG/CFG required | [ltxv-2b-0.9.6-distilled.yaml](https://github.com/Lightricks/LTX-Video/blob/main/configs/ltxv-2b-0.9.6-distilled.yaml) | [ltxvideo-i2v-distilled.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/low_level/ltxvideo-i2v-distilled.json) |
|
39 |
+
|
40 |
|
41 |
## Model Details
|
42 |
- **Developed by:** Lightricks
|
|
|
63 |
- 13B version 0.9.7-ICLoRA Canny [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
64 |
- Temporal upscaler version 0.9.7 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
65 |
- Spatial upscaler version 0.9.7 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
66 |
+
- 13B version 0.9.8-dev [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
67 |
+
- 13B version 0.9.8-dev-fp8 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
68 |
+
- 13B version 0.9.8-distilled [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
69 |
+
- 13B version 0.9.8-distilled-fp8 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
70 |
+
- 2B version 0.9.8-distilled [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
71 |
+
- 2B version 0.9.8-distilled-fp8 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
72 |
+
- 13B version 0.9.8-ICLoRA detailer [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
73 |
+
- Temporal upscaler version 0.9.8 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
74 |
+
- Spatial upscaler version 0.9.8 [license](https://huggingface.co/Lightricks/LTX-Video/blob/main/LTX-Video-Open-Weights-License-0.X.txt)
|
75 |
|
76 |
### General tips:
|
77 |
* The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames.
|
|
|
113 |
#### For image-to-video generation:
|
114 |
|
115 |
```bash
|
116 |
+
python inference.py --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.8-distilled.yaml
|
117 |
```
|
118 |
|
119 |
#### For video generation with multiple conditions:
|
|
|
122 |
Simply provide a list of paths to the images or video segments you want to condition on, along with their target frame numbers in the generated video. You can also specify the conditioning strength for each item (default: 1.0).
|
123 |
|
124 |
```bash
|
125 |
+
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_PATH_1 IMAGE_OR_VIDEO_PATH_2 --conditioning_start_frames TARGET_FRAME_1 TARGET_FRAME_2 --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config configs/ltxv-13b-0.9.8-distilled.yaml
|
126 |
```
|
127 |
|
128 |
### Diffusers 🧨
|
|
|
146 |
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
|
147 |
from diffusers.utils import export_to_video, load_image, load_video
|
148 |
|
149 |
+
pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.8-dev", torch_dtype=torch.bfloat16)
|
150 |
+
pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained("Lightricks/ltxv-spatial-upscaler-0.9.8", vae=pipe.vae, torch_dtype=torch.bfloat16)
|
151 |
pipe.to("cuda")
|
152 |
pipe_upsample.to("cuda")
|
153 |
pipe.vae.enable_tiling()
|
|
|
222 |
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
|
223 |
from diffusers.utils import export_to_video, load_video
|
224 |
|
225 |
+
pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.8-dev", torch_dtype=torch.bfloat16)
|
226 |
+
pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained("Lightricks/ltxv-spatial-upscaler-0.9.8", vae=pipe.vae, torch_dtype=torch.bfloat16)
|
227 |
pipe.to("cuda")
|
228 |
pipe_upsample.to("cuda")
|
229 |
pipe.vae.enable_tiling()
|