metadata

license: apache-2.0

Magi-1: Autoregressive Video Generation Are Scalable World Models

此处添加官方图片

This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our project page.

1. Introduction

We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.

2. Model and Checkpoints

We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.

Model	Link	Recommend Machine
Magi-1-24B	Magi-1-24B	H100/H800 * 8
Magi-1-24B-distill	Magi-1-24B-distill	H100/H800 * 8
Magi-1-24B-distill+fp8_quant	Magi-1-24B-distill+quant	H100/H800 * 4 or RTX 4090 * 8
Magi-1-4.5B	Magi-1-4.5B (Comming Soon)	RTX 4090 * 1
Magi-1-4.5B-distill	Magi-1-4.5B-distill (Comming Soon)	RTX 4090 * 1
Magi-1-4.5B-distill+fp8_quant	Magi-1-4.5B-distill+fp8_quant (Comming Soon)	RTX 4090 * 1

3. How to run

3.1 Environment preparation

We provide two ways to run Magi-1, with the Docker environment being the recommended option.

Run with docker environment (Recommend)

docker pull magi/magi:latest

docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash

Run with source code

# Create a new environment
conda create -n magi python==3.10.12
# Install pytorch
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# Install other dependencies
pip install -r requirements.txt
# Install magi-attention, new install method
pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps

3.2 Inference command

# Run 24B Magi-1 model
bash example/24B/run.sh

# Run 4.5B Magi-1 model
bash example/4.5B/run.sh

3.3 Useful configs

Config	Help
seed	Random seed used for video generation
video_size_h	Height of the video
video_size_w	Width of the video
num_frames	Controls the duration of generated video
fps	Frames per second, 4 video frames correspond to 1 latent_frame
cfg_number	Base model uses cfg_number==2, distill and quant model uses cfg_number=1
load	Directory containing a model checkpoint.
t5_pretrained	Path to load pretrained T5 model
vae_pretrained	Path to load pretrained VAE model

4. Acknowledgements

5. Contact

Please feel free to cite our paper if you find our code or model useful in your research.

@article{magi1,
  title={Magi-1: Autoregressive Video Generation Are Scalable World Models},
  author={Magi-1},
  journal={arXiv preprint arXiv:2504.06165},
  year={2025}
  (TODO: add correct citation)
}

If you have any questions, please feel free to raise an issue.