Spaces:
Runtime error
Runtime error
File size: 7,107 Bytes
38e20ed |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
<p align="center">
<img src="assets/logo.png" alt="Skyreels Logo" width="50%">
</p>
<h1 align="center">SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers</h1>
<div align='center'>
<a href='https://scholar.google.com/citations?user=6D_nzucAAAAJ&hl=en' target='_blank'>Di Qiu</a> 
<a href='https://scholar.google.com/citations?user=_43YnBcAAAAJ&hl=zh-CN' target='_blank'>Zhengcong Fei</a> 
<a href='' target='_blank'>Rui Wang</a> 
<a href='' target='_blank'>Jialin Bai</a> 
<a href='https://scholar.google.com/citations?user=Hv-vj2sAAAAJ&hl=en' target='_blank'>Changqian Yu</a> 
</div>
<div align='center'>
<a href='https://scholar.google.com.au/citations?user=ePIeVuUAAAAJ&hl=en' target='_blank'>Mingyuan Fan</a> 
<a href='https://scholar.google.com/citations?user=HukWSw4AAAAJ&hl=en' target='_blank'>Guibin Chen</a> 
<a href='https://scholar.google.com.tw/citations?user=RvAuMk0AAAAJ&hl=zh-CN' target='_blank'>Xiang Wen</a> 
</div>
<div align='center'>
<small><strong>Skywork AI</strong></small>
</div>
<br>
<div align="center">
<!-- <a href='LICENSE'><img src='https://img.shields.io/badge/license-MIT-yellow'></a> -->
<a href='https://arxiv.org/abs/2502.10841'><img src='https://img.shields.io/badge/arXiv-SkyReels A1-red'></a>
<a href='https://skyworkai.github.io/skyreels-a1.github.io/'><img src='https://img.shields.io/badge/Project-SkyReels A1-green'></a>
<a href='https://huggingface.co/Skywork/SkyReels-A1'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue'></a>
<a href='https://www.skyreels.ai/home?utm_campaign=github_A1'><img src='https://img.shields.io/badge/Playground-Spaces-yellow'></a>
<br>
</div>
<br>
<p align="center">
<img src="./assets/demo.gif" alt="showcase">
<br>
π₯ For more results, visit our <a href="https://skyworkai.github.io/skyreels-a1.github.io/"><strong>homepage</strong></a> π₯
</p>
<p align="center">
π Join our <a href="https://discord.gg/PwM6NYtccQ" target="_blank"><strong>Discord</strong></a>
</p>
This repo, named **SkyReels-A1**, contains the official PyTorch implementation of our paper [SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers](https://arxiv.org).
## π₯π₯π₯ News!!
* Mar 4, 2025: π₯ We release audio-driven portrait image animation pipeline.
* Feb 18, 2025: π We release the inference code and model weights of SkyReels-A1. [Download](https://huggingface.co/Skywork/SkyReels-A1)
* Feb 18, 2025: π We have made our technical report available as open source. [Read](https://skyworkai.github.io/skyreels-a1.github.io/report.pdf)
* Feb 18, 2025: π₯ Our online demo of LipSync is available on SkyReels now! Try out [LipSync](https://www.skyreels.ai/home/tools/lip-sync?refer=navbar).
* Feb 18, 2025: π₯ We have open-sourced I2V video generation model [SkyReels-V1](https://github.com/SkyworkAI/SkyReels-V1). This is the first and most advanced open-source human-centric video foundation model.
## π TODO List
- [x] Checkpoints
- [x] Inference Code
- [x] Web Demo (Gradio)
- [x] Audio-driven Portrait Image Animation Pipeline
- [ ] Inference Code for Long Videos
- [ ] User-Level GPU Inference on RTX4090
- [ ] ComfyUI
## Getting Started π
### 1. Clone the code and prepare the environment π οΈ
First git clone the repository with code:
```bash
git clone https://github.com/SkyworkAI/SkyReels-A1.git
cd SkyReels-A1
# create env using conda
conda create -n skyreels-a1 python=3.10
conda activate skyreels-a1
```
Then, install the remaining dependencies:
```bash
pip install -r requirements.txt
```
### 2. Download pretrained weights π₯
You can download the pretrained weights is from HuggingFace:
```bash
# !pip install -U "huggingface_hub[cli]"
huggingface-cli download SkyReels-A1 --local-dir local_path --exclude "*.git*" "README.md" "docs"
```
The FLAME, mediapipe, and smirk models are located in the SkyReels-A1/extra_models folder.
The directory structure of our SkyReels-A1 code is formulated as:
```text
pretrained_models
βββ FLAME
βββ SkyReels-A1-5B
β βββ pose_guider
β βββ scheduler
β βββ tokenizer
β βββ siglip-so400m-patch14-384
β βββ transformer
β βββ vae
β βββ text_encoder
βββ mediapipe
βββ smirk
```
#### Download DiffposeTalk assets and pretrained weights (For Audio-driven)
- We use [diffposetalk](https://github.com/DiffPoseTalk/DiffPoseTalk/tree/main) to generate flame coefficients from audio, thereby constructing motion signals.
- Download the diffposetalk code and follow its README to download the weights and related data.
- Then place them in the specified directory.
```bash
cp -r ${diffposetalk_root}/style pretrained_models/diffposetalk
cp ${diffposetalk_root}/experiments/DPT/head-SA-hubert-WM/checkpoints/iter_0110000.pt pretrained_models/diffposetalk
cp ${diffposetalk_root}/datasets/HDTF_TFHP/lmdb/stats_train.npz pretrained_models/diffposetalk
```
```text
pretrained_models
βββ FLAME
βββ SkyReels-A1-5B
βββ mediapipe
βββ diffposetalk
β βββ style
β βββ iter_0110000.pt
β βββ states_train.npz
βββ smirk
```
### 3. Inference π
You can simply run the inference scripts as:
```bash
python inference.py
# inference audio to video
python inference_audio.py
```
If the script runs successfully, you will get an output mp4 file. This file includes the following results: driving video, input image or video, and generated result.
## Gradio Interface π€
We provide a [Gradio](https://huggingface.co/docs/hub/spaces-sdks-gradio) interface for a better experience, just run by:
```bash
python app.py
```
The graphical interactive interface is shown as below:

## Metric Evaluation π
We also provide all scripts for automatically calculating the metrics, including SimFace, FID, and L1 distance between expression and motion, reported in the paper.
All codes can be found in the ```eval``` folder. After setting the video result path, run the following commands in sequence:
```bash
python arc_score.py
python expression_score.py
python pose_score.py
```
## Acknowledgements π
We would like to thank the contributors of [CogvideoX](https://github.com/THUDM/CogVideo), [finetrainers](https://github.com/a-r-r-o-w/finetrainers) and [DiffPoseTalk](https://github.com/DiffPoseTalk/DiffPoseTalk)repositories, for their open research and contributions.
## Citation π
If you find SkyReels-A1 useful for your research, welcome to π this repo and cite our work using the following BibTeX:
```bibtex
@article{qiu2025skyreels,
title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
journal={arXiv preprint arXiv:2502.10841},
year={2025}
}
```
|