multimodalart HF Staff commited on
Commit
f16936f
·
verified ·
1 Parent(s): 28b2dac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -194
README.md CHANGED
@@ -1,194 +1,14 @@
1
- <p align="center">
2
- <img src="assets/logo.png" alt="Skyreels Logo" width="50%">
3
- </p>
4
-
5
-
6
- <h1 align="center">SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers</h1>
7
- <div align='center'>
8
- <a href='https://scholar.google.com/citations?user=6D_nzucAAAAJ&hl=en' target='_blank'>Di Qiu</a>&emsp;
9
- <a href='https://scholar.google.com/citations?user=_43YnBcAAAAJ&hl=zh-CN' target='_blank'>Zhengcong Fei</a>&emsp;
10
- <a href='' target='_blank'>Rui Wang</a>&emsp;
11
- <a href='' target='_blank'>Jialin Bai</a>&emsp;
12
- <a href='https://scholar.google.com/citations?user=Hv-vj2sAAAAJ&hl=en' target='_blank'>Changqian Yu</a>&emsp;
13
- </div>
14
-
15
- <div align='center'>
16
- <a href='https://scholar.google.com.au/citations?user=ePIeVuUAAAAJ&hl=en' target='_blank'>Mingyuan Fan</a>&emsp;
17
- <a href='https://scholar.google.com/citations?user=HukWSw4AAAAJ&hl=en' target='_blank'>Guibin Chen</a>&emsp;
18
- <a href='https://scholar.google.com.tw/citations?user=RvAuMk0AAAAJ&hl=zh-CN' target='_blank'>Xiang Wen</a>&emsp;
19
- </div>
20
-
21
- <div align='center'>
22
- <small><strong>Skywork AI</strong></small>
23
- </div>
24
-
25
- <br>
26
-
27
- <div align="center">
28
- <!-- <a href='LICENSE'><img src='https://img.shields.io/badge/license-MIT-yellow'></a> -->
29
- <a href='https://arxiv.org/abs/2502.10841'><img src='https://img.shields.io/badge/arXiv-SkyReels A1-red'></a>
30
- <a href='https://skyworkai.github.io/skyreels-a1.github.io/'><img src='https://img.shields.io/badge/Project-SkyReels A1-green'></a>
31
- <a href='https://huggingface.co/Skywork/SkyReels-A1'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue'></a>
32
- <a href='https://www.skyreels.ai/home?utm_campaign=github_A1'><img src='https://img.shields.io/badge/Playground-Spaces-yellow'></a>
33
- <br>
34
- </div>
35
- <br>
36
-
37
-
38
- <p align="center">
39
- <img src="./assets/demo.gif" alt="showcase">
40
- <br>
41
- 🔥 For more results, visit our <a href="https://skyworkai.github.io/skyreels-a1.github.io/"><strong>homepage</strong></a> 🔥
42
- </p>
43
-
44
- <p align="center">
45
- 👋 Join our <a href="https://discord.gg/PwM6NYtccQ" target="_blank"><strong>Discord</strong></a>
46
- </p>
47
-
48
-
49
- This repo, named **SkyReels-A1**, contains the official PyTorch implementation of our paper [SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers](https://arxiv.org).
50
-
51
-
52
- ## 🔥🔥🔥 News!!
53
- * Mar 4, 2025: 🔥 We release audio-driven portrait image animation pipeline.
54
- * Feb 18, 2025: 👋 We release the inference code and model weights of SkyReels-A1. [Download](https://huggingface.co/Skywork/SkyReels-A1)
55
- * Feb 18, 2025: 🎉 We have made our technical report available as open source. [Read](https://skyworkai.github.io/skyreels-a1.github.io/report.pdf)
56
- * Feb 18, 2025: 🔥 Our online demo of LipSync is available on SkyReels now! Try out [LipSync](https://www.skyreels.ai/home/tools/lip-sync?refer=navbar).
57
- * Feb 18, 2025: 🔥 We have open-sourced I2V video generation model [SkyReels-V1](https://github.com/SkyworkAI/SkyReels-V1). This is the first and most advanced open-source human-centric video foundation model.
58
-
59
- ## 📑 TODO List
60
- - [x] Checkpoints
61
- - [x] Inference Code
62
- - [x] Web Demo (Gradio)
63
- - [x] Audio-driven Portrait Image Animation Pipeline
64
- - [ ] Inference Code for Long Videos
65
- - [ ] User-Level GPU Inference on RTX4090
66
- - [ ] ComfyUI
67
-
68
-
69
- ## Getting Started 🏁
70
-
71
- ### 1. Clone the code and prepare the environment 🛠️
72
- First git clone the repository with code:
73
- ```bash
74
- git clone https://github.com/SkyworkAI/SkyReels-A1.git
75
- cd SkyReels-A1
76
-
77
- # create env using conda
78
- conda create -n skyreels-a1 python=3.10
79
- conda activate skyreels-a1
80
- ```
81
- Then, install the remaining dependencies:
82
- ```bash
83
- pip install -r requirements.txt
84
- ```
85
-
86
-
87
- ### 2. Download pretrained weights 📥
88
- You can download the pretrained weights is from HuggingFace:
89
- ```bash
90
- # !pip install -U "huggingface_hub[cli]"
91
- huggingface-cli download SkyReels-A1 --local-dir local_path --exclude "*.git*" "README.md" "docs"
92
- ```
93
-
94
- The FLAME, mediapipe, and smirk models are located in the SkyReels-A1/extra_models folder.
95
-
96
- The directory structure of our SkyReels-A1 code is formulated as:
97
- ```text
98
- pretrained_models
99
- ├── FLAME
100
- ├── SkyReels-A1-5B
101
- │ ├── pose_guider
102
- │ ├── scheduler
103
- │ ├── tokenizer
104
- │ ├── siglip-so400m-patch14-384
105
- │ ├── transformer
106
- │ ├── vae
107
- │ └── text_encoder
108
- ├── mediapipe
109
- └── smirk
110
-
111
- ```
112
-
113
- #### Download DiffposeTalk assets and pretrained weights (For Audio-driven)
114
-
115
- - We use [diffposetalk](https://github.com/DiffPoseTalk/DiffPoseTalk/tree/main) to generate flame coefficients from audio, thereby constructing motion signals.
116
-
117
- - Download the diffposetalk code and follow its README to download the weights and related data.
118
-
119
- - Then place them in the specified directory.
120
-
121
- ```bash
122
- cp -r ${diffposetalk_root}/style pretrained_models/diffposetalk
123
- cp ${diffposetalk_root}/experiments/DPT/head-SA-hubert-WM/checkpoints/iter_0110000.pt pretrained_models/diffposetalk
124
- cp ${diffposetalk_root}/datasets/HDTF_TFHP/lmdb/stats_train.npz pretrained_models/diffposetalk
125
- ```
126
-
127
- ```text
128
- pretrained_models
129
- ├── FLAME
130
- ├── SkyReels-A1-5B
131
- ├── mediapipe
132
- ├── diffposetalk
133
- │ ├── style
134
- │ ├── iter_0110000.pt
135
- │ ├── states_train.npz
136
- └── smirk
137
-
138
- ```
139
-
140
-
141
- ### 3. Inference 🚀
142
- You can simply run the inference scripts as:
143
- ```bash
144
- python inference.py
145
-
146
- # inference audio to video
147
- python inference_audio.py
148
- ```
149
-
150
- If the script runs successfully, you will get an output mp4 file. This file includes the following results: driving video, input image or video, and generated result.
151
-
152
-
153
- ## Gradio Interface 🤗
154
-
155
- We provide a [Gradio](https://huggingface.co/docs/hub/spaces-sdks-gradio) interface for a better experience, just run by:
156
-
157
- ```bash
158
- python app.py
159
- ```
160
-
161
- The graphical interactive interface is shown as below:
162
-
163
- ![gradio](https://github.com/user-attachments/assets/ed56f08c-f31c-4fbe-ac1d-c4d4e87a8719)
164
-
165
-
166
- ## Metric Evaluation 👓
167
-
168
- We also provide all scripts for automatically calculating the metrics, including SimFace, FID, and L1 distance between expression and motion, reported in the paper.
169
-
170
- All codes can be found in the ```eval``` folder. After setting the video result path, run the following commands in sequence:
171
-
172
- ```bash
173
- python arc_score.py
174
- python expression_score.py
175
- python pose_score.py
176
- ```
177
-
178
-
179
- ## Acknowledgements 💐
180
- We would like to thank the contributors of [CogvideoX](https://github.com/THUDM/CogVideo), [finetrainers](https://github.com/a-r-r-o-w/finetrainers) and [DiffPoseTalk](https://github.com/DiffPoseTalk/DiffPoseTalk)repositories, for their open research and contributions.
181
-
182
- ## Citation 💖
183
- If you find SkyReels-A1 useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX:
184
- ```bibtex
185
- @article{qiu2025skyreels,
186
- title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
187
- author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
188
- journal={arXiv preprint arXiv:2502.10841},
189
- year={2025}
190
- }
191
- ```
192
-
193
-
194
-
 
1
+ ---
2
+ title: Skyreels Talking Head
3
+ emoji: 😻
4
+ colorFrom: yellow
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.20.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: audio to talking face
12
+ ---
13
+
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference