kiwhansong commited on
Commit
b430f0e
·
verified ·
1 Parent(s): 986f545

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -2
README.md CHANGED
@@ -3,6 +3,60 @@ license: mit
3
  pipeline_tag: image-to-video
4
  ---
5
 
6
- # Diffusion Forcing Transformer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
- @TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  pipeline_tag: image-to-video
4
  ---
5
 
6
+ <h1 align="center">Diffusion Forcing Transformer</h1>
7
+ <p align="center">
8
+ <p align="center">
9
+ <a href="https://kiwhan.dev/">Kiwhan Song*<sup>1</sup></a>
10
+ ·
11
+ <a href="https://boyuan.space/">Boyuan Chen*<sup>1</sup></a>
12
+ ·
13
+ <a href="https://msimchowitz.github.io/">Max Simchowitz<sup>2</sup></a>
14
+ ·
15
+ <a href="https://yilundu.github.io/">Yilun Du<sup>3</sup></a>
16
+ ·
17
+ <a href="https://groups.csail.mit.edu/locomotion/russt.html">Russ Tedrake<sup>1</sup></a>
18
+ ·
19
+ <a href="https://www.vincentsitzmann.com/">Vincent Sitzmann<sup>1</sup></a>
20
+ <br/>
21
+ *Equal contribution <sup>1</sup>MIT <sup>2</sup>CMU <sup>3</sup>Harvard
22
+ </p>
23
+ <h3 align="center"><a href="https://arxiv.org/abs/2502.06764">Paper</a> | <a href="https://boyuan.space/history-guidance">Website</a> | <a href="https://huggingface.co/spaces/kiwhansong/diffusion-forcing-transformer">HuggingFace Demo</a> | <a href="https://github.com/kwsong0113/diffusion-forcing-transformer">GitHub Code</a></h3>
24
+ </p>
25
 
26
+ This is the official model hub for the paper [**_History-guided Video Diffusion_**](https://arxiv.org/abs/2502.06764). We introduce the **Diffusion Forcing Tranformer (DFoT)**, a novel video diffusion model that designed to generate videos conditioned on an arbitrary number of context frames. Additionally, we present **History Guidance (HG)**, a family of guidance methods uniquely enabled by DFoT. These methods significantly enhance video generation quality, temporal consistency, and motion dynamics, while also unlocking new capabilities such as compositional video generation and the stable rollout of extremely long videos.
27
+
28
+
29
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6613663bcfbba5e761a69531/OcsBrHWZXQidH7YxGCMtS.png)
30
+
31
+
32
+ ## 🤗 Try generating videos with DFoT!
33
+
34
+ We provide an [_interactive_ demo](https://huggingface.co/spaces/kiwhansong/diffusion-forcing-transformer) on HuggingFace Spaces, where you can generate videos with DFoT and History Guidance. On the RealEstate10K dataset, you can generate:
35
+ - Any Number of Images → Short 2-second Video
36
+ - Single Image → Long 10-second Video
37
+ - Single Image → Endless Navigation Video (like the teaser above!)
38
+
39
+ Please check it out and have fun generating videos with DFoT!
40
+
41
+
42
+ ## 🚀 Usage
43
+
44
+ All pretrained models can be automatically loaded from [our GitHub codebase](https://github.com/kwsong0113/diffusion-forcing-transformer). Please visit our repository for further instructions!
45
+
46
+
47
+
48
+ ## 📌 Citation
49
+
50
+ If our work is useful for your research, please consider citing our paper:
51
+
52
+ ```bibtex
53
+ @misc{song2025historyguidedvideodiffusion,
54
+ title={History-Guided Video Diffusion},
55
+ author={Kiwhan Song and Boyuan Chen and Max Simchowitz and Yilun Du and Russ Tedrake and Vincent Sitzmann},
56
+ year={2025},
57
+ eprint={2502.06764},
58
+ archivePrefix={arXiv},
59
+ primaryClass={cs.LG},
60
+ url={https://arxiv.org/abs/2502.06764},
61
+ }
62
+ ```