Add pipeline tag and library name (#1)
Browse files- Add pipeline tag and library name (89a0b5d6a1daf2d933427039550a1d569f89ce04)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,7 +1,260 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
base_model:
|
4 |
- stabilityai/stable-diffusion-2-1-base
|
|
|
|
|
|
|
5 |
paper:
|
6 |
-
- arxiv.org/abs/2503.
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
2 |
base_model:
|
3 |
- stabilityai/stable-diffusion-2-1-base
|
4 |
+
license: apache-2.0
|
5 |
+
pipeline_tag: text-to-3d
|
6 |
+
library_name: diffusers
|
7 |
paper:
|
8 |
+
- arxiv.org/abs/2503.21694
|
9 |
+
---
|
10 |
+
|
11 |
+
## File information
|
12 |
+
|
13 |
+
The repository contains the following file information:
|
14 |
+
|
15 |
+
|
16 |
+
|
17 |
+
Note: file information is just provided as context for you, do not add it to the model card.
|
18 |
+
|
19 |
+
# Project page
|
20 |
+
|
21 |
+
The project page URL we found has the following URL:
|
22 |
+
|
23 |
+
# Github README
|
24 |
+
|
25 |
+
The Github README we found contains the following content:
|
26 |
+
|
27 |
+
<img src="assets/Showcase_v4.drawio.png" width="100%" align="center">
|
28 |
+
<div align="center">
|
29 |
+
<h1>Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data</h1>
|
30 |
+
<div>
|
31 |
+
<a href='https://scholar.google.com/citations?user=F15mLDYAAAAJ&hl=en' target='_blank'>Zhiyuan Ma</a> 
|
32 |
+
<a href='https://scholar.google.com/citations?user=R9PlnKgAAAAJ&hl=en' target='_blank'>Xinyue Liang</a> 
|
33 |
+
<a href='https://scholar.google.com/citations?user=A-U8zE8AAAAJ&hl=en' target='_blank'>Rongyuan Wu</a> 
|
34 |
+
<a href='https://scholar.google.com/citations?user=1rbNk5oAAAAJ&hl=zh-CN' target='_blank'>Xiangyu Zhu</a> 
|
35 |
+
<a href='https://scholar.google.com/citations?user=cuJ3QG8AAAAJ&hl=en' target='_blank'>Zhen Lei</a> 
|
36 |
+
<a href='https://scholar.google.com/citations?user=tAK5l1IAAAAJ&hl=en' target='_blank'>Lei Zhang</a>
|
37 |
+
</div>
|
38 |
+
|
39 |
+
<div>
|
40 |
+
<a href="https://arxiv.org/abs/2503.21694"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a>
|
41 |
+
<a href='https://theericma.github.io/TriplaneTurbo/'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a>
|
42 |
+
<a href='https://huggingface.co/spaces/ZhiyuanthePony/TriplaneTurbo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live_Demo-blue'></a>
|
43 |
+
<a href='https://theericma.github.io/TriplaneTurbo/static/pdf/main.pdf'><img src='https://img.shields.io/badge/Slides-Presentation-orange?logo=microsoftpowerpoint&logoColor=white' alt='Presentation Slides'></a>
|
44 |
+
</div>
|
45 |
+
|
46 |
+
|
47 |
+
---
|
48 |
+
|
49 |
+
</div>
|
50 |
+
|
51 |
+
<!-- Updates -->
|
52 |
+
## β© Updates
|
53 |
+
|
54 |
+
- **2025-04-01**: Presentation slides are now available for download.
|
55 |
+
- **2025-03-27**: The paper is now available on Arxiv.
|
56 |
+
- **2025-03-03**: Gradio and HuggingFace Demos are available.
|
57 |
+
- **2025-02-27**: TriplaneTurbo is accepted to CVPR 2025.
|
58 |
+
|
59 |
+
<!-- Features -->
|
60 |
+
## π Features
|
61 |
+
- **Fast Inference π**: Our code excels in inference efficiency, capable of outputting textured mesh in around 1 second.
|
62 |
+
- **Text Comprehension π**: It demonstrates strong understanding capabilities for complex text prompts, ensuring accurate generation according to the input.
|
63 |
+
- **3D-Data-Free Training π
ββοΈ**: The entire training process doesn't rely on any 3D datasets, making it more resource-friendly and adaptable.
|
64 |
+
|
65 |
+
|
66 |
+
## π€ Start local inference in 3 minutes
|
67 |
+
If you only wish to set up the demo locally, use the following code for the inference. Otherwise, for training and evaluation, use the next section of instructions for environment setup.
|
68 |
+
|
69 |
+
```python
|
70 |
+
python -m venv venv
|
71 |
+
source venv/bin/activate
|
72 |
+
bash setup.sh
|
73 |
+
python gradio_app.py
|
74 |
+
```
|
75 |
+
|
76 |
+
## π οΈ Official Installation
|
77 |
+
|
78 |
+
Create a virtual environment:
|
79 |
+
```sh
|
80 |
+
conda create -n triplaneturbo python=3.10
|
81 |
+
conda activate triplaneturbo
|
82 |
+
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
|
83 |
+
```
|
84 |
+
(Optional, Recommended) Install xFormers for attention acceleration:
|
85 |
+
```sh
|
86 |
+
conda install xFormers -c xFormers
|
87 |
+
```
|
88 |
+
(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions
|
89 |
+
```sh
|
90 |
+
pip install ninja
|
91 |
+
```
|
92 |
+
Install major dependencies
|
93 |
+
```sh
|
94 |
+
pip install -r requirements.txt
|
95 |
+
```
|
96 |
+
Install iNGP
|
97 |
+
```sh
|
98 |
+
export PATH="/usr/local/cuda/bin:$PATH"
|
99 |
+
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
|
100 |
+
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
|
101 |
+
```
|
102 |
+
If you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these steps to change the gcc version within your -cconda environment. After that, return to the project directory and reinstall iNGP and NerfAcc:
|
103 |
+
```sh
|
104 |
+
conda install -c conda-forge gxx=9.5.0
|
105 |
+
cd $CONDA_PREFIX/lib
|
106 |
+
ln -s /usr/lib/x86_64-linux-gnu/libcuda.so ./
|
107 |
+
cd <your project directory>
|
108 |
+
```
|
109 |
+
|
110 |
+
## π Evaluation
|
111 |
+
|
112 |
+
If you only want to run the evaluation without training, follow these steps:
|
113 |
+
|
114 |
+
```sh
|
115 |
+
# Download the model from HuggingFace
|
116 |
+
huggingface-cli download --resume-download ZhiyuanthePony/TriplaneTurbo \
|
117 |
+
--include "triplane_turbo_sd_v1.pth" \
|
118 |
+
--local-dir ./pretrained \
|
119 |
+
--local-dir-use-symlinks False
|
120 |
+
|
121 |
+
# Download evaluation assets
|
122 |
+
python scripts/prepare/download_eval_only.py
|
123 |
+
|
124 |
+
# Run evaluation script
|
125 |
+
bash scripts/eval/dreamfusion.sh --gpu 0,1 # You can use more GPUs (e.g. 0,1,2,3,4,5,6,7). For single GPU usage, please check the script for required modifications
|
126 |
+
```
|
127 |
+
|
128 |
+
Our evaluation metrics include:
|
129 |
+
- CLIP Similarity Score
|
130 |
+
- CLIP Recall@1
|
131 |
+
|
132 |
+
For detailed evaluation results, please refer to our paper.
|
133 |
+
|
134 |
+
If you want to evaluate your own model, use the following script:
|
135 |
+
```sh
|
136 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
|
137 |
+
--config <path_to_your_exp_config> \
|
138 |
+
--export \
|
139 |
+
system.exporter_type="multiprompt-mesh-exporter" \
|
140 |
+
resume=<path_to_your_ckpt> \
|
141 |
+
data.prompt_library="dreamfusion_415_prompt_library" \
|
142 |
+
system.exporter.fmt=obj
|
143 |
+
```
|
144 |
+
|
145 |
+
After running the script, you will find generated OBJ files in `outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-export>`. Set this path as `<OBJ_DIR>`, and set `outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-4views>` as `<VIEW_DIR>`. Then run:
|
146 |
+
|
147 |
+
```sh
|
148 |
+
SAVE_DIR=<VIEW_DIR>
|
149 |
+
python evaluation/mesh_visualize.py \
|
150 |
+
<OBJ_DIR> \
|
151 |
+
--save_dir $SAVE_DIR \
|
152 |
+
--gpu 0,1,2,3,4,5,6,7
|
153 |
+
|
154 |
+
python evaluation/clipscore/compute.py \
|
155 |
+
--result_dir $SAVE_DIR
|
156 |
+
```
|
157 |
+
The evaluation results will be displayed in your terminal once the computation is complete.
|
158 |
+
|
159 |
+
## π Training Options
|
160 |
+
|
161 |
+
### 1. Download Required Pretrained Models and Datasets
|
162 |
+
Use the provided download script to get all necessary files:
|
163 |
+
```sh
|
164 |
+
python scripts/prepare/download_full.py
|
165 |
+
```
|
166 |
+
|
167 |
+
This will download:
|
168 |
+
- Stable Diffusion 2.1 Base
|
169 |
+
- Stable Diffusion 1.5
|
170 |
+
- MVDream 4-view checkpoint
|
171 |
+
- RichDreamer checkpoint
|
172 |
+
- Text prompt datasets (3DTopia and DALLE+Midjourney)
|
173 |
+
|
174 |
+
### 2. Training Options
|
175 |
+
|
176 |
+
#### Option 1: Train with 3DTopia Text Prompts
|
177 |
+
```sh
|
178 |
+
# Single GPU
|
179 |
+
CUDA_VISIBLE_DEVICES=0 python launch.py \
|
180 |
+
--config configs/TriplaneTurbo_v0_acc-2.yaml \
|
181 |
+
--train \
|
182 |
+
data.prompt_library="3DTopia_prompt_library" \
|
183 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
|
184 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"
|
185 |
+
```
|
186 |
+
|
187 |
+
For multi-GPU training:
|
188 |
+
```sh
|
189 |
+
# 8 GPUs with 48GB+ memory each
|
190 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
|
191 |
+
--config configs/TriplaneTurbo_v1_acc-2.yaml \
|
192 |
+
--train \
|
193 |
+
data.prompt_library="3DTopia_361k_prompt_library" \
|
194 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
|
195 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"
|
196 |
+
```
|
197 |
+
|
198 |
+
#### Option 2: Train with DALLE+Midjourney Text Prompts
|
199 |
+
Choose the appropriate command based on your GPU configuration:
|
200 |
+
|
201 |
+
```sh
|
202 |
+
# Single GPU
|
203 |
+
CUDA_VISIBLE_DEVICES=0 python launch.py \
|
204 |
+
--config configs/TriplaneTurbo_v0_acc-2.yaml \
|
205 |
+
--train \
|
206 |
+
data.prompt_library="DALLE_Midjourney_prompt_library" \
|
207 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
|
208 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"
|
209 |
+
```
|
210 |
+
|
211 |
+
For multi-GPU training (higher performance):
|
212 |
+
```sh
|
213 |
+
# 8 GPUs with 48GB+ memory each
|
214 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
|
215 |
+
--config configs/TriplaneTurbo_v1_acc-2.yaml \
|
216 |
+
--train \
|
217 |
+
data.prompt_library="DALLE_Midjourney_prompt_library" \
|
218 |
+
data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
|
219 |
+
data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"
|
220 |
+
```
|
221 |
+
|
222 |
+
### 3. Configuration Notes
|
223 |
+
- **Memory Requirements**:
|
224 |
+
- v1 configuration: Requires GPUs with 48GB+ memory
|
225 |
+
- v0 configuration: Works with GPUs that have less memory (46GB+) but with reduced performance
|
226 |
+
|
227 |
+
- **Acceleration Options**:
|
228 |
+
- Use `_acc-2.yaml` configs for gradient accumulation to reduce memory usage
|
229 |
+
|
230 |
+
- **Advanced Options**:
|
231 |
+
- For highest quality, use `configs/TriplaneTurbo_v1.yaml` with `system.parallel_guidance=true` (requires 98GB+ memory GPUs)
|
232 |
+
- To disable certain guidance components: add `guidance.rd_weight=0 guidance.sd_weight=0` to the command
|
233 |
+
|
234 |
+
|
235 |
+
|
236 |
+
|
237 |
+
|
238 |
+
<!-- Citation -->
|
239 |
+
## π Citation
|
240 |
+
|
241 |
+
If you find this work helpful, please consider citing our paper:
|
242 |
+
```
|
243 |
+
@article{ma2025progressive,
|
244 |
+
title={Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data},
|
245 |
+
author={Ma, Zhiyuan and Liang, Xinyue and Wu, Rongyuan and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
|
246 |
+
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
|
247 |
+
year={2025}
|
248 |
+
}
|
249 |
+
```
|
250 |
+
|
251 |
+
|
252 |
+
<!-- Acknowledgement -->
|
253 |
+
## π Acknowledgement
|
254 |
+
Our code is heavily based on the following works
|
255 |
+
- [ThreeStudio](https://github.com/threestudio-project/threestudio): A clean and extensible codebase for 3D generation via Score Distillation.
|
256 |
+
- [MVDream](https://github.com/bytedance/MVDream): Used as one of our multi - view teachers.
|
257 |
+
- [RichDreamer](https://github.com/bytedance/MVDream): Serves as another multi - view teacher for normal and depth supervision
|
258 |
+
- [3DTopia](https://github.com/3DTopia/3DTopia): Its text caption dataset is applied in our training and comparison.
|
259 |
+
- [DiffMC](https://github.com/SarahWeiii/diso): Our solution uses its differentiable marching cube for mesh rasterization.
|
260 |
+
- [NeuS](https://github.com/Totoro97/NeuS): We implement its SDF - based volume rendering for dual rendering in our solution
|