ZhiyuanthePony nielsr HF Staff commited on
Commit
9bfe7bc
Β·
verified Β·
1 Parent(s): cec6e57

Add pipeline tag and library name (#1)

Browse files

- Add pipeline tag and library name (89a0b5d6a1daf2d933427039550a1d569f89ce04)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +256 -3
README.md CHANGED
@@ -1,7 +1,260 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - stabilityai/stable-diffusion-2-1-base
 
 
 
5
  paper:
6
- - arxiv.org/abs/2503.2169
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - stabilityai/stable-diffusion-2-1-base
4
+ license: apache-2.0
5
+ pipeline_tag: text-to-3d
6
+ library_name: diffusers
7
  paper:
8
+ - arxiv.org/abs/2503.21694
9
+ ---
10
+
11
+ ## File information
12
+
13
+ The repository contains the following file information:
14
+
15
+
16
+
17
+ Note: file information is just provided as context for you, do not add it to the model card.
18
+
19
+ # Project page
20
+
21
+ The project page URL we found has the following URL:
22
+
23
+ # Github README
24
+
25
+ The Github README we found contains the following content:
26
+
27
+ <img src="assets/Showcase_v4.drawio.png" width="100%" align="center">
28
+ <div align="center">
29
+ <h1>Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data</h1>
30
+ <div>
31
+ <a href='https://scholar.google.com/citations?user=F15mLDYAAAAJ&hl=en' target='_blank'>Zhiyuan Ma</a>&emsp;
32
+ <a href='https://scholar.google.com/citations?user=R9PlnKgAAAAJ&hl=en' target='_blank'>Xinyue Liang</a>&emsp;
33
+ <a href='https://scholar.google.com/citations?user=A-U8zE8AAAAJ&hl=en' target='_blank'>Rongyuan Wu</a>&emsp;
34
+ <a href='https://scholar.google.com/citations?user=1rbNk5oAAAAJ&hl=zh-CN' target='_blank'>Xiangyu Zhu</a>&emsp;
35
+ <a href='https://scholar.google.com/citations?user=cuJ3QG8AAAAJ&hl=en' target='_blank'>Zhen Lei</a>&emsp;
36
+ <a href='https://scholar.google.com/citations?user=tAK5l1IAAAAJ&hl=en' target='_blank'>Lei Zhang</a>
37
+ </div>
38
+
39
+ <div>
40
+ <a href="https://arxiv.org/abs/2503.21694"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a>
41
+ <a href='https://theericma.github.io/TriplaneTurbo/'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a>
42
+ <a href='https://huggingface.co/spaces/ZhiyuanthePony/TriplaneTurbo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live_Demo-blue'></a>
43
+ <a href='https://theericma.github.io/TriplaneTurbo/static/pdf/main.pdf'><img src='https://img.shields.io/badge/Slides-Presentation-orange?logo=microsoftpowerpoint&logoColor=white' alt='Presentation Slides'></a>
44
+ </div>
45
+
46
+
47
+ ---
48
+
49
+ </div>
50
+
51
+ <!-- Updates -->
52
+ ## ⏩ Updates
53
+
54
+ - **2025-04-01**: Presentation slides are now available for download.
55
+ - **2025-03-27**: The paper is now available on Arxiv.
56
+ - **2025-03-03**: Gradio and HuggingFace Demos are available.
57
+ - **2025-02-27**: TriplaneTurbo is accepted to CVPR 2025.
58
+
59
+ <!-- Features -->
60
+ ## 🌟 Features
61
+ - **Fast Inference πŸš€**: Our code excels in inference efficiency, capable of outputting textured mesh in around 1 second.
62
+ - **Text Comprehension πŸ†™**: It demonstrates strong understanding capabilities for complex text prompts, ensuring accurate generation according to the input.
63
+ - **3D-Data-Free Training πŸ™…β€β™‚οΈ**: The entire training process doesn't rely on any 3D datasets, making it more resource-friendly and adaptable.
64
+
65
+
66
+ ## πŸ€– Start local inference in 3 minutes
67
+ If you only wish to set up the demo locally, use the following code for the inference. Otherwise, for training and evaluation, use the next section of instructions for environment setup.
68
+
69
+ ```python
70
+ python -m venv venv
71
+ source venv/bin/activate
72
+ bash setup.sh
73
+ python gradio_app.py
74
+ ```
75
+
76
+ ## πŸ› οΈ Official Installation
77
+
78
+ Create a virtual environment:
79
+ ```sh
80
+ conda create -n triplaneturbo python=3.10
81
+ conda activate triplaneturbo
82
+ conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
83
+ ```
84
+ (Optional, Recommended) Install xFormers for attention acceleration:
85
+ ```sh
86
+ conda install xFormers -c xFormers
87
+ ```
88
+ (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions
89
+ ```sh
90
+ pip install ninja
91
+ ```
92
+ Install major dependencies
93
+ ```sh
94
+ pip install -r requirements.txt
95
+ ```
96
+ Install iNGP
97
+ ```sh
98
+ export PATH="/usr/local/cuda/bin:$PATH"
99
+ export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
100
+ pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
101
+ ```
102
+ If you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these steps to change the gcc version within your -cconda environment. After that, return to the project directory and reinstall iNGP and NerfAcc:
103
+ ```sh
104
+ conda install -c conda-forge gxx=9.5.0
105
+ cd $CONDA_PREFIX/lib
106
+ ln -s /usr/lib/x86_64-linux-gnu/libcuda.so ./
107
+ cd <your project directory>
108
+ ```
109
+
110
+ ## πŸ“Š Evaluation
111
+
112
+ If you only want to run the evaluation without training, follow these steps:
113
+
114
+ ```sh
115
+ # Download the model from HuggingFace
116
+ huggingface-cli download --resume-download ZhiyuanthePony/TriplaneTurbo \
117
+ --include "triplane_turbo_sd_v1.pth" \
118
+ --local-dir ./pretrained \
119
+ --local-dir-use-symlinks False
120
+
121
+ # Download evaluation assets
122
+ python scripts/prepare/download_eval_only.py
123
+
124
+ # Run evaluation script
125
+ bash scripts/eval/dreamfusion.sh --gpu 0,1 # You can use more GPUs (e.g. 0,1,2,3,4,5,6,7). For single GPU usage, please check the script for required modifications
126
+ ```
127
+
128
+ Our evaluation metrics include:
129
+ - CLIP Similarity Score
130
+ - CLIP Recall@1
131
+
132
+ For detailed evaluation results, please refer to our paper.
133
+
134
+ If you want to evaluate your own model, use the following script:
135
+ ```sh
136
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
137
+ --config <path_to_your_exp_config> \
138
+ --export \
139
+ system.exporter_type="multiprompt-mesh-exporter" \
140
+ resume=<path_to_your_ckpt> \
141
+ data.prompt_library="dreamfusion_415_prompt_library" \
142
+ system.exporter.fmt=obj
143
+ ```
144
+
145
+ After running the script, you will find generated OBJ files in `outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-export>`. Set this path as `<OBJ_DIR>`, and set `outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-4views>` as `<VIEW_DIR>`. Then run:
146
+
147
+ ```sh
148
+ SAVE_DIR=<VIEW_DIR>
149
+ python evaluation/mesh_visualize.py \
150
+ <OBJ_DIR> \
151
+ --save_dir $SAVE_DIR \
152
+ --gpu 0,1,2,3,4,5,6,7
153
+
154
+ python evaluation/clipscore/compute.py \
155
+ --result_dir $SAVE_DIR
156
+ ```
157
+ The evaluation results will be displayed in your terminal once the computation is complete.
158
+
159
+ ## πŸš€ Training Options
160
+
161
+ ### 1. Download Required Pretrained Models and Datasets
162
+ Use the provided download script to get all necessary files:
163
+ ```sh
164
+ python scripts/prepare/download_full.py
165
+ ```
166
+
167
+ This will download:
168
+ - Stable Diffusion 2.1 Base
169
+ - Stable Diffusion 1.5
170
+ - MVDream 4-view checkpoint
171
+ - RichDreamer checkpoint
172
+ - Text prompt datasets (3DTopia and DALLE+Midjourney)
173
+
174
+ ### 2. Training Options
175
+
176
+ #### Option 1: Train with 3DTopia Text Prompts
177
+ ```sh
178
+ # Single GPU
179
+ CUDA_VISIBLE_DEVICES=0 python launch.py \
180
+ --config configs/TriplaneTurbo_v0_acc-2.yaml \
181
+ --train \
182
+ data.prompt_library="3DTopia_prompt_library" \
183
+ data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
184
+ data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"
185
+ ```
186
+
187
+ For multi-GPU training:
188
+ ```sh
189
+ # 8 GPUs with 48GB+ memory each
190
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
191
+ --config configs/TriplaneTurbo_v1_acc-2.yaml \
192
+ --train \
193
+ data.prompt_library="3DTopia_361k_prompt_library" \
194
+ data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
195
+ data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"
196
+ ```
197
+
198
+ #### Option 2: Train with DALLE+Midjourney Text Prompts
199
+ Choose the appropriate command based on your GPU configuration:
200
+
201
+ ```sh
202
+ # Single GPU
203
+ CUDA_VISIBLE_DEVICES=0 python launch.py \
204
+ --config configs/TriplaneTurbo_v0_acc-2.yaml \
205
+ --train \
206
+ data.prompt_library="DALLE_Midjourney_prompt_library" \
207
+ data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
208
+ data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"
209
+ ```
210
+
211
+ For multi-GPU training (higher performance):
212
+ ```sh
213
+ # 8 GPUs with 48GB+ memory each
214
+ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
215
+ --config configs/TriplaneTurbo_v1_acc-2.yaml \
216
+ --train \
217
+ data.prompt_library="DALLE_Midjourney_prompt_library" \
218
+ data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
219
+ data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"
220
+ ```
221
+
222
+ ### 3. Configuration Notes
223
+ - **Memory Requirements**:
224
+ - v1 configuration: Requires GPUs with 48GB+ memory
225
+ - v0 configuration: Works with GPUs that have less memory (46GB+) but with reduced performance
226
+
227
+ - **Acceleration Options**:
228
+ - Use `_acc-2.yaml` configs for gradient accumulation to reduce memory usage
229
+
230
+ - **Advanced Options**:
231
+ - For highest quality, use `configs/TriplaneTurbo_v1.yaml` with `system.parallel_guidance=true` (requires 98GB+ memory GPUs)
232
+ - To disable certain guidance components: add `guidance.rd_weight=0 guidance.sd_weight=0` to the command
233
+
234
+
235
+
236
+
237
+
238
+ <!-- Citation -->
239
+ ## πŸ“œ Citation
240
+
241
+ If you find this work helpful, please consider citing our paper:
242
+ ```
243
+ @article{ma2025progressive,
244
+ title={Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data},
245
+ author={Ma, Zhiyuan and Liang, Xinyue and Wu, Rongyuan and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
246
+ booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
247
+ year={2025}
248
+ }
249
+ ```
250
+
251
+
252
+ <!-- Acknowledgement -->
253
+ ## πŸ™ Acknowledgement
254
+ Our code is heavily based on the following works
255
+ - [ThreeStudio](https://github.com/threestudio-project/threestudio): A clean and extensible codebase for 3D generation via Score Distillation.
256
+ - [MVDream](https://github.com/bytedance/MVDream): Used as one of our multi - view teachers.
257
+ - [RichDreamer](https://github.com/bytedance/MVDream): Serves as another multi - view teacher for normal and depth supervision
258
+ - [3DTopia](https://github.com/3DTopia/3DTopia): Its text caption dataset is applied in our training and comparison.
259
+ - [DiffMC](https://github.com/SarahWeiii/diso): Our solution uses its differentiable marching cube for mesh rasterization.
260
+ - [NeuS](https://github.com/Totoro97/NeuS): We implement its SDF - based volume rendering for dual rendering in our solution