lbourdois commited on
Commit
c7d7b11
·
verified ·
1 Parent(s): 71278d3

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +216 -202
README.md CHANGED
@@ -1,202 +1,216 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- library_name: transformers
4
- base_model:
5
- - Qwen/Qwen2.5-0.5B-Instruct
6
- ---
7
-
8
- # SpatialLM-Qwen-0.5B
9
-
10
- <!-- markdownlint-disable first-line-h1 -->
11
- <!-- markdownlint-disable html -->
12
- <!-- markdownlint-disable no-duplicate-header -->
13
-
14
- <div align="center">
15
- <picture>
16
- <source srcset="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/_dK14CT3do8rBG3QrHUjN.png" media="(prefers-color-scheme: dark)">
17
- <img src="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/bAZyeIXOMVASHR6-xVlQU.png" width="60%" alt="SpatialLM""/>
18
- </picture>
19
- </div>
20
- <hr style="margin-top: 0; margin-bottom: 8px;">
21
- <div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;">
22
- <a href="https://manycore-research.github.io/SpatialLM" target="_blank" style="margin: 2px;"><img alt="Project"
23
- src="https://img.shields.io/badge/🌐%20Website-SpatialLM-ffc107?color=42a5f5&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
24
- <a href="https://github.com/manycore-research/SpatialLM" target="_blank" style="margin: 2px;"><img alt="GitHub"
25
- src="https://img.shields.io/badge/GitHub-SpatialLM-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
26
- </div>
27
- <div align="center" style="line-height: 1;">
28
- <a href="https://huggingface.co/manycore-research/SpatialLM-Llama-1B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
29
- src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM%201B-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
30
- <a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
31
- src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
32
- </div>
33
-
34
- ## Introduction
35
-
36
- SpatialLM is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, SpatialLM can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors. This multimodal architecture effectively bridges the gap between unstructured 3D geometric data and structured 3D representations, offering high-level semantic understanding. It enhances spatial reasoning capabilities for applications in embodied robotics, autonomous navigation, and other complex 3D scene analysis tasks.
37
-
38
- <div align="center">
39
- <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/3bz_jNRCLD2L9uj11HPnP.mp4" poster="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/euo94dNx28qBNe51_oiB1.png"></video>
40
- <p><i>SpatialLM reconstructs 3D layout from a monocular RGB video with MASt3R-SLAM. Results aligned to video with GT cameras for visualization.</i></p>
41
- </div>
42
-
43
- ## SpatialLM Models
44
-
45
- <div align="center">
46
-
47
- | **Model** | **Download** |
48
- | :-----------------: | ------------------------------------------------------------------------------ |
49
- | SpatialLM-Llama-1B | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) |
50
- | SpatialLM-Qwen-0.5B | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) |
51
-
52
- </div>
53
-
54
- ## Usage
55
-
56
- ### Installation
57
-
58
- Tested with the following environment:
59
-
60
- - Python 3.11
61
- - Pytorch 2.4.1
62
- - CUDA Version 12.4
63
-
64
- ```bash
65
- # clone the repository
66
- git clone https://github.com/manycore-research/SpatialLM.git
67
- cd SpatialLM
68
-
69
- # create a conda environment with cuda 12.4
70
- conda create -n spatiallm python=3.11
71
- conda activate spatiallm
72
- conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash
73
-
74
- # Install dependencies with poetry
75
- pip install poetry && poetry config virtualenvs.create false --local
76
- poetry install
77
- poe install-torchsparse # Building wheel for torchsparse will take a while
78
- ```
79
-
80
- ### Inference
81
-
82
- In the current version of SpatialLM, input point clouds are considered axis-aligned where the z-axis is the up axis. This orientation is crucial for maintaining consistency in spatial understanding and scene interpretation across different datasets and applications.
83
- Example preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM), are available in [SpatialLM-Testset](#spatiallm-testset).
84
-
85
- Download an example point cloud:
86
-
87
- ```bash
88
- huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .
89
- ```
90
-
91
- Run inference:
92
-
93
- ```bash
94
- python inference.py --point_cloud pcd/scene0000_00.ply --output scene0000_00.txt --model_path manycore-research/SpatialLM-Qwen-0.5B
95
- ```
96
-
97
- ### Visualization
98
-
99
- Use `rerun` to visualize the point cloud and the predicted structured 3D layout output:
100
-
101
- ```bash
102
- # Convert the predicted layout to Rerun format
103
- python visualize.py --point_cloud pcd/scene0000_00.ply --layout scene0000_00.txt --save scene0000_00.rrd
104
-
105
- # Visualize the point cloud and the predicted layout
106
- rerun scene0000_00.rrd
107
- ```
108
-
109
- ### Evaluation
110
-
111
- To evaluate the performance of SpatialLM, we provide `eval.py` script that reports the benchmark results on the SpatialLM-Testset in the table below in section [Benchmark Results](#benchmark-results).
112
-
113
- Download the testset:
114
-
115
- ```bash
116
- huggingface-cli download manycore-research/SpatialLM-Testset --repo-type dataset --local-dir SpatialLM-Testset
117
- ```
118
-
119
- Run evaluation:
120
-
121
- ```bash
122
- # Run inference on the PLY point clouds in folder SpatialLM-Testset/pcd with SpatialLM-Qwen-0.5B model
123
- python inference.py --point_cloud SpatialLM-Testset/pcd --output SpatialLM-Testset/pred --model_path manycore-research/SpatialLM-Qwen-0.5B
124
-
125
- # Evaluate the predicted layouts
126
- python eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/layout --pred_dir SpatialLM-Testset/pred --label_mapping SpatialLM-Testset/benchmark_categories.tsv
127
- ```
128
-
129
- ## SpatialLM Testset
130
-
131
- We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
132
-
133
- <div align="center">
134
-
135
- | **Dataset** | **Download** |
136
- | :---------------: | ---------------------------------------------------------------------------------- |
137
- | SpatialLM-Testset | [🤗 Datasets](https://huggingface.co/datasets/manycore-research/SpatialLM-TestSet) |
138
-
139
- </div>
140
-
141
- ## Benchmark Results
142
-
143
- Benchmark results on the challenging SpatialLM-Testset are reported in the following table:
144
-
145
- <div align="center">
146
-
147
- | **Method** | **SpatialLM-Llama-1B** | **SpatialLM-Qwen-0.5B** |
148
- | ---------------- | ---------------------- | ----------------------- |
149
- | **Floorplan** | **mean IoU** | |
150
- | wall | 78.62 | 74.81 |
151
- | | | |
152
- | **Objects** | **F1 @.25 IoU (3D)** | |
153
- | curtain | 27.35 | 28.59 |
154
- | nightstand | 57.47 | 54.39 |
155
- | chandelier | 38.92 | 40.12 |
156
- | wardrobe | 23.33 | 30.60 |
157
- | bed | 95.24 | 93.75 |
158
- | sofa | 65.50 | 66.15 |
159
- | chair | 21.26 | 14.94 |
160
- | cabinet | 8.47 | 8.44 |
161
- | dining table | 54.26 | 56.10 |
162
- | plants | 20.68 | 26.46 |
163
- | tv cabinet | 33.33 | 10.26 |
164
- | coffee table | 50.00 | 55.56 |
165
- | side table | 7.60 | 2.17 |
166
- | air conditioner | 20.00 | 13.04 |
167
- | dresser | 46.67 | 23.53 |
168
- | | | |
169
- | **Thin Objects** | **F1 @.25 IoU (2D)** | |
170
- | painting | 50.04 | 53.81 |
171
- | carpet | 31.76 | 45.31 |
172
- | tv | 67.31 | 52.29 |
173
- | door | 50.35 | 42.15 |
174
- | window | 45.4 | 45.9 |
175
-
176
- </div>
177
-
178
- ## License
179
-
180
- SpatialLM-Llama-1B is derived from Llama3.2-1B-Instruct, which is licensed under the Llama3.2 license.
181
- SpatialLM-Qwen-0.5B is derived from the Qwen-2.5 series, originally licensed under the Apache 2.0 License.
182
-
183
- All models are built upon the SceneScript point cloud encoder, licensed under the CC-BY-NC-4.0 License. TorchSparse, utilized in this project, is licensed under the MIT License.
184
-
185
- ## Citation
186
-
187
- If you find this work useful, please consider citing:
188
-
189
- ```bibtex
190
- @misc{spatiallm,
191
- title = {SpatialLM: Large Language Model for Spatial Understanding},
192
- author = {ManyCore Research Team},
193
- howpublished = {\url{https://github.com/manycore-research/SpatialLM}},
194
- year = {2025}
195
- }
196
- ```
197
-
198
- ## Acknowledgements
199
-
200
- We would like to thank the following projects that made this work possible:
201
-
202
- [Llama3.2](https://github.com/meta-llama) | [Qwen2.5](https://github.com/QwenLM/Qwen2.5) | [Transformers](https://github.com/huggingface/transformers) | [SceneScript](https://github.com/facebookresearch/scenescript) | [TorchSparse](https://github.com/mit-han-lab/torchsparse)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: transformers
4
+ base_model:
5
+ - Qwen/Qwen2.5-0.5B-Instruct
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ ---
21
+
22
+ # SpatialLM-Qwen-0.5B
23
+
24
+ <!-- markdownlint-disable first-line-h1 -->
25
+ <!-- markdownlint-disable html -->
26
+ <!-- markdownlint-disable no-duplicate-header -->
27
+
28
+ <div align="center">
29
+ <picture>
30
+ <source srcset="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/_dK14CT3do8rBG3QrHUjN.png" media="(prefers-color-scheme: dark)">
31
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/bAZyeIXOMVASHR6-xVlQU.png" width="60%" alt="SpatialLM""/>
32
+ </picture>
33
+ </div>
34
+ <hr style="margin-top: 0; margin-bottom: 8px;">
35
+ <div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;">
36
+ <a href="https://manycore-research.github.io/SpatialLM" target="_blank" style="margin: 2px;"><img alt="Project"
37
+ src="https://img.shields.io/badge/🌐%20Website-SpatialLM-ffc107?color=42a5f5&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
38
+ <a href="https://github.com/manycore-research/SpatialLM" target="_blank" style="margin: 2px;"><img alt="GitHub"
39
+ src="https://img.shields.io/badge/GitHub-SpatialLM-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
40
+ </div>
41
+ <div align="center" style="line-height: 1;">
42
+ <a href="https://huggingface.co/manycore-research/SpatialLM-Llama-1B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
43
+ src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM%201B-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
44
+ <a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
45
+ src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
46
+ </div>
47
+
48
+ ## Introduction
49
+
50
+ SpatialLM is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, SpatialLM can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors. This multimodal architecture effectively bridges the gap between unstructured 3D geometric data and structured 3D representations, offering high-level semantic understanding. It enhances spatial reasoning capabilities for applications in embodied robotics, autonomous navigation, and other complex 3D scene analysis tasks.
51
+
52
+ <div align="center">
53
+ <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/3bz_jNRCLD2L9uj11HPnP.mp4" poster="https://cdn-uploads.huggingface.co/production/uploads/63efbb1efc92a63ac81126d0/euo94dNx28qBNe51_oiB1.png"></video>
54
+ <p><i>SpatialLM reconstructs 3D layout from a monocular RGB video with MASt3R-SLAM. Results aligned to video with GT cameras for visualization.</i></p>
55
+ </div>
56
+
57
+ ## SpatialLM Models
58
+
59
+ <div align="center">
60
+
61
+ | **Model** | **Download** |
62
+ | :-----------------: | ------------------------------------------------------------------------------ |
63
+ | SpatialLM-Llama-1B | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) |
64
+ | SpatialLM-Qwen-0.5B | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) |
65
+
66
+ </div>
67
+
68
+ ## Usage
69
+
70
+ ### Installation
71
+
72
+ Tested with the following environment:
73
+
74
+ - Python 3.11
75
+ - Pytorch 2.4.1
76
+ - CUDA Version 12.4
77
+
78
+ ```bash
79
+ # clone the repository
80
+ git clone https://github.com/manycore-research/SpatialLM.git
81
+ cd SpatialLM
82
+
83
+ # create a conda environment with cuda 12.4
84
+ conda create -n spatiallm python=3.11
85
+ conda activate spatiallm
86
+ conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash
87
+
88
+ # Install dependencies with poetry
89
+ pip install poetry && poetry config virtualenvs.create false --local
90
+ poetry install
91
+ poe install-torchsparse # Building wheel for torchsparse will take a while
92
+ ```
93
+
94
+ ### Inference
95
+
96
+ In the current version of SpatialLM, input point clouds are considered axis-aligned where the z-axis is the up axis. This orientation is crucial for maintaining consistency in spatial understanding and scene interpretation across different datasets and applications.
97
+ Example preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM), are available in [SpatialLM-Testset](#spatiallm-testset).
98
+
99
+ Download an example point cloud:
100
+
101
+ ```bash
102
+ huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .
103
+ ```
104
+
105
+ Run inference:
106
+
107
+ ```bash
108
+ python inference.py --point_cloud pcd/scene0000_00.ply --output scene0000_00.txt --model_path manycore-research/SpatialLM-Qwen-0.5B
109
+ ```
110
+
111
+ ### Visualization
112
+
113
+ Use `rerun` to visualize the point cloud and the predicted structured 3D layout output:
114
+
115
+ ```bash
116
+ # Convert the predicted layout to Rerun format
117
+ python visualize.py --point_cloud pcd/scene0000_00.ply --layout scene0000_00.txt --save scene0000_00.rrd
118
+
119
+ # Visualize the point cloud and the predicted layout
120
+ rerun scene0000_00.rrd
121
+ ```
122
+
123
+ ### Evaluation
124
+
125
+ To evaluate the performance of SpatialLM, we provide `eval.py` script that reports the benchmark results on the SpatialLM-Testset in the table below in section [Benchmark Results](#benchmark-results).
126
+
127
+ Download the testset:
128
+
129
+ ```bash
130
+ huggingface-cli download manycore-research/SpatialLM-Testset --repo-type dataset --local-dir SpatialLM-Testset
131
+ ```
132
+
133
+ Run evaluation:
134
+
135
+ ```bash
136
+ # Run inference on the PLY point clouds in folder SpatialLM-Testset/pcd with SpatialLM-Qwen-0.5B model
137
+ python inference.py --point_cloud SpatialLM-Testset/pcd --output SpatialLM-Testset/pred --model_path manycore-research/SpatialLM-Qwen-0.5B
138
+
139
+ # Evaluate the predicted layouts
140
+ python eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/layout --pred_dir SpatialLM-Testset/pred --label_mapping SpatialLM-Testset/benchmark_categories.tsv
141
+ ```
142
+
143
+ ## SpatialLM Testset
144
+
145
+ We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
146
+
147
+ <div align="center">
148
+
149
+ | **Dataset** | **Download** |
150
+ | :---------------: | ---------------------------------------------------------------------------------- |
151
+ | SpatialLM-Testset | [🤗 Datasets](https://huggingface.co/datasets/manycore-research/SpatialLM-TestSet) |
152
+
153
+ </div>
154
+
155
+ ## Benchmark Results
156
+
157
+ Benchmark results on the challenging SpatialLM-Testset are reported in the following table:
158
+
159
+ <div align="center">
160
+
161
+ | **Method** | **SpatialLM-Llama-1B** | **SpatialLM-Qwen-0.5B** |
162
+ | ---------------- | ---------------------- | ----------------------- |
163
+ | **Floorplan** | **mean IoU** | |
164
+ | wall | 78.62 | 74.81 |
165
+ | | | |
166
+ | **Objects** | **F1 @.25 IoU (3D)** | |
167
+ | curtain | 27.35 | 28.59 |
168
+ | nightstand | 57.47 | 54.39 |
169
+ | chandelier | 38.92 | 40.12 |
170
+ | wardrobe | 23.33 | 30.60 |
171
+ | bed | 95.24 | 93.75 |
172
+ | sofa | 65.50 | 66.15 |
173
+ | chair | 21.26 | 14.94 |
174
+ | cabinet | 8.47 | 8.44 |
175
+ | dining table | 54.26 | 56.10 |
176
+ | plants | 20.68 | 26.46 |
177
+ | tv cabinet | 33.33 | 10.26 |
178
+ | coffee table | 50.00 | 55.56 |
179
+ | side table | 7.60 | 2.17 |
180
+ | air conditioner | 20.00 | 13.04 |
181
+ | dresser | 46.67 | 23.53 |
182
+ | | | |
183
+ | **Thin Objects** | **F1 @.25 IoU (2D)** | |
184
+ | painting | 50.04 | 53.81 |
185
+ | carpet | 31.76 | 45.31 |
186
+ | tv | 67.31 | 52.29 |
187
+ | door | 50.35 | 42.15 |
188
+ | window | 45.4 | 45.9 |
189
+
190
+ </div>
191
+
192
+ ## License
193
+
194
+ SpatialLM-Llama-1B is derived from Llama3.2-1B-Instruct, which is licensed under the Llama3.2 license.
195
+ SpatialLM-Qwen-0.5B is derived from the Qwen-2.5 series, originally licensed under the Apache 2.0 License.
196
+
197
+ All models are built upon the SceneScript point cloud encoder, licensed under the CC-BY-NC-4.0 License. TorchSparse, utilized in this project, is licensed under the MIT License.
198
+
199
+ ## Citation
200
+
201
+ If you find this work useful, please consider citing:
202
+
203
+ ```bibtex
204
+ @misc{spatiallm,
205
+ title = {SpatialLM: Large Language Model for Spatial Understanding},
206
+ author = {ManyCore Research Team},
207
+ howpublished = {\url{https://github.com/manycore-research/SpatialLM}},
208
+ year = {2025}
209
+ }
210
+ ```
211
+
212
+ ## Acknowledgements
213
+
214
+ We would like to thank the following projects that made this work possible:
215
+
216
+ [Llama3.2](https://github.com/meta-llama) | [Qwen2.5](https://github.com/QwenLM/Qwen2.5) | [Transformers](https://github.com/huggingface/transformers) | [SceneScript](https://github.com/facebookresearch/scenescript) | [TorchSparse](https://github.com/mit-han-lab/torchsparse)