saurabhati commited on
Commit
22b5634
·
verified ·
1 Parent(s): 561d594

Upload VMambaForImageClassification

Browse files
Files changed (5) hide show
  1. README.md +199 -0
  2. config.json +2035 -0
  3. configuration_vmamba.py +97 -0
  4. model.safetensors +3 -0
  5. modeling_vmamba.py +1220 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
config.json ADDED
@@ -0,0 +1,2035 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "VMambaForImageClassification"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_vmamba.VMambaConfig",
7
+ "AutoModelForImageClassification": "modeling_vmamba.VMambaForImageClassification"
8
+ },
9
+ "depths": [
10
+ 2,
11
+ 2,
12
+ 20,
13
+ 2
14
+ ],
15
+ "dims": [
16
+ 96,
17
+ 192,
18
+ 384,
19
+ 768
20
+ ],
21
+ "drop_path_rate": 0.2,
22
+ "embed_dim": 96,
23
+ "id2label": {
24
+ "0": "tench, Tinca tinca",
25
+ "1": "goldfish, Carassius auratus",
26
+ "2": "great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias",
27
+ "3": "tiger shark, Galeocerdo cuvieri",
28
+ "4": "hammerhead, hammerhead shark",
29
+ "5": "electric ray, crampfish, numbfish, torpedo",
30
+ "6": "stingray",
31
+ "7": "cock",
32
+ "8": "hen",
33
+ "9": "ostrich, Struthio camelus",
34
+ "10": "brambling, Fringilla montifringilla",
35
+ "11": "goldfinch, Carduelis carduelis",
36
+ "12": "house finch, linnet, Carpodacus mexicanus",
37
+ "13": "junco, snowbird",
38
+ "14": "indigo bunting, indigo finch, indigo bird, Passerina cyanea",
39
+ "15": "robin, American robin, Turdus migratorius",
40
+ "16": "bulbul",
41
+ "17": "jay",
42
+ "18": "magpie",
43
+ "19": "chickadee",
44
+ "20": "water ouzel, dipper",
45
+ "21": "kite",
46
+ "22": "bald eagle, American eagle, Haliaeetus leucocephalus",
47
+ "23": "vulture",
48
+ "24": "great grey owl, great gray owl, Strix nebulosa",
49
+ "25": "European fire salamander, Salamandra salamandra",
50
+ "26": "common newt, Triturus vulgaris",
51
+ "27": "eft",
52
+ "28": "spotted salamander, Ambystoma maculatum",
53
+ "29": "axolotl, mud puppy, Ambystoma mexicanum",
54
+ "30": "bullfrog, Rana catesbeiana",
55
+ "31": "tree frog, tree-frog",
56
+ "32": "tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui",
57
+ "33": "loggerhead, loggerhead turtle, Caretta caretta",
58
+ "34": "leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea",
59
+ "35": "mud turtle",
60
+ "36": "terrapin",
61
+ "37": "box turtle, box tortoise",
62
+ "38": "banded gecko",
63
+ "39": "common iguana, iguana, Iguana iguana",
64
+ "40": "American chameleon, anole, Anolis carolinensis",
65
+ "41": "whiptail, whiptail lizard",
66
+ "42": "agama",
67
+ "43": "frilled lizard, Chlamydosaurus kingi",
68
+ "44": "alligator lizard",
69
+ "45": "Gila monster, Heloderma suspectum",
70
+ "46": "green lizard, Lacerta viridis",
71
+ "47": "African chameleon, Chamaeleo chamaeleon",
72
+ "48": "Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis",
73
+ "49": "African crocodile, Nile crocodile, Crocodylus niloticus",
74
+ "50": "American alligator, Alligator mississipiensis",
75
+ "51": "triceratops",
76
+ "52": "thunder snake, worm snake, Carphophis amoenus",
77
+ "53": "ringneck snake, ring-necked snake, ring snake",
78
+ "54": "hognose snake, puff adder, sand viper",
79
+ "55": "green snake, grass snake",
80
+ "56": "king snake, kingsnake",
81
+ "57": "garter snake, grass snake",
82
+ "58": "water snake",
83
+ "59": "vine snake",
84
+ "60": "night snake, Hypsiglena torquata",
85
+ "61": "boa constrictor, Constrictor constrictor",
86
+ "62": "rock python, rock snake, Python sebae",
87
+ "63": "Indian cobra, Naja naja",
88
+ "64": "green mamba",
89
+ "65": "sea snake",
90
+ "66": "horned viper, cerastes, sand viper, horned asp, Cerastes cornutus",
91
+ "67": "diamondback, diamondback rattlesnake, Crotalus adamanteus",
92
+ "68": "sidewinder, horned rattlesnake, Crotalus cerastes",
93
+ "69": "trilobite",
94
+ "70": "harvestman, daddy longlegs, Phalangium opilio",
95
+ "71": "scorpion",
96
+ "72": "black and gold garden spider, Argiope aurantia",
97
+ "73": "barn spider, Araneus cavaticus",
98
+ "74": "garden spider, Aranea diademata",
99
+ "75": "black widow, Latrodectus mactans",
100
+ "76": "tarantula",
101
+ "77": "wolf spider, hunting spider",
102
+ "78": "tick",
103
+ "79": "centipede",
104
+ "80": "black grouse",
105
+ "81": "ptarmigan",
106
+ "82": "ruffed grouse, partridge, Bonasa umbellus",
107
+ "83": "prairie chicken, prairie grouse, prairie fowl",
108
+ "84": "peacock",
109
+ "85": "quail",
110
+ "86": "partridge",
111
+ "87": "African grey, African gray, Psittacus erithacus",
112
+ "88": "macaw",
113
+ "89": "sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita",
114
+ "90": "lorikeet",
115
+ "91": "coucal",
116
+ "92": "bee eater",
117
+ "93": "hornbill",
118
+ "94": "hummingbird",
119
+ "95": "jacamar",
120
+ "96": "toucan",
121
+ "97": "drake",
122
+ "98": "red-breasted merganser, Mergus serrator",
123
+ "99": "goose",
124
+ "100": "black swan, Cygnus atratus",
125
+ "101": "tusker",
126
+ "102": "echidna, spiny anteater, anteater",
127
+ "103": "platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus",
128
+ "104": "wallaby, brush kangaroo",
129
+ "105": "koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus",
130
+ "106": "wombat",
131
+ "107": "jellyfish",
132
+ "108": "sea anemone, anemone",
133
+ "109": "brain coral",
134
+ "110": "flatworm, platyhelminth",
135
+ "111": "nematode, nematode worm, roundworm",
136
+ "112": "conch",
137
+ "113": "snail",
138
+ "114": "slug",
139
+ "115": "sea slug, nudibranch",
140
+ "116": "chiton, coat-of-mail shell, sea cradle, polyplacophore",
141
+ "117": "chambered nautilus, pearly nautilus, nautilus",
142
+ "118": "Dungeness crab, Cancer magister",
143
+ "119": "rock crab, Cancer irroratus",
144
+ "120": "fiddler crab",
145
+ "121": "king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica",
146
+ "122": "American lobster, Northern lobster, Maine lobster, Homarus americanus",
147
+ "123": "spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish",
148
+ "124": "crayfish, crawfish, crawdad, crawdaddy",
149
+ "125": "hermit crab",
150
+ "126": "isopod",
151
+ "127": "white stork, Ciconia ciconia",
152
+ "128": "black stork, Ciconia nigra",
153
+ "129": "spoonbill",
154
+ "130": "flamingo",
155
+ "131": "little blue heron, Egretta caerulea",
156
+ "132": "American egret, great white heron, Egretta albus",
157
+ "133": "bittern",
158
+ "134": "crane",
159
+ "135": "limpkin, Aramus pictus",
160
+ "136": "European gallinule, Porphyrio porphyrio",
161
+ "137": "American coot, marsh hen, mud hen, water hen, Fulica americana",
162
+ "138": "bustard",
163
+ "139": "ruddy turnstone, Arenaria interpres",
164
+ "140": "red-backed sandpiper, dunlin, Erolia alpina",
165
+ "141": "redshank, Tringa totanus",
166
+ "142": "dowitcher",
167
+ "143": "oystercatcher, oyster catcher",
168
+ "144": "pelican",
169
+ "145": "king penguin, Aptenodytes patagonica",
170
+ "146": "albatross, mollymawk",
171
+ "147": "grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus",
172
+ "148": "killer whale, killer, orca, grampus, sea wolf, Orcinus orca",
173
+ "149": "dugong, Dugong dugon",
174
+ "150": "sea lion",
175
+ "151": "Chihuahua",
176
+ "152": "Japanese spaniel",
177
+ "153": "Maltese dog, Maltese terrier, Maltese",
178
+ "154": "Pekinese, Pekingese, Peke",
179
+ "155": "Shih-Tzu",
180
+ "156": "Blenheim spaniel",
181
+ "157": "papillon",
182
+ "158": "toy terrier",
183
+ "159": "Rhodesian ridgeback",
184
+ "160": "Afghan hound, Afghan",
185
+ "161": "basset, basset hound",
186
+ "162": "beagle",
187
+ "163": "bloodhound, sleuthhound",
188
+ "164": "bluetick",
189
+ "165": "black-and-tan coonhound",
190
+ "166": "Walker hound, Walker foxhound",
191
+ "167": "English foxhound",
192
+ "168": "redbone",
193
+ "169": "borzoi, Russian wolfhound",
194
+ "170": "Irish wolfhound",
195
+ "171": "Italian greyhound",
196
+ "172": "whippet",
197
+ "173": "Ibizan hound, Ibizan Podenco",
198
+ "174": "Norwegian elkhound, elkhound",
199
+ "175": "otterhound, otter hound",
200
+ "176": "Saluki, gazelle hound",
201
+ "177": "Scottish deerhound, deerhound",
202
+ "178": "Weimaraner",
203
+ "179": "Staffordshire bullterrier, Staffordshire bull terrier",
204
+ "180": "American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier",
205
+ "181": "Bedlington terrier",
206
+ "182": "Border terrier",
207
+ "183": "Kerry blue terrier",
208
+ "184": "Irish terrier",
209
+ "185": "Norfolk terrier",
210
+ "186": "Norwich terrier",
211
+ "187": "Yorkshire terrier",
212
+ "188": "wire-haired fox terrier",
213
+ "189": "Lakeland terrier",
214
+ "190": "Sealyham terrier, Sealyham",
215
+ "191": "Airedale, Airedale terrier",
216
+ "192": "cairn, cairn terrier",
217
+ "193": "Australian terrier",
218
+ "194": "Dandie Dinmont, Dandie Dinmont terrier",
219
+ "195": "Boston bull, Boston terrier",
220
+ "196": "miniature schnauzer",
221
+ "197": "giant schnauzer",
222
+ "198": "standard schnauzer",
223
+ "199": "Scotch terrier, Scottish terrier, Scottie",
224
+ "200": "Tibetan terrier, chrysanthemum dog",
225
+ "201": "silky terrier, Sydney silky",
226
+ "202": "soft-coated wheaten terrier",
227
+ "203": "West Highland white terrier",
228
+ "204": "Lhasa, Lhasa apso",
229
+ "205": "flat-coated retriever",
230
+ "206": "curly-coated retriever",
231
+ "207": "golden retriever",
232
+ "208": "Labrador retriever",
233
+ "209": "Chesapeake Bay retriever",
234
+ "210": "German short-haired pointer",
235
+ "211": "vizsla, Hungarian pointer",
236
+ "212": "English setter",
237
+ "213": "Irish setter, red setter",
238
+ "214": "Gordon setter",
239
+ "215": "Brittany spaniel",
240
+ "216": "clumber, clumber spaniel",
241
+ "217": "English springer, English springer spaniel",
242
+ "218": "Welsh springer spaniel",
243
+ "219": "cocker spaniel, English cocker spaniel, cocker",
244
+ "220": "Sussex spaniel",
245
+ "221": "Irish water spaniel",
246
+ "222": "kuvasz",
247
+ "223": "schipperke",
248
+ "224": "groenendael",
249
+ "225": "malinois",
250
+ "226": "briard",
251
+ "227": "kelpie",
252
+ "228": "komondor",
253
+ "229": "Old English sheepdog, bobtail",
254
+ "230": "Shetland sheepdog, Shetland sheep dog, Shetland",
255
+ "231": "collie",
256
+ "232": "Border collie",
257
+ "233": "Bouvier des Flandres, Bouviers des Flandres",
258
+ "234": "Rottweiler",
259
+ "235": "German shepherd, German shepherd dog, German police dog, alsatian",
260
+ "236": "Doberman, Doberman pinscher",
261
+ "237": "miniature pinscher",
262
+ "238": "Greater Swiss Mountain dog",
263
+ "239": "Bernese mountain dog",
264
+ "240": "Appenzeller",
265
+ "241": "EntleBucher",
266
+ "242": "boxer",
267
+ "243": "bull mastiff",
268
+ "244": "Tibetan mastiff",
269
+ "245": "French bulldog",
270
+ "246": "Great Dane",
271
+ "247": "Saint Bernard, St Bernard",
272
+ "248": "Eskimo dog, husky",
273
+ "249": "malamute, malemute, Alaskan malamute",
274
+ "250": "Siberian husky",
275
+ "251": "dalmatian, coach dog, carriage dog",
276
+ "252": "affenpinscher, monkey pinscher, monkey dog",
277
+ "253": "basenji",
278
+ "254": "pug, pug-dog",
279
+ "255": "Leonberg",
280
+ "256": "Newfoundland, Newfoundland dog",
281
+ "257": "Great Pyrenees",
282
+ "258": "Samoyed, Samoyede",
283
+ "259": "Pomeranian",
284
+ "260": "chow, chow chow",
285
+ "261": "keeshond",
286
+ "262": "Brabancon griffon",
287
+ "263": "Pembroke, Pembroke Welsh corgi",
288
+ "264": "Cardigan, Cardigan Welsh corgi",
289
+ "265": "toy poodle",
290
+ "266": "miniature poodle",
291
+ "267": "standard poodle",
292
+ "268": "Mexican hairless",
293
+ "269": "timber wolf, grey wolf, gray wolf, Canis lupus",
294
+ "270": "white wolf, Arctic wolf, Canis lupus tundrarum",
295
+ "271": "red wolf, maned wolf, Canis rufus, Canis niger",
296
+ "272": "coyote, prairie wolf, brush wolf, Canis latrans",
297
+ "273": "dingo, warrigal, warragal, Canis dingo",
298
+ "274": "dhole, Cuon alpinus",
299
+ "275": "African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus",
300
+ "276": "hyena, hyaena",
301
+ "277": "red fox, Vulpes vulpes",
302
+ "278": "kit fox, Vulpes macrotis",
303
+ "279": "Arctic fox, white fox, Alopex lagopus",
304
+ "280": "grey fox, gray fox, Urocyon cinereoargenteus",
305
+ "281": "tabby, tabby cat",
306
+ "282": "tiger cat",
307
+ "283": "Persian cat",
308
+ "284": "Siamese cat, Siamese",
309
+ "285": "Egyptian cat",
310
+ "286": "cougar, puma, catamount, mountain lion, painter, panther, Felis concolor",
311
+ "287": "lynx, catamount",
312
+ "288": "leopard, Panthera pardus",
313
+ "289": "snow leopard, ounce, Panthera uncia",
314
+ "290": "jaguar, panther, Panthera onca, Felis onca",
315
+ "291": "lion, king of beasts, Panthera leo",
316
+ "292": "tiger, Panthera tigris",
317
+ "293": "cheetah, chetah, Acinonyx jubatus",
318
+ "294": "brown bear, bruin, Ursus arctos",
319
+ "295": "American black bear, black bear, Ursus americanus, Euarctos americanus",
320
+ "296": "ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus",
321
+ "297": "sloth bear, Melursus ursinus, Ursus ursinus",
322
+ "298": "mongoose",
323
+ "299": "meerkat, mierkat",
324
+ "300": "tiger beetle",
325
+ "301": "ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle",
326
+ "302": "ground beetle, carabid beetle",
327
+ "303": "long-horned beetle, longicorn, longicorn beetle",
328
+ "304": "leaf beetle, chrysomelid",
329
+ "305": "dung beetle",
330
+ "306": "rhinoceros beetle",
331
+ "307": "weevil",
332
+ "308": "fly",
333
+ "309": "bee",
334
+ "310": "ant, emmet, pismire",
335
+ "311": "grasshopper, hopper",
336
+ "312": "cricket",
337
+ "313": "walking stick, walkingstick, stick insect",
338
+ "314": "cockroach, roach",
339
+ "315": "mantis, mantid",
340
+ "316": "cicada, cicala",
341
+ "317": "leafhopper",
342
+ "318": "lacewing, lacewing fly",
343
+ "319": "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
344
+ "320": "damselfly",
345
+ "321": "admiral",
346
+ "322": "ringlet, ringlet butterfly",
347
+ "323": "monarch, monarch butterfly, milkweed butterfly, Danaus plexippus",
348
+ "324": "cabbage butterfly",
349
+ "325": "sulphur butterfly, sulfur butterfly",
350
+ "326": "lycaenid, lycaenid butterfly",
351
+ "327": "starfish, sea star",
352
+ "328": "sea urchin",
353
+ "329": "sea cucumber, holothurian",
354
+ "330": "wood rabbit, cottontail, cottontail rabbit",
355
+ "331": "hare",
356
+ "332": "Angora, Angora rabbit",
357
+ "333": "hamster",
358
+ "334": "porcupine, hedgehog",
359
+ "335": "fox squirrel, eastern fox squirrel, Sciurus niger",
360
+ "336": "marmot",
361
+ "337": "beaver",
362
+ "338": "guinea pig, Cavia cobaya",
363
+ "339": "sorrel",
364
+ "340": "zebra",
365
+ "341": "hog, pig, grunter, squealer, Sus scrofa",
366
+ "342": "wild boar, boar, Sus scrofa",
367
+ "343": "warthog",
368
+ "344": "hippopotamus, hippo, river horse, Hippopotamus amphibius",
369
+ "345": "ox",
370
+ "346": "water buffalo, water ox, Asiatic buffalo, Bubalus bubalis",
371
+ "347": "bison",
372
+ "348": "ram, tup",
373
+ "349": "bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis",
374
+ "350": "ibex, Capra ibex",
375
+ "351": "hartebeest",
376
+ "352": "impala, Aepyceros melampus",
377
+ "353": "gazelle",
378
+ "354": "Arabian camel, dromedary, Camelus dromedarius",
379
+ "355": "llama",
380
+ "356": "weasel",
381
+ "357": "mink",
382
+ "358": "polecat, fitch, foulmart, foumart, Mustela putorius",
383
+ "359": "black-footed ferret, ferret, Mustela nigripes",
384
+ "360": "otter",
385
+ "361": "skunk, polecat, wood pussy",
386
+ "362": "badger",
387
+ "363": "armadillo",
388
+ "364": "three-toed sloth, ai, Bradypus tridactylus",
389
+ "365": "orangutan, orang, orangutang, Pongo pygmaeus",
390
+ "366": "gorilla, Gorilla gorilla",
391
+ "367": "chimpanzee, chimp, Pan troglodytes",
392
+ "368": "gibbon, Hylobates lar",
393
+ "369": "siamang, Hylobates syndactylus, Symphalangus syndactylus",
394
+ "370": "guenon, guenon monkey",
395
+ "371": "patas, hussar monkey, Erythrocebus patas",
396
+ "372": "baboon",
397
+ "373": "macaque",
398
+ "374": "langur",
399
+ "375": "colobus, colobus monkey",
400
+ "376": "proboscis monkey, Nasalis larvatus",
401
+ "377": "marmoset",
402
+ "378": "capuchin, ringtail, Cebus capucinus",
403
+ "379": "howler monkey, howler",
404
+ "380": "titi, titi monkey",
405
+ "381": "spider monkey, Ateles geoffroyi",
406
+ "382": "squirrel monkey, Saimiri sciureus",
407
+ "383": "Madagascar cat, ring-tailed lemur, Lemur catta",
408
+ "384": "indri, indris, Indri indri, Indri brevicaudatus",
409
+ "385": "Indian elephant, Elephas maximus",
410
+ "386": "African elephant, Loxodonta africana",
411
+ "387": "lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens",
412
+ "388": "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca",
413
+ "389": "barracouta, snoek",
414
+ "390": "eel",
415
+ "391": "coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch",
416
+ "392": "rock beauty, Holocanthus tricolor",
417
+ "393": "anemone fish",
418
+ "394": "sturgeon",
419
+ "395": "gar, garfish, garpike, billfish, Lepisosteus osseus",
420
+ "396": "lionfish",
421
+ "397": "puffer, pufferfish, blowfish, globefish",
422
+ "398": "abacus",
423
+ "399": "abaya",
424
+ "400": "academic gown, academic robe, judge's robe",
425
+ "401": "accordion, piano accordion, squeeze box",
426
+ "402": "acoustic guitar",
427
+ "403": "aircraft carrier, carrier, flattop, attack aircraft carrier",
428
+ "404": "airliner",
429
+ "405": "airship, dirigible",
430
+ "406": "altar",
431
+ "407": "ambulance",
432
+ "408": "amphibian, amphibious vehicle",
433
+ "409": "analog clock",
434
+ "410": "apiary, bee house",
435
+ "411": "apron",
436
+ "412": "ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin",
437
+ "413": "assault rifle, assault gun",
438
+ "414": "backpack, back pack, knapsack, packsack, rucksack, haversack",
439
+ "415": "bakery, bakeshop, bakehouse",
440
+ "416": "balance beam, beam",
441
+ "417": "balloon",
442
+ "418": "ballpoint, ballpoint pen, ballpen, Biro",
443
+ "419": "Band Aid",
444
+ "420": "banjo",
445
+ "421": "bannister, banister, balustrade, balusters, handrail",
446
+ "422": "barbell",
447
+ "423": "barber chair",
448
+ "424": "barbershop",
449
+ "425": "barn",
450
+ "426": "barometer",
451
+ "427": "barrel, cask",
452
+ "428": "barrow, garden cart, lawn cart, wheelbarrow",
453
+ "429": "baseball",
454
+ "430": "basketball",
455
+ "431": "bassinet",
456
+ "432": "bassoon",
457
+ "433": "bathing cap, swimming cap",
458
+ "434": "bath towel",
459
+ "435": "bathtub, bathing tub, bath, tub",
460
+ "436": "beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon",
461
+ "437": "beacon, lighthouse, beacon light, pharos",
462
+ "438": "beaker",
463
+ "439": "bearskin, busby, shako",
464
+ "440": "beer bottle",
465
+ "441": "beer glass",
466
+ "442": "bell cote, bell cot",
467
+ "443": "bib",
468
+ "444": "bicycle-built-for-two, tandem bicycle, tandem",
469
+ "445": "bikini, two-piece",
470
+ "446": "binder, ring-binder",
471
+ "447": "binoculars, field glasses, opera glasses",
472
+ "448": "birdhouse",
473
+ "449": "boathouse",
474
+ "450": "bobsled, bobsleigh, bob",
475
+ "451": "bolo tie, bolo, bola tie, bola",
476
+ "452": "bonnet, poke bonnet",
477
+ "453": "bookcase",
478
+ "454": "bookshop, bookstore, bookstall",
479
+ "455": "bottlecap",
480
+ "456": "bow",
481
+ "457": "bow tie, bow-tie, bowtie",
482
+ "458": "brass, memorial tablet, plaque",
483
+ "459": "brassiere, bra, bandeau",
484
+ "460": "breakwater, groin, groyne, mole, bulwark, seawall, jetty",
485
+ "461": "breastplate, aegis, egis",
486
+ "462": "broom",
487
+ "463": "bucket, pail",
488
+ "464": "buckle",
489
+ "465": "bulletproof vest",
490
+ "466": "bullet train, bullet",
491
+ "467": "butcher shop, meat market",
492
+ "468": "cab, hack, taxi, taxicab",
493
+ "469": "caldron, cauldron",
494
+ "470": "candle, taper, wax light",
495
+ "471": "cannon",
496
+ "472": "canoe",
497
+ "473": "can opener, tin opener",
498
+ "474": "cardigan",
499
+ "475": "car mirror",
500
+ "476": "carousel, carrousel, merry-go-round, roundabout, whirligig",
501
+ "477": "carpenter's kit, tool kit",
502
+ "478": "carton",
503
+ "479": "car wheel",
504
+ "480": "cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM",
505
+ "481": "cassette",
506
+ "482": "cassette player",
507
+ "483": "castle",
508
+ "484": "catamaran",
509
+ "485": "CD player",
510
+ "486": "cello, violoncello",
511
+ "487": "cellular telephone, cellular phone, cellphone, cell, mobile phone",
512
+ "488": "chain",
513
+ "489": "chainlink fence",
514
+ "490": "chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour",
515
+ "491": "chain saw, chainsaw",
516
+ "492": "chest",
517
+ "493": "chiffonier, commode",
518
+ "494": "chime, bell, gong",
519
+ "495": "china cabinet, china closet",
520
+ "496": "Christmas stocking",
521
+ "497": "church, church building",
522
+ "498": "cinema, movie theater, movie theatre, movie house, picture palace",
523
+ "499": "cleaver, meat cleaver, chopper",
524
+ "500": "cliff dwelling",
525
+ "501": "cloak",
526
+ "502": "clog, geta, patten, sabot",
527
+ "503": "cocktail shaker",
528
+ "504": "coffee mug",
529
+ "505": "coffeepot",
530
+ "506": "coil, spiral, volute, whorl, helix",
531
+ "507": "combination lock",
532
+ "508": "computer keyboard, keypad",
533
+ "509": "confectionery, confectionary, candy store",
534
+ "510": "container ship, containership, container vessel",
535
+ "511": "convertible",
536
+ "512": "corkscrew, bottle screw",
537
+ "513": "cornet, horn, trumpet, trump",
538
+ "514": "cowboy boot",
539
+ "515": "cowboy hat, ten-gallon hat",
540
+ "516": "cradle",
541
+ "517": "crane",
542
+ "518": "crash helmet",
543
+ "519": "crate",
544
+ "520": "crib, cot",
545
+ "521": "Crock Pot",
546
+ "522": "croquet ball",
547
+ "523": "crutch",
548
+ "524": "cuirass",
549
+ "525": "dam, dike, dyke",
550
+ "526": "desk",
551
+ "527": "desktop computer",
552
+ "528": "dial telephone, dial phone",
553
+ "529": "diaper, nappy, napkin",
554
+ "530": "digital clock",
555
+ "531": "digital watch",
556
+ "532": "dining table, board",
557
+ "533": "dishrag, dishcloth",
558
+ "534": "dishwasher, dish washer, dishwashing machine",
559
+ "535": "disk brake, disc brake",
560
+ "536": "dock, dockage, docking facility",
561
+ "537": "dogsled, dog sled, dog sleigh",
562
+ "538": "dome",
563
+ "539": "doormat, welcome mat",
564
+ "540": "drilling platform, offshore rig",
565
+ "541": "drum, membranophone, tympan",
566
+ "542": "drumstick",
567
+ "543": "dumbbell",
568
+ "544": "Dutch oven",
569
+ "545": "electric fan, blower",
570
+ "546": "electric guitar",
571
+ "547": "electric locomotive",
572
+ "548": "entertainment center",
573
+ "549": "envelope",
574
+ "550": "espresso maker",
575
+ "551": "face powder",
576
+ "552": "feather boa, boa",
577
+ "553": "file, file cabinet, filing cabinet",
578
+ "554": "fireboat",
579
+ "555": "fire engine, fire truck",
580
+ "556": "fire screen, fireguard",
581
+ "557": "flagpole, flagstaff",
582
+ "558": "flute, transverse flute",
583
+ "559": "folding chair",
584
+ "560": "football helmet",
585
+ "561": "forklift",
586
+ "562": "fountain",
587
+ "563": "fountain pen",
588
+ "564": "four-poster",
589
+ "565": "freight car",
590
+ "566": "French horn, horn",
591
+ "567": "frying pan, frypan, skillet",
592
+ "568": "fur coat",
593
+ "569": "garbage truck, dustcart",
594
+ "570": "gasmask, respirator, gas helmet",
595
+ "571": "gas pump, gasoline pump, petrol pump, island dispenser",
596
+ "572": "goblet",
597
+ "573": "go-kart",
598
+ "574": "golf ball",
599
+ "575": "golfcart, golf cart",
600
+ "576": "gondola",
601
+ "577": "gong, tam-tam",
602
+ "578": "gown",
603
+ "579": "grand piano, grand",
604
+ "580": "greenhouse, nursery, glasshouse",
605
+ "581": "grille, radiator grille",
606
+ "582": "grocery store, grocery, food market, market",
607
+ "583": "guillotine",
608
+ "584": "hair slide",
609
+ "585": "hair spray",
610
+ "586": "half track",
611
+ "587": "hammer",
612
+ "588": "hamper",
613
+ "589": "hand blower, blow dryer, blow drier, hair dryer, hair drier",
614
+ "590": "hand-held computer, hand-held microcomputer",
615
+ "591": "handkerchief, hankie, hanky, hankey",
616
+ "592": "hard disc, hard disk, fixed disk",
617
+ "593": "harmonica, mouth organ, harp, mouth harp",
618
+ "594": "harp",
619
+ "595": "harvester, reaper",
620
+ "596": "hatchet",
621
+ "597": "holster",
622
+ "598": "home theater, home theatre",
623
+ "599": "honeycomb",
624
+ "600": "hook, claw",
625
+ "601": "hoopskirt, crinoline",
626
+ "602": "horizontal bar, high bar",
627
+ "603": "horse cart, horse-cart",
628
+ "604": "hourglass",
629
+ "605": "iPod",
630
+ "606": "iron, smoothing iron",
631
+ "607": "jack-o'-lantern",
632
+ "608": "jean, blue jean, denim",
633
+ "609": "jeep, landrover",
634
+ "610": "jersey, T-shirt, tee shirt",
635
+ "611": "jigsaw puzzle",
636
+ "612": "jinrikisha, ricksha, rickshaw",
637
+ "613": "joystick",
638
+ "614": "kimono",
639
+ "615": "knee pad",
640
+ "616": "knot",
641
+ "617": "lab coat, laboratory coat",
642
+ "618": "ladle",
643
+ "619": "lampshade, lamp shade",
644
+ "620": "laptop, laptop computer",
645
+ "621": "lawn mower, mower",
646
+ "622": "lens cap, lens cover",
647
+ "623": "letter opener, paper knife, paperknife",
648
+ "624": "library",
649
+ "625": "lifeboat",
650
+ "626": "lighter, light, igniter, ignitor",
651
+ "627": "limousine, limo",
652
+ "628": "liner, ocean liner",
653
+ "629": "lipstick, lip rouge",
654
+ "630": "Loafer",
655
+ "631": "lotion",
656
+ "632": "loudspeaker, speaker, speaker unit, loudspeaker system, speaker system",
657
+ "633": "loupe, jeweler's loupe",
658
+ "634": "lumbermill, sawmill",
659
+ "635": "magnetic compass",
660
+ "636": "mailbag, postbag",
661
+ "637": "mailbox, letter box",
662
+ "638": "maillot",
663
+ "639": "maillot, tank suit",
664
+ "640": "manhole cover",
665
+ "641": "maraca",
666
+ "642": "marimba, xylophone",
667
+ "643": "mask",
668
+ "644": "matchstick",
669
+ "645": "maypole",
670
+ "646": "maze, labyrinth",
671
+ "647": "measuring cup",
672
+ "648": "medicine chest, medicine cabinet",
673
+ "649": "megalith, megalithic structure",
674
+ "650": "microphone, mike",
675
+ "651": "microwave, microwave oven",
676
+ "652": "military uniform",
677
+ "653": "milk can",
678
+ "654": "minibus",
679
+ "655": "miniskirt, mini",
680
+ "656": "minivan",
681
+ "657": "missile",
682
+ "658": "mitten",
683
+ "659": "mixing bowl",
684
+ "660": "mobile home, manufactured home",
685
+ "661": "Model T",
686
+ "662": "modem",
687
+ "663": "monastery",
688
+ "664": "monitor",
689
+ "665": "moped",
690
+ "666": "mortar",
691
+ "667": "mortarboard",
692
+ "668": "mosque",
693
+ "669": "mosquito net",
694
+ "670": "motor scooter, scooter",
695
+ "671": "mountain bike, all-terrain bike, off-roader",
696
+ "672": "mountain tent",
697
+ "673": "mouse, computer mouse",
698
+ "674": "mousetrap",
699
+ "675": "moving van",
700
+ "676": "muzzle",
701
+ "677": "nail",
702
+ "678": "neck brace",
703
+ "679": "necklace",
704
+ "680": "nipple",
705
+ "681": "notebook, notebook computer",
706
+ "682": "obelisk",
707
+ "683": "oboe, hautboy, hautbois",
708
+ "684": "ocarina, sweet potato",
709
+ "685": "odometer, hodometer, mileometer, milometer",
710
+ "686": "oil filter",
711
+ "687": "organ, pipe organ",
712
+ "688": "oscilloscope, scope, cathode-ray oscilloscope, CRO",
713
+ "689": "overskirt",
714
+ "690": "oxcart",
715
+ "691": "oxygen mask",
716
+ "692": "packet",
717
+ "693": "paddle, boat paddle",
718
+ "694": "paddlewheel, paddle wheel",
719
+ "695": "padlock",
720
+ "696": "paintbrush",
721
+ "697": "pajama, pyjama, pj's, jammies",
722
+ "698": "palace",
723
+ "699": "panpipe, pandean pipe, syrinx",
724
+ "700": "paper towel",
725
+ "701": "parachute, chute",
726
+ "702": "parallel bars, bars",
727
+ "703": "park bench",
728
+ "704": "parking meter",
729
+ "705": "passenger car, coach, carriage",
730
+ "706": "patio, terrace",
731
+ "707": "pay-phone, pay-station",
732
+ "708": "pedestal, plinth, footstall",
733
+ "709": "pencil box, pencil case",
734
+ "710": "pencil sharpener",
735
+ "711": "perfume, essence",
736
+ "712": "Petri dish",
737
+ "713": "photocopier",
738
+ "714": "pick, plectrum, plectron",
739
+ "715": "pickelhaube",
740
+ "716": "picket fence, paling",
741
+ "717": "pickup, pickup truck",
742
+ "718": "pier",
743
+ "719": "piggy bank, penny bank",
744
+ "720": "pill bottle",
745
+ "721": "pillow",
746
+ "722": "ping-pong ball",
747
+ "723": "pinwheel",
748
+ "724": "pirate, pirate ship",
749
+ "725": "pitcher, ewer",
750
+ "726": "plane, carpenter's plane, woodworking plane",
751
+ "727": "planetarium",
752
+ "728": "plastic bag",
753
+ "729": "plate rack",
754
+ "730": "plow, plough",
755
+ "731": "plunger, plumber's helper",
756
+ "732": "Polaroid camera, Polaroid Land camera",
757
+ "733": "pole",
758
+ "734": "police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria",
759
+ "735": "poncho",
760
+ "736": "pool table, billiard table, snooker table",
761
+ "737": "pop bottle, soda bottle",
762
+ "738": "pot, flowerpot",
763
+ "739": "potter's wheel",
764
+ "740": "power drill",
765
+ "741": "prayer rug, prayer mat",
766
+ "742": "printer",
767
+ "743": "prison, prison house",
768
+ "744": "projectile, missile",
769
+ "745": "projector",
770
+ "746": "puck, hockey puck",
771
+ "747": "punching bag, punch bag, punching ball, punchball",
772
+ "748": "purse",
773
+ "749": "quill, quill pen",
774
+ "750": "quilt, comforter, comfort, puff",
775
+ "751": "racer, race car, racing car",
776
+ "752": "racket, racquet",
777
+ "753": "radiator",
778
+ "754": "radio, wireless",
779
+ "755": "radio telescope, radio reflector",
780
+ "756": "rain barrel",
781
+ "757": "recreational vehicle, RV, R.V.",
782
+ "758": "reel",
783
+ "759": "reflex camera",
784
+ "760": "refrigerator, icebox",
785
+ "761": "remote control, remote",
786
+ "762": "restaurant, eating house, eating place, eatery",
787
+ "763": "revolver, six-gun, six-shooter",
788
+ "764": "rifle",
789
+ "765": "rocking chair, rocker",
790
+ "766": "rotisserie",
791
+ "767": "rubber eraser, rubber, pencil eraser",
792
+ "768": "rugby ball",
793
+ "769": "rule, ruler",
794
+ "770": "running shoe",
795
+ "771": "safe",
796
+ "772": "safety pin",
797
+ "773": "saltshaker, salt shaker",
798
+ "774": "sandal",
799
+ "775": "sarong",
800
+ "776": "sax, saxophone",
801
+ "777": "scabbard",
802
+ "778": "scale, weighing machine",
803
+ "779": "school bus",
804
+ "780": "schooner",
805
+ "781": "scoreboard",
806
+ "782": "screen, CRT screen",
807
+ "783": "screw",
808
+ "784": "screwdriver",
809
+ "785": "seat belt, seatbelt",
810
+ "786": "sewing machine",
811
+ "787": "shield, buckler",
812
+ "788": "shoe shop, shoe-shop, shoe store",
813
+ "789": "shoji",
814
+ "790": "shopping basket",
815
+ "791": "shopping cart",
816
+ "792": "shovel",
817
+ "793": "shower cap",
818
+ "794": "shower curtain",
819
+ "795": "ski",
820
+ "796": "ski mask",
821
+ "797": "sleeping bag",
822
+ "798": "slide rule, slipstick",
823
+ "799": "sliding door",
824
+ "800": "slot, one-armed bandit",
825
+ "801": "snorkel",
826
+ "802": "snowmobile",
827
+ "803": "snowplow, snowplough",
828
+ "804": "soap dispenser",
829
+ "805": "soccer ball",
830
+ "806": "sock",
831
+ "807": "solar dish, solar collector, solar furnace",
832
+ "808": "sombrero",
833
+ "809": "soup bowl",
834
+ "810": "space bar",
835
+ "811": "space heater",
836
+ "812": "space shuttle",
837
+ "813": "spatula",
838
+ "814": "speedboat",
839
+ "815": "spider web, spider's web",
840
+ "816": "spindle",
841
+ "817": "sports car, sport car",
842
+ "818": "spotlight, spot",
843
+ "819": "stage",
844
+ "820": "steam locomotive",
845
+ "821": "steel arch bridge",
846
+ "822": "steel drum",
847
+ "823": "stethoscope",
848
+ "824": "stole",
849
+ "825": "stone wall",
850
+ "826": "stopwatch, stop watch",
851
+ "827": "stove",
852
+ "828": "strainer",
853
+ "829": "streetcar, tram, tramcar, trolley, trolley car",
854
+ "830": "stretcher",
855
+ "831": "studio couch, day bed",
856
+ "832": "stupa, tope",
857
+ "833": "submarine, pigboat, sub, U-boat",
858
+ "834": "suit, suit of clothes",
859
+ "835": "sundial",
860
+ "836": "sunglass",
861
+ "837": "sunglasses, dark glasses, shades",
862
+ "838": "sunscreen, sunblock, sun blocker",
863
+ "839": "suspension bridge",
864
+ "840": "swab, swob, mop",
865
+ "841": "sweatshirt",
866
+ "842": "swimming trunks, bathing trunks",
867
+ "843": "swing",
868
+ "844": "switch, electric switch, electrical switch",
869
+ "845": "syringe",
870
+ "846": "table lamp",
871
+ "847": "tank, army tank, armored combat vehicle, armoured combat vehicle",
872
+ "848": "tape player",
873
+ "849": "teapot",
874
+ "850": "teddy, teddy bear",
875
+ "851": "television, television system",
876
+ "852": "tennis ball",
877
+ "853": "thatch, thatched roof",
878
+ "854": "theater curtain, theatre curtain",
879
+ "855": "thimble",
880
+ "856": "thresher, thrasher, threshing machine",
881
+ "857": "throne",
882
+ "858": "tile roof",
883
+ "859": "toaster",
884
+ "860": "tobacco shop, tobacconist shop, tobacconist",
885
+ "861": "toilet seat",
886
+ "862": "torch",
887
+ "863": "totem pole",
888
+ "864": "tow truck, tow car, wrecker",
889
+ "865": "toyshop",
890
+ "866": "tractor",
891
+ "867": "trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi",
892
+ "868": "tray",
893
+ "869": "trench coat",
894
+ "870": "tricycle, trike, velocipede",
895
+ "871": "trimaran",
896
+ "872": "tripod",
897
+ "873": "triumphal arch",
898
+ "874": "trolleybus, trolley coach, trackless trolley",
899
+ "875": "trombone",
900
+ "876": "tub, vat",
901
+ "877": "turnstile",
902
+ "878": "typewriter keyboard",
903
+ "879": "umbrella",
904
+ "880": "unicycle, monocycle",
905
+ "881": "upright, upright piano",
906
+ "882": "vacuum, vacuum cleaner",
907
+ "883": "vase",
908
+ "884": "vault",
909
+ "885": "velvet",
910
+ "886": "vending machine",
911
+ "887": "vestment",
912
+ "888": "viaduct",
913
+ "889": "violin, fiddle",
914
+ "890": "volleyball",
915
+ "891": "waffle iron",
916
+ "892": "wall clock",
917
+ "893": "wallet, billfold, notecase, pocketbook",
918
+ "894": "wardrobe, closet, press",
919
+ "895": "warplane, military plane",
920
+ "896": "washbasin, handbasin, washbowl, lavabo, wash-hand basin",
921
+ "897": "washer, automatic washer, washing machine",
922
+ "898": "water bottle",
923
+ "899": "water jug",
924
+ "900": "water tower",
925
+ "901": "whiskey jug",
926
+ "902": "whistle",
927
+ "903": "wig",
928
+ "904": "window screen",
929
+ "905": "window shade",
930
+ "906": "Windsor tie",
931
+ "907": "wine bottle",
932
+ "908": "wing",
933
+ "909": "wok",
934
+ "910": "wooden spoon",
935
+ "911": "wool, woolen, woollen",
936
+ "912": "worm fence, snake fence, snake-rail fence, Virginia fence",
937
+ "913": "wreck",
938
+ "914": "yawl",
939
+ "915": "yurt",
940
+ "916": "web site, website, internet site, site",
941
+ "917": "comic book",
942
+ "918": "crossword puzzle, crossword",
943
+ "919": "street sign",
944
+ "920": "traffic light, traffic signal, stoplight",
945
+ "921": "book jacket, dust cover, dust jacket, dust wrapper",
946
+ "922": "menu",
947
+ "923": "plate",
948
+ "924": "guacamole",
949
+ "925": "consomme",
950
+ "926": "hot pot, hotpot",
951
+ "927": "trifle",
952
+ "928": "ice cream, icecream",
953
+ "929": "ice lolly, lolly, lollipop, popsicle",
954
+ "930": "French loaf",
955
+ "931": "bagel, beigel",
956
+ "932": "pretzel",
957
+ "933": "cheeseburger",
958
+ "934": "hotdog, hot dog, red hot",
959
+ "935": "mashed potato",
960
+ "936": "head cabbage",
961
+ "937": "broccoli",
962
+ "938": "cauliflower",
963
+ "939": "zucchini, courgette",
964
+ "940": "spaghetti squash",
965
+ "941": "acorn squash",
966
+ "942": "butternut squash",
967
+ "943": "cucumber, cuke",
968
+ "944": "artichoke, globe artichoke",
969
+ "945": "bell pepper",
970
+ "946": "cardoon",
971
+ "947": "mushroom",
972
+ "948": "Granny Smith",
973
+ "949": "strawberry",
974
+ "950": "orange",
975
+ "951": "lemon",
976
+ "952": "fig",
977
+ "953": "pineapple, ananas",
978
+ "954": "banana",
979
+ "955": "jackfruit, jak, jack",
980
+ "956": "custard apple",
981
+ "957": "pomegranate",
982
+ "958": "hay",
983
+ "959": "carbonara",
984
+ "960": "chocolate sauce, chocolate syrup",
985
+ "961": "dough",
986
+ "962": "meat loaf, meatloaf",
987
+ "963": "pizza, pizza pie",
988
+ "964": "potpie",
989
+ "965": "burrito",
990
+ "966": "red wine",
991
+ "967": "espresso",
992
+ "968": "cup",
993
+ "969": "eggnog",
994
+ "970": "alp",
995
+ "971": "bubble",
996
+ "972": "cliff, drop, drop-off",
997
+ "973": "coral reef",
998
+ "974": "geyser",
999
+ "975": "lakeside, lakeshore",
1000
+ "976": "promontory, headland, head, foreland",
1001
+ "977": "sandbar, sand bar",
1002
+ "978": "seashore, coast, seacoast, sea-coast",
1003
+ "979": "valley, vale",
1004
+ "980": "volcano",
1005
+ "981": "ballplayer, baseball player",
1006
+ "982": "groom, bridegroom",
1007
+ "983": "scuba diver",
1008
+ "984": "rapeseed",
1009
+ "985": "daisy",
1010
+ "986": "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
1011
+ "987": "corn",
1012
+ "988": "acorn",
1013
+ "989": "hip, rose hip, rosehip",
1014
+ "990": "buckeye, horse chestnut, conker",
1015
+ "991": "coral fungus",
1016
+ "992": "agaric",
1017
+ "993": "gyromitra",
1018
+ "994": "stinkhorn, carrion fungus",
1019
+ "995": "earthstar",
1020
+ "996": "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa",
1021
+ "997": "bolete",
1022
+ "998": "ear, spike, capitulum",
1023
+ "999": "toilet tissue, toilet paper, bathroom tissue"
1024
+ },
1025
+ "label2id": {
1026
+ "Afghan hound, Afghan": 160,
1027
+ "African chameleon, Chamaeleo chamaeleon": 47,
1028
+ "African crocodile, Nile crocodile, Crocodylus niloticus": 49,
1029
+ "African elephant, Loxodonta africana": 386,
1030
+ "African grey, African gray, Psittacus erithacus": 87,
1031
+ "African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus": 275,
1032
+ "Airedale, Airedale terrier": 191,
1033
+ "American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier": 180,
1034
+ "American alligator, Alligator mississipiensis": 50,
1035
+ "American black bear, black bear, Ursus americanus, Euarctos americanus": 295,
1036
+ "American chameleon, anole, Anolis carolinensis": 40,
1037
+ "American coot, marsh hen, mud hen, water hen, Fulica americana": 137,
1038
+ "American egret, great white heron, Egretta albus": 132,
1039
+ "American lobster, Northern lobster, Maine lobster, Homarus americanus": 122,
1040
+ "Angora, Angora rabbit": 332,
1041
+ "Appenzeller": 240,
1042
+ "Arabian camel, dromedary, Camelus dromedarius": 354,
1043
+ "Arctic fox, white fox, Alopex lagopus": 279,
1044
+ "Australian terrier": 193,
1045
+ "Band Aid": 419,
1046
+ "Bedlington terrier": 181,
1047
+ "Bernese mountain dog": 239,
1048
+ "Blenheim spaniel": 156,
1049
+ "Border collie": 232,
1050
+ "Border terrier": 182,
1051
+ "Boston bull, Boston terrier": 195,
1052
+ "Bouvier des Flandres, Bouviers des Flandres": 233,
1053
+ "Brabancon griffon": 262,
1054
+ "Brittany spaniel": 215,
1055
+ "CD player": 485,
1056
+ "Cardigan, Cardigan Welsh corgi": 264,
1057
+ "Chesapeake Bay retriever": 209,
1058
+ "Chihuahua": 151,
1059
+ "Christmas stocking": 496,
1060
+ "Crock Pot": 521,
1061
+ "Dandie Dinmont, Dandie Dinmont terrier": 194,
1062
+ "Doberman, Doberman pinscher": 236,
1063
+ "Dungeness crab, Cancer magister": 118,
1064
+ "Dutch oven": 544,
1065
+ "Egyptian cat": 285,
1066
+ "English foxhound": 167,
1067
+ "English setter": 212,
1068
+ "English springer, English springer spaniel": 217,
1069
+ "EntleBucher": 241,
1070
+ "Eskimo dog, husky": 248,
1071
+ "European fire salamander, Salamandra salamandra": 25,
1072
+ "European gallinule, Porphyrio porphyrio": 136,
1073
+ "French bulldog": 245,
1074
+ "French horn, horn": 566,
1075
+ "French loaf": 930,
1076
+ "German shepherd, German shepherd dog, German police dog, alsatian": 235,
1077
+ "German short-haired pointer": 210,
1078
+ "Gila monster, Heloderma suspectum": 45,
1079
+ "Gordon setter": 214,
1080
+ "Granny Smith": 948,
1081
+ "Great Dane": 246,
1082
+ "Great Pyrenees": 257,
1083
+ "Greater Swiss Mountain dog": 238,
1084
+ "Ibizan hound, Ibizan Podenco": 173,
1085
+ "Indian cobra, Naja naja": 63,
1086
+ "Indian elephant, Elephas maximus": 385,
1087
+ "Irish setter, red setter": 213,
1088
+ "Irish terrier": 184,
1089
+ "Irish water spaniel": 221,
1090
+ "Irish wolfhound": 170,
1091
+ "Italian greyhound": 171,
1092
+ "Japanese spaniel": 152,
1093
+ "Kerry blue terrier": 183,
1094
+ "Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis": 48,
1095
+ "Labrador retriever": 208,
1096
+ "Lakeland terrier": 189,
1097
+ "Leonberg": 255,
1098
+ "Lhasa, Lhasa apso": 204,
1099
+ "Loafer": 630,
1100
+ "Madagascar cat, ring-tailed lemur, Lemur catta": 383,
1101
+ "Maltese dog, Maltese terrier, Maltese": 153,
1102
+ "Mexican hairless": 268,
1103
+ "Model T": 661,
1104
+ "Newfoundland, Newfoundland dog": 256,
1105
+ "Norfolk terrier": 185,
1106
+ "Norwegian elkhound, elkhound": 174,
1107
+ "Norwich terrier": 186,
1108
+ "Old English sheepdog, bobtail": 229,
1109
+ "Pekinese, Pekingese, Peke": 154,
1110
+ "Pembroke, Pembroke Welsh corgi": 263,
1111
+ "Persian cat": 283,
1112
+ "Petri dish": 712,
1113
+ "Polaroid camera, Polaroid Land camera": 732,
1114
+ "Pomeranian": 259,
1115
+ "Rhodesian ridgeback": 159,
1116
+ "Rottweiler": 234,
1117
+ "Saint Bernard, St Bernard": 247,
1118
+ "Saluki, gazelle hound": 176,
1119
+ "Samoyed, Samoyede": 258,
1120
+ "Scotch terrier, Scottish terrier, Scottie": 199,
1121
+ "Scottish deerhound, deerhound": 177,
1122
+ "Sealyham terrier, Sealyham": 190,
1123
+ "Shetland sheepdog, Shetland sheep dog, Shetland": 230,
1124
+ "Shih-Tzu": 155,
1125
+ "Siamese cat, Siamese": 284,
1126
+ "Siberian husky": 250,
1127
+ "Staffordshire bullterrier, Staffordshire bull terrier": 179,
1128
+ "Sussex spaniel": 220,
1129
+ "Tibetan mastiff": 244,
1130
+ "Tibetan terrier, chrysanthemum dog": 200,
1131
+ "Walker hound, Walker foxhound": 166,
1132
+ "Weimaraner": 178,
1133
+ "Welsh springer spaniel": 218,
1134
+ "West Highland white terrier": 203,
1135
+ "Windsor tie": 906,
1136
+ "Yorkshire terrier": 187,
1137
+ "abacus": 398,
1138
+ "abaya": 399,
1139
+ "academic gown, academic robe, judge's robe": 400,
1140
+ "accordion, piano accordion, squeeze box": 401,
1141
+ "acorn": 988,
1142
+ "acorn squash": 941,
1143
+ "acoustic guitar": 402,
1144
+ "admiral": 321,
1145
+ "affenpinscher, monkey pinscher, monkey dog": 252,
1146
+ "agama": 42,
1147
+ "agaric": 992,
1148
+ "aircraft carrier, carrier, flattop, attack aircraft carrier": 403,
1149
+ "airliner": 404,
1150
+ "airship, dirigible": 405,
1151
+ "albatross, mollymawk": 146,
1152
+ "alligator lizard": 44,
1153
+ "alp": 970,
1154
+ "altar": 406,
1155
+ "ambulance": 407,
1156
+ "amphibian, amphibious vehicle": 408,
1157
+ "analog clock": 409,
1158
+ "anemone fish": 393,
1159
+ "ant, emmet, pismire": 310,
1160
+ "apiary, bee house": 410,
1161
+ "apron": 411,
1162
+ "armadillo": 363,
1163
+ "artichoke, globe artichoke": 944,
1164
+ "ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin": 412,
1165
+ "assault rifle, assault gun": 413,
1166
+ "axolotl, mud puppy, Ambystoma mexicanum": 29,
1167
+ "baboon": 372,
1168
+ "backpack, back pack, knapsack, packsack, rucksack, haversack": 414,
1169
+ "badger": 362,
1170
+ "bagel, beigel": 931,
1171
+ "bakery, bakeshop, bakehouse": 415,
1172
+ "balance beam, beam": 416,
1173
+ "bald eagle, American eagle, Haliaeetus leucocephalus": 22,
1174
+ "balloon": 417,
1175
+ "ballplayer, baseball player": 981,
1176
+ "ballpoint, ballpoint pen, ballpen, Biro": 418,
1177
+ "banana": 954,
1178
+ "banded gecko": 38,
1179
+ "banjo": 420,
1180
+ "bannister, banister, balustrade, balusters, handrail": 421,
1181
+ "barbell": 422,
1182
+ "barber chair": 423,
1183
+ "barbershop": 424,
1184
+ "barn": 425,
1185
+ "barn spider, Araneus cavaticus": 73,
1186
+ "barometer": 426,
1187
+ "barracouta, snoek": 389,
1188
+ "barrel, cask": 427,
1189
+ "barrow, garden cart, lawn cart, wheelbarrow": 428,
1190
+ "baseball": 429,
1191
+ "basenji": 253,
1192
+ "basketball": 430,
1193
+ "basset, basset hound": 161,
1194
+ "bassinet": 431,
1195
+ "bassoon": 432,
1196
+ "bath towel": 434,
1197
+ "bathing cap, swimming cap": 433,
1198
+ "bathtub, bathing tub, bath, tub": 435,
1199
+ "beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon": 436,
1200
+ "beacon, lighthouse, beacon light, pharos": 437,
1201
+ "beagle": 162,
1202
+ "beaker": 438,
1203
+ "bearskin, busby, shako": 439,
1204
+ "beaver": 337,
1205
+ "bee": 309,
1206
+ "bee eater": 92,
1207
+ "beer bottle": 440,
1208
+ "beer glass": 441,
1209
+ "bell cote, bell cot": 442,
1210
+ "bell pepper": 945,
1211
+ "bib": 443,
1212
+ "bicycle-built-for-two, tandem bicycle, tandem": 444,
1213
+ "bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis": 349,
1214
+ "bikini, two-piece": 445,
1215
+ "binder, ring-binder": 446,
1216
+ "binoculars, field glasses, opera glasses": 447,
1217
+ "birdhouse": 448,
1218
+ "bison": 347,
1219
+ "bittern": 133,
1220
+ "black and gold garden spider, Argiope aurantia": 72,
1221
+ "black grouse": 80,
1222
+ "black stork, Ciconia nigra": 128,
1223
+ "black swan, Cygnus atratus": 100,
1224
+ "black widow, Latrodectus mactans": 75,
1225
+ "black-and-tan coonhound": 165,
1226
+ "black-footed ferret, ferret, Mustela nigripes": 359,
1227
+ "bloodhound, sleuthhound": 163,
1228
+ "bluetick": 164,
1229
+ "boa constrictor, Constrictor constrictor": 61,
1230
+ "boathouse": 449,
1231
+ "bobsled, bobsleigh, bob": 450,
1232
+ "bolete": 997,
1233
+ "bolo tie, bolo, bola tie, bola": 451,
1234
+ "bonnet, poke bonnet": 452,
1235
+ "book jacket, dust cover, dust jacket, dust wrapper": 921,
1236
+ "bookcase": 453,
1237
+ "bookshop, bookstore, bookstall": 454,
1238
+ "borzoi, Russian wolfhound": 169,
1239
+ "bottlecap": 455,
1240
+ "bow": 456,
1241
+ "bow tie, bow-tie, bowtie": 457,
1242
+ "box turtle, box tortoise": 37,
1243
+ "boxer": 242,
1244
+ "brain coral": 109,
1245
+ "brambling, Fringilla montifringilla": 10,
1246
+ "brass, memorial tablet, plaque": 458,
1247
+ "brassiere, bra, bandeau": 459,
1248
+ "breakwater, groin, groyne, mole, bulwark, seawall, jetty": 460,
1249
+ "breastplate, aegis, egis": 461,
1250
+ "briard": 226,
1251
+ "broccoli": 937,
1252
+ "broom": 462,
1253
+ "brown bear, bruin, Ursus arctos": 294,
1254
+ "bubble": 971,
1255
+ "bucket, pail": 463,
1256
+ "buckeye, horse chestnut, conker": 990,
1257
+ "buckle": 464,
1258
+ "bulbul": 16,
1259
+ "bull mastiff": 243,
1260
+ "bullet train, bullet": 466,
1261
+ "bulletproof vest": 465,
1262
+ "bullfrog, Rana catesbeiana": 30,
1263
+ "burrito": 965,
1264
+ "bustard": 138,
1265
+ "butcher shop, meat market": 467,
1266
+ "butternut squash": 942,
1267
+ "cab, hack, taxi, taxicab": 468,
1268
+ "cabbage butterfly": 324,
1269
+ "cairn, cairn terrier": 192,
1270
+ "caldron, cauldron": 469,
1271
+ "can opener, tin opener": 473,
1272
+ "candle, taper, wax light": 470,
1273
+ "cannon": 471,
1274
+ "canoe": 472,
1275
+ "capuchin, ringtail, Cebus capucinus": 378,
1276
+ "car mirror": 475,
1277
+ "car wheel": 479,
1278
+ "carbonara": 959,
1279
+ "cardigan": 474,
1280
+ "cardoon": 946,
1281
+ "carousel, carrousel, merry-go-round, roundabout, whirligig": 476,
1282
+ "carpenter's kit, tool kit": 477,
1283
+ "carton": 478,
1284
+ "cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM": 480,
1285
+ "cassette": 481,
1286
+ "cassette player": 482,
1287
+ "castle": 483,
1288
+ "catamaran": 484,
1289
+ "cauliflower": 938,
1290
+ "cello, violoncello": 486,
1291
+ "cellular telephone, cellular phone, cellphone, cell, mobile phone": 487,
1292
+ "centipede": 79,
1293
+ "chain": 488,
1294
+ "chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour": 490,
1295
+ "chain saw, chainsaw": 491,
1296
+ "chainlink fence": 489,
1297
+ "chambered nautilus, pearly nautilus, nautilus": 117,
1298
+ "cheeseburger": 933,
1299
+ "cheetah, chetah, Acinonyx jubatus": 293,
1300
+ "chest": 492,
1301
+ "chickadee": 19,
1302
+ "chiffonier, commode": 493,
1303
+ "chime, bell, gong": 494,
1304
+ "chimpanzee, chimp, Pan troglodytes": 367,
1305
+ "china cabinet, china closet": 495,
1306
+ "chiton, coat-of-mail shell, sea cradle, polyplacophore": 116,
1307
+ "chocolate sauce, chocolate syrup": 960,
1308
+ "chow, chow chow": 260,
1309
+ "church, church building": 497,
1310
+ "cicada, cicala": 316,
1311
+ "cinema, movie theater, movie theatre, movie house, picture palace": 498,
1312
+ "cleaver, meat cleaver, chopper": 499,
1313
+ "cliff dwelling": 500,
1314
+ "cliff, drop, drop-off": 972,
1315
+ "cloak": 501,
1316
+ "clog, geta, patten, sabot": 502,
1317
+ "clumber, clumber spaniel": 216,
1318
+ "cock": 7,
1319
+ "cocker spaniel, English cocker spaniel, cocker": 219,
1320
+ "cockroach, roach": 314,
1321
+ "cocktail shaker": 503,
1322
+ "coffee mug": 504,
1323
+ "coffeepot": 505,
1324
+ "coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch": 391,
1325
+ "coil, spiral, volute, whorl, helix": 506,
1326
+ "collie": 231,
1327
+ "colobus, colobus monkey": 375,
1328
+ "combination lock": 507,
1329
+ "comic book": 917,
1330
+ "common iguana, iguana, Iguana iguana": 39,
1331
+ "common newt, Triturus vulgaris": 26,
1332
+ "computer keyboard, keypad": 508,
1333
+ "conch": 112,
1334
+ "confectionery, confectionary, candy store": 509,
1335
+ "consomme": 925,
1336
+ "container ship, containership, container vessel": 510,
1337
+ "convertible": 511,
1338
+ "coral fungus": 991,
1339
+ "coral reef": 973,
1340
+ "corkscrew, bottle screw": 512,
1341
+ "corn": 987,
1342
+ "cornet, horn, trumpet, trump": 513,
1343
+ "coucal": 91,
1344
+ "cougar, puma, catamount, mountain lion, painter, panther, Felis concolor": 286,
1345
+ "cowboy boot": 514,
1346
+ "cowboy hat, ten-gallon hat": 515,
1347
+ "coyote, prairie wolf, brush wolf, Canis latrans": 272,
1348
+ "cradle": 516,
1349
+ "crane": 517,
1350
+ "crash helmet": 518,
1351
+ "crate": 519,
1352
+ "crayfish, crawfish, crawdad, crawdaddy": 124,
1353
+ "crib, cot": 520,
1354
+ "cricket": 312,
1355
+ "croquet ball": 522,
1356
+ "crossword puzzle, crossword": 918,
1357
+ "crutch": 523,
1358
+ "cucumber, cuke": 943,
1359
+ "cuirass": 524,
1360
+ "cup": 968,
1361
+ "curly-coated retriever": 206,
1362
+ "custard apple": 956,
1363
+ "daisy": 985,
1364
+ "dalmatian, coach dog, carriage dog": 251,
1365
+ "dam, dike, dyke": 525,
1366
+ "damselfly": 320,
1367
+ "desk": 526,
1368
+ "desktop computer": 527,
1369
+ "dhole, Cuon alpinus": 274,
1370
+ "dial telephone, dial phone": 528,
1371
+ "diamondback, diamondback rattlesnake, Crotalus adamanteus": 67,
1372
+ "diaper, nappy, napkin": 529,
1373
+ "digital clock": 530,
1374
+ "digital watch": 531,
1375
+ "dingo, warrigal, warragal, Canis dingo": 273,
1376
+ "dining table, board": 532,
1377
+ "dishrag, dishcloth": 533,
1378
+ "dishwasher, dish washer, dishwashing machine": 534,
1379
+ "disk brake, disc brake": 535,
1380
+ "dock, dockage, docking facility": 536,
1381
+ "dogsled, dog sled, dog sleigh": 537,
1382
+ "dome": 538,
1383
+ "doormat, welcome mat": 539,
1384
+ "dough": 961,
1385
+ "dowitcher": 142,
1386
+ "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk": 319,
1387
+ "drake": 97,
1388
+ "drilling platform, offshore rig": 540,
1389
+ "drum, membranophone, tympan": 541,
1390
+ "drumstick": 542,
1391
+ "dugong, Dugong dugon": 149,
1392
+ "dumbbell": 543,
1393
+ "dung beetle": 305,
1394
+ "ear, spike, capitulum": 998,
1395
+ "earthstar": 995,
1396
+ "echidna, spiny anteater, anteater": 102,
1397
+ "eel": 390,
1398
+ "eft": 27,
1399
+ "eggnog": 969,
1400
+ "electric fan, blower": 545,
1401
+ "electric guitar": 546,
1402
+ "electric locomotive": 547,
1403
+ "electric ray, crampfish, numbfish, torpedo": 5,
1404
+ "entertainment center": 548,
1405
+ "envelope": 549,
1406
+ "espresso": 967,
1407
+ "espresso maker": 550,
1408
+ "face powder": 551,
1409
+ "feather boa, boa": 552,
1410
+ "fiddler crab": 120,
1411
+ "fig": 952,
1412
+ "file, file cabinet, filing cabinet": 553,
1413
+ "fire engine, fire truck": 555,
1414
+ "fire screen, fireguard": 556,
1415
+ "fireboat": 554,
1416
+ "flagpole, flagstaff": 557,
1417
+ "flamingo": 130,
1418
+ "flat-coated retriever": 205,
1419
+ "flatworm, platyhelminth": 110,
1420
+ "flute, transverse flute": 558,
1421
+ "fly": 308,
1422
+ "folding chair": 559,
1423
+ "football helmet": 560,
1424
+ "forklift": 561,
1425
+ "fountain": 562,
1426
+ "fountain pen": 563,
1427
+ "four-poster": 564,
1428
+ "fox squirrel, eastern fox squirrel, Sciurus niger": 335,
1429
+ "freight car": 565,
1430
+ "frilled lizard, Chlamydosaurus kingi": 43,
1431
+ "frying pan, frypan, skillet": 567,
1432
+ "fur coat": 568,
1433
+ "gar, garfish, garpike, billfish, Lepisosteus osseus": 395,
1434
+ "garbage truck, dustcart": 569,
1435
+ "garden spider, Aranea diademata": 74,
1436
+ "garter snake, grass snake": 57,
1437
+ "gas pump, gasoline pump, petrol pump, island dispenser": 571,
1438
+ "gasmask, respirator, gas helmet": 570,
1439
+ "gazelle": 353,
1440
+ "geyser": 974,
1441
+ "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca": 388,
1442
+ "giant schnauzer": 197,
1443
+ "gibbon, Hylobates lar": 368,
1444
+ "go-kart": 573,
1445
+ "goblet": 572,
1446
+ "golden retriever": 207,
1447
+ "goldfinch, Carduelis carduelis": 11,
1448
+ "goldfish, Carassius auratus": 1,
1449
+ "golf ball": 574,
1450
+ "golfcart, golf cart": 575,
1451
+ "gondola": 576,
1452
+ "gong, tam-tam": 577,
1453
+ "goose": 99,
1454
+ "gorilla, Gorilla gorilla": 366,
1455
+ "gown": 578,
1456
+ "grand piano, grand": 579,
1457
+ "grasshopper, hopper": 311,
1458
+ "great grey owl, great gray owl, Strix nebulosa": 24,
1459
+ "great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias": 2,
1460
+ "green lizard, Lacerta viridis": 46,
1461
+ "green mamba": 64,
1462
+ "green snake, grass snake": 55,
1463
+ "greenhouse, nursery, glasshouse": 580,
1464
+ "grey fox, gray fox, Urocyon cinereoargenteus": 280,
1465
+ "grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus": 147,
1466
+ "grille, radiator grille": 581,
1467
+ "grocery store, grocery, food market, market": 582,
1468
+ "groenendael": 224,
1469
+ "groom, bridegroom": 982,
1470
+ "ground beetle, carabid beetle": 302,
1471
+ "guacamole": 924,
1472
+ "guenon, guenon monkey": 370,
1473
+ "guillotine": 583,
1474
+ "guinea pig, Cavia cobaya": 338,
1475
+ "gyromitra": 993,
1476
+ "hair slide": 584,
1477
+ "hair spray": 585,
1478
+ "half track": 586,
1479
+ "hammer": 587,
1480
+ "hammerhead, hammerhead shark": 4,
1481
+ "hamper": 588,
1482
+ "hamster": 333,
1483
+ "hand blower, blow dryer, blow drier, hair dryer, hair drier": 589,
1484
+ "hand-held computer, hand-held microcomputer": 590,
1485
+ "handkerchief, hankie, hanky, hankey": 591,
1486
+ "hard disc, hard disk, fixed disk": 592,
1487
+ "hare": 331,
1488
+ "harmonica, mouth organ, harp, mouth harp": 593,
1489
+ "harp": 594,
1490
+ "hartebeest": 351,
1491
+ "harvester, reaper": 595,
1492
+ "harvestman, daddy longlegs, Phalangium opilio": 70,
1493
+ "hatchet": 596,
1494
+ "hay": 958,
1495
+ "head cabbage": 936,
1496
+ "hen": 8,
1497
+ "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa": 996,
1498
+ "hermit crab": 125,
1499
+ "hip, rose hip, rosehip": 989,
1500
+ "hippopotamus, hippo, river horse, Hippopotamus amphibius": 344,
1501
+ "hog, pig, grunter, squealer, Sus scrofa": 341,
1502
+ "hognose snake, puff adder, sand viper": 54,
1503
+ "holster": 597,
1504
+ "home theater, home theatre": 598,
1505
+ "honeycomb": 599,
1506
+ "hook, claw": 600,
1507
+ "hoopskirt, crinoline": 601,
1508
+ "horizontal bar, high bar": 602,
1509
+ "hornbill": 93,
1510
+ "horned viper, cerastes, sand viper, horned asp, Cerastes cornutus": 66,
1511
+ "horse cart, horse-cart": 603,
1512
+ "hot pot, hotpot": 926,
1513
+ "hotdog, hot dog, red hot": 934,
1514
+ "hourglass": 604,
1515
+ "house finch, linnet, Carpodacus mexicanus": 12,
1516
+ "howler monkey, howler": 379,
1517
+ "hummingbird": 94,
1518
+ "hyena, hyaena": 276,
1519
+ "iPod": 605,
1520
+ "ibex, Capra ibex": 350,
1521
+ "ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus": 296,
1522
+ "ice cream, icecream": 928,
1523
+ "ice lolly, lolly, lollipop, popsicle": 929,
1524
+ "impala, Aepyceros melampus": 352,
1525
+ "indigo bunting, indigo finch, indigo bird, Passerina cyanea": 14,
1526
+ "indri, indris, Indri indri, Indri brevicaudatus": 384,
1527
+ "iron, smoothing iron": 606,
1528
+ "isopod": 126,
1529
+ "jacamar": 95,
1530
+ "jack-o'-lantern": 607,
1531
+ "jackfruit, jak, jack": 955,
1532
+ "jaguar, panther, Panthera onca, Felis onca": 290,
1533
+ "jay": 17,
1534
+ "jean, blue jean, denim": 608,
1535
+ "jeep, landrover": 609,
1536
+ "jellyfish": 107,
1537
+ "jersey, T-shirt, tee shirt": 610,
1538
+ "jigsaw puzzle": 611,
1539
+ "jinrikisha, ricksha, rickshaw": 612,
1540
+ "joystick": 613,
1541
+ "junco, snowbird": 13,
1542
+ "keeshond": 261,
1543
+ "kelpie": 227,
1544
+ "killer whale, killer, orca, grampus, sea wolf, Orcinus orca": 148,
1545
+ "kimono": 614,
1546
+ "king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica": 121,
1547
+ "king penguin, Aptenodytes patagonica": 145,
1548
+ "king snake, kingsnake": 56,
1549
+ "kit fox, Vulpes macrotis": 278,
1550
+ "kite": 21,
1551
+ "knee pad": 615,
1552
+ "knot": 616,
1553
+ "koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus": 105,
1554
+ "komondor": 228,
1555
+ "kuvasz": 222,
1556
+ "lab coat, laboratory coat": 617,
1557
+ "lacewing, lacewing fly": 318,
1558
+ "ladle": 618,
1559
+ "ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle": 301,
1560
+ "lakeside, lakeshore": 975,
1561
+ "lampshade, lamp shade": 619,
1562
+ "langur": 374,
1563
+ "laptop, laptop computer": 620,
1564
+ "lawn mower, mower": 621,
1565
+ "leaf beetle, chrysomelid": 304,
1566
+ "leafhopper": 317,
1567
+ "leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea": 34,
1568
+ "lemon": 951,
1569
+ "lens cap, lens cover": 622,
1570
+ "leopard, Panthera pardus": 288,
1571
+ "lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens": 387,
1572
+ "letter opener, paper knife, paperknife": 623,
1573
+ "library": 624,
1574
+ "lifeboat": 625,
1575
+ "lighter, light, igniter, ignitor": 626,
1576
+ "limousine, limo": 627,
1577
+ "limpkin, Aramus pictus": 135,
1578
+ "liner, ocean liner": 628,
1579
+ "lion, king of beasts, Panthera leo": 291,
1580
+ "lionfish": 396,
1581
+ "lipstick, lip rouge": 629,
1582
+ "little blue heron, Egretta caerulea": 131,
1583
+ "llama": 355,
1584
+ "loggerhead, loggerhead turtle, Caretta caretta": 33,
1585
+ "long-horned beetle, longicorn, longicorn beetle": 303,
1586
+ "lorikeet": 90,
1587
+ "lotion": 631,
1588
+ "loudspeaker, speaker, speaker unit, loudspeaker system, speaker system": 632,
1589
+ "loupe, jeweler's loupe": 633,
1590
+ "lumbermill, sawmill": 634,
1591
+ "lycaenid, lycaenid butterfly": 326,
1592
+ "lynx, catamount": 287,
1593
+ "macaque": 373,
1594
+ "macaw": 88,
1595
+ "magnetic compass": 635,
1596
+ "magpie": 18,
1597
+ "mailbag, postbag": 636,
1598
+ "mailbox, letter box": 637,
1599
+ "maillot": 638,
1600
+ "maillot, tank suit": 639,
1601
+ "malamute, malemute, Alaskan malamute": 249,
1602
+ "malinois": 225,
1603
+ "manhole cover": 640,
1604
+ "mantis, mantid": 315,
1605
+ "maraca": 641,
1606
+ "marimba, xylophone": 642,
1607
+ "marmoset": 377,
1608
+ "marmot": 336,
1609
+ "mashed potato": 935,
1610
+ "mask": 643,
1611
+ "matchstick": 644,
1612
+ "maypole": 645,
1613
+ "maze, labyrinth": 646,
1614
+ "measuring cup": 647,
1615
+ "meat loaf, meatloaf": 962,
1616
+ "medicine chest, medicine cabinet": 648,
1617
+ "meerkat, mierkat": 299,
1618
+ "megalith, megalithic structure": 649,
1619
+ "menu": 922,
1620
+ "microphone, mike": 650,
1621
+ "microwave, microwave oven": 651,
1622
+ "military uniform": 652,
1623
+ "milk can": 653,
1624
+ "miniature pinscher": 237,
1625
+ "miniature poodle": 266,
1626
+ "miniature schnauzer": 196,
1627
+ "minibus": 654,
1628
+ "miniskirt, mini": 655,
1629
+ "minivan": 656,
1630
+ "mink": 357,
1631
+ "missile": 657,
1632
+ "mitten": 658,
1633
+ "mixing bowl": 659,
1634
+ "mobile home, manufactured home": 660,
1635
+ "modem": 662,
1636
+ "monarch, monarch butterfly, milkweed butterfly, Danaus plexippus": 323,
1637
+ "monastery": 663,
1638
+ "mongoose": 298,
1639
+ "monitor": 664,
1640
+ "moped": 665,
1641
+ "mortar": 666,
1642
+ "mortarboard": 667,
1643
+ "mosque": 668,
1644
+ "mosquito net": 669,
1645
+ "motor scooter, scooter": 670,
1646
+ "mountain bike, all-terrain bike, off-roader": 671,
1647
+ "mountain tent": 672,
1648
+ "mouse, computer mouse": 673,
1649
+ "mousetrap": 674,
1650
+ "moving van": 675,
1651
+ "mud turtle": 35,
1652
+ "mushroom": 947,
1653
+ "muzzle": 676,
1654
+ "nail": 677,
1655
+ "neck brace": 678,
1656
+ "necklace": 679,
1657
+ "nematode, nematode worm, roundworm": 111,
1658
+ "night snake, Hypsiglena torquata": 60,
1659
+ "nipple": 680,
1660
+ "notebook, notebook computer": 681,
1661
+ "obelisk": 682,
1662
+ "oboe, hautboy, hautbois": 683,
1663
+ "ocarina, sweet potato": 684,
1664
+ "odometer, hodometer, mileometer, milometer": 685,
1665
+ "oil filter": 686,
1666
+ "orange": 950,
1667
+ "orangutan, orang, orangutang, Pongo pygmaeus": 365,
1668
+ "organ, pipe organ": 687,
1669
+ "oscilloscope, scope, cathode-ray oscilloscope, CRO": 688,
1670
+ "ostrich, Struthio camelus": 9,
1671
+ "otter": 360,
1672
+ "otterhound, otter hound": 175,
1673
+ "overskirt": 689,
1674
+ "ox": 345,
1675
+ "oxcart": 690,
1676
+ "oxygen mask": 691,
1677
+ "oystercatcher, oyster catcher": 143,
1678
+ "packet": 692,
1679
+ "paddle, boat paddle": 693,
1680
+ "paddlewheel, paddle wheel": 694,
1681
+ "padlock": 695,
1682
+ "paintbrush": 696,
1683
+ "pajama, pyjama, pj's, jammies": 697,
1684
+ "palace": 698,
1685
+ "panpipe, pandean pipe, syrinx": 699,
1686
+ "paper towel": 700,
1687
+ "papillon": 157,
1688
+ "parachute, chute": 701,
1689
+ "parallel bars, bars": 702,
1690
+ "park bench": 703,
1691
+ "parking meter": 704,
1692
+ "partridge": 86,
1693
+ "passenger car, coach, carriage": 705,
1694
+ "patas, hussar monkey, Erythrocebus patas": 371,
1695
+ "patio, terrace": 706,
1696
+ "pay-phone, pay-station": 707,
1697
+ "peacock": 84,
1698
+ "pedestal, plinth, footstall": 708,
1699
+ "pelican": 144,
1700
+ "pencil box, pencil case": 709,
1701
+ "pencil sharpener": 710,
1702
+ "perfume, essence": 711,
1703
+ "photocopier": 713,
1704
+ "pick, plectrum, plectron": 714,
1705
+ "pickelhaube": 715,
1706
+ "picket fence, paling": 716,
1707
+ "pickup, pickup truck": 717,
1708
+ "pier": 718,
1709
+ "piggy bank, penny bank": 719,
1710
+ "pill bottle": 720,
1711
+ "pillow": 721,
1712
+ "pineapple, ananas": 953,
1713
+ "ping-pong ball": 722,
1714
+ "pinwheel": 723,
1715
+ "pirate, pirate ship": 724,
1716
+ "pitcher, ewer": 725,
1717
+ "pizza, pizza pie": 963,
1718
+ "plane, carpenter's plane, woodworking plane": 726,
1719
+ "planetarium": 727,
1720
+ "plastic bag": 728,
1721
+ "plate": 923,
1722
+ "plate rack": 729,
1723
+ "platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus": 103,
1724
+ "plow, plough": 730,
1725
+ "plunger, plumber's helper": 731,
1726
+ "pole": 733,
1727
+ "polecat, fitch, foulmart, foumart, Mustela putorius": 358,
1728
+ "police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria": 734,
1729
+ "pomegranate": 957,
1730
+ "poncho": 735,
1731
+ "pool table, billiard table, snooker table": 736,
1732
+ "pop bottle, soda bottle": 737,
1733
+ "porcupine, hedgehog": 334,
1734
+ "pot, flowerpot": 738,
1735
+ "potpie": 964,
1736
+ "potter's wheel": 739,
1737
+ "power drill": 740,
1738
+ "prairie chicken, prairie grouse, prairie fowl": 83,
1739
+ "prayer rug, prayer mat": 741,
1740
+ "pretzel": 932,
1741
+ "printer": 742,
1742
+ "prison, prison house": 743,
1743
+ "proboscis monkey, Nasalis larvatus": 376,
1744
+ "projectile, missile": 744,
1745
+ "projector": 745,
1746
+ "promontory, headland, head, foreland": 976,
1747
+ "ptarmigan": 81,
1748
+ "puck, hockey puck": 746,
1749
+ "puffer, pufferfish, blowfish, globefish": 397,
1750
+ "pug, pug-dog": 254,
1751
+ "punching bag, punch bag, punching ball, punchball": 747,
1752
+ "purse": 748,
1753
+ "quail": 85,
1754
+ "quill, quill pen": 749,
1755
+ "quilt, comforter, comfort, puff": 750,
1756
+ "racer, race car, racing car": 751,
1757
+ "racket, racquet": 752,
1758
+ "radiator": 753,
1759
+ "radio telescope, radio reflector": 755,
1760
+ "radio, wireless": 754,
1761
+ "rain barrel": 756,
1762
+ "ram, tup": 348,
1763
+ "rapeseed": 984,
1764
+ "recreational vehicle, RV, R.V.": 757,
1765
+ "red fox, Vulpes vulpes": 277,
1766
+ "red wine": 966,
1767
+ "red wolf, maned wolf, Canis rufus, Canis niger": 271,
1768
+ "red-backed sandpiper, dunlin, Erolia alpina": 140,
1769
+ "red-breasted merganser, Mergus serrator": 98,
1770
+ "redbone": 168,
1771
+ "redshank, Tringa totanus": 141,
1772
+ "reel": 758,
1773
+ "reflex camera": 759,
1774
+ "refrigerator, icebox": 760,
1775
+ "remote control, remote": 761,
1776
+ "restaurant, eating house, eating place, eatery": 762,
1777
+ "revolver, six-gun, six-shooter": 763,
1778
+ "rhinoceros beetle": 306,
1779
+ "rifle": 764,
1780
+ "ringlet, ringlet butterfly": 322,
1781
+ "ringneck snake, ring-necked snake, ring snake": 53,
1782
+ "robin, American robin, Turdus migratorius": 15,
1783
+ "rock beauty, Holocanthus tricolor": 392,
1784
+ "rock crab, Cancer irroratus": 119,
1785
+ "rock python, rock snake, Python sebae": 62,
1786
+ "rocking chair, rocker": 765,
1787
+ "rotisserie": 766,
1788
+ "rubber eraser, rubber, pencil eraser": 767,
1789
+ "ruddy turnstone, Arenaria interpres": 139,
1790
+ "ruffed grouse, partridge, Bonasa umbellus": 82,
1791
+ "rugby ball": 768,
1792
+ "rule, ruler": 769,
1793
+ "running shoe": 770,
1794
+ "safe": 771,
1795
+ "safety pin": 772,
1796
+ "saltshaker, salt shaker": 773,
1797
+ "sandal": 774,
1798
+ "sandbar, sand bar": 977,
1799
+ "sarong": 775,
1800
+ "sax, saxophone": 776,
1801
+ "scabbard": 777,
1802
+ "scale, weighing machine": 778,
1803
+ "schipperke": 223,
1804
+ "school bus": 779,
1805
+ "schooner": 780,
1806
+ "scoreboard": 781,
1807
+ "scorpion": 71,
1808
+ "screen, CRT screen": 782,
1809
+ "screw": 783,
1810
+ "screwdriver": 784,
1811
+ "scuba diver": 983,
1812
+ "sea anemone, anemone": 108,
1813
+ "sea cucumber, holothurian": 329,
1814
+ "sea lion": 150,
1815
+ "sea slug, nudibranch": 115,
1816
+ "sea snake": 65,
1817
+ "sea urchin": 328,
1818
+ "seashore, coast, seacoast, sea-coast": 978,
1819
+ "seat belt, seatbelt": 785,
1820
+ "sewing machine": 786,
1821
+ "shield, buckler": 787,
1822
+ "shoe shop, shoe-shop, shoe store": 788,
1823
+ "shoji": 789,
1824
+ "shopping basket": 790,
1825
+ "shopping cart": 791,
1826
+ "shovel": 792,
1827
+ "shower cap": 793,
1828
+ "shower curtain": 794,
1829
+ "siamang, Hylobates syndactylus, Symphalangus syndactylus": 369,
1830
+ "sidewinder, horned rattlesnake, Crotalus cerastes": 68,
1831
+ "silky terrier, Sydney silky": 201,
1832
+ "ski": 795,
1833
+ "ski mask": 796,
1834
+ "skunk, polecat, wood pussy": 361,
1835
+ "sleeping bag": 797,
1836
+ "slide rule, slipstick": 798,
1837
+ "sliding door": 799,
1838
+ "slot, one-armed bandit": 800,
1839
+ "sloth bear, Melursus ursinus, Ursus ursinus": 297,
1840
+ "slug": 114,
1841
+ "snail": 113,
1842
+ "snorkel": 801,
1843
+ "snow leopard, ounce, Panthera uncia": 289,
1844
+ "snowmobile": 802,
1845
+ "snowplow, snowplough": 803,
1846
+ "soap dispenser": 804,
1847
+ "soccer ball": 805,
1848
+ "sock": 806,
1849
+ "soft-coated wheaten terrier": 202,
1850
+ "solar dish, solar collector, solar furnace": 807,
1851
+ "sombrero": 808,
1852
+ "sorrel": 339,
1853
+ "soup bowl": 809,
1854
+ "space bar": 810,
1855
+ "space heater": 811,
1856
+ "space shuttle": 812,
1857
+ "spaghetti squash": 940,
1858
+ "spatula": 813,
1859
+ "speedboat": 814,
1860
+ "spider monkey, Ateles geoffroyi": 381,
1861
+ "spider web, spider's web": 815,
1862
+ "spindle": 816,
1863
+ "spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish": 123,
1864
+ "spoonbill": 129,
1865
+ "sports car, sport car": 817,
1866
+ "spotlight, spot": 818,
1867
+ "spotted salamander, Ambystoma maculatum": 28,
1868
+ "squirrel monkey, Saimiri sciureus": 382,
1869
+ "stage": 819,
1870
+ "standard poodle": 267,
1871
+ "standard schnauzer": 198,
1872
+ "starfish, sea star": 327,
1873
+ "steam locomotive": 820,
1874
+ "steel arch bridge": 821,
1875
+ "steel drum": 822,
1876
+ "stethoscope": 823,
1877
+ "stingray": 6,
1878
+ "stinkhorn, carrion fungus": 994,
1879
+ "stole": 824,
1880
+ "stone wall": 825,
1881
+ "stopwatch, stop watch": 826,
1882
+ "stove": 827,
1883
+ "strainer": 828,
1884
+ "strawberry": 949,
1885
+ "street sign": 919,
1886
+ "streetcar, tram, tramcar, trolley, trolley car": 829,
1887
+ "stretcher": 830,
1888
+ "studio couch, day bed": 831,
1889
+ "stupa, tope": 832,
1890
+ "sturgeon": 394,
1891
+ "submarine, pigboat, sub, U-boat": 833,
1892
+ "suit, suit of clothes": 834,
1893
+ "sulphur butterfly, sulfur butterfly": 325,
1894
+ "sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita": 89,
1895
+ "sundial": 835,
1896
+ "sunglass": 836,
1897
+ "sunglasses, dark glasses, shades": 837,
1898
+ "sunscreen, sunblock, sun blocker": 838,
1899
+ "suspension bridge": 839,
1900
+ "swab, swob, mop": 840,
1901
+ "sweatshirt": 841,
1902
+ "swimming trunks, bathing trunks": 842,
1903
+ "swing": 843,
1904
+ "switch, electric switch, electrical switch": 844,
1905
+ "syringe": 845,
1906
+ "tabby, tabby cat": 281,
1907
+ "table lamp": 846,
1908
+ "tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui": 32,
1909
+ "tank, army tank, armored combat vehicle, armoured combat vehicle": 847,
1910
+ "tape player": 848,
1911
+ "tarantula": 76,
1912
+ "teapot": 849,
1913
+ "teddy, teddy bear": 850,
1914
+ "television, television system": 851,
1915
+ "tench, Tinca tinca": 0,
1916
+ "tennis ball": 852,
1917
+ "terrapin": 36,
1918
+ "thatch, thatched roof": 853,
1919
+ "theater curtain, theatre curtain": 854,
1920
+ "thimble": 855,
1921
+ "three-toed sloth, ai, Bradypus tridactylus": 364,
1922
+ "thresher, thrasher, threshing machine": 856,
1923
+ "throne": 857,
1924
+ "thunder snake, worm snake, Carphophis amoenus": 52,
1925
+ "tick": 78,
1926
+ "tiger beetle": 300,
1927
+ "tiger cat": 282,
1928
+ "tiger shark, Galeocerdo cuvieri": 3,
1929
+ "tiger, Panthera tigris": 292,
1930
+ "tile roof": 858,
1931
+ "timber wolf, grey wolf, gray wolf, Canis lupus": 269,
1932
+ "titi, titi monkey": 380,
1933
+ "toaster": 859,
1934
+ "tobacco shop, tobacconist shop, tobacconist": 860,
1935
+ "toilet seat": 861,
1936
+ "toilet tissue, toilet paper, bathroom tissue": 999,
1937
+ "torch": 862,
1938
+ "totem pole": 863,
1939
+ "toucan": 96,
1940
+ "tow truck, tow car, wrecker": 864,
1941
+ "toy poodle": 265,
1942
+ "toy terrier": 158,
1943
+ "toyshop": 865,
1944
+ "tractor": 866,
1945
+ "traffic light, traffic signal, stoplight": 920,
1946
+ "trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi": 867,
1947
+ "tray": 868,
1948
+ "tree frog, tree-frog": 31,
1949
+ "trench coat": 869,
1950
+ "triceratops": 51,
1951
+ "tricycle, trike, velocipede": 870,
1952
+ "trifle": 927,
1953
+ "trilobite": 69,
1954
+ "trimaran": 871,
1955
+ "tripod": 872,
1956
+ "triumphal arch": 873,
1957
+ "trolleybus, trolley coach, trackless trolley": 874,
1958
+ "trombone": 875,
1959
+ "tub, vat": 876,
1960
+ "turnstile": 877,
1961
+ "tusker": 101,
1962
+ "typewriter keyboard": 878,
1963
+ "umbrella": 879,
1964
+ "unicycle, monocycle": 880,
1965
+ "upright, upright piano": 881,
1966
+ "vacuum, vacuum cleaner": 882,
1967
+ "valley, vale": 979,
1968
+ "vase": 883,
1969
+ "vault": 884,
1970
+ "velvet": 885,
1971
+ "vending machine": 886,
1972
+ "vestment": 887,
1973
+ "viaduct": 888,
1974
+ "vine snake": 59,
1975
+ "violin, fiddle": 889,
1976
+ "vizsla, Hungarian pointer": 211,
1977
+ "volcano": 980,
1978
+ "volleyball": 890,
1979
+ "vulture": 23,
1980
+ "waffle iron": 891,
1981
+ "walking stick, walkingstick, stick insect": 313,
1982
+ "wall clock": 892,
1983
+ "wallaby, brush kangaroo": 104,
1984
+ "wallet, billfold, notecase, pocketbook": 893,
1985
+ "wardrobe, closet, press": 894,
1986
+ "warplane, military plane": 895,
1987
+ "warthog": 343,
1988
+ "washbasin, handbasin, washbowl, lavabo, wash-hand basin": 896,
1989
+ "washer, automatic washer, washing machine": 897,
1990
+ "water bottle": 898,
1991
+ "water buffalo, water ox, Asiatic buffalo, Bubalus bubalis": 346,
1992
+ "water jug": 899,
1993
+ "water ouzel, dipper": 20,
1994
+ "water snake": 58,
1995
+ "water tower": 900,
1996
+ "weasel": 356,
1997
+ "web site, website, internet site, site": 916,
1998
+ "weevil": 307,
1999
+ "whippet": 172,
2000
+ "whiptail, whiptail lizard": 41,
2001
+ "whiskey jug": 901,
2002
+ "whistle": 902,
2003
+ "white stork, Ciconia ciconia": 127,
2004
+ "white wolf, Arctic wolf, Canis lupus tundrarum": 270,
2005
+ "wig": 903,
2006
+ "wild boar, boar, Sus scrofa": 342,
2007
+ "window screen": 904,
2008
+ "window shade": 905,
2009
+ "wine bottle": 907,
2010
+ "wing": 908,
2011
+ "wire-haired fox terrier": 188,
2012
+ "wok": 909,
2013
+ "wolf spider, hunting spider": 77,
2014
+ "wombat": 106,
2015
+ "wood rabbit, cottontail, cottontail rabbit": 330,
2016
+ "wooden spoon": 910,
2017
+ "wool, woolen, woollen": 911,
2018
+ "worm fence, snake fence, snake-rail fence, Virginia fence": 912,
2019
+ "wreck": 913,
2020
+ "yawl": 914,
2021
+ "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum": 986,
2022
+ "yurt": 915,
2023
+ "zebra": 340,
2024
+ "zucchini, courgette": 939
2025
+ },
2026
+ "max_length": 1024,
2027
+ "model_type": "vmamba",
2028
+ "num_channels": 3,
2029
+ "num_classes": 1000,
2030
+ "num_mel_bins": 128,
2031
+ "patch_size": 4,
2032
+ "torch_dtype": "float32",
2033
+ "transformers_version": "4.50.0.dev0",
2034
+ "use_checkpoint": false
2035
+ }
configuration_vmamba.py ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # @Author : Saurabhchand Bhati
3
+ # @Affiliation : Massachusetts Institute of Technology
4
+ """VMamba: Visual State Space Model configuration"""
5
+
6
+ from typing import Any, Dict
7
+
8
+ from transformers.configuration_utils import PretrainedConfig
9
+ from transformers.utils import logging
10
+
11
+
12
+ logger = logging.get_logger(__name__)
13
+
14
+ class VMambaConfig(PretrainedConfig):
15
+ r"""
16
+ This is the configuration class to store the configuration of a [`VMambaModel`]. It is used to instantiate a VMamba
17
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
18
+ defaults will yield a similar configuration to that of the
19
+ [VMamba-T](https://github.com/MzeroMiko/VMamba/) architecture.
20
+
21
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
22
+ documentation from [`PretrainedConfig`] for more information.
23
+
24
+ Args:
25
+ num_channels (`int`, *optional*, defaults to 3):
26
+ Number of input channels.
27
+ patch_size (`int`, *optional*, defaults to 4):
28
+ The size (resolution) of each patch.
29
+ embed_dim (`int`, *optional*, defaults to 96):
30
+ Dimensionality of patch embedding.
31
+ depths (`list(int)`, *optional*, defaults to `[2, 2, 8, 2]`):
32
+ Depth of each layer in the VMamba encoder.
33
+ dims (`list(int)`, *optional*, defaults to `[96, 192, 384, 768]`):
34
+ Dimensionality of each layer in the VMamba encoder.
35
+ drop_path_rate (`float`, *optional*, defaults to 0.2):
36
+ Stochastic depth rate.
37
+ num_classes (`int`, *optional*, defaults to 1000):
38
+ Number of classes for classification.
39
+ max_length (`int`, *optional*, defaults to 1024):
40
+ Temporal dimension of the spectrograms.
41
+ num_mel_bins (`int`, *optional*, defaults to 128):
42
+ Frequency dimension of the spectrograms (number of Mel-frequency bins).
43
+ use_checkpoint (`bool`, *optional*, defaults to `False`):
44
+ Whether to use checkpointing to save memory.
45
+
46
+ Example:
47
+
48
+ ```python
49
+ >>> from transformers import VMambaConfig, VMambaModel
50
+
51
+ >>> # Initializing a VMamba tiny style configuration
52
+ >>> configuration = VMambaConfig()
53
+
54
+ >>> # Initializing a model (with random weights) from the VMamba tiny style configuration
55
+ >>> model = VMambaModel(configuration)
56
+
57
+ >>> # Accessing the model configuration
58
+ >>> configuration = model.config
59
+ ```"""
60
+
61
+ model_type = "vmamba"
62
+
63
+ def __init__(
64
+ self,
65
+ num_channels: int = 3,
66
+ patch_size: int = 4,
67
+ embed_dim: int = 96,
68
+ depths: list = [2, 2, 8, 2],
69
+ dims: list =[96, 192, 384, 768],
70
+ drop_path_rate: float = 0.2,
71
+ num_classes: int = 1000,
72
+ max_length: int = 1024,
73
+ num_mel_bins: int = 128,
74
+ use_checkpoint: bool = False,
75
+ **kwargs,
76
+ ):
77
+ super().__init__(**kwargs)
78
+
79
+ self.num_channels = num_channels
80
+ self.patch_size = patch_size
81
+ self.embed_dim = embed_dim
82
+ self.depths = depths
83
+ self.dims = dims
84
+ self.drop_path_rate = drop_path_rate
85
+ self.num_classes = num_classes
86
+ self.max_length = max_length
87
+ self.num_mel_bins = num_mel_bins
88
+ self.use_checkpoint = use_checkpoint
89
+
90
+ # Overwritten from the parent class: VMamba is not compatible with `generate`, but has a config parameter sharing the
91
+ # same name (`max_length`). Sharing the same name triggers checks regarding the config -> generation_config
92
+ # generative parameters deprecation cycle, overwriting this function prevents this from happening.
93
+ def _get_non_default_generation_parameters(self) -> Dict[str, Any]:
94
+ return {}
95
+
96
+
97
+ __all__ = ["VMambaConfig"]
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fd6fb885ddfa33cb259d9c78c72565408d9735959ec14bc72a6e78819f7fa83
3
+ size 196105016
modeling_vmamba.py ADDED
@@ -0,0 +1,1220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # @Author : Saurabhchand Bhati
3
+ # @Affiliation : Massachusetts Institute of Technology
4
+ # VMamba backbone is from https://github.com/MzeroMiko/VMamba/blob/main/vmamba.py
5
+ # VMambaLayer, VMambaModel, VMambaForImageClassification are implemnted based on VMamba
6
+ # SS2Dv0, SS2Dv1, SS2S are merged into one class and initiliazation is limited to v05_noz,
7
+ # patch embeddings is limited to v2 and downsample is limited to v3.
8
+
9
+ # MIT License
10
+
11
+ # Copyright (c) 2024 MzeroMiko, Saurabhchand Bhati
12
+
13
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
14
+ # of this software and associated documentation files (the "Software"), to deal
15
+ # in the Software without restriction, including without limitation the rights
16
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
17
+ # copies of the Software, and to permit persons to whom the Software is
18
+ # furnished to do so, subject to the following conditions:
19
+
20
+ # The above copyright notice and this permission notice shall be included in all
21
+ # copies or substantial portions of the Software.
22
+
23
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
24
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
25
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
26
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
27
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
28
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
29
+ # SOFTWARE.
30
+
31
+ """VMamba: Visual State Space Model configuration model"""
32
+
33
+ import math
34
+ import torch
35
+ import warnings
36
+ import torch.nn as nn
37
+ import torch.nn.functional as F
38
+ import torch.utils.checkpoint as checkpoint
39
+ from timm.models.layers import DropPath, trunc_normal_
40
+ from functools import partial
41
+ from typing import Optional, Callable, Any, Union
42
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss
43
+ from transformers.modeling_outputs import ImageClassifierOutput
44
+
45
+ from transformers.utils import logging
46
+ from transformers.modeling_utils import PreTrainedModel
47
+
48
+ from .configuration_vmamba import VMambaConfig
49
+
50
+ logger = logging.get_logger(__name__)
51
+
52
+ # General docstring
53
+ _CONFIG_FOR_DOC = "VMambaConfig"
54
+
55
+ WITH_TRITON = True
56
+ # WITH_TRITON = False
57
+ try:
58
+ import triton
59
+ import triton.language as tl
60
+ except:
61
+ WITH_TRITON = False
62
+ warnings.warn("Triton not installed, fall back to pytorch implements.")
63
+
64
+ # to make sure cached_property can be loaded for triton
65
+ if WITH_TRITON:
66
+ try:
67
+ from functools import cached_property
68
+ except:
69
+ warnings.warn("if you are using py37, add this line to functools.py: "
70
+ "cached_property = lambda func: property(lru_cache()(func))")
71
+
72
+ # torch implementation ========================================
73
+ def cross_scan_fwd(x: torch.Tensor, in_channel_first=True, out_channel_first=True, scans=0):
74
+ if in_channel_first:
75
+ B, C, H, W = x.shape
76
+ if scans == 0:
77
+ y = x.new_empty((B, 4, C, H * W))
78
+ y[:, 0, :, :] = x.flatten(2, 3)
79
+ y[:, 1, :, :] = x.transpose(dim0=2, dim1=3).flatten(2, 3)
80
+ y[:, 2:4, :, :] = torch.flip(y[:, 0:2, :, :], dims=[-1])
81
+ elif scans == 1:
82
+ y = x.view(B, 1, C, H * W).repeat(1, 4, 1, 1)
83
+ elif scans == 2:
84
+ y = x.view(B, 1, C, H * W).repeat(1, 2, 1, 1)
85
+ y = torch.cat([y, y.flip(dims=[-1])], dim=1)
86
+ elif scans == 3:
87
+ y = x.new_empty((B, 4, C, H * W))
88
+ y[:, 0, :, :] = x.flatten(2, 3)
89
+ y[:, 1, :, :] = torch.rot90(x, 1, dims=(2, 3)).flatten(2, 3)
90
+ y[:, 2, :, :] = torch.rot90(x, 2, dims=(2, 3)).flatten(2, 3)
91
+ y[:, 3, :, :] = torch.rot90(x, 3, dims=(2, 3)).flatten(2, 3)
92
+ else:
93
+ B, H, W, C = x.shape
94
+ if scans == 0:
95
+ y = x.new_empty((B, H * W, 4, C))
96
+ y[:, :, 0, :] = x.flatten(1, 2)
97
+ y[:, :, 1, :] = x.transpose(dim0=1, dim1=2).flatten(1, 2)
98
+ y[:, :, 2:4, :] = torch.flip(y[:, :, 0:2, :], dims=[1])
99
+ elif scans == 1:
100
+ y = x.view(B, H * W, 1, C).repeat(1, 1, 4, 1)
101
+ elif scans == 2:
102
+ y = x.view(B, H * W, 1, C).repeat(1, 1, 2, 1)
103
+ y = torch.cat([y, y.flip(dims=[1])], dim=2)
104
+ elif scans == 3:
105
+ y = x.new_empty((B, H * W, 4, C))
106
+ y[:, :, 0, :] = x.flatten(1, 2)
107
+ y[:, :, 1, :] = torch.rot90(x, 1, dims=(1, 2)).flatten(1, 2)
108
+ y[:, :, 2, :] = torch.rot90(x, 2, dims=(1, 2)).flatten(1, 2)
109
+ y[:, :, 3, :] = torch.rot90(x, 3, dims=(1, 2)).flatten(1, 2)
110
+
111
+ if in_channel_first and (not out_channel_first):
112
+ y = y.permute(0, 3, 1, 2).contiguous()
113
+ elif (not in_channel_first) and out_channel_first:
114
+ y = y.permute(0, 2, 3, 1).contiguous()
115
+
116
+ return y
117
+
118
+
119
+ def cross_merge_fwd(y: torch.Tensor, in_channel_first=True, out_channel_first=True, scans=0):
120
+ if out_channel_first:
121
+ B, K, D, H, W = y.shape
122
+ y = y.view(B, K, D, -1)
123
+ if scans == 0:
124
+ y = y[:, 0:2] + y[:, 2:4].flip(dims=[-1]).view(B, 2, D, -1)
125
+ y = y[:, 0] + y[:, 1].view(B, -1, W, H).transpose(dim0=2, dim1=3).contiguous().view(B, D, -1)
126
+ elif scans == 1:
127
+ y = y.sum(1)
128
+ elif scans == 2:
129
+ y = y[:, 0:2] + y[:, 2:4].flip(dims=[-1]).view(B, 2, D, -1)
130
+ y = y.sum(1)
131
+ elif scans == 3:
132
+ oy = y[:, 0, :, :].contiguous().view(B, D, -1)
133
+ oy = oy + torch.rot90(y.view(B, K, D, W, H)[:, 1, :, :, :], -1, dims=(2, 3)).flatten(2, 3)
134
+ oy = oy + torch.rot90(y.view(B, K, D, H, W)[:, 2, :, :, :], -2, dims=(2, 3)).flatten(2, 3)
135
+ oy = oy + torch.rot90(y.view(B, K, D, W, H)[:, 3, :, :, :], -3, dims=(2, 3)).flatten(2, 3)
136
+ y = oy
137
+ else:
138
+ B, H, W, K, D = y.shape
139
+ y = y.view(B, -1, K, D)
140
+ if scans == 0:
141
+ y = y[:, :, 0:2] + y[:, :, 2:4].flip(dims=[1]).view(B, -1, 2, D)
142
+ y = y[:, :, 0] + y[:, :, 1].view(B, W, H, -1).transpose(dim0=1, dim1=2).contiguous().view(B, -1, D)
143
+ elif scans == 1:
144
+ y = y.sum(2)
145
+ elif scans == 2:
146
+ y = y[:, :, 0:2] + y[:, :, 2:4].flip(dims=[1]).view(B, -1, 2, D)
147
+ y = y.sum(2)
148
+ elif scans == 3:
149
+ oy = y[:, :, 0, :].contiguous().view(B, -1, D)
150
+ oy = oy + torch.rot90(y.view(B, W, H, K, D)[:, :, :, 1, :], -1, dims=(1, 2)).flatten(1, 2)
151
+ oy = oy + torch.rot90(y.view(B, H, W, K, D)[:, :, :, 2, :], -2, dims=(1, 2)).flatten(1, 2)
152
+ oy = oy + torch.rot90(y.view(B, W, H, K, D)[:, :, :, 3, :], -3, dims=(1, 2)).flatten(1, 2)
153
+ y = oy
154
+
155
+ if in_channel_first and (not out_channel_first):
156
+ y = y.permute(0, 2, 1).contiguous()
157
+ elif (not in_channel_first) and out_channel_first:
158
+ y = y.permute(0, 2, 1).contiguous()
159
+
160
+ return y
161
+
162
+
163
+ def cross_scan1b1_fwd(x: torch.Tensor, in_channel_first=True, out_channel_first=True, scans=0):
164
+ if in_channel_first:
165
+ B, _, C, H, W = x.shape
166
+ if scans == 0:
167
+ y = torch.stack([
168
+ x[:, 0].flatten(2, 3),
169
+ x[:, 1].transpose(dim0=2, dim1=3).flatten(2, 3),
170
+ torch.flip(x[:, 2].flatten(2, 3), dims=[-1]),
171
+ torch.flip(x[:, 3].transpose(dim0=2, dim1=3).flatten(2, 3), dims=[-1]),
172
+ ], dim=1)
173
+ elif scans == 1:
174
+ y = x.flatten(2, 3)
175
+ elif scans == 2:
176
+ y = torch.stack([
177
+ x[:, 0].flatten(2, 3),
178
+ x[:, 1].flatten(2, 3),
179
+ torch.flip(x[:, 2].flatten(2, 3), dims=[-1]),
180
+ torch.flip(x[:, 3].flatten(2, 3), dims=[-1]),
181
+ ], dim=1)
182
+ elif scans == 3:
183
+ y = torch.stack([
184
+ x[:, 0, :, :, :].flatten(2, 3),
185
+ torch.rot90(x[:, 1, :, :, :], 1, dims=(2, 3)).flatten(2, 3),
186
+ torch.rot90(x[:, 2, :, :, :], 2, dims=(2, 3)).flatten(2, 3),
187
+ torch.rot90(x[:, 3, :, :, :], 3, dims=(2, 3)).flatten(2, 3),
188
+ ], dim=1)
189
+
190
+ else:
191
+ B, H, W, _, C = x.shape
192
+ if scans == 0:
193
+ y = torch.stack([
194
+ x[:, :, :, 0].flatten(1, 2),
195
+ x[:, :, :, 1].transpose(dim0=1, dim1=2).flatten(1, 2),
196
+ torch.flip(x[:, :, :, 2].flatten(1, 2), dims=[1]),
197
+ torch.flip(x[:, :, :, 3].transpose(dim0=1, dim1=2).flatten(1, 2), dims=[1]),
198
+ ], dim=2)
199
+ elif scans == 1:
200
+ y = x.flatten(1, 2)
201
+ elif scans == 2:
202
+ y = torch.stack([
203
+ x[:, 0].flatten(1, 2),
204
+ x[:, 1].flatten(1, 2),
205
+ torch.flip(x[:, 2].flatten(1, 2), dims=[-1]),
206
+ torch.flip(x[:, 3].flatten(1, 2), dims=[-1]),
207
+ ], dim=2)
208
+ elif scans == 3:
209
+ y = torch.stack([
210
+ x[:, :, :, 0, :].flatten(1, 2),
211
+ torch.rot90(x[:, :, :, 1, :], 1, dims=(1, 2)).flatten(1, 2),
212
+ torch.rot90(x[:, :, :, 2, :], 2, dims=(1, 2)).flatten(1, 2),
213
+ torch.rot90(x[:, :, :, 3, :], 3, dims=(1, 2)).flatten(1, 2),
214
+ ], dim=1)
215
+
216
+ if in_channel_first and (not out_channel_first):
217
+ y = y.permute(0, 3, 1, 2).contiguous()
218
+ elif (not in_channel_first) and out_channel_first:
219
+ y = y.permute(0, 2, 3, 1).contiguous()
220
+
221
+ return y
222
+
223
+
224
+ def cross_merge1b1_fwd(y: torch.Tensor, in_channel_first=True, out_channel_first=True, scans=0):
225
+ if out_channel_first:
226
+ B, K, D, H, W = y.shape
227
+ y = y.view(B, K, D, -1)
228
+ if scans == 0:
229
+ y = torch.stack([
230
+ y[:, 0],
231
+ y[:, 1].view(B, -1, W, H).transpose(dim0=2, dim1=3).flatten(2, 3),
232
+ torch.flip(y[:, 2], dims=[-1]),
233
+ torch.flip(y[:, 3].view(B, -1, W, H).transpose(dim0=2, dim1=3).flatten(2, 3), dims=[-1]),
234
+ ], dim=1)
235
+ elif scans == 1:
236
+ y = y
237
+ elif scans == 2:
238
+ y = torch.stack([
239
+ y[:, 0],
240
+ y[:, 1],
241
+ torch.flip(y[:, 2], dims=[-1]),
242
+ torch.flip(y[:, 3], dims=[-1]),
243
+ ], dim=1)
244
+ elif scans == 3:
245
+ y = torch.stack([
246
+ y[:, 0, :, :].contiguous().view(B, D, -1),
247
+ torch.rot90(y.view(B, K, D, W, H)[:, 1, :, :, :], -1, dims=(2, 3)).flatten(2, 3),
248
+ torch.rot90(y.view(B, K, D, H, W)[:, 2, :, :, :], -2, dims=(2, 3)).flatten(2, 3),
249
+ torch.rot90(y.view(B, K, D, W, H)[:, 3, :, :, :], -3, dims=(2, 3)).flatten(2, 3),
250
+ ], dim=1)
251
+ else:
252
+ B, H, W, K, D = y.shape
253
+ y = y.view(B, -1, K, D)
254
+ if scans == 0:
255
+ y = torch.stack([
256
+ y[:, :, 0],
257
+ y[:, :, 1].view(B, W, H, -1).transpose(dim0=1, dim1=2).flatten(1, 2),
258
+ torch.flip(y[:, :, 2], dims=[1]),
259
+ torch.flip(y[:, :, 3].view(B, W, H, -1).transpose(dim0=1, dim1=2).flatten(1, 2), dims=[1]),
260
+ ], dim=2)
261
+ elif scans == 1:
262
+ y = y
263
+ elif scans == 2:
264
+ y = torch.stack([
265
+ y[:, :, 0],
266
+ y[:, :, 1],
267
+ torch.flip(y[:, :, 2], dims=[1]),
268
+ torch.flip(y[:, :, 3], dims=[1]),
269
+ ], dim=2)
270
+ elif scans == 3:
271
+ y = torch.stack([
272
+ y[:, :, 0, :].contiguous().view(B, -1, D),
273
+ torch.rot90(y.view(B, W, H, K, D)[:, :, :, 1, :], -1, dims=(1, 2)).flatten(1, 2),
274
+ torch.rot90(y.view(B, H, W, K, D)[:, :, :, 2, :], -2, dims=(1, 2)).flatten(1, 2),
275
+ torch.rot90(y.view(B, W, H, K, D)[:, :, :, 3, :], -3, dims=(1, 2)).flatten(1, 2),
276
+ ], dim=2)
277
+
278
+ if out_channel_first and (not in_channel_first):
279
+ y = y.permute(0, 3, 1, 2).contiguous()
280
+ elif (not out_channel_first) and in_channel_first:
281
+ y = y.permute(0, 2, 3, 1).contiguous()
282
+
283
+ return y
284
+
285
+
286
+ class CrossScanF(torch.autograd.Function):
287
+ @staticmethod
288
+ def forward(ctx, x: torch.Tensor, in_channel_first=True, out_channel_first=True, one_by_one=False, scans=0):
289
+ # x: (B, C, H, W) | (B, H, W, C) | (B, 4, C, H, W) | (B, H, W, 4, C)
290
+ # y: (B, 4, C, H * W) | (B, H * W, 4, C)
291
+ ctx.in_channel_first = in_channel_first
292
+ ctx.out_channel_first = out_channel_first
293
+ ctx.one_by_one = one_by_one
294
+ ctx.scans = scans
295
+
296
+ if one_by_one:
297
+ B, K, C, H, W = x.shape
298
+ if not in_channel_first:
299
+ B, H, W, K, C = x.shape
300
+ else:
301
+ B, C, H, W = x.shape
302
+ if not in_channel_first:
303
+ B, H, W, C = x.shape
304
+ ctx.shape = (B, C, H, W)
305
+
306
+ _fn = cross_scan1b1_fwd if one_by_one else cross_scan_fwd
307
+ y = _fn(x, in_channel_first, out_channel_first, scans)
308
+
309
+ return y
310
+
311
+ @staticmethod
312
+ def backward(ctx, ys: torch.Tensor):
313
+ # out: (b, k, d, l)
314
+ in_channel_first = ctx.in_channel_first
315
+ out_channel_first = ctx.out_channel_first
316
+ one_by_one = ctx.one_by_one
317
+ scans = ctx.scans
318
+ B, C, H, W = ctx.shape
319
+
320
+ ys = ys.view(B, -1, C, H, W) if out_channel_first else ys.view(B, H, W, -1, C)
321
+ _fn = cross_merge1b1_fwd if one_by_one else cross_merge_fwd
322
+ y = _fn(ys, in_channel_first, out_channel_first, scans)
323
+
324
+ if one_by_one:
325
+ y = y.view(B, 4, -1, H, W) if in_channel_first else y.view(B, H, W, 4, -1)
326
+ else:
327
+ y = y.view(B, -1, H, W) if in_channel_first else y.view(B, H, W, -1)
328
+
329
+ return y, None, None, None, None
330
+
331
+
332
+ class CrossMergeF(torch.autograd.Function):
333
+ @staticmethod
334
+ def forward(ctx, ys: torch.Tensor, in_channel_first=True, out_channel_first=True, one_by_one=False, scans=0):
335
+ # x: (B, C, H, W) | (B, H, W, C) | (B, 4, C, H, W) | (B, H, W, 4, C)
336
+ # y: (B, 4, C, H * W) | (B, H * W, 4, C)
337
+ ctx.in_channel_first = in_channel_first
338
+ ctx.out_channel_first = out_channel_first
339
+ ctx.one_by_one = one_by_one
340
+ ctx.scans = scans
341
+
342
+ B, K, C, H, W = ys.shape
343
+ if not out_channel_first:
344
+ B, H, W, K, C = ys.shape
345
+ ctx.shape = (B, C, H, W)
346
+
347
+ _fn = cross_merge1b1_fwd if one_by_one else cross_merge_fwd
348
+ y = _fn(ys, in_channel_first, out_channel_first, scans)
349
+
350
+ return y
351
+
352
+ @staticmethod
353
+ def backward(ctx, x: torch.Tensor):
354
+ # B, D, L = x.shape
355
+ # out: (b, k, d, h, w)
356
+ in_channel_first = ctx.in_channel_first
357
+ out_channel_first = ctx.out_channel_first
358
+ one_by_one = ctx.one_by_one
359
+ scans = ctx.scans
360
+ B, C, H, W = ctx.shape
361
+
362
+ if not one_by_one:
363
+ if in_channel_first:
364
+ x = x.view(B, C, H, W)
365
+ else:
366
+ x = x.view(B, H, W, C)
367
+ else:
368
+ if in_channel_first:
369
+ x = x.view(B, 4, C, H, W)
370
+ else:
371
+ x = x.view(B, H, W, 4, C)
372
+
373
+ _fn = cross_scan1b1_fwd if one_by_one else cross_scan_fwd
374
+ x = _fn(x, in_channel_first, out_channel_first, scans)
375
+ x = x.view(B, 4, C, H, W) if out_channel_first else x.view(B, H, W, 4, C)
376
+
377
+ return x, None, None, None, None
378
+
379
+
380
+ # triton implements ========================================
381
+
382
+ @triton.jit
383
+ def triton_cross_scan_flex(
384
+ x: tl.tensor, # (B, C, H, W) | (B, H, W, C) | (B, 4, C, H, W) | (B, H, W, 4, C)
385
+ y: tl.tensor, # (B, 4, C, H, W) | (B, H, W, 4, C)
386
+ x_layout: tl.constexpr,
387
+ y_layout: tl.constexpr,
388
+ operation: tl.constexpr,
389
+ onebyone: tl.constexpr,
390
+ scans: tl.constexpr,
391
+ BC: tl.constexpr,
392
+ BH: tl.constexpr,
393
+ BW: tl.constexpr,
394
+ DC: tl.constexpr,
395
+ DH: tl.constexpr,
396
+ DW: tl.constexpr,
397
+ NH: tl.constexpr,
398
+ NW: tl.constexpr,
399
+ ):
400
+ # x_layout = 0
401
+ # y_layout = 1 # 0 BCHW, 1 BHWC
402
+ # operation = 0 # 0 scan, 1 merge
403
+ # onebyone = 0 # 0 false, 1 true
404
+ # scans = 0 # 0 cross scan, 1 unidirectional, 2 bidirectional
405
+
406
+ i_hw, i_c, i_b = tl.program_id(0), tl.program_id(1), tl.program_id(2)
407
+ i_h, i_w = (i_hw // NW), (i_hw % NW)
408
+ _mask_h = (i_h * BH + tl.arange(0, BH)) < DH
409
+ _mask_w = (i_w * BW + tl.arange(0, BW)) < DW
410
+ _mask_hw = _mask_h[:, None] & _mask_w[None, :]
411
+ _for_C = min(DC - i_c * BC, BC)
412
+
413
+ pos_h = (i_h * BH + tl.arange(0, BH)[:, None])
414
+ pos_w = (i_w * BW + tl.arange(0, BW)[None, :])
415
+ neg_h = (DH - i_h * BH - 1 - tl.arange(0, BH)[:, None])
416
+ neg_w = (DW - i_w * BW - 1 - tl.arange(0, BW)[None, :])
417
+ if scans == 0:
418
+ # none; trans; flip; trans + flip;
419
+ HWRoute0 = pos_h * DW + pos_w
420
+ HWRoute1 = pos_w * DH + pos_h # trans
421
+ HWRoute2 = neg_h * DW + neg_w # flip
422
+ HWRoute3 = neg_w * DH + neg_h # trans + flip
423
+ elif scans == 1:
424
+ # none; none; none; none;
425
+ HWRoute0 = pos_h * DW + pos_w
426
+ HWRoute1 = HWRoute0
427
+ HWRoute2 = HWRoute0
428
+ HWRoute3 = HWRoute0
429
+ elif scans == 2:
430
+ # none; none; flip; flip;
431
+ HWRoute0 = pos_h * DW + pos_w
432
+ HWRoute1 = HWRoute0
433
+ HWRoute2 = neg_h * DW + neg_w # flip
434
+ HWRoute3 = HWRoute2
435
+ elif scans == 3:
436
+ # none; rot90; rot180==flip; rot270;
437
+ HWRoute0 = pos_h * DW + pos_w
438
+ HWRoute1 = neg_w * DH + pos_h
439
+ HWRoute2 = neg_h * DW + neg_w
440
+ HWRoute3 = pos_w * DH + neg_h
441
+
442
+ _tmp1 = DC * DH * DW
443
+
444
+ y_ptr_base = y + i_b * 4 * _tmp1 + (i_c * BC * DH * DW if y_layout == 0 else i_c * BC)
445
+ if y_layout == 0:
446
+ p_y1 = y_ptr_base + HWRoute0
447
+ p_y2 = y_ptr_base + _tmp1 + HWRoute1
448
+ p_y3 = y_ptr_base + 2 * _tmp1 + HWRoute2
449
+ p_y4 = y_ptr_base + 3 * _tmp1 + HWRoute3
450
+ else:
451
+ p_y1 = y_ptr_base + HWRoute0 * 4 * DC
452
+ p_y2 = y_ptr_base + DC + HWRoute1 * 4 * DC
453
+ p_y3 = y_ptr_base + 2 * DC + HWRoute2 * 4 * DC
454
+ p_y4 = y_ptr_base + 3 * DC + HWRoute3 * 4 * DC
455
+
456
+ if onebyone == 0:
457
+ x_ptr_base = x + i_b * _tmp1 + (i_c * BC * DH * DW if x_layout == 0 else i_c * BC)
458
+ if x_layout == 0:
459
+ p_x = x_ptr_base + HWRoute0
460
+ else:
461
+ p_x = x_ptr_base + HWRoute0 * DC
462
+
463
+ if operation == 0:
464
+ for idxc in range(_for_C):
465
+ _idx_x = idxc * DH * DW if x_layout == 0 else idxc
466
+ _idx_y = idxc * DH * DW if y_layout == 0 else idxc
467
+ _x = tl.load(p_x + _idx_x, mask=_mask_hw)
468
+ tl.store(p_y1 + _idx_y, _x, mask=_mask_hw)
469
+ tl.store(p_y2 + _idx_y, _x, mask=_mask_hw)
470
+ tl.store(p_y3 + _idx_y, _x, mask=_mask_hw)
471
+ tl.store(p_y4 + _idx_y, _x, mask=_mask_hw)
472
+ elif operation == 1:
473
+ for idxc in range(_for_C):
474
+ _idx_x = idxc * DH * DW if x_layout == 0 else idxc
475
+ _idx_y = idxc * DH * DW if y_layout == 0 else idxc
476
+ _y1 = tl.load(p_y1 + _idx_y, mask=_mask_hw)
477
+ _y2 = tl.load(p_y2 + _idx_y, mask=_mask_hw)
478
+ _y3 = tl.load(p_y3 + _idx_y, mask=_mask_hw)
479
+ _y4 = tl.load(p_y4 + _idx_y, mask=_mask_hw)
480
+ tl.store(p_x + _idx_x, _y1 + _y2 + _y3 + _y4, mask=_mask_hw)
481
+
482
+ else:
483
+ x_ptr_base = x + i_b * 4 * _tmp1 + (i_c * BC * DH * DW if x_layout == 0 else i_c * BC)
484
+ if x_layout == 0:
485
+ p_x1 = x_ptr_base + HWRoute0
486
+ p_x2 = p_x1 + _tmp1
487
+ p_x3 = p_x2 + _tmp1
488
+ p_x4 = p_x3 + _tmp1
489
+ else:
490
+ p_x1 = x_ptr_base + HWRoute0 * 4 * DC
491
+ p_x2 = p_x1 + DC
492
+ p_x3 = p_x2 + DC
493
+ p_x4 = p_x3 + DC
494
+
495
+ if operation == 0:
496
+ for idxc in range(_for_C):
497
+ _idx_x = idxc * DH * DW if x_layout == 0 else idxc
498
+ _idx_y = idxc * DH * DW if y_layout == 0 else idxc
499
+ tl.store(p_y1 + _idx_y, tl.load(p_x1 + _idx_x, mask=_mask_hw), mask=_mask_hw)
500
+ tl.store(p_y2 + _idx_y, tl.load(p_x2 + _idx_x, mask=_mask_hw), mask=_mask_hw)
501
+ tl.store(p_y3 + _idx_y, tl.load(p_x3 + _idx_x, mask=_mask_hw), mask=_mask_hw)
502
+ tl.store(p_y4 + _idx_y, tl.load(p_x4 + _idx_x, mask=_mask_hw), mask=_mask_hw)
503
+ else:
504
+ for idxc in range(_for_C):
505
+ _idx_x = idxc * DH * DW if x_layout == 0 else idxc
506
+ _idx_y = idxc * DH * DW if y_layout == 0 else idxc
507
+ tl.store(p_x1 + _idx_x, tl.load(p_y1 + _idx_y), mask=_mask_hw)
508
+ tl.store(p_x2 + _idx_x, tl.load(p_y2 + _idx_y), mask=_mask_hw)
509
+ tl.store(p_x3 + _idx_x, tl.load(p_y3 + _idx_y), mask=_mask_hw)
510
+ tl.store(p_x4 + _idx_x, tl.load(p_y4 + _idx_y), mask=_mask_hw)
511
+
512
+
513
+ class CrossScanTritonF(torch.autograd.Function):
514
+ @staticmethod
515
+ def forward(ctx, x: torch.Tensor, in_channel_first=True, out_channel_first=True, one_by_one=False, scans=0):
516
+ if one_by_one:
517
+ if in_channel_first:
518
+ B, _, C, H, W = x.shape
519
+ else:
520
+ B, H, W, _, C = x.shape
521
+ else:
522
+ if in_channel_first:
523
+ B, C, H, W = x.shape
524
+ else:
525
+ B, H, W, C = x.shape
526
+ B, C, H, W = int(B), int(C), int(H), int(W)
527
+ BC, BH, BW = 1, 32, 32
528
+ NH, NW, NC = triton.cdiv(H, BH), triton.cdiv(W, BW), triton.cdiv(C, BC)
529
+
530
+ ctx.in_channel_first = in_channel_first
531
+ ctx.out_channel_first = out_channel_first
532
+ ctx.one_by_one = one_by_one
533
+ ctx.scans = scans
534
+ ctx.shape = (B, C, H, W)
535
+ ctx.triton_shape = (BC, BH, BW, NC, NH, NW)
536
+
537
+ y = x.new_empty((B, 4, C, H * W)) if out_channel_first else x.new_empty((B, H * W, 4, C))
538
+ triton_cross_scan_flex[(NH * NW, NC, B)](
539
+ x.contiguous(), y,
540
+ (0 if in_channel_first else 1), (0 if out_channel_first else 1), 0, (0 if not one_by_one else 1), scans,
541
+ BC, BH, BW, C, H, W, NH, NW
542
+ )
543
+ return y
544
+
545
+ @staticmethod
546
+ def backward(ctx, y: torch.Tensor):
547
+ in_channel_first = ctx.in_channel_first
548
+ out_channel_first = ctx.out_channel_first
549
+ one_by_one = ctx.one_by_one
550
+ scans = ctx.scans
551
+ B, C, H, W = ctx.shape
552
+ BC, BH, BW, NC, NH, NW = ctx.triton_shape
553
+ if one_by_one:
554
+ x = y.new_empty((B, 4, C, H, W)) if in_channel_first else y.new_empty((B, H, W, 4, C))
555
+ else:
556
+ x = y.new_empty((B, C, H, W)) if in_channel_first else y.new_empty((B, H, W, C))
557
+
558
+ triton_cross_scan_flex[(NH * NW, NC, B)](
559
+ x, y.contiguous(),
560
+ (0 if in_channel_first else 1), (0 if out_channel_first else 1), 1, (0 if not one_by_one else 1), scans,
561
+ BC, BH, BW, C, H, W, NH, NW
562
+ )
563
+ return x, None, None, None, None
564
+
565
+
566
+ class CrossMergeTritonF(torch.autograd.Function):
567
+ @staticmethod
568
+ def forward(ctx, y: torch.Tensor, in_channel_first=True, out_channel_first=True, one_by_one=False, scans=0):
569
+ if out_channel_first:
570
+ B, _, C, H, W = y.shape
571
+ else:
572
+ B, H, W, _, C = y.shape
573
+ B, C, H, W = int(B), int(C), int(H), int(W)
574
+ BC, BH, BW = 1, 32, 32
575
+ NH, NW, NC = triton.cdiv(H, BH), triton.cdiv(W, BW), triton.cdiv(C, BC)
576
+ ctx.in_channel_first = in_channel_first
577
+ ctx.out_channel_first = out_channel_first
578
+ ctx.one_by_one = one_by_one
579
+ ctx.scans = scans
580
+ ctx.shape = (B, C, H, W)
581
+ ctx.triton_shape = (BC, BH, BW, NC, NH, NW)
582
+ if one_by_one:
583
+ x = y.new_empty((B, 4, C, H * W)) if in_channel_first else y.new_empty((B, H * W, 4, C))
584
+ else:
585
+ x = y.new_empty((B, C, H * W)) if in_channel_first else y.new_empty((B, H * W, C))
586
+ triton_cross_scan_flex[(NH * NW, NC, B)](
587
+ x, y.contiguous(),
588
+ (0 if in_channel_first else 1), (0 if out_channel_first else 1), 1, (0 if not one_by_one else 1), scans,
589
+ BC, BH, BW, C, H, W, NH, NW
590
+ )
591
+ return x
592
+
593
+ @staticmethod
594
+ def backward(ctx, x: torch.Tensor):
595
+ in_channel_first = ctx.in_channel_first
596
+ out_channel_first = ctx.out_channel_first
597
+ one_by_one = ctx.one_by_one
598
+ scans = ctx.scans
599
+ B, C, H, W = ctx.shape
600
+ BC, BH, BW, NC, NH, NW = ctx.triton_shape
601
+ y = x.new_empty((B, 4, C, H, W)) if out_channel_first else x.new_empty((B, H, W, 4, C))
602
+ triton_cross_scan_flex[(NH * NW, NC, B)](
603
+ x.contiguous(), y,
604
+ (0 if in_channel_first else 1), (0 if out_channel_first else 1), 0, (0 if not one_by_one else 1), scans,
605
+ BC, BH, BW, C, H, W, NH, NW
606
+ )
607
+ return y, None, None, None, None, None
608
+
609
+
610
+ # @torch.compile(options={"triton.cudagraphs": True}, fullgraph=True)
611
+ def cross_scan_fn(x: torch.Tensor, in_channel_first=True, out_channel_first=True, one_by_one=False, scans=0, force_torch=False):
612
+ # x: (B, C, H, W) | (B, H, W, C) | (B, 4, C, H, W) | (B, H, W, 4, C)
613
+ # y: (B, 4, C, L) | (B, L, 4, C)
614
+ # scans: 0: cross scan; 1 unidirectional; 2: bidirectional;
615
+ CSF = CrossScanTritonF if WITH_TRITON and x.is_cuda and (not force_torch) else CrossScanF
616
+ if x.is_cuda:
617
+ with torch.cuda.device(x.device):
618
+ return CSF.apply(x, in_channel_first, out_channel_first, one_by_one, scans)
619
+ else:
620
+ return CrossScanF.apply(x, in_channel_first, out_channel_first, one_by_one, scans)
621
+
622
+
623
+ # @torch.compile(options={"triton.cudagraphs": True}, fullgraph=True)
624
+ def cross_merge_fn(y: torch.Tensor, in_channel_first=True, out_channel_first=True, one_by_one=False, scans=0, force_torch=False):
625
+ # y: (B, 4, C, L) | (B, L, 4, C)
626
+ # x: (B, C, H * W) | (B, H * W, C) | (B, 4, C, H * W) | (B, H * W, 4, C)
627
+ # scans: 0: cross scan; 1 unidirectional; 2: bidirectional;
628
+ CMF = CrossMergeTritonF if WITH_TRITON and y.is_cuda and (not force_torch) else CrossMergeF
629
+ if y.is_cuda:
630
+ with torch.cuda.device(y.device):
631
+ return CMF.apply(y, in_channel_first, out_channel_first, one_by_one, scans)
632
+ else:
633
+ return CrossMergeF.apply(y, in_channel_first, out_channel_first, one_by_one, scans)
634
+
635
+
636
+ ##########################################################
637
+ # csms6s.py
638
+ ##########################################################
639
+
640
+ WITH_SELECTIVESCAN_MAMBA = True
641
+ try:
642
+ import selective_scan_cuda
643
+ except ImportError:
644
+ WITH_SELECTIVESCAN_MAMBA = False
645
+
646
+
647
+ def selective_scan_torch(
648
+ u: torch.Tensor, # (B, K * C, L)
649
+ delta: torch.Tensor, # (B, K * C, L)
650
+ A: torch.Tensor, # (K * C, N)
651
+ B: torch.Tensor, # (B, K, N, L)
652
+ C: torch.Tensor, # (B, K, N, L)
653
+ D: torch.Tensor = None, # (K * C)
654
+ delta_bias: torch.Tensor = None, # (K * C)
655
+ delta_softplus=True,
656
+ oflex=True,
657
+ *args,
658
+ **kwargs
659
+ ):
660
+ dtype_in = u.dtype
661
+ Batch, K, N, L = B.shape
662
+ KCdim = u.shape[1]
663
+ Cdim = int(KCdim / K)
664
+ assert u.shape == (Batch, KCdim, L)
665
+ assert delta.shape == (Batch, KCdim, L)
666
+ assert A.shape == (KCdim, N)
667
+ assert C.shape == B.shape
668
+
669
+ if delta_bias is not None:
670
+ delta = delta + delta_bias[..., None]
671
+ if delta_softplus:
672
+ delta = torch.nn.functional.softplus(delta)
673
+
674
+ u, delta, A, B, C = u.float(), delta.float(), A.float(), B.float(), C.float()
675
+ B = B.view(Batch, K, 1, N, L).repeat(1, 1, Cdim, 1, 1).view(Batch, KCdim, N, L)
676
+ C = C.view(Batch, K, 1, N, L).repeat(1, 1, Cdim, 1, 1).view(Batch, KCdim, N, L)
677
+ deltaA = torch.exp(torch.einsum('bdl,dn->bdln', delta, A))
678
+ deltaB_u = torch.einsum('bdl,bdnl,bdl->bdln', delta, B, u)
679
+
680
+ if True:
681
+ x = A.new_zeros((Batch, KCdim, N))
682
+ ys = []
683
+ for i in range(L):
684
+ x = deltaA[:, :, i, :] * x + deltaB_u[:, :, i, :]
685
+ y = torch.einsum('bdn,bdn->bd', x, C[:, :, :, i])
686
+ ys.append(y)
687
+ y = torch.stack(ys, dim=2) # (B, C, L)
688
+
689
+ out = y if D is None else y + u * D.unsqueeze(-1)
690
+ return out if oflex else out.to(dtype=dtype_in)
691
+
692
+
693
+ class SelectiveScanCuda(torch.autograd.Function):
694
+ @staticmethod
695
+ @torch.cuda.amp.custom_fwd
696
+ def forward(ctx, u, delta, A, B, C, D=None, delta_bias=None, delta_softplus=False, oflex=True, backend=None):
697
+ ctx.delta_softplus = delta_softplus
698
+ # backend = "oflex" if WITH_SELECTIVESCAN_OFLEX and (backend is None) else backend
699
+ # backend = "core" if WITH_SELECTIVESCAN_CORE and (backend is None) else backend
700
+ backend = "mamba" if WITH_SELECTIVESCAN_MAMBA and (backend is None) else backend
701
+ ctx.backend = backend
702
+ if backend == "oflex":
703
+ out, x, *rest = selective_scan_cuda_oflex.fwd(u, delta, A, B, C, D, delta_bias, delta_softplus, 1, oflex)
704
+ elif backend == "mamba":
705
+ out, x, *rest = selective_scan_cuda.fwd(u, delta, A, B, C, D, None, delta_bias, delta_softplus)
706
+ ctx.save_for_backward(u, delta, A, B, C, D, delta_bias, x)
707
+ return out
708
+
709
+ @staticmethod
710
+ @torch.cuda.amp.custom_bwd
711
+ def backward(ctx, dout, *args):
712
+ u, delta, A, B, C, D, delta_bias, x = ctx.saved_tensors
713
+ backend = ctx.backend
714
+ if dout.stride(-1) != 1:
715
+ dout = dout.contiguous()
716
+ if backend == "oflex":
717
+ du, ddelta, dA, dB, dC, dD, ddelta_bias, *rest = selective_scan_cuda_oflex.bwd(
718
+ u, delta, A, B, C, D, delta_bias, dout, x, ctx.delta_softplus, 1
719
+ )
720
+ elif backend == "mamba":
721
+ du, ddelta, dA, dB, dC, dD, ddelta_bias, *rest = selective_scan_cuda.bwd(
722
+ u, delta, A, B, C, D, None, delta_bias, dout, x, None, None, ctx.delta_softplus,
723
+ False
724
+ )
725
+ return du, ddelta, dA, dB, dC, dD, ddelta_bias, None, None, None
726
+
727
+
728
+ def selective_scan_fn(
729
+ u: torch.Tensor, # (B, K * C, L)
730
+ delta: torch.Tensor, # (B, K * C, L)
731
+ A: torch.Tensor, # (K * C, N)
732
+ B: torch.Tensor, # (B, K, N, L)
733
+ C: torch.Tensor, # (B, K, N, L)
734
+ D: torch.Tensor = None, # (K * C)
735
+ delta_bias: torch.Tensor = None, # (K * C)
736
+ delta_softplus=True,
737
+ oflex=True,
738
+ backend=None,
739
+ ):
740
+ fn = selective_scan_torch if backend == "torch" or (not WITH_SELECTIVESCAN_MAMBA) else SelectiveScanCuda.apply
741
+ return fn(u, delta, A, B, C, D, delta_bias, delta_softplus, oflex, backend)
742
+
743
+ ##########################################################
744
+ ############## HuggingFace modeling file #################
745
+ ##########################################################
746
+
747
+ class VMambaLinear2d(nn.Linear):
748
+ def __init__(self, *args, groups=1, **kwargs):
749
+ nn.Linear.__init__(self, *args, **kwargs)
750
+ self.groups = groups
751
+
752
+ def forward(self, x: torch.Tensor):
753
+ if len(x.shape) == 4:
754
+ return F.conv2d(x, self.weight[:, :, None, None], self.bias, groups=self.groups)
755
+ elif len(x.shape) == 3:
756
+ return F.conv1d(x, self.weight[:, :, None], self.bias, groups=self.groups)
757
+
758
+ def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs):
759
+ self_state_dict = self.state_dict()
760
+ load_state_dict_keys = list(state_dict.keys())
761
+ if prefix + "weight" in load_state_dict_keys:
762
+ state_dict[prefix + "weight"] = state_dict[prefix + "weight"].view_as(self_state_dict["weight"])
763
+ return super()._load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
764
+
765
+
766
+ class VMambaLayerNorm2d(nn.LayerNorm):
767
+ def __init__(self, *args, **kwargs):
768
+ nn.LayerNorm.__init__(self, *args, **kwargs)
769
+
770
+ def forward(self, x: torch.Tensor):
771
+ x = x.permute(0, 2, 3, 1)
772
+ x = nn.LayerNorm.forward(self, x)
773
+ x = x.permute(0, 3, 1, 2)
774
+ return x
775
+
776
+
777
+ class VMambaPatchEmbeddings(nn.Module):
778
+ """
779
+ This class turns `input_values` into the initial `hidden_states` (patch embeddings) of shape `(batch_size,
780
+ seq_length, hidden_size)` to be consumed by a State-space model.
781
+ """
782
+
783
+ def __init__(self, num_channels=3,patch_size=4,embed_dim=96):
784
+ super().__init__()
785
+
786
+ stride = patch_size // 2
787
+ kernel_size = stride + 1
788
+ padding = 1
789
+
790
+ self.projection = nn.Sequential(
791
+ nn.Conv2d(num_channels, embed_dim // 2, kernel_size=kernel_size, stride=stride, padding=padding),
792
+ VMambaLayerNorm2d(embed_dim // 2),
793
+ nn.GELU(),
794
+ nn.Conv2d(embed_dim // 2, embed_dim, kernel_size=kernel_size, stride=stride, padding=padding),
795
+ VMambaLayerNorm2d(embed_dim),
796
+ )
797
+
798
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
799
+ x = self.projection(x)
800
+ return x
801
+
802
+
803
+ class VMambaDowsample(nn.Module):
804
+ """
805
+ This class downsamples the input tensor using a convolutional layer followed by a layer normalization.
806
+ """
807
+ def __init__(self, dim, out_dim, use_norm=True):
808
+ super().__init__()
809
+ self.down = nn.Conv2d(dim, out_dim, kernel_size=3, stride=2, padding=1)
810
+ self.norm = VMambaLayerNorm2d(out_dim) if use_norm else nn.Identity()
811
+
812
+ def forward(self, x):
813
+ x = self.down(x)
814
+ x = self.norm(x)
815
+ return x
816
+
817
+
818
+ class VMambaMlp(nn.Module):
819
+ def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
820
+ super().__init__()
821
+ out_features = out_features or in_features
822
+ hidden_features = hidden_features or in_features
823
+ self.fc1 = VMambaLinear2d(in_features, hidden_features)
824
+ self.act = act_layer()
825
+ self.fc2 = VMambaLinear2d(hidden_features, out_features)
826
+ self.drop = nn.Dropout(drop)
827
+
828
+ def forward(self, x):
829
+ x = self.fc1(x)
830
+ x = self.act(x)
831
+ x = self.drop(x)
832
+ x = self.fc2(x)
833
+ x = self.drop(x)
834
+ return x
835
+
836
+
837
+ class SS2D(nn.Module):
838
+ def __init__(
839
+ self,
840
+ # basic dims ===========
841
+ d_model=96,
842
+ d_state=16,
843
+ ssm_ratio=2.0,
844
+ dt_rank="auto",
845
+ act_layer=nn.SiLU,
846
+ # dwconv ===============
847
+ d_conv=3,
848
+ conv_bias=True,
849
+ # ======================
850
+ dropout=0.0,
851
+ bias=False,
852
+ # dt init ==============
853
+ dt_min=0.001,
854
+ dt_max=0.1,
855
+ dt_init="random",
856
+ dt_scale=1.0,
857
+ dt_init_floor=1e-4,
858
+ # forward_type="v05_noz" is always used
859
+ # ======================
860
+ **kwargs,
861
+ ):
862
+ super().__init__()
863
+ self.k_group = 4
864
+ self.d_model = int(d_model)
865
+ self.d_state = int(d_state)
866
+ self.d_inner = int(ssm_ratio * d_model)
867
+ self.dt_rank = int(math.ceil(self.d_model / 16) if dt_rank == "auto" else dt_rank)
868
+ self.forward_core = partial(self.forward_corev2, force_fp32=False, no_einsum=True)
869
+ self.with_dconv = d_conv > 1
870
+
871
+ # In projection
872
+ self.in_proj = VMambaLinear2d(self.d_model, self.d_inner, bias=bias)
873
+ self.act: nn.Module = act_layer()
874
+
875
+ # Convolution
876
+ if self.with_dconv:
877
+ self.conv2d = nn.Conv2d(
878
+ in_channels=self.d_inner,
879
+ out_channels=self.d_inner,
880
+ groups=self.d_inner,
881
+ bias=conv_bias,
882
+ kernel_size=d_conv,
883
+ padding=(d_conv - 1) // 2,
884
+ )
885
+
886
+ # x_proj and dt_proj
887
+ self.x_proj = VMambaLinear2d(self.d_inner, self.k_group * (self.dt_rank + self.d_state * 2), groups=self.k_group, bias=False)
888
+ self.dt_projs = VMambaLinear2d(self.dt_rank, self.k_group * self.d_inner, groups=self.k_group, bias=False)
889
+
890
+ # out projection
891
+ self.out_proj = VMambaLinear2d(self.d_inner, self.d_model, bias=bias)
892
+ self.dropout = nn.Dropout(dropout) if dropout > 0. else nn.Identity()
893
+
894
+ # Initialization
895
+ self.A_logs, self.Ds, self.dt_projs_weight, self.dt_projs_bias = self.init_dt_A_D(
896
+ self.d_state, self.dt_rank, self.d_inner, dt_scale, dt_init, dt_min, dt_max, dt_init_floor, k_group=self.k_group,
897
+ )
898
+ self.dt_projs.weight.data = self.dt_projs_weight.data.view(self.dt_projs.weight.shape)
899
+ # self.dt_projs.bias.data = self.dt_projs_bias.data.view(self.dt_projs.bias.shape)
900
+ del self.dt_projs_weight
901
+ # del self.dt_projs_bias
902
+ # Define out_norm directly with "LN2D"
903
+ self.out_norm = VMambaLayerNorm2d(self.d_inner)
904
+
905
+ @staticmethod
906
+ def dt_init(dt_rank, d_inner, dt_scale=1.0, dt_init="random", dt_min=0.001, dt_max=0.1, dt_init_floor=1e-4):
907
+ dt_proj = nn.Linear(dt_rank, d_inner, bias=True)
908
+
909
+ dt_init_std = dt_rank**-0.5 * dt_scale
910
+ if dt_init == "constant":
911
+ nn.init.constant_(dt_proj.weight, dt_init_std)
912
+ elif dt_init == "random":
913
+ nn.init.uniform_(dt_proj.weight, -dt_init_std, dt_init_std)
914
+ else:
915
+ raise NotImplementedError
916
+
917
+ dt = torch.exp(
918
+ torch.rand(d_inner) * (math.log(dt_max) - math.log(dt_min))
919
+ + math.log(dt_min)
920
+ ).clamp(min=dt_init_floor)
921
+
922
+ inv_dt = dt + torch.log(-torch.expm1(-dt))
923
+ with torch.no_grad():
924
+ dt_proj.bias.copy_(inv_dt)
925
+
926
+ return dt_proj
927
+
928
+ @staticmethod
929
+ def A_log_init(d_state, d_inner, copies=-1, device=None, merge=True):
930
+ A = torch.arange(1, d_state + 1, dtype=torch.float32, device=device).view(1, -1).repeat(d_inner, 1).contiguous()
931
+ A_log = torch.log(A)
932
+ if copies > 0:
933
+ A_log = A_log[None].repeat(copies, 1, 1).contiguous()
934
+ if merge:
935
+ A_log = A_log.flatten(0, 1)
936
+ A_log = nn.Parameter(A_log)
937
+ A_log._no_weight_decay = True
938
+ return A_log
939
+
940
+ @staticmethod
941
+ def D_init(d_inner, copies=-1, device=None, merge=True):
942
+ D = torch.ones(d_inner, device=device)
943
+ if copies > 0:
944
+ D = D[None].repeat(copies, 1).contiguous()
945
+ if merge:
946
+ D = D.flatten(0, 1)
947
+ D = nn.Parameter(D)
948
+ D._no_weight_decay = True
949
+ return D
950
+
951
+ @classmethod
952
+ def init_dt_A_D(cls, d_state, dt_rank, d_inner, dt_scale, dt_init, dt_min, dt_max, dt_init_floor, k_group=4):
953
+ dt_projs = [
954
+ cls.dt_init(dt_rank, d_inner, dt_scale, dt_init, dt_min, dt_max, dt_init_floor)
955
+ for _ in range(k_group)
956
+ ]
957
+ dt_projs_weight = nn.Parameter(torch.stack([t.weight for t in dt_projs], dim=0))
958
+ dt_projs_bias = nn.Parameter(torch.stack([t.bias for t in dt_projs], dim=0))
959
+ del dt_projs
960
+
961
+ A_logs = cls.A_log_init(d_state, d_inner, copies=k_group, merge=True)
962
+ Ds = cls.D_init(d_inner, copies=k_group, merge=True)
963
+ return A_logs, Ds, dt_projs_weight, dt_projs_bias
964
+
965
+ def forward_corev2(
966
+ self,
967
+ x: torch.Tensor,
968
+ force_fp32=False,
969
+ no_einsum=True,
970
+ ):
971
+ B, D, H, W = x.shape
972
+ N = self.d_state
973
+ L = H * W
974
+
975
+ xs = cross_scan_fn(x, in_channel_first=True, out_channel_first=True)
976
+ x_dbl = self.x_proj(xs.view(B, -1, L))
977
+ dts, Bs, Cs = torch.split(x_dbl.view(B, self.k_group, -1, L), [self.dt_rank, N, N], dim=2)
978
+ dts = dts.contiguous().view(B, -1, L)
979
+ dts = self.dt_projs(dts)
980
+
981
+ xs = xs.view(B, -1, L)
982
+ dts = dts.contiguous().view(B, -1, L)
983
+ As = -self.A_logs.to(torch.float32).exp()
984
+ Ds = self.Ds.to(torch.float32)
985
+ Bs = Bs.contiguous().view(B, self.k_group, N, L)
986
+ Cs = Cs.contiguous().view(B, self.k_group, N, L)
987
+ delta_bias = self.dt_projs_bias.view(-1).to(torch.float32)
988
+
989
+ ys = selective_scan_fn(
990
+ xs, dts, As, Bs, Cs, Ds, delta_bias, delta_softplus=True, backend="mamba"
991
+ ).view(B, self.k_group, -1, H, W)
992
+
993
+ y = cross_merge_fn(ys, in_channel_first=True, out_channel_first=True)
994
+ y = y.view(B, -1, H, W)
995
+ y = self.out_norm(y)
996
+ return y.to(x.dtype)
997
+
998
+ def forward(self, x: torch.Tensor):
999
+ x = self.in_proj(x)
1000
+ x = self.conv2d(x)
1001
+
1002
+ x = self.act(x)
1003
+ y = self.forward_core(x)
1004
+
1005
+ out = self.dropout(self.out_proj(y))
1006
+ return out
1007
+
1008
+
1009
+ class VSSBlock(nn.Module):
1010
+ def __init__(
1011
+ self,
1012
+ hidden_dim: int = 0,
1013
+ drop_path: float = 0,
1014
+ ssm_d_state: int = 1,
1015
+ ssm_ratio=1.0,
1016
+ ssm_dt_rank: Any = "auto",
1017
+ ssm_act_layer=nn.SiLU,
1018
+ ssm_conv: int = 3,
1019
+ ssm_conv_bias=False,
1020
+ ssm_drop_rate: float = 0,
1021
+ mlp_ratio=4.0,
1022
+ mlp_act_layer=nn.GELU,
1023
+ mlp_drop_rate: float = 0.0,
1024
+ use_checkpoint: bool = False,
1025
+ post_norm: bool = False,
1026
+ **kwargs,
1027
+ ):
1028
+ super().__init__()
1029
+ self.ssm_branch = ssm_ratio > 0
1030
+ self.mlp_branch = mlp_ratio > 0
1031
+ self.use_checkpoint = use_checkpoint
1032
+ self.post_norm = post_norm
1033
+
1034
+ if self.ssm_branch:
1035
+ self.norm = VMambaLayerNorm2d(hidden_dim)
1036
+ self.op = SS2D(
1037
+ d_model=hidden_dim,
1038
+ d_state=ssm_d_state,
1039
+ ssm_ratio=ssm_ratio,
1040
+ dt_rank=ssm_dt_rank,
1041
+ act_layer=ssm_act_layer,
1042
+ d_conv=ssm_conv,
1043
+ conv_bias=ssm_conv_bias,
1044
+ dropout=ssm_drop_rate,
1045
+ )
1046
+
1047
+ self.drop_path = DropPath(drop_path)
1048
+
1049
+ if self.mlp_branch:
1050
+ self.norm2 = VMambaLayerNorm2d(hidden_dim)
1051
+ mlp_hidden_dim = int(hidden_dim * mlp_ratio)
1052
+ self.mlp = VMambaMlp(in_features=hidden_dim, hidden_features=mlp_hidden_dim, act_layer=mlp_act_layer, drop=mlp_drop_rate)
1053
+
1054
+ def _forward(self, input: torch.Tensor):
1055
+ x = input
1056
+ if self.ssm_branch:
1057
+ if self.post_norm:
1058
+ x = x + self.drop_path(self.norm(self.op(x)))
1059
+ else:
1060
+ x = x + self.drop_path(self.op(self.norm(x)))
1061
+ if self.mlp_branch:
1062
+ if self.post_norm:
1063
+ x = x + self.drop_path(self.norm2(self.mlp(x)))
1064
+ else:
1065
+ x = x + self.drop_path(self.mlp(self.norm2(x)))
1066
+ return x
1067
+
1068
+ def forward(self, input: torch.Tensor):
1069
+ if self.use_checkpoint:
1070
+ return checkpoint.checkpoint(self._forward, input)
1071
+ else:
1072
+ return self._forward(input)
1073
+
1074
+ class VMambaLayer(nn.Module):
1075
+
1076
+ def __init__(
1077
+ self,
1078
+ input_dim,
1079
+ depth,
1080
+ drop_path=0.0,
1081
+ norm_layer=VMambaLayerNorm2d,
1082
+ downsample=nn.Identity(),
1083
+ use_checkpoint=False,
1084
+ **kwargs,
1085
+ ):
1086
+ super().__init__()
1087
+ self.input_dim = input_dim
1088
+ self.use_checkpoint = use_checkpoint
1089
+
1090
+ self.blocks = nn.ModuleList()
1091
+ for i in range(depth):
1092
+ self.blocks.append(
1093
+ VSSBlock(hidden_dim=input_dim,
1094
+ drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,
1095
+ norm_layer=norm_layer,use_checkpoint=use_checkpoint,**kwargs,
1096
+ )
1097
+ )
1098
+
1099
+ self.downsample = downsample
1100
+
1101
+ def forward(self, x):
1102
+ for block in self.blocks:
1103
+ x = block(x)
1104
+
1105
+ x = self.downsample(x)
1106
+ return x
1107
+
1108
+ class VMambaPreTrainedModel(PreTrainedModel):
1109
+ """
1110
+ An abstract class to handle weights initialization and
1111
+ a simple interface for downloading and loading pretrained models.
1112
+ """
1113
+
1114
+ config_class = VMambaConfig
1115
+ base_model_prefix = "vmamba"
1116
+ supports_gradient_checkpointing = False
1117
+
1118
+ def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None:
1119
+ """Initialize the weights"""
1120
+ if isinstance(module, nn.Linear):
1121
+ trunc_normal_(module.weight, std=0.02)
1122
+ if isinstance(module, nn.Linear) and module.bias is not None:
1123
+ nn.init.constant_(module.bias, 0)
1124
+ elif isinstance(module, nn.LayerNorm):
1125
+ nn.init.constant_(module.bias, 0)
1126
+ nn.init.constant_(module.weight, 1.0)
1127
+
1128
+
1129
+ class VMambaModel(VMambaPreTrainedModel):
1130
+ def __init__(self, config):
1131
+ super().__init__(config)
1132
+ self.config = config
1133
+
1134
+ dims = config.dims
1135
+ if isinstance(dims, int):
1136
+ dims = [int(dims * 2**i_layer) for i_layer in range(self.num_layers)]
1137
+
1138
+ self.dims = dims
1139
+ self.patch_embeddings = VMambaPatchEmbeddings(patch_size=config.patch_size,
1140
+ embed_dim=dims[0])
1141
+
1142
+ self.num_layers = len(config.depths)
1143
+ dpr = [x.item() for x in torch.linspace(0, config.drop_path_rate, sum(config.depths))]
1144
+ self.num_features = dims[-1]
1145
+
1146
+ self.layers = nn.ModuleList()
1147
+ for i in range(self.num_layers):
1148
+ layer = VMambaLayer(
1149
+ input_dim=self.dims[i],
1150
+ depth=config.depths[i],
1151
+ drop_path=dpr[sum(config.depths[:i]):sum(config.depths[:i+1])],
1152
+ downsample=VMambaDowsample(self.dims[i], self.dims[i+1]) if i < self.num_layers - 1 else nn.Identity(),
1153
+ use_checkpoint=config.use_checkpoint,
1154
+ )
1155
+ self.layers.append(layer)
1156
+
1157
+ self.norm = VMambaLayerNorm2d(self.num_features)
1158
+ self.avgpool = nn.AdaptiveAvgPool2d(1)
1159
+
1160
+ def get_input_embeddings(self) -> VMambaPatchEmbeddings:
1161
+ return self.patch_embeddings
1162
+
1163
+ def forward(self, input_values: torch.Tensor):
1164
+ x = self.patch_embeddings(input_values)
1165
+ for layer in self.layers:
1166
+ x = layer(x)
1167
+ x = self.norm(x)
1168
+ x = self.avgpool(x).flatten(1)
1169
+ return x
1170
+
1171
+
1172
+ class VMambaForImageClassification(VMambaPreTrainedModel):
1173
+ def __init__(self, config):
1174
+ super().__init__(config)
1175
+
1176
+ self.num_classes = config.num_classes
1177
+ self.vmamba = VMambaModel(config)
1178
+ self.head = nn.Linear(self.vmamba.num_features, self.num_classes) if self.num_classes > 0 else nn.Identity()
1179
+
1180
+ # Initialize weights and apply final processing
1181
+ self.post_init()
1182
+
1183
+ def forward(
1184
+ self,
1185
+ pixel_values: Optional[torch.Tensor] = None,
1186
+ labels: Optional[torch.Tensor] = None,
1187
+ return_dict: Optional[bool] = None,
1188
+ ):
1189
+
1190
+ outputs = self.vmamba(
1191
+ pixel_values,
1192
+ )
1193
+
1194
+ logits = self.head(outputs)
1195
+
1196
+ loss = None
1197
+ if labels is not None:
1198
+ labels = labels.to(logits.device)
1199
+ if self.config.loss_type == "ce":
1200
+ loss_fct = CrossEntropyLoss()
1201
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1202
+ elif self.config.problem_type == "bce":
1203
+ loss_fct = BCEWithLogitsLoss()
1204
+ loss = loss_fct(logits, labels)
1205
+
1206
+ if return_dict:
1207
+ output = (logits,) + (outputs,)
1208
+ return ((loss,) + output) if loss is not None else output
1209
+
1210
+ return ImageClassifierOutput(
1211
+ loss=loss,
1212
+ logits=logits,
1213
+ hidden_states=outputs,
1214
+ )
1215
+
1216
+ __all__ = [
1217
+ "VMambaModel",
1218
+ "VMambaPreTrainedModel",
1219
+ "VMambaForImageClassification",
1220
+ ]