|
--- |
|
library_name: transformers |
|
license: mit |
|
pipeline_tag: depth-estimation |
|
arxiv: <2502.19204> |
|
tags: |
|
- distill-any-depth |
|
- vision |
|
--- |
|
# Distill Any Depth Small - Transformers Version |
|
|
|
## Introduction |
|
We present Distill-Any-Depth, a new SOTA monocular depth estimation model trained with our proposed knowledge distillation algorithms. It was introduced in the paper [Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator](http://arxiv.org/abs/2502.19204). |
|
|
|
This model checkpoint is compatible with the transformers library. |
|
|
|
[Online demo](https://huggingface.co/spaces/xingyang1/Distill-Any-Depth). |
|
|
|
### How to use |
|
|
|
Here is how to use this model to perform zero-shot depth estimation: |
|
|
|
```python |
|
from transformers import pipeline |
|
from PIL import Image |
|
import requests |
|
# load pipe |
|
pipe = pipeline(task="depth-estimation", model="xingyang1/Distill-Any-Depth-Small-hf") |
|
# load image |
|
url = 'http://images.cocodataset.org/val2017/000000039769.jpg' |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
# inference |
|
depth = pipe(image)["depth"] |
|
``` |
|
|
|
Alternatively, you can use the model and processor classes: |
|
|
|
```python |
|
from transformers import AutoImageProcessor, AutoModelForDepthEstimation |
|
import torch |
|
import numpy as np |
|
from PIL import Image |
|
import requests |
|
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
image_processor = AutoImageProcessor.from_pretrained("xingyang1/Distill-Any-Depth-Small-hf") |
|
model = AutoModelForDepthEstimation.from_pretrained("xingyang1/Distill-Any-Depth-Small-hf") |
|
|
|
# prepare image for the model |
|
inputs = image_processor(images=image, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# interpolate to original size and visualize the prediction |
|
post_processed_output = image_processor.post_process_depth_estimation( |
|
outputs, |
|
target_sizes=[(image.height, image.width)], |
|
) |
|
|
|
predicted_depth = post_processed_output[0]["predicted_depth"] |
|
depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min()) |
|
depth = depth.detach().cpu().numpy() * 255 |
|
depth = Image.fromarray(depth.astype("uint8")) |
|
) |
|
``` |
|
|
|
|
|
If you find this project useful, please consider citing: |
|
|
|
```bibtex |
|
@article{he2025distill, |
|
title = {Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator}, |
|
author = {Xiankang He and Dongyan Guo and Hongji Li and Ruibo Li and Ying Cui and Chi Zhang}, |
|
year = {2025}, |
|
journal = {arXiv preprint arXiv: 2502.19204} |
|
} |
|
``` |
|
|
|
## Model Card Author |
|
[Parteek Kamboj](https://huggingface.co/keetrap) |