--- library_name: transformers license: mit pipeline_tag: depth-estimation arxiv: <2502.19204> tags: - distill-any-depth - vision --- # Distill Any Depth Small - Transformers Version ## Introduction We present Distill-Any-Depth, a new SOTA monocular depth estimation model trained with our proposed knowledge distillation algorithms. It was introduced in the paper [Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator](http://arxiv.org/abs/2502.19204). This model checkpoint is compatible with the transformers library. [Online demo](https://huggingface.co/spaces/xingyang1/Distill-Any-Depth). ### How to use Here is how to use this model to perform zero-shot depth estimation: ```python from transformers import pipeline from PIL import Image import requests # load pipe pipe = pipeline(task="depth-estimation", model="xingyang1/Distill-Any-Depth-Small-hf") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference depth = pipe(image)["depth"] ``` Alternatively, you can use the model and processor classes: ```python from transformers import AutoImageProcessor, AutoModelForDepthEstimation import torch import numpy as np from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) image_processor = AutoImageProcessor.from_pretrained("xingyang1/Distill-Any-Depth-Small-hf") model = AutoModelForDepthEstimation.from_pretrained("xingyang1/Distill-Any-Depth-Small-hf") # prepare image for the model inputs = image_processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) # interpolate to original size and visualize the prediction post_processed_output = image_processor.post_process_depth_estimation( outputs, target_sizes=[(image.height, image.width)], ) predicted_depth = post_processed_output[0]["predicted_depth"] depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min()) depth = depth.detach().cpu().numpy() * 255 depth = Image.fromarray(depth.astype("uint8")) ) ``` If you find this project useful, please consider citing: ```bibtex @article{he2025distill, title = {Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator}, author = {Xiankang He and Dongyan Guo and Hongji Li and Ruibo Li and Ying Cui and Chi Zhang}, year = {2025}, journal = {arXiv preprint arXiv: 2502.19204} } ``` ## Model Card Author [Parteek Kamboj](https://huggingface.co/keetrap)