Papers
arxiv:2510.03574

Efficient Test-Time Scaling for Small Vision-Language Models

Published on Oct 3
ยท Submitted by onurcan on Oct 6
Authors:
,

Abstract

Two novel test-time scaling strategies, Test-Time Augmentation and Test-Time Adaptation, improve small vision-language models' performance without compromising computational efficiency.

AI-generated summary

Small Vision-Language Models (VLMs) provide a computationally efficient alternative to larger models, at the cost of weaker generalization abilities and downstream task performance. These shortcomings could be addressed by test-time scaling techniques, but existing methods are typically computationally demanding, contradicting the resource-efficient design goals of small models. To address these limitations, we propose two novel and efficient test-time scaling strategies that leverage the model-internal features rather than external supervision: (i) Test-Time Augmentation (TTAug), which generates multiple augmented inputs and aggregates outputs at the token level without parameter updates, and (ii) Test-Time Adaptation (TTAdapt), which adapts model parameters during inference using consensus-based pseudolabels from TTAug. Through extensive experiments across nine benchmarks, we demonstrate consistent performance improvements while maintaining computational efficiency suitable for resource-constrained environments. The generality of our approach is demonstrated both within models at different scales and across different VLMs without additional tuning.

Community

Paper author Paper submitter

We propose two efficient and effective methods improving multimodal small language models at test-time: TTAug (input augmentation + token-level aggregation) and TTAdapt (parameter adaptation via pseudolabels).

๐ŸŒ Project Page: https://monurcan.github.io/efficient_test_time_scaling
๐Ÿ’ป Code: https://github.com/monurcan/efficient_test_time_scaling

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.03574 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.03574 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.03574 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.