Go with Your Gut: Scaling Confidence for Autoregressive Image Generation
Abstract
ScalingAR enhances next-token prediction in autoregressive image generation by using token entropy and adaptive scaling, improving model performance and efficiency.
Test-time scaling (TTS) has demonstrated remarkable success in enhancing large language models, yet its application to next-token prediction (NTP) autoregressive (AR) image generation remains largely uncharted. Existing TTS approaches for visual AR (VAR), which rely on frequent partial decoding and external reward models, are ill-suited for NTP-based image generation due to the inherent incompleteness of intermediate decoding results. To bridge this gap, we introduce ScalingAR, the first TTS framework specifically designed for NTP-based AR image generation that eliminates the need for early decoding or auxiliary rewards. ScalingAR leverages token entropy as a novel signal in visual token generation and operates at two complementary scaling levels: (i) Profile Level, which streams a calibrated confidence state by fusing intrinsic and conditional signals; and (ii) Policy Level, which utilizes this state to adaptively terminate low-confidence trajectories and dynamically schedule guidance for phase-appropriate conditioning strength. Experiments on both general and compositional benchmarks show that ScalingAR (1) improves base models by 12.5% on GenEval and 15.2% on TIIF-Bench, (2) efficiently reduces visual token consumption by 62.0% while outperforming baselines, and (3) successfully enhances robustness, mitigating performance drops by 26.0% in challenging scenarios.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model (2025)
- NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale (2025)
- STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation (2025)
- Growing Visual Generative Capacity for Pre-Trained MLLMs (2025)
- Group Critical-token Policy Optimization for Autoregressive Image Generation (2025)
- AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning (2025)
- AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper