Rethinking Thinking Tokens: LLMs as Improvement Operators
Abstract
Parallel-Distill-Refine (PDR) and Sequential Refinement (SR) improve the performance of LLMs by optimizing accuracy and latency through metacognitive strategies, with PDR showing significant gains on math tasks.
Reasoning training incentivizes LLMs to produce long chains of thought (long CoT), which among other things, allows them to explore solution strategies with self-checking. This results in higher accuracy, but inflates context length, token/compute cost, and answer latency. We ask: Can current models leverage their metacognition to provide other combinations on this Pareto frontier, e.g., better accuracy with lower context length and/or latency? Abstractly, we view the model as an improvement operator on its own "thoughts" with a continuum of possible strategies. We identify an interesting inference family Parallel-Distill-Refine (PDR), which performs the following: (i) generate diverse drafts in parallel; (ii) distill them into a bounded, textual workspace; and (iii) refine conditioned on this workspace, producing an output that seeds the next round. Importantly, context length (hence compute cost) is controllable via degree of parallelism, and is no longer conflated with the total number of generated tokens. We report PDR instantiations of current models that give better accuracy than long CoT while incurring lower latency. Setting degree of parallelism to 1 yields an interesting subcase, Sequential Refinement (SR) (iteratively improve a single candidate answer) which provides performance superior to long CoT. Success of such model orchestrations raises the question whether further training could shift the Pareto frontier. To this end, we train an 8B thinking model with Reinforcement Learning (RL) to make it consistent with PDR as the inference method. On math tasks with verifiable answers, iterative pipelines surpass single-pass baselines at matched sequential budgets, with PDR delivering the largest gains (e.g., +11% on AIME 2024 and +9% on AIME 2025).
Community
Interesting paper
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Train Long, Think Short: Curriculum Learning for Efficient Reasoning (2025)
- Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors (2025)
- Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models (2025)
- Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation (2025)
- BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens (2025)
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems (2025)
- Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper