Submitted by fangwu97 118 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Stanford NLP 3
Submitted by pbicho 62 SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights HUAWEI Computing Systems Lab 346 5
Submitted by taesiri 59 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators · 11 authors 34 3
Submitted by ziniuli 41 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation ByteDance Seed 2
Submitted by waleko 30 PIPer: On-Device Environment Setup via Online Reinforcement Learning JetBrains Research 6 2
Submitted by taesiri 26 Code2Video: A Code-centric Paradigm for Educational Video Generation Show Lab 374 4
Submitted by XinXuNLPer 15 BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses · 6 authors 3 2
Submitted by wenhu 15 EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing TIGER-Lab 3
Submitted by tianyue818 14 Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution OPPO-Personal-AI-Lab 5 2
Submitted by yuntian-deng 13 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls · 8 authors 7 3
Submitted by Benyucong 11 QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL · 8 authors 1 2
Submitted by xx18 8 On Predictability of Reinforcement Learning Dynamics for Large Language Models · 9 authors 13 2
Submitted by gaotang 8 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum · 5 authors 7 2
Submitted by taesiri 6 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness · 5 authors 2
Submitted by huu-ontocord 5 MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Ontocord.AI 3
Submitted by soujanyaporia 5 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned Deep Cognition and Language Research (DeCLaRe) Lab 4 2
Submitted by ejhwang 5 Infusing Theory of Mind into Socially Intelligent LLM Agents University of British Columbia 2 2
Submitted by zptu 3 BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs Tencent 2
Submitted by tianchez 3 VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs Om AI Lab 2
Submitted by hao-li 3 An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications · 6 authors 2
Submitted by mboss 2 ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction Stability AI 8 2
Submitted by RubinSun 2 CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs · 10 authors 3 2
Submitted by Minjong 2 In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning · 7 authors 1
Submitted by BestWishYsh 2 BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration · 9 authors 2
Submitted by yuemithucsd 2 TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks Massachusetts Institute of Technology 2
Submitted by nielsr 2 Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models · 9 authors 2
Submitted by saturnMars 2 Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures · 5 authors 2 2