papers-to-read - a esthor Collection

esthor 's Collections

TTS

papers-to-read

updated 6 days ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 272
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 236
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 254
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 153
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 131
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 119
WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 118
4KAgent: Agentic Any Image to 4K Super-Resolution

Paper • 2507.07105 • Published Jul 9 • 101
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30 • 98
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 236
DINOv3

Paper • 2508.10104 • Published Aug 13 • 261
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 181
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6 • 157
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 176
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 149
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published 16 days ago • 617
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25 • 340
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published 24 days ago • 206
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published 16 days ago • 166
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Paper • 2508.21148 • Published 29 days ago • 140
Why Language Models Hallucinate

Paper • 2509.04664 • Published 22 days ago • 178
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published 17 days ago • 96
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Paper • 2509.13312 • Published 10 days ago • 100
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published 10 days ago • 103
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published 18 days ago • 77
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published 22 days ago • 73
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published 25 days ago • 69
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML

Paper • 2509.06806 • Published 18 days ago • 61
FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published 8 days ago • 100