Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning Paper • 2412.11120 • Published Dec 15, 2024
LLM-Empowered State Representation for Reinforcement Learning Paper • 2407.13237 • Published Jul 18, 2024
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning Paper • 2309.12696 • Published Sep 22, 2023
Model Predictive Task Sampling for Efficient and Robust Adaptation Paper • 2501.11039 • Published Jan 19
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments Paper • 2504.19139 • Published Apr 27
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? Paper • 2507.04632 • Published Jul 7 • 1