HallOumi GRPO HallOumi training data prepared for a GRPO trainer. TEEN-D/grpo-oumi-anli-subset Viewer • Updated Apr 25 • 21.1k • 7 TEEN-D/grpo-oumi-c2d-d2c-subset Viewer • Updated Apr 24 • 14.4k • 9 TEEN-D/grpo-oumi-synthetic-claims Viewer • Updated Apr 24 • 19.2k • 8 TEEN-D/grpo-oumi-synthetic-document-claims Viewer • Updated Apr 24 • 8.4k • 14
Reinforcement Learning A collection of various reinforcement learning environments with corresponding solutions, covering tabular methods up to deep learning approaches like TEEN-D/squiral_maze Reinforcement Learning • Updated Mar 30 TEEN-D/Tabular_RL_For_Multi_Env Reinforcement Learning • Updated Mar 30 TEEN-D/RxRovers_Roaming_for_Rapid_Relief Reinforcement Learning • Updated Mar 30
HallOumi GRPO HallOumi training data prepared for a GRPO trainer. TEEN-D/grpo-oumi-anli-subset Viewer • Updated Apr 25 • 21.1k • 7 TEEN-D/grpo-oumi-c2d-d2c-subset Viewer • Updated Apr 24 • 14.4k • 9 TEEN-D/grpo-oumi-synthetic-claims Viewer • Updated Apr 24 • 19.2k • 8 TEEN-D/grpo-oumi-synthetic-document-claims Viewer • Updated Apr 24 • 8.4k • 14
Reinforcement Learning A collection of various reinforcement learning environments with corresponding solutions, covering tabular methods up to deep learning approaches like TEEN-D/squiral_maze Reinforcement Learning • Updated Mar 30 TEEN-D/Tabular_RL_For_Multi_Env Reinforcement Learning • Updated Mar 30 TEEN-D/RxRovers_Roaming_for_Rapid_Relief Reinforcement Learning • Updated Mar 30