Language - a neutrino12 Collection

neutrino12 's Collections

Datasets & Evals

Personal Interests

Agent

Vision

Language

updated 2 days ago

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 5.81k • 42
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 271
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 130
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 285
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 89
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 119
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 294
nablaNABLA: Neighborhood Adaptive Block-Level Attention

Paper • 2507.13546 • Published Jul 17 • 120
GR-3 Technical Report

Paper • 2507.15493 • Published Jul 21 • 45
MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20 • 46
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Paper • 2507.15758 • Published Jul 21 • 34
Complex Logical Instruction Generation

Paper • 2508.09125 • Published 24 days ago • 38
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published 30 days ago • 45
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published 24 days ago • 66
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published 25 days ago • 25
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Paper • 2508.07750 • Published 26 days ago • 19
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

Paper • 2508.05988 • Published 29 days ago • 19
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published 27 days ago • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Paper • 2508.09726 • Published 24 days ago • 13
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 234
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published 29 days ago • 170
R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published 30 days ago • 123
Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Paper • 2508.09983 • Published 23 days ago • 67
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

Paper • 2508.02120 • Published Aug 4 • 19
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Paper • 2508.02150 • Published Aug 4 • 36
Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4 • 16
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Paper • 2508.04017 • Published Aug 6 • 11
Deep Think with Confidence

Paper • 2508.15260 • Published 16 days ago • 81
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

Paper • 2508.11116 • Published 22 days ago • 22
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published 8 days ago • 14
Model-Task Alignment Drives Distinct RL Outcomes

Paper • 2508.21188 • Published 8 days ago • 8
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published 9 days ago • 85
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published 11 days ago • 35
Hermes 4 Technical Report

Paper • 2508.18255 • Published 11 days ago • 34
StepWiser: Stepwise Generative Judges for Wiser Reasoning

Paper • 2508.19229 • Published 10 days ago • 19
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published 11 days ago • 14
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 11 days ago • 5
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published 15 days ago • 3
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills

Paper • 2508.19500 • Published 10 days ago • 1
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Paper • 2508.15868 • Published 16 days ago • 3
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Paper • 2508.10390 • Published 23 days ago • 1
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published 6 days ago • 72
DCPO: Dynamic Clipping Policy Optimization

Paper • 2509.02333 • Published 4 days ago • 18