-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 93 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 102 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 104
Collections
Discover the best community collections!
Collections including paper arxiv:2511.06221
-
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 133 -
Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey
Paper • 2511.07448 • Published • 3 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 109
-
HuggingFaceTB/SmolLM3-3B
Text Generation • 3B • Updated • 61.3k • • 892 -
HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 208k • 2.64k -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 133 -
p-e-w/Llama-3.1-8B-Instruct-heretic
Text Generation • 8B • Updated • 170 • 9
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 5 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 133 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 144 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 104
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 62 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 52 -
Performance Trade-offs of Optimizing Small Language Models for E-Commerce
Paper • 2510.21970 • Published • 3
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 36 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 25 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 28
-
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper • 2511.16334 • Published • 93 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 102 -
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Paper • 2509.04475 • Published • 3 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 104
-
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 133 -
Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey
Paper • 2511.07448 • Published • 3 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 109
-
HuggingFaceTB/SmolLM3-3B
Text Generation • 3B • Updated • 61.3k • • 892 -
HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 208k • 2.64k -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 133 -
p-e-w/Llama-3.1-8B-Instruct-heretic
Text Generation • 8B • Updated • 170 • 9
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 5 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 133 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 144 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 104
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 62 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 52 -
Performance Trade-offs of Optimizing Small Language Models for E-Commerce
Paper • 2510.21970 • Published • 3
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 36 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 25 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 28