Blog, Articles, and discussions

Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More

By April 8, 2025 guest • 19

Community Articles

view all

CU-1 for Autonomous UI Agent Systems: An Open Alternative to Proprietary Solutions

•

3 days ago

• 12

Code a simple RAG from scratch

•

Oct 29, 2024

• 212

When Does Reasoning Matter? Unpacking the Contribution of Reasoning to LLM Performance

and 1 other •

4 days ago

• 10

How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons

•

4 days ago

• 9

Preserving Agency: Why AI Safety Needs Community, Not Corporate Control

•

5 days ago

• 9

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 685

Small Language Models (SLM): A Comprehensive Overview

•

Feb 22

• 78

Gaia2 Leaderboard Update: New Models and New Observations

and 3 others •

1 day ago

• 6

From GRPO to DAPO and GSPO: What, Why, and How

•

Aug 9

• 35

RexBERT: Encoders for a brave new world of E-Commerce

and 1 other •

13 days ago

• 46

Nemotron-Personas-Japan: Synthesized Data for Sovereign AI

and 6 others •

10 days ago

• 25

Nemotron-Personas-Japan: ソブリン AI のための合成データセット

and 6 others •

8 days ago

• 7

Cactus: High-Performance AI Inference on Any Smartphone

•

about 10 hours ago

• 5

Introduction to State Space Models (SSM)

•

Jul 19, 2024

• 176

arXiv实用技巧，如何让你的paper关注度变高？

•

Jul 8, 2024

• 14

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

•

Feb 11

• 72

How to Train an Antibody Developability Model

and 1 other •

16 days ago

• 14

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

By May 24, 2024 guest • 22

Introducing the Open Arabic LLM Leaderboard

By May 14, 2024 guest • 100

Introducing the Open Leaderboard for Hebrew LLMs!

By May 5, 2024 guest • 51

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

By May 3, 2024 guest • 14

Improving Prompt Consistency with Structured Generations

By April 30, 2024 guest • 66

Introducing the Open Chain of Thought Leaderboard

By April 23, 2024 guest • 36

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

By April 19, 2024 guest • 182

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

By April 16, 2024 guest • 15

Introducing the Chatbot Guardrails Arena

By March 21, 2024 guest • 6

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

By March 5, 2024 guest • 4

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

By February 27, 2024 guest • 71

Introducing the Red-Teaming Resistance Leaderboard

By February 23, 2024 guest • 13

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

By February 20, 2024 guest • 4

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

By February 2, 2024 guest • 4

Community Articles

There is no such thing as a tokenizer-free lunch

•

9 days ago

• 71

ModernVBERT: Towards Smaller Visual Document Retrievers

and 4 others •

about 20 hours ago

• 20

Model Quality: Hugging Face Is All You Need

•

8 days ago

• 20

CU-1 for Autonomous UI Agent Systems: An Open Alternative to Proprietary Solutions

•

3 days ago

• 12

Code a simple RAG from scratch

•

Oct 29, 2024

• 212

When Does Reasoning Matter? Unpacking the Contribution of Reasoning to LLM Performance

and 1 other •

4 days ago

• 10

How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons

•

4 days ago

• 9

Preserving Agency: Why AI Safety Needs Community, Not Corporate Control

•

5 days ago

• 9

Uncensor any LLM with abliteration

•

Jun 13, 2024

• 685

Small Language Models (SLM): A Comprehensive Overview

•

Feb 22

• 78

Gaia2 Leaderboard Update: New Models and New Observations

and 3 others •

1 day ago

• 6

From GRPO to DAPO and GSPO: What, Why, and How

•

Aug 9

• 35

RexBERT: Encoders for a brave new world of E-Commerce

and 1 other •

13 days ago

• 46

Nemotron-Personas-Japan: Synthesized Data for Sovereign AI

and 6 others •

10 days ago

• 25

Nemotron-Personas-Japan: ソブリン AI のための合成データセット

and 6 others •

8 days ago

• 7

Cactus: High-Performance AI Inference on Any Smartphone

•

about 10 hours ago

• 5

Introduction to State Space Models (SSM)

•

Jul 19, 2024

• 176

arXiv实用技巧，如何让你的paper关注度变高？

•

Jul 8, 2024

• 14

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

•

Feb 11

• 72

How to Train an Antibody Developability Model

and 1 other •

16 days ago

• 14

View all