sparse-generative-ai

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

AppleSwing updated a Space 3 days ago

sparse-generative-ai/open-moe-llm-leaderboard

AppleSwing new activity 3 days ago

sparse-generative-ai/results:update

AppleSwing updated a dataset 3 days ago

sparse-generative-ai/results

View all activity

AppleSwing

updated a Space 3 days ago

OPEN-MOE-LLM-LEADERBOARD

🔥

View and submit large language model evaluations

AppleSwing

in sparse-generative-ai/results 3 days ago

update

#4 opened 3 days ago by

AppleSwing

updated a dataset 3 days ago

sparse-generative-ai/results

Updated 3 days ago • 191

AppleSwing

in sparse-generative-ai/results 3 days ago

update

#3 opened 3 days ago by

AppleSwing

update

#2 opened 3 days ago by

AppleSwing

in sparse-generative-ai/open-moe-llm-leaderboard 5 days ago

update

#34 opened 5 days ago by

AppleSwing

update2

#35 opened 5 days ago by

AppleSwing

in sparse-generative-ai/results 5 days ago

update

#1 opened 5 days ago by

AppleSwing

zhiminy

posted an update about 1 month ago

Post

1933

Hey everyone,

We're thrilled to introduce our latest project: a hand-curated list of the best production-level generative-ai open-source toolkits! 🚀

Check it out here: zhiminy/awesome-production-genai-search

pingnieuk

authored a paper 2 months ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Paper • 2505.20139 • Published May 26 • 18

zhiminy

posted an update 3 months ago

Post

1908

# 🚀 SE Arena: Evaluating Foundation Models for Software Engineering

SE Arena is the first open-source platform for evaluating foundation models in real-world software engineering workflows.

## What makes it unique?

- RepoChat: Automatically injects repository context (issues, commits, PRs) into conversations for more realistic evaluations
- Multi-round interactions: Tests models through iterative workflows, not just single prompts
- Novel metrics: Includes a "model consistency score" that measures model determinism through self-play matches and "conversation efficiency index" that evaluates model performance while accounting for the number of interaction rounds required to reach conclusions.

Try it now: SE-Arena/Software-Engineering-Arena

## Why it matters

Traditional evaluation frameworks don't capture how developers actually use models in their daily work. SE Arena creates a testing environment that mirrors real engineering workflows, helping you choose the right model for your specific software development needs.

From debugging to requirement refinement, see which models truly excel at software engineering tasks!

pingnieuk

authored a paper 4 months ago

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

Paper • 2504.00824 • Published Apr 1 • 44

pingnieuk

authored a paper 6 months ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3 • 29

zhiminy

posted an update 8 months ago

Post

2273

Hey everyone,

We're thrilled to introduce our latest project: a hand-curated list of the best production-level machine-learning open-source toolkits! 🚀

Check it out here: zhiminy/Awesome-Production-Machine-Learning-Search

chivier

updated a Space 12 months ago

OPEN-MOE-LLM-LEADERBOARD

🔥

View and submit large language model evaluations

zhiminy

posted an update about 1 year ago

Post

814

Hey everyone,

We're thrilled to introduce our latest project: a hand-curated list of the best-curated lists related to artificial intelligence! 🚀

Check it out here: https://github.com/zhimin-z/awesome-awesome-artificial-intelligence

zhiminy

posted an update about 1 year ago

Post

2005

Hey everyone!

Our team just dropped something cool! 🎉 We've published a new paper on arxiv diving into the foundation model leaderboards across different platforms. We've analyzed the content, operational workflows, and common issues of these leaderboards. From this, we came up with two new concepts: Leaderboard Operations (LBOps) and leaderboard smells.

We also put together an awesome list with nearly 300 of the latest leaderboards, development tools, and publishing organizations. You can check it out here: https://github.com/SAILResearch/awesome-foundation-model-leaderboards

If you find it useful or interesting, give us a follow or drop a comment. We'd love to hear your thoughts and get your support! ✨

Link to the paper: https://arxiv.org/abs/2407.04065

lausannel

updated a dataset over 1 year ago

sparse-generative-ai/requests

Preview • Updated Apr 27, 2024 • 16

AI & ML interests

Recent Activity

Team members 8