Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.13161

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 108
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 102

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 11 days ago • 87

Reading-Paper-List

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published 12 days ago • 67
DataDecide: How to Predict Best Pretraining Data with Small Experiments

Paper • 2504.11393 • Published 14 days ago • 17
Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published 15 days ago • 13
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 11 days ago • 87

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 11 days ago • 87

stuff i never have time to read

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 11 days ago • 87
Hebbian Learning based Orthogonal Projection for Continual Learning of Spiking Neural Networks

Paper • 2402.11984 • Published Feb 19, 2024
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling

Paper • 2503.06121 • Published Mar 8 • 5
Timer: Transformers for Time Series Analysis at Scale

Paper • 2402.02368 • Published Feb 4, 2024

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7, 2024 • 30
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Paper • 2411.05005 • Published Nov 7, 2024 • 13
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models

Paper • 2411.04075 • Published Nov 6, 2024 • 17
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19

about 23 hours ago

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13, 2024 • 22
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 18
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20, 2024 • 15
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 32

google/gemma-7b

Text Generation • Updated Jun 27, 2024 • 55.6k • 3.16k
Running

291

291

JARVIS

🔥

Voice Chat with JARVIS
Build error

98

98

DataMeasurementsTool

🤗
Scaling Laws of Synthetic Data for Language Models

Paper • 2503.19551 • Published Mar 25 • 1

interesting stuff

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 39
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 78
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 85
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs