Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2509.06951

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published 4 days ago • 26

about 4 hours ago

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published about 1 month ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 29 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 24 days ago • 48

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 37
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6 • 43
A Survey on Vision-Language-Action Models for Autonomous Driving

Paper • 2506.24044 • Published Jun 30 • 14
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Paper • 2507.10548 • Published Jul 14 • 36

Vision Language Models for Robotics

about 16 hours ago

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 131
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 10
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

about 22 hours ago

Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16 • 43
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17 • 30
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks

Paper • 2503.11514 • Published Mar 13 • 18
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Paper • 2502.19328 • Published Feb 26 • 22

paper seminar_251001

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published 4 days ago • 37
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published 4 days ago • 26
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published 4 days ago • 27
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts

Paper • 2509.06155 • Published 5 days ago • 13

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11 • 27
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published 23 days ago • 35
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published 13 days ago • 17
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published 4 days ago • 26

Vision Language Action models

about 7 hours ago

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 37
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22 • 34
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 42
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published 16 days ago • 29

about 22 hours ago

SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Paper • 2506.04180 • Published Jun 4 • 33
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Paper • 2506.10540 • Published Jun 12 • 37
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Paper • 2506.10974 • Published Jun 12 • 18
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Paper • 2507.15245 • Published Jul 21 • 11

facebook/w2v-bert-2.0

Feature Extraction • 0.6B • Updated Jan 25, 2024 • 207k • 180
facebook/metaclip-h14-fullcc2.5b

Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 43.1k • 44
openai/clip-vit-large-patch14

Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 8.36M • 1.86k
Salesforce/blip-image-captioning-large

Image-to-Text • 0.5B • Updated Feb 3 • 1.22M • 1.4k

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published 4 days ago • 26

paper seminar_251001

Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published 4 days ago • 37
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published 4 days ago • 26
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published 4 days ago • 27
UniVerse-1: Unified Audio-Video Generation via Stitching of Experts

Paper • 2509.06155 • Published 5 days ago • 13

about 4 hours ago

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published about 1 month ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 29 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 24 days ago • 48

Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11 • 27
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published 23 days ago • 35
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Paper • 2509.00428 • Published 13 days ago • 17
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published 4 days ago • 26

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 37
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6 • 43
A Survey on Vision-Language-Action Models for Autonomous Driving

Paper • 2506.24044 • Published Jun 30 • 14
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Paper • 2507.10548 • Published Jul 14 • 36

Vision Language Action models

about 7 hours ago

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 37
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22 • 34
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 42
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published 16 days ago • 29

Vision Language Models for Robotics

about 16 hours ago

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 131
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 10
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

about 22 hours ago

SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Paper • 2506.04180 • Published Jun 4 • 33
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Paper • 2506.10540 • Published Jun 12 • 37
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Paper • 2506.10974 • Published Jun 12 • 18
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Paper • 2507.15245 • Published Jul 21 • 11

about 22 hours ago

Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16 • 43
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17 • 30
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks

Paper • 2503.11514 • Published Mar 13 • 18
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Paper • 2502.19328 • Published Feb 26 • 22

facebook/w2v-bert-2.0

Feature Extraction • 0.6B • Updated Jan 25, 2024 • 207k • 180
facebook/metaclip-h14-fullcc2.5b

Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 43.1k • 44
openai/clip-vit-large-patch14

Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 8.36M • 1.86k
Salesforce/blip-image-captioning-large

Image-to-Text • 0.5B • Updated Feb 3 • 1.22M • 1.4k

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs