Arnas Uselis's picture

3 20 2

Arnas Uselis

Gigglingface

·

AI & ML interests

None yet

Organizations

None yet

upvoted 7 papers 2 months ago

Does Data Scaling Lead to Visual Compositional Generalization?

Paper • 2507.07102 • Published Jul 9 • 1

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Paper • 2507.06165 • Published Jul 8 • 56

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Paper • 2507.05920 • Published Jul 8 • 11

Is Diversity All You Need for Scalable Robotic Manipulation?

Paper • 2507.06219 • Published Jul 8 • 20

A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 90

On the rankability of visual embeddings

Paper • 2507.03683 • Published Jul 4 • 15

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Paper • 2506.03828 • Published Jun 4 • 13

upvoted 10 papers 4 months ago

ImgEdit: A Unified Image Editing Dataset and Benchmark

Paper • 2505.20275 • Published May 26 • 18

MLLMs are Deeply Affected by Modality Bias

Paper • 2505.18657 • Published May 24 • 5

Exploring the Latent Capacity of LLMs for One-Step Text Generation

Paper • 2505.21189 • Published May 27 • 61

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Paper • 2505.18445 • Published May 24 • 64

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Paper • 2505.21327 • Published May 27 • 82

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27 • 107

G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning

Paper • 2505.13426 • Published May 19 • 13

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Paper • 2505.17012 • Published May 22 • 12

GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21 • 12

Diffusion Classifiers Understand Compositionality, but Conditions Apply

Paper • 2505.17955 • Published May 23 • 22

upvoted an article 4 months ago

Article

Vision Language Models (Better, Faster, Stronger)

By

and 4 others •

May 12

• 524

upvoted a paper 5 months ago

Intermediate Layer Classifiers for OOD generalization

Paper • 2504.05461 • Published Apr 7 • 1

upvoted a paper 7 months ago

CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally

Paper • 2502.03566 • Published Feb 5 • 2