arxiv:2509.24285

SCI-Verifier: Scientific Verifier with Thinking

Published on Sep 29

· Submitted by

shenghe zheng on Sep 30

Upvote

Authors:

Shenghe Zheng ,

Abstract

A framework combining SCI-VerifyBench and SCI-Verifier addresses challenges in verifying LLM-generated scientific answers through cross-disciplinary benchmarks and reasoning-augmented verification.

AI-generated summary

As large language models (LLMs) are increasingly applied to scientific reasoning, the complexity of answer formats and the diversity of equivalent expressions make answer verification a critical yet challenging task. Existing verification studies in scientific domains suffer from two major limitations: (a) the absence of systematic evaluation standards and insufficient disciplinary coverage, which hinders their comprehensive assessment; and (b) heavy reliance on cumbersome rule design or prompt engineering, which reduces their effectiveness in complex reasoning scenarios or limits their cross-disciplinary generalization. To address these challenges, we propose solutions at both the data and model levels. On the data side, we construct SCI-VerifyBench, a cross-disciplinary benchmark covering mathematics, physics, biology, chemistry, and general scientific QA. The benchmark is built from real LLM responses and enhanced with domain-specific equivalence transformations that generate challenging and realistic data. Model-based and expert annotations ensure both quality and diversity, enabling rigorous evaluation of verification ability. On the model side, we emphasize the importance of reasoning for verification and introduce SCI-Verifier, a unified reasoning-augmented verifier for scientific domains. Through post-training, SCI-Verifier demonstrates strong logical reasoning and equivalence judgment capabilities while maintaining concise and stable outputs. Together, SCI-VerifyBench and SCI-Verifier provide a principled framework for scientific verification, offering both systematic evaluation and practical pathways to enhance the reliability and applicability of LLMs in scientific domains.

View arXiv page View PDF GitHub 2 Add to collection

Community

desimfj

Paper author Paper submitter 8 days ago

A framework combining SCI-VerifyBench and SCI-Verifier addresses challenges in verifying LLM-generated scientific answers through cross-disciplinary benchmarks and reasoning-augmented verification.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.24285 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.24285 in a Space README.md to link it from this page.