Papers
arxiv:2506.12176

Fidelity Isn't Accuracy: When Linearly Decodable Functions Fail to Match the Ground Truth

Published on Jun 13
Authors:

Abstract

The linearity score quantifies the extent to which a neural network's behavior can be explained by a linear model, providing insights into model interpretability.

AI-generated summary

Neural networks excel as function approximators, but their complexity often obscures the types of functions they learn, making it difficult to explain their behavior. To address this, the linearity score lambda(f) is introduced, a simple and interpretable diagnostic that quantifies how well a regression network's output can be mimicked by a linear model. Defined as the R^2 value between the network's predictions and those of a trained linear surrogate, lambda(f) measures linear decodability: the extent to which the network's behavior aligns with a structurally simple model. This framework is evaluated on both synthetic and real-world datasets, using dataset-specific networks and surrogates. High lambda(f) scores reliably indicate alignment with the network's outputs; however, they do not guarantee accuracy with respect to the ground truth. These results highlight the risk of using surrogate fidelity as a proxy for model understanding, especially in high-stakes regression tasks.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.12176 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.12176 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.12176 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.