Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
Abstract
A novel Hyperdimensional Probe method decodes information from LLM vector spaces using Vector Symbolic Architectures, providing interpretable insights into model states and failures.
Despite their capabilities, Large Language Models (LLMs) remain opaque with limited understanding of their internal representations. Current interpretability methods, such as direct logit attribution (DLA) and sparse autoencoders (SAEs), provide restricted insight due to limitations such as the model's output vocabulary or unclear feature names. This work introduces Hyperdimensional Probe, a novel paradigm for decoding information from the LLM vector space. It combines ideas from symbolic representations and neural probing to project the model's residual stream into interpretable concepts via Vector Symbolic Architectures (VSAs). This probe combines the strengths of SAEs and conventional probes while overcoming their key limitations. We validate our decoding paradigm with controlled input-completion tasks, probing the model's final state before next-token prediction on inputs spanning syntactic pattern recognition, key-value associations, and abstract inference. We further assess it in a question-answering setting, examining the state of the model both before and after text generation. Our experiments show that our probe reliably extracts meaningful concepts across varied LLMs, embedding sizes, and input domains, also helping identify LLM failures. Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.
Community
This work combines symbolic representations and neural probing to introduce Hyperdimensional Probe, a new paradigm for decoding LLM vector space into human-interpretable features, consistently extracting meaningful concepts across language models and input types.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AdaptiveK Sparse Autoencoders: Dynamic Sparsity Allocation for Interpretable LLM Representations (2025)
- How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding (2025)
- Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement (2025)
- Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs (2025)
- Compressing Large Language Models with PCA Without Performance Loss (2025)
- Prompt the Unseen: Evaluating Visual-Language Alignment Beyond Supervision (2025)
- Beyond Transcription: Mechanistic Interpretability in ASR (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper