HRM Sudoku Extreme
A Hierarchical Reasoning Model (HRM) trained to solve extreme difficulty Sudoku puzzles using hierarchical processing and adaptive computation.
Model Details
Model Description
This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving extreme difficulty Sudoku puzzles. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract planning and low-level (L) modules for detailed computation. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.
The model processes 9ร9 Sudoku grids (81 tokens) and predicts the correct digit for each cell through hierarchical reasoning cycles.
- Developed by: Sapient Inc.
- Model type: Hierarchical Reasoning Model (HRM)
- Language(s): Symbolic reasoning (digits 0-9)
- License: Apache 2.0
- Original checkpoint: sapientinc/HRM-checkpoint-sudoku-extreme
Model Sources
- Repository: transformers
- Paper: Hierarchical Reasoning Model
- Original Repository: HRM GitHub
Uses
Direct Use
This model is designed for solving extreme difficulty Sudoku puzzles. It can:
- Solve complex 9ร9 Sudoku grids that require advanced reasoning techniques
- Process partial grids and predict missing digits
- Demonstrate hierarchical reasoning strategies for constraint satisfaction problems
Downstream Use
The model can be used as:
- A component in puzzle-solving applications
- A baseline for research in hierarchical reasoning and adaptive computation
- An example of applying neural networks to combinatorial optimization problems
Recommendations
Users should be aware that:
- The model is specialized for Sudoku and should not be used for general reasoning tasks
- Input must be properly formatted as 9ร9 grids with digits 0-9 (0 for empty cells)
- Inference time may vary due to the adaptive computation mechanism
How to Get Started with the Model
import torch
from transformers import HrmForCausalLM
# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-sudoku-extreme")
model.eval()
# Prepare a Sudoku grid (9x9 = 81 tokens)
# 0 represents empty cells, 1-9 are the digits
sudoku_grid = torch.randint(0, 11, (1, 81)) # Example random grid
puzzle_ids = torch.zeros(1, dtype=torch.long)
# Run inference
with torch.no_grad():
outputs = model(input_ids=sudoku_grid, puzzle_identifiers=puzzle_ids)
# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted solution: {predictions}")
Training Details
Training Data
The model was trained on a dataset of extreme difficulty Sudoku puzzles. These puzzles require advanced solving techniques beyond basic constraint propagation.
Training Procedure
The model uses a hierarchical architecture with:
- High-level (H) module: 4 transformer layers for abstract planning
- Low-level (L) module: 4 transformer layers for detailed computation
- H-cycles: 2 high-level reasoning cycles
- L-cycles: 2 low-level computation cycles per H-cycle
- ACT mechanism: Q-learning based adaptive halting with max 16 steps
Training Hyperparameters
- Training regime: bfloat16 mixed precision
- Architecture: 4 H-layers, 4 L-layers, 8 attention heads
- Hidden size: 512
- Intermediate size: 1536
- Max position embeddings: 900
- Vocabulary size: 11 (digits 0-9 + padding)
Model Architecture
Technical Specifications
Component | Value |
---|---|
Total Parameters | 27,275,778 (27.3M) |
Model Size | 109.11 MB |
Vocabulary Size | 11 |
Hidden Size | 512 |
Intermediate Size | 1536 |
H-level Layers | 4 |
L-level Layers | 4 |
Attention Heads | 8 |
H-cycles | 2 |
L-cycles | 2 |
Max Halting Steps | 16 |
Position Encoding | RoPE (Rotary Position Embeddings) |
Activation | SwiGLU |
Model Architecture and Objective
The Hierarchical Reasoning Model (HRM) features:
Two-level Hierarchical Processing:
- H-level (High-level): Performs slow, abstract planning and strategy formulation
- L-level (Low-level): Executes fast, detailed computations
Adaptive Computation Time (ACT):
- Q-learning based halting mechanism
- Dynamically determines when sufficient computation has been performed
- Allows variable computational depth based on problem difficulty
Recurrent Carry State:
- Maintains H and L hidden states across reasoning cycles
- Enables iterative refinement of solutions
Positional Encoding:
- RoPE (Rotary Position Embeddings) for position-aware attention
- Supports up to 900 positions (30ร30 grids)
Compute Infrastructure
Software
- Framework: PyTorch with transformers library
- Precision: bfloat16
- Format: Safetensors
Citation
BibTeX:
@article{wang2025hierarchical,
title={Hierarchical Reasoning Model},
author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
journal={arXiv preprint arXiv:2506.21734},
year={2025}
}
APA:
Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.
More Information
This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-sudoku-extreme, formatted for use with the HuggingFace transformers library.
For more details about the HRM architecture and training methodology, see:
- Paper: https://arxiv.org/abs/2506.21734
- Original Implementation: https://github.com/sapientinc/HRM
Model Card Contact
For questions or issues with this converted checkpoint, please open an issue in the transformers repository.
- Downloads last month
- 21