HRM Sudoku Extreme

A Hierarchical Reasoning Model (HRM) trained to solve extreme difficulty Sudoku puzzles using hierarchical processing and adaptive computation.

Model Details

Model Description

This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving extreme difficulty Sudoku puzzles. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract planning and low-level (L) modules for detailed computation. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.

The model processes 9ร—9 Sudoku grids (81 tokens) and predicts the correct digit for each cell through hierarchical reasoning cycles.

  • Developed by: Sapient Inc.
  • Model type: Hierarchical Reasoning Model (HRM)
  • Language(s): Symbolic reasoning (digits 0-9)
  • License: Apache 2.0
  • Original checkpoint: sapientinc/HRM-checkpoint-sudoku-extreme

Model Sources

Uses

Direct Use

This model is designed for solving extreme difficulty Sudoku puzzles. It can:

  • Solve complex 9ร—9 Sudoku grids that require advanced reasoning techniques
  • Process partial grids and predict missing digits
  • Demonstrate hierarchical reasoning strategies for constraint satisfaction problems

Downstream Use

The model can be used as:

  • A component in puzzle-solving applications
  • A baseline for research in hierarchical reasoning and adaptive computation
  • An example of applying neural networks to combinatorial optimization problems

Recommendations

Users should be aware that:

  • The model is specialized for Sudoku and should not be used for general reasoning tasks
  • Input must be properly formatted as 9ร—9 grids with digits 0-9 (0 for empty cells)
  • Inference time may vary due to the adaptive computation mechanism

How to Get Started with the Model

import torch
from transformers import HrmForCausalLM

# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-sudoku-extreme")
model.eval()

# Prepare a Sudoku grid (9x9 = 81 tokens)
# 0 represents empty cells, 1-9 are the digits
sudoku_grid = torch.randint(0, 11, (1, 81))  # Example random grid
puzzle_ids = torch.zeros(1, dtype=torch.long)

# Run inference
with torch.no_grad():
    outputs = model(input_ids=sudoku_grid, puzzle_identifiers=puzzle_ids)

# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted solution: {predictions}")

Training Details

Training Data

The model was trained on a dataset of extreme difficulty Sudoku puzzles. These puzzles require advanced solving techniques beyond basic constraint propagation.

Training Procedure

The model uses a hierarchical architecture with:

  • High-level (H) module: 4 transformer layers for abstract planning
  • Low-level (L) module: 4 transformer layers for detailed computation
  • H-cycles: 2 high-level reasoning cycles
  • L-cycles: 2 low-level computation cycles per H-cycle
  • ACT mechanism: Q-learning based adaptive halting with max 16 steps

Training Hyperparameters

  • Training regime: bfloat16 mixed precision
  • Architecture: 4 H-layers, 4 L-layers, 8 attention heads
  • Hidden size: 512
  • Intermediate size: 1536
  • Max position embeddings: 900
  • Vocabulary size: 11 (digits 0-9 + padding)

Model Architecture

Technical Specifications

Component Value
Total Parameters 27,275,778 (27.3M)
Model Size 109.11 MB
Vocabulary Size 11
Hidden Size 512
Intermediate Size 1536
H-level Layers 4
L-level Layers 4
Attention Heads 8
H-cycles 2
L-cycles 2
Max Halting Steps 16
Position Encoding RoPE (Rotary Position Embeddings)
Activation SwiGLU

Model Architecture and Objective

The Hierarchical Reasoning Model (HRM) features:

  1. Two-level Hierarchical Processing:

    • H-level (High-level): Performs slow, abstract planning and strategy formulation
    • L-level (Low-level): Executes fast, detailed computations
  2. Adaptive Computation Time (ACT):

    • Q-learning based halting mechanism
    • Dynamically determines when sufficient computation has been performed
    • Allows variable computational depth based on problem difficulty
  3. Recurrent Carry State:

    • Maintains H and L hidden states across reasoning cycles
    • Enables iterative refinement of solutions
  4. Positional Encoding:

    • RoPE (Rotary Position Embeddings) for position-aware attention
    • Supports up to 900 positions (30ร—30 grids)

Compute Infrastructure

Software

  • Framework: PyTorch with transformers library
  • Precision: bfloat16
  • Format: Safetensors

Citation

BibTeX:

@article{wang2025hierarchical,
  title={Hierarchical Reasoning Model},
  author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
  journal={arXiv preprint arXiv:2506.21734},
  year={2025}
}

APA:

Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.

More Information

This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-sudoku-extreme, formatted for use with the HuggingFace transformers library.

For more details about the HRM architecture and training methodology, see:

Model Card Contact

For questions or issues with this converted checkpoint, please open an issue in the transformers repository.

Downloads last month
21
Safetensors
Model size
27.3M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support