Hierarchical Reasoning Model

HRM excels at tasks that demand extensive search and backtracking. Using only 1,000 input-output examples, without pre-training or CoT supervision, HRM learns to solve problems that are intractable for even the most advanced LLMs. For example, it achieves near-perfect accuracy in complex Sudoku puzzles (Sudoku-Extreme Full) and optimal pathfinding in 30x30 mazes, where state-of-the-art CoT methods completely fail (0% accuracy). In the Abstraction and Reasoning Corpus (ARC) AGI Challenge27,28,29 - a benchmark of inductive reasoning - HRM, trained from scratch with only the official dataset (~1000 examples), with only 27M parameters and a 30x30 grid context (900 tokens), achieves a performance of 40.3%, which substantially surpasses leading CoT-based models like o3-mini-high (34.5%) and Claude 3.7 8K context (21.2%), despite their considerably larger parameter sizes and context lengths

https://arxiv.org/pdf/2506.21734

3 Likes

This has some similarities to Sakana’s continuous thought machine (CTM). For every recurrence/cycle this seems to output the whole answer all at once (just like CTM) but in the beginning there’ll be lots of errors (so it’s like a crude draft at first) and then it keeps repeating until the answer it outputs is correct. I don’t really understand the paper but I wonder if it can be made to output one token at a time instead of outputting a complete draft/answer at each cycle/recurrence.

According to a capable LLM friend of mine:
The CTM includes learnable exponential decay factors per neuron pair when computing synchronization. Higher values result in shorter-term dependency while lower values result in longer-term integration.

So this decay mechanism allows certain neural interactions to be more sensitive to short-term (high-frequency) changes, while others can integrate over longer timescales. In effect, this acts as a kind of “frequency filter”—and can be interpreted loosely as:

  • High decay / short-term → faster-moving, detail-oriented patterns (akin to high frequency)
  • Low decay / long-term → slower, global representations (akin to low frequency)

In maze-solving, for example, more meaningful decays were observed—supporting local reasoning, which is more time-sensitive.

So CTM actually learns the frequencies for each neuron while HRM explicitly hardwires the different frequencies.

4 Likes