Hierarchical Reasoning Model

cezar_t · July 23, 2025, 1:08pm

HRM excels at tasks that demand extensive search and backtracking. Using only 1,000 input-output examples, without pre-training or CoT supervision, HRM learns to solve problems that are intractable for even the most advanced LLMs. For example, it achieves near-perfect accuracy in complex Sudoku puzzles (Sudoku-Extreme Full) and optimal pathfinding in 30x30 mazes, where state-of-the-art CoT methods completely fail (0% accuracy). In the Abstraction and Reasoning Corpus (ARC) AGI Challenge27,28,29 - a benchmark of inductive reasoning - HRM, trained from scratch with only the official dataset (~1000 examples), with only 27M parameters and a 30x30 grid context (900 tokens), achieves a performance of 40.3%, which substantially surpasses leading CoT-based models like o3-mini-high (34.5%) and Claude 3.7 8K context (21.2%), despite their considerably larger parameter sizes and context lengths

https://arxiv.org/pdf/2506.21734

roboto · July 23, 2025, 2:16pm

This has some similarities to Sakana’s continuous thought machine (CTM). For every recurrence/cycle this seems to output the whole answer all at once (just like CTM) but in the beginning there’ll be lots of errors (so it’s like a crude draft at first) and then it keeps repeating until the answer it outputs is correct. I don’t really understand the paper but I wonder if it can be made to output one token at a time instead of outputting a complete draft/answer at each cycle/recurrence.

According to a capable LLM friend of mine:
The CTM includes learnable exponential decay factors per neuron pair when computing synchronization. Higher values result in shorter-term dependency while lower values result in longer-term integration.

So this decay mechanism allows certain neural interactions to be more sensitive to short-term (high-frequency) changes, while others can integrate over longer timescales. In effect, this acts as a kind of “frequency filter”—and can be interpreted loosely as:

High decay / short-term → faster-moving, detail-oriented patterns (akin to high frequency)
Low decay / long-term → slower, global representations (akin to low frequency)

In maze-solving, for example, more meaningful decays were observed—supporting local reasoning, which is more time-sensitive.

So CTM actually learns the frequencies for each neuron while HRM explicitly hardwires the different frequencies.

Topic		Replies	Views
Continuous Thought Machine Machine Learning dentritic-mlp	13	140	July 16, 2025
An open-source community research project on comparing HTM-RL to conventional RL Related Papers	63	3364	June 19, 2018
Hierarchical Temporal Memory Agent in standard Reinforcement Learning Environment Engineering	12	2052	February 16, 2020
Extrapolating and interpolating quantities with HTM Numenta Theory question , cortical-columns , representation , minicolumns	13	622	February 15, 2023
Why is HTM Ignored by Google DeepMind? Tangential Theories question , applications	40	8278	August 30, 2019

Hierarchical Reasoning Model

Related topics