BDH (Baby Dragon Hatchling)

MTIzNDU2Nzg5 · December 19, 2025, 4:05pm

This paper on BDH (Baby Dragon Hatchling). It’s basically an attempt to replace the Transformer bottleneck with a scale-free graph of “neuron particles.”

The technical gist:

No KV Cache: Instead of storing context in a massive memory buffer, it uses synaptic states. It uses local Hebbian rules to update weights during inference.
Sparsity: It operates at ~5% activation. Everything is sparse and positive-only, which keeps it closer to SDRs (Sparse Distributed Representations) than traditional dense LLM vectors.
Graph Dynamics: It’s structured as a graph rather than a stack of layers. It uses an “integrate-and-fire” cycle (Firing → Competition → Update → Transmission).
Scaling: They managed to hit GPT-2 performance levels at 1B parameters. That’s the part that actually matters—it’s a biologically-plausible model that doesn’t fall apart at scale.
Interpretability: Because of the sparsity and local rules, the authors claim “monosemanticity.” You can basically trace a concept to a specific physical path in the graph rather than a high-dimensional mystery.

They’ve got a BDH-GPU implementation that maps these graph interactions into linear algebra kernels so it actually runs on current hardware.

The “Thermodynamic Limit” they mention actually prevents the local updates from diverging/exploding when you move past 1B parameters.

Podcast on this

sean · December 21, 2025, 3:12pm

very interesting and promising development - found the fact very curious that you can just concat these models to produce a bigger one that combines knowledge of both parts into one model.

Falco · December 22, 2025, 8:20am

They seem to add a limited attempt at sparsity and a beginning of continuous learning (after classic pre-training with back propagation), but they still use point neurons.

Thanks for the post @MTIzNDU2Nzg5.

Maggus · January 8, 2026, 6:50am

Here is also also a interesting youtube interview about the BDH model.

Topic		Replies	Views
An ode to biological principles Lounge	28	1316	September 23, 2023
Implementation data model: Graph vs. Arrays Implementations	2	469	August 14, 2021
Obstacles to widespread commercial adoption of sparsity in the ML industry? Machine Learning sparsity	38	1283	April 6, 2022
Application of HTM in today’s ML frameworks Machine Learning	50	3586	March 28, 2019
SpikingBrain Technical Report: Spiking Brain-inspired Large Models Lounge	2	181	September 12, 2025

BDH (Baby Dragon Hatchling)

Related topics