SpikingBrain Technical Report: Spiking Brain-inspired Large Models

MTIzNDU2Nzg5 · September 12, 2025, 12:45am

SpikingBrain: brain-inspired principles in large language models

Researchers from China have presented a technical report on SpikingBrain - a series of large language models inspired by how the brain works. They built two versions: one with 7B and one with 76B. The key difference is the use of spiking neurons, which, like in the brain, activate only when necessary.

In ordinary LLMs, a long context greatly slows down performance: the longer the text, the more time and memory it takes. In SpikingBrain, the team applied a combination of linear attention, local windows, and Mixture-of-Experts together with spiking encoding. The result is a model that can handle million-token inputs without catastrophic growth in cost.

In numbers, it looks like this:

- on 1 million tokens, SpikingBrain-7B generates the first token 26× faster than standard Qwen2.5-7B

on 4 million tokens, the speedup extrapolates to over 100×
the average sparsity of activations is about 69%: most channels simply don’t «fire», saving computation
the models were trained on MetaX C550 - Chinese GPUs, making this the first stable training of a brain-inspired LLM not on NVIDIA.

In terms of benchmarks, SpikingBrain-7B reaches up to 90% of the quality of Qwen2.5-7B with far fewer resources, while the hybrid 76B version in some cases matches or surpasses Llama-2-70B and Mixtral-8×7B.

For now, this is still «spikes in software» - an imitation. The real benefits will appear on neuromorphic chips, which operate natively on event-based logic. But even now, it’s clear: neuro-inspired ideas truly help stretch super-long contexts and speed up inference.

In essence, this is a step toward models that can feed with entire libraries or massive codebases and still get fast answers. And perhaps also a bridge to future systems where LLMs run not only as software, but on specialized brain-like hardware.

MTIzNDU2Nzg5 · September 12, 2025, 12:47am

Now what if someone inject HTM encoding into it…

dmac · September 12, 2025, 4:22pm

I appreciate what theyre trying to do but i think theyve misunderstood the concept of sparse coding, because their system does not seem sparse enough. At 31% activity the representations are too still dense to be unioned together.

Topic		Replies	Views
Long short-term memory and learning-to-learn in networks of spiking neurons Related Papers spiking	0	403	September 26, 2021
An ode to biological principles Lounge	28	1348	September 23, 2023
BDH (Baby Dragon Hatchling) Lounge	3	222	January 8, 2026
Anyone can explain why Numenta latest algo optimizes Deep Learning 100x? Machine Learning	15	1294	May 15, 2023
Biologically sound brain simulation for exploration Lounge	0	310	August 24, 2021

SpikingBrain Technical Report: Spiking Brain-inspired Large Models

Related topics