SpikingBrain Technical Report: Spiking Brain-inspired Large Models

:brain: SpikingBrain: brain-inspired principles in large language models

Researchers from China have presented a technical report on SpikingBrain - a series of large language models inspired by how the brain works. They built two versions: one with 7B and one with 76B. The key difference is the use of spiking neurons, which, like in the brain, activate only when necessary.

In ordinary LLMs, a long context greatly slows down performance: the longer the text, the more time and memory it takes. In SpikingBrain, the team applied a combination of linear attention, local windows, and Mixture-of-Experts together with spiking encoding. The result is a model that can handle million-token inputs without catastrophic growth in cost.

:bar_chart: In numbers, it looks like this:

- on 1 million tokens, SpikingBrain-7B generates the first token 26× faster than standard Qwen2.5-7B

  • on 4 million tokens, the speedup extrapolates to over 100×
  • the average sparsity of activations is about 69%: most channels simply don’t «fire», saving computation
  • the models were trained on MetaX C550 - Chinese GPUs, making this the first stable training of a brain-inspired LLM not on NVIDIA.

In terms of benchmarks, SpikingBrain-7B reaches up to 90% of the quality of Qwen2.5-7B with far fewer resources, while the hybrid 76B version in some cases matches or surpasses Llama-2-70B and Mixtral-8×7B.

For now, this is still «spikes in software» - an imitation. The real benefits will appear on neuromorphic chips, which operate natively on event-based logic. But even now, it’s clear: neuro-inspired ideas truly help stretch super-long contexts and speed up inference.

In essence, this is a step toward models that can feed with entire libraries or massive codebases and still get fast answers. And perhaps also a bridge to future systems where LLMs run not only as software, but on specialized brain-like hardware.

1 Like

Now what if someone inject HTM encoding into it…

1 Like

I appreciate what theyre trying to do but i think theyve misunderstood the concept of sparse coding, because their system does not seem sparse enough. At 31% activity the representations are too still dense to be unioned together.

3 Likes