Understanding HTM


After reading Mr Hawkin’s book (and watching his videos) I get the idea that human brain is a memory system and it’s simply predicting all the time. However, I am having trouble visualizing an implementation. Is there a tiny implementation out there that captures the essence of SDR and how it is subsequently used? With perhaps minimal number of neurons.


Welcome to the community.

There is a post pitched just for your first visit:

It will suggest that you watch these videos and I do think that is an excellent place to start.


Thanks for the link. I have watched the first few videos of the series and I still don’t get it :frowning: … In fact, I was watching this video and I get thrown off when a single line is drawn from input and single line is used to trace it to different layers - https://www.youtube.com/watch?v=7tYxK8DZKYU at 5:09 for example. I would imagine one input line to break out into several lines as they flow through the system.


Are you sure you watched the HTM School video series? https://numenta.org/htm-school/

You jumped right in the deep end here. If you don’t understand the HTM School series, this is going to be really hard to explain :sweat_smile:


I’ll give the videos another shot :slight_smile:


Okay, I am watching the SDR videos at HTM school. I would appreciate it very much if someone could help me understand this - how does sensory input get converted to an SDR? Not understanding it seems to inhibit my ability to appreciate the SDR visualizations.

Is the sensory input directly connected to SDR layer? How is sparsity ensured? Could i see a diagram of input signals from an eye (say 10 receptors -> 10 nerves going into the brain) gets mapped to an SDR?


1 Like

The raw sensor data (whether its scalar floats or pixel colors) is converted to a binary array using an encoder. Depending on how the encoder is implemented, the output could be an SDR suitable for use with HTM algorithms. However a spatial pooler is often applied to the output of the encoder in order to generate SDRs that make more efficient use of the available bit representaions.


The “encoder” is taking the place of the sensory organ in a biological system. The encoders we’ve creating are extremely simple compared to your cochlea and retina. Quick correction: Encoders do not have to produce sparse arrays, but their output must have semantic meaning. Spatial Pooling will extract meaning from sparse or dense input. For example, if someone shines a flashlight in your face, you might receive denser representations (more input activations) into the cortex from the retinas. The cortex (SP) will normalize this to a stable rate of about 2%. These sparse activations still have meaning, and cortical columns use them to build models of reality, or reference frames. The sparsity is super important, because is allows them to store a lot of objects and compare and contrast them efficiently.

The Spatial Pooling algorithm produces a stable sparsity. You can change this by setting a value in our models. It could produce dense representations, but sparse seems to be crucial to the whole system working. Input to the SP can be direct sensory encodings, but they must be binary arrays. They can also have topology. The SP can be configured with topology and local inhibition.