The ANN equivalent of HTM


When talking about memory systems in ML most people think first about RNNs where we build a neural network with weight sharing over time. An RNN maintains a fixed-sized vector of internal state and we backpropagate some kind of error to make this state useful to perform some task.
In HTM we have sparse representations and Hebbian learning VS dense representation and gradient descent in RNNs.
So RNNs are not really very similar to HTM.

What would be a good ANN equivalent to HTM?
Firstly, I think we need a very wide and big neural network with lots of parameters. The net is a function from the state (context) and input, to a new state. Instead of letting the system learn slowly and average over all samples, we use backpropogation to aggressivally update the net on each new input by making the gradient descent step size huge instead of very small. There’s no backpropogation through time, when we train the network we just iterativly give the last state together with the input to the network.

So the net has a huge sparse parameter space to store memory at, and each new input writes to a piece of memory somewhere in this space, parametrized by the context.


So you discard RNN as a good candidate on the basis that it uses backprop, and then ask for some other candidate with backprop ?
I don’t really understand what you’re trying to match, or why.


I don’t discard anything.
RNNs just have different characteristics than HTM, they create a very dense representation that’s particularly good for a give task, they require many samples to learn this representation and they’re not online learning friendly since you need to backpropogate the error through previous time steps (meaning you have to store the unfolded RNN which grows with time).

The problem is not backpropogation per say, it’s that the memory you need to optimize an RNN grows linearly with time, making it not suitable for online learning.

HTM on the other hand is very good at memorizing lots of sequences using a very sparse representation and updating it online via hebbian learning. Dense representations are highly optimized yet inflexible, sparse ones are not optimized but can change rapidly online.

I’m simply pointing out the differences between current memory system in ML and HTM, and trying to invent something that more resembles HTM.


Well I’m not familiar with RNN implementation but I know of some of their achievements. They seem to be able to deal with problematics which in HTM we’d kick for a touch, awaiting clues on a hierarchy.

If you stumble upon any insight linking the two together, maybe that could be worth the effort.


Yes. And RNNs seem to be incapable of memorizing sequences rapidly in an online fashion which HTM does very well.


It seems I do link to that one a lot, recently…


It’s a good question @matan_tsuberi, but I have not found a good answer in the years I’ve been the community manager. There is really no equivilent tech in machine learning ANN world. (Remember that HTM is an ANN, if that helps. :wink:)


Matan, rhyolight is correct HTM is an ANN. In most ANN people use overly simplistic neurons. Simple integrate and pass through a transfer function. Almost no one uses lateral connectivity within an area (I am using area to mean level of hierarchy following Bengio to distinguish it from layers 1-6). Tensorflow is not friendly for this kind of connectivity. The EU Human Brain project simulator is setup for this. I would look there first.


RNN are ANN with a feed back loop. LSTM are RNN with flip flops to hold bits high.
Noting that it has toggled to a new state.

The connectionist working with ANNs try to model ANN more as transmission lines
They will use hard memory device as a last resort to make their model work.

Machine Leaning do not stack two or more deep ANN together than have a feed back
to make supper stacked RNN. I think the brain does.

There are a couple of ways making a memory device with ANN. One way is to daisy
chain of neurons in a loop. Also known as a delay line, that feed back in. Like a
RNN. But it is a pure transmission line and data is not changed.

The second way is to store thing up in “Weight Space”.
A ANN converts input data to output data. It dose this by adjusting the weight within
the ANN.
So a input value that look like a address value could be inputted in do a ANN
and be trained to give a output data value. This is a ANN simulating RAM. Training
a value into ANN take longer to store it.

For me, One of my AGI models i use a very long “Straight Line” of many spiked
neurons that drive video.

When a pule singal jump to then next neuron it generate then next frame of video.

One pulse goes to the next neuron and another goes to a ANN that generates
the video image. Like a addressing through video memory in sequence.

When a bot wake up and look around. it detection features of the world that
activate SDR bits. These bit routed to the “straight Line” of spiked neurons. That
drives the video.
Like hitting a spider web line. Many features detection will cause a spike to develop
at given location. I also call this Straight line of neurons the consciousness track.

The center of the universe for a connectionist is in between a encoder network
and generator network.

Encoder Decoder Network - Computerphile:

Some of the very fist life form on earth, that used neurons, had a detector or encoder
network that detected food and then a decoder or generating network to catch food.

Latter on decoder were used as a memory device.

Numenta turns attention to The Thalamus!
Intelligence vs Consciousness

What do you think of my answer ? :slight_smile:


It’s honestly hard to make sense of it. I have never studied ANNs based upon the point neuron, so I’m not familiar with the dynamics of the systems you can create with them. My interests are purely biological (with the intent of reverse-engineering the biology with code).


The video is making sense in regards to the spatial navigation map I’m experimenting with needing a similar reduction in resolution. Otherwise nonsignaling places to avoid become large areas with no directional information inside them. The (from literature) concordant pair ratio is then no longer around 58%.

Conceptualizing a shock zone to avoid as a single entity with vectors neatly pointing away from the center requires representing as a single place. Where the differing resolution modules are stacked the non-signaling centers of the higher resolution maps would be filled in by the lower. There is then one active map with the ideal directions to go where stuck in the middle of a pond and must swim towards a landmark on shore, and well resolved shoreline where a nearby tree trunk to avoid bumping into is already around one place in size, the personal space needed to maneuver. Somatosensory maps further resolve the area that’s within reach.