Why do we need TM?

This is abit tongue in chick question …
If you think about it from the point of view of higher layers TM should not be needed.
Let me graph it … normally we imagine it this way :

[Encoder]=>[Spatial pooler]=>[Temporal Memory]=>[Temporal pooler]=>[Higher LVL]

but what if we just have :

[Encoder]=>[Spatial pooler]=>[Temporal pooler]=>[Higher LVL]

The only difference is that we miss TM predictive capability … except that all else looks good.

Why wouldn’t this scheme allow us to recognize objects ? It is still “recording” the interaction as single SDR at the TP output, right ? And if two pooled-recordings are similar then the objects will be similar , aren’t they !!

I groked the answer :slight_smile:

Perception is bottom-up, Prediction is top-down !!

so on the way up we only use TM to learn the sequence, but dont use it to predict.
On the way down we use it to predict and influence the bottom layer TPooler.

The question is how do you merge/update lower-TP i.e. the order of actions to be executed , because you have to sync signals coming from 3 directions :

  1. Update from below
  2. Lateral sync between columns, voting
  3. Update from Prediction from the top

any ideas ?


What do you mean by “Temporal Memory” and “Temporal Pooler”?

What is the distinction between the two for you?

1 Like

The temporal memory would be beneficial if the timing factor is there. For instance, recognizing objects in video streams. However, when dealing with discrete images, you can just use the spatial pooler + SDR classifier to recognize the objects in the images.


TM is the part that handles variable-markov-order sequences.

Temporal Pooler is the part that “converts” the temporal sequence to spatial-item/single-SDR, to be used in the next layer.

1 Like

To rephrase and make sure I understand correctly, your understanding of the TM and TP distinction, temporal memory is the thing that remember synaptic connections between cells of different columns as a sequence progresses, and the “pooler” is nothing more than an activation prediction view of those columns.

In my mind, the TP as expressed by yourself is simply a thin “peek” layer for the benefit of we who are trying to get an overall picture of which columns are falling into a potentially predictive state, which can then be overlaid with the spatial pooler. Functionally, it isn’t actually needed, except as a useful abstraction. One could, at the end of a turn, simply check if a winning column had a predictive cell or not, on an individual column basis, rather than have to duplicate that data elsewhere.

Again, functionally, it doesn’t remove anything from the actual operation of an HTM system.

(If I’m not mistaken, which is not guaranteed…)

After reading Paul’s reply below, I realized that I hadn’t properly finished my thought: TM, if you’re trying to tracking sequential patterns, is needed. The “TP” is just a ‘view’.

On the other hand if you are just trying to recognize a pattern in a single instant, without regard for what came before or after, sure, we don’t need TM.


One reason, I think, is movement. If the agent itself is controlling the sensory stream through its own actions, then each sensor must be able to predict what it will sense next. Otherwise the agent won’t have enough information to decide what movements to make next in order to reach its goal. You could probably remove this predictive element if the movement is hard-coded elsewhere and your only goal was to passively recognize objects. But if your goal is to generate output which is primed for action selection, then I think temporal memory and prediction is essential.

Another way to think of TM being essential for movement, is to consider temporal unfolding of actions down a hierarchy. Imagine some high-level action is selected, like “grab the coffee cup”. This breaks down into lower-level actions like “reach out hand”, “grasp fingers”, etc. Those break down into even lower-level actions and so-on. Besides the timing aspect, which you mentioned, all of these must also happen in the proper sequence. TM is essential for that to occur.

TM is also necessary for an agent to recognize things where order matters (like when hearing the notes of a song, someone speaking, etc). For example, without TM, “Bob punched Tom” would be identical to “Tom punched Bob”. I know you mentioned “object recognition”, so you may see this example as not relevant to your question, though (I have a rather abstract definition of “object” in the context of HTM theory).


AFAIK, current HTM theory does not include “movement”/RL aspect !
My understanding is that motor commands execution are outside the Cortical column … most probably in the basal-ganglia.

CC only handles/inputs Location+Sensory info, so it is sequence but from the point of view of the upper layer and TP it does not require TM ! probably … trying to figure scenario that will need it.
You only need to aggregate the sequence in chunks … slow it down…
If you dont pass TM data to TP you also skip the problem of bursting.

/Of course I’m looking at those as separate modules, they just could be intermingled. But I like them as modules because it is easier for implementation :wink: /

Now what you are saying about downstream-replay and/or generating motor commands you have to have TM. I can see that now.

So to repeat : UP no need of TM, DOWN need.

1 Like

I didn’t mention RL, but of course that will be necessary as the theory progresses. Movement is very much part of the theory (see many of the recent research meetings).

Of course, if you limit your use cases to object recognition (traditional “objects”, perhaps in photos, for example), and have the movement of the sensor as well as location/orientation information hard-coded, you could probably write something useful. I hope I don’t sound like I am putting the idea down – just that it limits your use cases by chopping out various components that are probably needed for other use cases.

can you give me case apart from object recognition that will need it ?

I also think it from a different perspective… The possible inputs to TP are :

  1. Whatever goes into TM i.e. Loc+Sense at t
  2. Whatever goes out of TM i.e. Prediction i.e. Burst OR Loc+Sense at t+1

So the only benefit of TM=>TP is being step ahead, but the drawback is you will get bursting-cols.

Is there other scenario ?

for that I posted this : TP should preserve sequence similarity!

which should also take care of ORDER.

Algebraic way to preserve order is to shift the SDR before UNION i.e.

  abc =  a >> 2 + b >> 4 + c >> 6 + ...
  cab =  c >> 2 + a >> 4 + b >> 6 + ...

later you can compare/check containment using the numbers … i.e. 2 is pos 1 in a sequence.

  a >> 2  in abc : True
  a >> 2  in cab : False

It was you who mentioned object recognition in the OP, so I thought that was the gist of your question about whether or not TM is needed. If other use cases are fair game, then how would you go about anomaly detection in streaming data without TM?

This is a very different idea for TP than I have explored myself. I suppose if your goal to invent a new ML algorithm, than anything is possible (I think you did mentioned “current HTM theory” a couple posts back :wink:)

Anyway, I’m interested to see what you come up with, so don’t let me discourage you!

Now that I think about it another way TM is useful is as a Model/Recorder, where you can reply or test sequences.
In RL part it can play the model part.

But then we dont need TP in this case :slight_smile:

What I want to do is to build HTM pieces as modules and use/reuse them in to solve different real problems.

For this to happen there has to be a tasks that are doable with HTM.
So far we have seen Object recognition (which on more abstract level can be thought as proto Concept creation, which is good , we can ground hierarchy ) and Anomaly detection (no idea what this is good for)

So far I have build Alpha versions of : Encoders, Spatial pooler, Segmented memory (memory-based Classifier, base-class for TM, at the core is Numenta-like-SDR-segmented-neuron), TM , simple-TP (decaying-union) … missing TP vote/sync algorithm.
Plus I have the code to do SDR-algebra.

Now I want to do is permutation of connecting them in different ways to solve real tasks.

1 Like

I see, that makes sense. From that perspective, then I think I would answer your original question “Why do we need TM?” with “Why not?” The more tools that you have in your toolbox, chances are the more real problems you will be able to tackle.

1 Like

Could you use a 1/2 timestep to merge the different types of signal ? A neuron can be depolarised by distal dendrites if it receives top down feedback or lateral input (3. or 2.), and if depolarised it will then fire faster when it receives a feed-forward signal (1.) than it would otherwise. So if you work out which cells are depolarised at timestep n, they can then fire at n+1/2 if they get a feed-forward signal, and use that to trigger inhibition of all the neighbouring neurons at timestep n+1. If nothing is depolarised, then n+1/2 is irrelevant and you just get normal feed-forward/lateral behaviour when they fire at n+1. Feed-forward processing would happen as usual, but will be modified by feedback inputs when they’re available.
Hope I interpreted the question properly - apologies if not !