Predictive Processing vs Predictive Dendrites

I was going back over the paper “Locations in the Neocortex: A Theory of Sensorimotor Object Recognition Using Cortical Grid Cells” There is confusion in that paper about what predictive processing implies.

The paper states “Our model provides an alternate explanation for predictive processing in visual cortex.” and describes other research where “being able to predict what you will feel as you move your hand over the bottle or what you will see as you move your eyes over the bottle” and suggests the paper provides a mechanism “our model does propose how the brain represents the bottle in different sensory modalities in such a way that it will make predictions in response to movements. We think the idea of object-specific location representations is quite compatible with this view of perception.”

There is a significant difference because HTM predictions are local to the dendrites of each cell. This means that the prediction is not available as an output, so there can be no processing of the prediction i.e. this is not “predictive processing” In predictive coding (another term for predictive processing, Predictive coding - Wikipedia) “modelling predictions of lower-level sensory inputs via backward connections from relatively higher levels in a cortical hierarchy.” requires predictions be processed i.e. communicated (not remaining local to a cell’s dendrites).

As we move up the layers of the HTM hierarchy cell outputs will be increasingly delayed from the sensory input i.e. it is a reactive processing not a predictive processing. Assuming I’m right (I would prefer to be wrong!), it would be good to make this distinction more obvious.

1 Like

To address this point, first let me abstract the circuit as an input layer and an output layer (SMI is more complex than that, but at a high level that is still a way to think about it). The output layer is more abstract than the input layer (i.e. the step up in hierarchy happens within a CC). When building a hierarchy, at a basic level, we take the output layer of a CC and send it to the input layer of another CC. Thus (internal) predictions occur at all levels of the hierarchy, not just at the bottom.

Starting with that simplified view of the circuit, the next thing to remember is that the activity in a CCs input layer is in the context of an object. In the case of a sequence, the input isn’t simply “G”, but something more specific like “33rd note of Beethoven’s 5th”. While this is not a prediction, it is something in a similar class – a “belief”. This belief is synchronized across hierarchical levels.

This is still not predictive processing, but setting the stage. When the “belief” turns out to be wrong, the input layer in a CC begins to burst. What that means, is it is sending a dense signal of the input in all known contexts. This is actually an active prediction from the perspective of the output layer. The dense activity has a much stronger pull on the output layer (32 times as much impact in a typical HTM configuration).

The output layer is shared between many CCs in the same hierarchical level (aka voting), so if lots of the CCs are bursting, the active predictions in the output layer settle onto an object that satisfies the most CCs (or adjusts to a new object to be collectively learned if none of the CCs agree).

So if the output layer is actively predicting multiple objects, and it is an input to the next level in the hierarchy, that then would cause the input layer of CCs in that next level to start bursting, and repeat the process on up the hierarchy.

I am making some assumptions here about the algorithm in the output layer, but I think they are in alignment with its intent.

Predictions could seem local (partially depolarized cells can’t directly impact other cells) but you can basically see the active cells of TM (which can impact other cells) as the prediction. They are equal in terms of the amount of information they have.
This is trivial because the predictive(partially depolarized) cells are derived directly from the active cells.

Also,

In the paper “Locations in the Neocortex: A Theory of Sensorimotor Object Recognition Using Cortical Grid Cells”, the model they proposed utilizes the property of union operation (multiple active cells in a GCM) to represent multiple possible predictions which can be disambiguated by incorporating multiple CCs. (communication)

1 Like

Active cells are active relative to their inputs. So there is a delay between an input arriving and cells becoming active. This delay will accumulate up the hierarchy.

There is not a strict hierarchy but there are more abstract layers. Hopefully the problem is clearer.

Predictions are about future inputs. At the lower levels the belief is that the current input is part of a previously learnt input sequence (there are not concepts like note, count, song) When the music stops on the 33rd note you don’t know what the 34th will be. Instead you get bursting to learn whatever random thing happens instead of the 34th note.

The dense bursting is not a prediction - it is a reaction to an unknown current input. It is still not about the future.

I would like to read more about the implementation of voting, please let me know if you’ve seen something. If there is an output layer then this would no longer be inline with a repeating algorithms at the scale of CC. My understanding was that the voting needs to be local to each CC (using information from many other CC). Otherwise this is headed toward the idea of a homunculus - but the idea of prediction seems to have been abandoned so why not!

As above I don’t think this is a prediction, it is a mismatch with the current input. Perhaps it could be used as an anomaly indication.

Ideally the architecture of how we get predictive processing out of reactive units would be explained in TBT but I don’t remember seeing anything, do you?

1 Like

The active dendrite has learnt the associations i.e. the dendrite has information. The active cells are reflecting current input and historical context. They are a result of (locally) predicting the current input (or not in the case of bursting).

These are not predictions but a way of deciding when the current output has discriminated a unique object i.e. the current input is recognized and part of a unique historical sequence.

If the implementation is reactive rather than predictive, then this does not imply it is “bad”. The intention of this thread is to clarify what predictive processing is and make it clear about the difference between HTM and predictive processing. I hope people will be able to make better use of HTM (and contribute more) if they understand what it is (and what it is not). I was assuming it was predictive processing and I learnt (on another thread) I was wrong. Maybe I will learn that I am wrong about being wrong…

Even if you assume prediction as “predicting the future inputs”, they are still predictions. It’s about where you draw the line though.
As the GCM activations derive predictive cells in minicolumns(which represent inputs that it predicts to be sensed in the future) when the agent makes a movement.
The active cells in GCMs are in the same sense to actives cells in TM, representing the prediction and have the same latent information that the predictive cells have.

1 Like

Unfortunately this is wrong. There are not predictive cells, there are predictive dendrites. The “predictive state” of cells in TM leaves the cell as inactive. It is only AFTER the present input is matched that the cells in a “predictive state” become active. I was also confused by this, but if you don’t see this please read BAMI and take a look at Issues in BAMI

I am well aware of that.

But I say this again, the predictive cells are entirely and directly derived from the active cells. Regardless of if they are in TM or GCMs.
Then you can trivially settle on the conclusion that they essentially have the information. And active cells can interact with other cells.
So I still think HTM and TBT as predictive frameworks.

The information about what is next is in the active dendrites of the cells in a “predictive state” (they have learnt this and that is where the information is). The particular combination of active cells that lead to an active dendrite is not available at the output. A higher layer takes the outputs of a lower layer and can learn to predict but (again) those predictions will be in the active dendrites and not available as predictions within the system.

I’m not sure what you mean by “predictive framework” there is of course prediction going on in the active dendrites. This thread is intended to clarify that HTM’s “predictive framework” is not the same thing as “predictive processing” which requires anticipation (i.e. acting in the present based on predictions of the future)

If you thought the theory was sound because you thought predictive cells were active, then it wouldn’t have any problems even if they weren’t. Motor cells can learn from the active cells as if they were the predictive cells. Because They have the same information.

No they don’t have the information that is learnt by the distal dendrites. But if you want to imagine that the distal dendrites do not learn anything then that is your decision. If you want to convince others, then you’ll need to show how the information in the distal segments gets communicated without communication.

Another way to think of this is that active cells communicate the current input in the context of the input’s history.

A way to understand predictive processing is to think of it as a system that anticipates.

Also I should clarify I am not claiming the theory is not sound I’m just claiming it is not predictive processing.

Let’s say the predictive cells were actually active, then how do the motor cells know what input the predictive cells represent? They can’t without learning. If the motor cells could learn what input the predictive cells represent and thus knew how to react the way they should, so they could with the active cells.

This would not work. HTM does not work that way either. I was assuming it is a predictive processing framework. I did not decide it was a predictive processing framework based on a misunderstanding of the algorithm. I understood it was not predictive processing by better understanding the algorithm.

I assumed it was predictive processing based on reading On Intelligence.

I was just saying it hypothetically… for the sake of making a point.

How does it differ?

If you think of a temporal memory as a series of nodes, when any of its nodes are selected, that is technically “predicting” the entire series. Forgetting about the voting aspect or bursting (I think maybe that just obfuscated my point) when the output layer in a single CC recognizes a couple of nodes in the series activating in the input layer, it activates a representation for that series. This is an active prediction of the series. If you take the activity which represents the series, and the activity which represents the position in the series, those together encode not only the same information as the next node in the series, but also all future nodes. This is much more useful than simply transmitting what will happen a few milliseconds from now.

A couple of other thoughts I had about this:

Timing. For streaming data, when a belief is correct, the amount of time between when a cell is predictive and when it is active is a very short. There is probably not a lot to gain by sending the next element of information a few milliseconds sooner. More useful would be sending information about what will happen a few seconds or minutes in advance. This is what I see the output layer does. Since the activity is more stable than the input layer, it is representing a temporal correlation that encodes information not only what is the current input, but also the future inputs.

Division of labor. What does a CC within a hierarchy need to know? Of course, I need to know what will happen next (my prediction based on my belief), but what about the levels above and below me?

The CC below me needs to know my current belief. Since I am already predicting what will happen next at my own level of abstraction, the CC below me does not need to know what I am predicting, and doesn’t need to copy my predictions (nor likely can it, since it is modelling lower abstractions). It just needs to unfold my current belief using its own predictions at its own level of abstraction. I’ll tell it when to move to my next belief to unfold.

The CC above me needs to know if my current belief is panning out for me and matching reality. It doesn’t need me to tell it my predictions, it is making its own predictions at a higher level of abstraction. When my belief is wrong, it needs me to tell it that (and it needs me to provide some alternate possibilities based on my own memory). It uses this information to judge how well its own belief is matching reality, and to inform the next level up about that.

Learning. When predictions are going well, there is nothing to learn, so no reason to transmit them outside of where they are being locally managed. When they are not going well, then it is time to get the hierarchy involved to help figure out what has gone wrong, and update models accordingly.

The wrong belief could be at any level of the hierarchy (tripping on a newly formed crack in the sidewalk vs a road has been blocked off on my walk to the store, etc) The lowest levels of the hierarchy (being where evidence from the world enters the picture) are always going to be the first to recognize when things are going wrong, and they have a mechanism (bursting) to transmit that quickly up the hierarchy to whatever level needs it.

I think this is actually a desired property for temporal abstractions (for example, the temporal difference between “left, right, left, right” and “walk forward”). But where you want to shortcut this temporal difference is when beliefs are not matching reality and you need to update your model. There is a mechanism for this – bursting. Being a much denser activity, it is shouting for attention of the next higher level. If this “shouting” contradicts the beliefs of the next higher level, then it will also start bursting, and shouting up to the next level and so-on, until the source of the problem is reached, and learning can occur to fix the faulty model.

There is a feedback loop. The input layer nudges on the activity in the output layer (along with other sibling CCs), and as the output layer changes, it biases cells in the input layers causing them to change context.

Haha, no it doesn’t require anything that magical. There are simple, local-rules based mechanisms that can do this. One example is self-reinforcing hex grids (borrowing from William Calvin’s book “The Cerebral Code”). I am in the process of drawing visualizations to explain this particular implementation of an “output layer”, and will be posting a thread about it soon.

3 Likes

Exactly what I was trying to say… and so much more!
Great explanation, @Paul_Lamb ! :clap:

BTW, I have to say that these latest threads you’ve been posting have really helped me with exploring the rough edges of my understanding. In this particular case, I’ve always viewed HTM as a predictive framework, but never explicitly tried to explain why. Maybe I will learn that I am wrong. :slight_smile:

3 Likes

If you redefine what predictive processing means then you can be right! Another option would be to define a different algorithm but then it would not be the topic of this thread which is about HTM as currently defined.

I’m glad it is of some use, thanks.

Regarding your ideas, another thread would be good. I suspect it runs into a problem of representing the object vs representing the historical context. I remember a video with Jeff and Marcus talking about this, if you include the context in the SDR (as per TM) then you lose the semantic content of the SDR.

There is a bit of a grey area with “as currently defined” (though I probably went pretty far beyond that in my last post there…) If defined as classic SP + TM only, then there is no hierarchy, so the Wikipedia definition of predictive coding wouldn’t really be applicable. The theory is still actively evolving, so maybe the question should be “how will HTM apply predictive coding when it begins to incorporate hierarchy?”

This is specifically what I’ve been working out in my own experiments. I decided to focus somewhere besides reference frames (since that is where Numenta is currently focused), and the “output layer” and hierarchy is a prime area for exploration. I have found a few different algorithms which preserve semantics (the one I’m working on the illustrations for does as well)

1 Like

From what you guys talk there are two “Frameworks”

  1. The prediction is in active neurons
    1.1 If right --do nothing (or pet yourself on the back)
    1.2 If wrong sends the information back (with an angry note)

  2. The prediction is in dendrite
    2.1 If right --make neuron active (fireworks…)
    2.2 If wrong --do nothing (keep it to yourself…maybe next time …)

Some words about 2.1
The HTM network is sparsely activated but not so sparsely connected and this means that there are a LARGE number of neurons that, left unchecked, will activate regardless of the input. You can view the active neurons as “predictive ones” because they not only predicted but also get a confirmation of that prediction and this is the only reason they get successfully active (they win the competition). The error, if any, does not go up the stream, it remains local, where in the 1.1 case they do not wait to see if true …they send the message up in the stream (and down as an error correction message).
So the difference seems to be the place where u deal with the error.

1 Like