Predictive Processing vs Predictive Dendrites

Predictive coding - Wikipedia

Predictive coding (also known as predictive processing ) is a theory of brain function in which the brain is constantly generating and updating a mental model of the environment. The model is used to generate predictions of sensory input that are compared to actual sensory input. This comparison results in prediction errors that are then used to update and revise the mental model.

I don’t get it. Why does the Temporal Memory not fit this description?

Also, at the end of that wiki page it says:

In sum, the neural evidence is still in its infancy.

2 Likes

I think the objection comes from this quote (Mark can correct me if I am wrong):

That quote is making an assumption about how a cortical hierarchy works, of course, so it may be circular reasoning to use this quote as evidence that TM is not predictive coding since it doesn’t implement hierarchy yet.

3 Likes

Connections between levels tend to go to multiple levels, so processing doesn’t happen one level at a time. Still, there’s a little delay between sensory input and recognition. There’s not a massive delay though.

Prediction isn’t always about a sequence. Part of HTM narrows down a list of possible objects (and in other parts, locations, displacements, etc.) That’s a set of predictions in a way, because it expects what it senses to be consistent with those.

Temporal memory is the same. It narrows down a list of possible sequences. When the input is entirely unexpected, it activates all cells in all active minicolumns. That’s the union of all possible sequence contexts. As a result, it predicts the union of everything next in known sequences. For example, let’s say it knows two sequences, ABCD and EBCF, and it gets input B unexpectedly. It predicts C in context of both of those sequences. If C turns out to be the next input, it’ll activate all the predicted neurons in each active minicolumn, so 2 neurons per minicolumn for 2 sequence contexts. Then it predicts D and F. If D or F is the next input, it narrows down the list of possible sequences to just one.

I’m not saying that’s always a good thing. The point is, temporal memory has predictive states, but it also predicts possible sequences by activating a union of SDRs for each sequence which fits. Technically it’s still an implicit prediction because it’s the sequence-so-far not the whole sequence, but it’s not just implicit prediction in the layer for object disambiguation (aka the output layer).

Maybe it’s wrong to call a list of expected possibilities a set of predictions. I think some mental gymnastics are required to understand the brain though, because it doesn’t care how we define things. Something which looks like a prediction of the next sensory input, e.g. presaccadic remapping, might actually be something slightly different.

I think HTM papers are using the term predictive processing in a loose way, but I’m not sure what a better term would be. Maybe “predictive modelling” or “modelling as expectations”.

I think hsgo is saying the state of temporal memory is sufficient to make the predictions. Temporal memory makes those predictions, but another set of cells could too, with the right learning rules etc. You can do other processing as if you had the predictions, even if you don’t directly represent them with firing cells. You can’t just pretend you have the predictive states as if they’re firing cells, but that mostly just matters was you get into implementation details like learning rules.

3 Likes

:point_up: I love this. Best quote of the day!

3 Likes

Considering how long it takes a neuron to generate a spike, the experimental data (on how fast we can detect things like class of an image) indicate there is prediction going on. This fits, to some extent, with the idea of depolarizing cells to accelerate the firing, but in an HTM neuron model there are not the sorts of delays you get in a neuron anyway.

I was trying to point out that I don’t think the way HTM works is not inline with the technical meaning of “predictive processing”. In this thread some have assumed that they need to show HTM is predictive processing rather than understanding what it is actually doing.

This is the external observers interpretation. It is just matching (or not) learnt sequences.

This is where I think you are making a leap. It is not predicting the sequence, it is predicting T+1. The output reflects the “the input in the context of prior input.”

In the paper the object classification is done by another system that is reading the HTM outputs. I don’t think there was a general representation of an object in the HTM part of the experimental setup. This seems to be a major issue as the HTM SDR is not abstracting (or modelling) the sequence, it is instead representing the current input in the context of prior input.

This is not a justification for using technical terms in the wrong way (if that is the case). Obviously whatever is going on is not intuitive so it will appear confusing, but that does not mean that a theory is on the right path because it is confusing. Without clear definitions there is little chance other researchers will understand what is being said.

We are on the same page, that is a better way of presenting the issue.

This is obvious because the cells represent “the input in the context of prior input”. It should also be obvious that the information contained in the active dendrites needs to be learnt i.e. the information is not just the input and context, information is learnt over a history of previous sequences and that information is not available in the current output (it is in the active dendrites).

Perhaps this serves to make the point that if the next layer up can learn everything the layer below it knows then there is clearly no abstraction going on and therefore not much useful modelling.

I know this was your point, but just highlighting it – this can’t be the way it works. Some form of traditional hierarchy must also be part of the cortical algorithm (in addition to “precision through voting” framework). Any framework which abandons it is going to have a really hard time explaining abstract concepts like “democracy”, etc.

I think the earlier point that was being made is that any element in a sequence could itself be a label for the whole sequence (even without a hierarchy). We often write an input in context as “D after C after B after A”, but it is could also be written as “D in the sequence ABCDEFG”, because it is unique to the sequence (same is true for the composites of an object – the representation is unique to the object, and I presume the same will also true for unfolding actions) Therefore from an information perspective, knowing D in context holds all the information for any other population of cells to learn that G is going to happen at some point down the road (like maybe I get some food when I get to G, for example, and can judge how good or bad is the fact that we have this context for D active right now).

We should look for good terms to use for the aforementioned “encoding the same information” concept, as well as the “predictions are internal/passive until learning is required” one.

A label that changes on every input and is not invariant under noise is perhaps not a great label.

1 Like

Definitely. Hierarchy helps with that point (once HTM gets there), as we will be representing more stable “labels” of the activity. Hopefully the illustrations in the thread that I am working on will demonstrate this point a bit easier than the walls of text that I have been posting :wink:

I like to think of the temporal memory as representing the sequence-so-far. For example, it’d represent B in ABC same as the B in ABD. It’s less confusing to just say sequence though.

To make sure we’re talking about the same thing:
Neurons take like 2 milliseconds to respond to presynaptic neurons firing, whereas it takes like a hundred milliseconds or more to recognize something, sometimes many seconds. Why does that indicate there’s prediction going on?

Are you saying the delay comes from making predictions while viewing a static image for the 100+ ms it takes to recognize it? The delay could come from other things, like recurrence, slow receptors, and firing rate codes (10 hz would take 100 ms for 2 spikes).

Excitatory response times for quick & slower receptors

AMPA receptors trigger EPSPs peaking very quickly, whereas NMDA receptors take like 20 ms to reach their peak (https://www.sciencedirect.com/science/article/pii/S0006349599769900) and metabotropic receptors produce responses lasting hundreds of milliseconds to several seconds (The Function of Metabotropic Glutamate Receptors in Thalamus and Cortex - PMC).

I’m confused. Are you saying HTM doesn’t do predictive processing, it does, or neither? Is it just about the clarity of HTM papers? I agree predictive processing is a confusing term in the quotes you give from the locations paper. I’m trying to explain the way in which I think HTM is predictive processing (so what the paper could say to be clearer).

The paper mentions “predictive processing” twice. That’s not really a term it coins, and it doesn’t seem to mean something specific. I guess it’s more like a topic.

First, it says

Predictive sensorimotor processing also occurs in the hippocampal formation. … Another population of neurons selectively become active when an animal arrives at a location where a previously present object is missing (Tsao et al., 2013), indicating that the system is predictive.

It’s not saying there’s predictive firing in that case. It’s just saying there’s something related to prediction going on. It’s part of setting up context, leading to:

Thus, different areas of the brain that seemingly play different roles in cognition display hallmarks of two common computations: integration of information over sensorimotor sequences, and prediction of sensory stimuli after movements.

The other place it mentions predictive processing is about a specific phenomenon.

Our model provides an alternate explanation for predictive processing in visual cortex. Existing models of saccadic remapping suggest that it occurs by shifting attended parts of the image across visual cortex (Wurtz, 2008)

Presaccadic predictive remapping is a phenomenon where a neuron starts spiking early, right before a saccade brings a visual feature into the neuron’s receptive field. That description simplifies a complicated topic.

In our model, each patch of visual cortex computes the location of a patch of retina relative to the attended object, then uses this location to predict sensory input. As the eyes saccade over a static object, our model would not require any horizontal shifting of information within the visual cortex. … Implementing this extended model is a topic for future research.

It’s saying their model could do presaccadic predictive remapping without needing fibers connecting every part of a cortical region to every other part, which would take up a lot of space and use more calories. Not necessarily their current model, but an extended one.

I’m not sure it needs an extended model. The current model has no cells firing when predicting the next input, but it does represent features in reference frames. In egocentric regions, presaccadic predictive remapping may just be updating locations. They aren’t necessarily even being updated ahead of time, even if they’re updated right when the motor command happens. The motor command is right before the movement, but for purposes of subsequent motor commands, it can consider the location as already changed because the next motor command will have the same delay before the actual movement. It kind of has to do that to do a rapid sequence of saccades.

Finally getting to the point, that’s a way to predictively fire right before the next input, without neurons firing to predict the next item in a sequence.

Right, I didn’t mention that for the sake of simplicity, but yes, the B alone represents a set of similar “objects” rather than a single object. You still need a hierarchy to capture their semantics and differentiate them at the time B is occurring.

“Even if each spike arriving at a neuron evoked a voltage blip big enough to push the neuron to its tipping point, there’d still be a delay of at least 10 milliseconds from the spike arriving at the gap to the new spike arriving at its destination—and longer still if the axon is slow or long or both.” Humphries, Mark. The Spike (pp. 147-148). Princeton University Press. Kindle Edition.

“In less than a second, most neurons do not have time to make a spike. And the few that do can send at most a handful. And even then, the last few of that handful will arrive at their target neurons after those neurons have already sent their own spikes. How then do we get spikes from the eye to the front of the cortex in under 150 milliseconds? We need another solution.” Humphries, Mark. The Spike (pp. 150-151). Princeton University Press. Kindle Edition.

I’m at a loss for the rest. If at this stage you don’t see the issue I don’t think another post will help. Perhaps reading on “predictive coding” or “predictive processing” in the context of other research would help.

1 Like

BTW, I should point out that Numenta is not blind to this topic. I have watched several videos where the concept of “active predictions” has been discussed. Where they are focused at the moment hasn’t required it as yet, but from a framework perspective, they likely will be (in particular I don’t really see any way around it for unfolding actions).

Just to give one random example of where this was a sticking point in a conversation was this interaction a couple years ago where Matt was picking Jeff’s brain to help him define a system diagram of the layers as they were understood at that time (this video is outdated, just using it to illustrate the point).

(sorry, if you don’t want to watch the whole thing, the sticking point in the conversation was at 4:00, leading to the reference to active predictions at 12:49)

2 Likes

By this logic there should be around 10 layers of neurons between the sensory organ and frontal cortex. And this property should constrain any valid theory of how the brain functions. So I took my personal hypothesis and counted the minimum number of layers:

  1. Sensory organ
  2. Thalamus
  3. Sensory cortex (layer 4 & 6)
  4. Sensory cortex (layer 2/3 & 5)
  5. Striatum
  6. Globus Pallidus
  7. Subthalamic nucleus
  8. Thalamus
  9. Frontal Cortex (layer 4 & 6)

Notice that the input only passes through the sensory cortex once, instead of propagating up a hierarchy of cortical areas. This is because the hierarchy of cortical areas is like an image pyramid which simply processes the same image at different scales, and each different scale of the image can be processed in parallel.

[1] https://journals.physiology.org/doi/full/10.1152/jn.1999.81.3.1171
In 6 ms [1], sensory stimulus → trigeminal nucleus → thalamus → cortex.

It is frustrating to need to defend a neuroscientist’s book published in 2021 to an amateur throwing around references from a paper published in 1999. I guess the idea is that I should read the paper for you, read the book for you, summarise both, and politely draw attention to your misunderstanding. At least for the last part you will have to put up with sarcasm over politeness.

It could not be that you are wrong it must be the Chair in Computational Neuroscience who is wrong.

There are (of course) a wide variety of spiking behaviors, the 10 ms I referred to is in regards to spiking within the neocortex. Considering the response times of neurons in sensors and motor pathways is missing the point.

The paper describes spontaneous activity, which is not modelled in HTM. The suggestion of Humphries is that this is what allows for predictive processing.

Taking some of the fastest neuron response times (sensory pathways) then averaging them, then imagining that this is representative of all neurons, is not a valid approach to understanding.

If you believe the system is a reactive system and not using predictive processing that is fine by me. The point of this thread is to encourage you to avoid using the term “predictive processing” when that is not what you mean.

A great topic for a new thread.

That is not so much my concern. It is more about representing (in the papers) the actual algorithm as it is implemented. In predictive processing the central theme is anticipation i.e. action in the present based on prediction of the future. This was the central theme of On Intelligence too. But what Numenta have built is not based on that principle, papers should be about what is currently implemented or make it clear, for example, “we speculate that in the future we will have a hypothesis of a mechanism that could implement predictive processing” :slight_smile:

1 Like

Of course, no disagreement here – definitions matter. As mentioned before, the term isn’t used extensively in the papers, and a cursory read of a few search results (without a deep dive into how the term is more broadly used) is very easy to misclassify.

Case in point: the quote that was posted earlier from Wikipedia, for example, actually beautifully summarizes the TM algorithm: “a theory of brain function in which the brain is constantly generating and updating a mental model of the environment. The model is used to generate predictions of sensory input that are compared to actual sensory input. This comparison results in prediction errors that are then used to update and revise the mental model.”

Devil is in the details…

2 Likes

Transmitting predictions outside of a layer is not needed until you start building a hierarchy. Numenta is not working on hierarchy yet. Keep in mind that these are still the early stages of defining the algorithms and implementing them.

1 Like

I know I’m an amateur, I just love neuroscience except lab work. I spent a year reading neuroscience papers like it was my job. If you don’t want me to share what I know, I’ll stop nerding out.

1 Like