Predictive Processing vs Predictive Dendrites

Of course, no disagreement here – definitions matter. As mentioned before, the term isn’t used extensively in the papers, and a cursory read of a few search results (without a deep dive into how the term is more broadly used) is very easy to misclassify.

Case in point: the quote that was posted earlier from Wikipedia, for example, actually beautifully summarizes the TM algorithm: “a theory of brain function in which the brain is constantly generating and updating a mental model of the environment. The model is used to generate predictions of sensory input that are compared to actual sensory input. This comparison results in prediction errors that are then used to update and revise the mental model.”

Devil is in the details…

2 Likes

Transmitting predictions outside of a layer is not needed until you start building a hierarchy. Numenta is not working on hierarchy yet. Keep in mind that these are still the early stages of defining the algorithms and implementing them.

1 Like

I know I’m an amateur, I just love neuroscience except lab work. I spent a year reading neuroscience papers like it was my job. If you don’t want me to share what I know, I’ll stop nerding out.

1 Like

This is why you and I do not write papers on the topic. When publishing a paper the author is meant to take the time to get the terminology correct.

This is not what HTM is doing. There is no theory I’m aware of that explains how this would work with HTM. The actual learning algorithms would have to be far clearer before any claim like that could be made. In general there seems to be a lack of a clear understanding of what a model is and how it would be generated. A model is not the idea of recalling a sequence. For example, it could be statistical (like a DNN) or it could be causal allowing for inferences. To begin with it probably needs some sort of invariance which HTM does not have.

Finally I am going to react to this :slight_smile: This is the chorus of the HTM community. But I don’t think it is correct. As I understand it Numenta has implemented hierarchy, there is a networking infrastructure to support this. I think they have been trying to figure out hierarchy since they started using the term HTM. It is not that they have not got to it yet, it is that the efforts have not led to the types of success other layered ANN have achieved. It is perhaps more interesting to ask why, rather than claim they have not tried.

Given that Jeff has been working on this for perhaps 40 years I don’t think it is reasonable to say they are in the early stages. Furthermore, Jeff has already claimed to have provided the framework for intelligence, so it seems we are quite late in the game in Jeff’s opinion.

That is not the point. You should also take seriously what others are pointing out. I gave you the reference of a neuroscience book from 2021 and you replied with a reference to a study from 1999. As if that neuroscientist, who wrote a book about The Spike, is obviously misinformed.

1 Like

Numenta does have an official position on the question of “Hierarchy”, which I had forgotten about:

As before, definitions matter. HTM does make models in my understanding of the word. How would you define a “model”?

You may be correct, but I personally disagree with this interpretation. I believe the networking infrastructure, besides setting up NuPIC as an expandable framework from the get-go, is at the moment used for two purposes – practical use of HTM in real-world or toy applications (where knowing all the biological details are not necessary), and to provide building blocks to be used as placeholders for various aspects of a circuit. for which a deep research dive has not yet occurred. This allows them to quickly hook up a new idea (glossing over the finer details) so they can tinker around with the idea.

They believe a single CC is far more capable than most view it, and the direction has (for as long as I have been observing at least) been to fully understand a single CC by itself before leaning on hierarchy to solve a problem (Jeff views this as a cop-out to cover for not understanding something well enough – I couldn’t find an example off hand, but there are a number of videos of him taking this perspective). This is a preference, of course (you can’t boil the ocean, so you must start somewhere). There are other approaches that can be taken (I for example am starting with hierarchy, and less focused on biological plausibility).

At the moment, they are obsessed with figuring out reference frames, believing it to be one of the key ideas of understanding the cortical circuit, and they are not focused at all on hierarchy yet. It is of course impossible to know someones motivations, but their research meetings are publicly available, so it is not difficult to watch them working and see where their heads are at.

This was a subjective observation on my part, sorry. My perspective is that there is frankly nothing yet to compare it with to estimate what a “normal” total timeline should be (nobody has achieved a working model of the neocortex). I say it is early stages, because even the things we know (and not counting the things yet to be discovered that we don’t know), there is a longer list of TODOs than “dones” (reference frames, ego-centric<>allocentric, timing, reinforcement, behavior, attention, consciousness, thinking, planning, decision making, etc just to name a few).

That may be. To your point, I do recall several videos where Jeff has made comments indicating that he feels like they are getting close to understanding a single CC (but typically not long after those, a slew of new unknowns surfaces). I think that is human nature (the more you know about a topic, the more you understand what is yet unknown).

Anyway, I understood the point about presenting a framework to be that neuroscientists have been studying and accumulating a massive wealth of knowledge about the brain, but we sort of spinning our wheels on putting it all together without a theoretical framework to view it in some higher-level perspective.

That is one aspect of hierarchy they have put forward, yes (a means of handling input at different scales). They also have reiterated in other videos and talks, that they are not throwing out the traditional form of hierarchy, either, only adding to it. I don’t think you can ever build an abstract concept like “democracy” with the scaling mechanism described in that video alone.

I haven’t commented on it because it’s irrelevant to what I’m saying. There’s very little intrinsic delay in firing. Delays come from other things, which I was trying to explain what those things are in an earlier reply. Firing totally can have a lot of latency, and I tried to explain why neurons taking 2 ms to fire doesn’t contradict the 100 ms it takes to recognize an object, when I thought you knew neurons can fire so quickly.

2 Likes

I am having trouble understanding why there is any trouble understanding how spike trains are compatible with HTM. The leading edge of a pulse train forms a signal as is described in HTM. I don’t think anyone every suggested that there was anything else every going on but spikes. The following part of a pulse train can contribute to the 40 or so synapse firings that integrate into an action potential in a given region of a dendrite.

Again - totally compatible with basic HTM theory.

I feel like I stated clearly earlier that the choice of the word “predictive” was not a perfect match with what is going on with cortical columns but it is more like prediction than most other words you can use to describe the basic collective action of neocortex.

And what is that? Is it a markov chain? Is it some sort of Bayes combination of priors and predictions?

As sensations come in the cortex acts to recall some prior learned pattern - a form of pattern completion. These recalled patterns include learned sequences. But not at all like replaying a movie. At each moment in time the current matchup of the constructed internal activation has a fuzzy cloud of possible futures. Any of the learned fuzzy cloud of futures is “normal” and will not trigger as an anomaly. One may quibble at the choice of the word “model” but I am at a loss to suggest a better word choice - a squeak of the floor, a wrinkle in your sock, a bug on your skin, your sleeve rubbing on a tree branch, a rain drop on your head - any of these is detected as a deviation from “something” you are constructing in your head. Moment by moment this internal state is updated with reality and new patterns are invoked (recalled) to match the ongoing sensations. A new cloud of possible futures is invoked at the same time.

The short term cloud of possible sequences based on prior experience is sufficient to learn that something novel requires attention. It is hard to overstate what a valuable contribution this makes to survival.

A true prediction would quickly run into the impossible wall of combinatorial explosion.

The 100 step rule is valid from sensations to actions but it imposes an outside limit on processing steps. It has very little to say about how this time is distributed as processing in the various pathways through brain.

2 Likes

This.

It’s not practical to represent predictions with firing neurons. Maybe it’d work for one timestep of temporal memory, say like 50 milliseconds, but that’s kind of useless for perceiving the world (how much happens in 50 ms? If not 50 ms, how long before the combinatorial explosion but still long enough? That’s probably not going to work out.) So you gotta use mechanisms related to prediction but not exactly the same, e.g. anomaly detection.

Check out this HTM paper. You need to know about object disambiguation to fully understand what the current thinking is about, e.g. locations.

Keep track of a list of possible objects, and get rid of objects which don’t have each next sensed feature. The rest builds on that.

Paul Lamb was talking about this when he mentioned the output layer and voting.

https://www.frontiersin.org/articles/10.3389/fncir.2017.00081/full

Here are the key ideas. #5 is kinda the whole thing, everything before that is just context for what’s going on. #6 just adds voting.

  1. An object is a set of features at locations on object.
  2. In this paper, we just get the location of the feature, from who knows where. Location is just extra information about what’s being sensed, so you could do this without that extra info, it just wouldn’t work well in practice.
  3. First let’s just consider a single cortical column. It “sees” a patch of the sensor, such as a fingertip.
  4. The fingertip touches a random feature one at a time.
  5. For the first feature, it gets a list of objects with that feature @ location. Then for the next feature it senses (e.g. by moving the fingertip randomly), narrow the list down, leaving only the ones with that feature @ location. Repeat for each feature. Eventually it narrows the list down to one.
  6. You can touch multiple features at once. So there’s one cortical column for each each sensory patch e.g. fingertip. Whenever one touches a feature or multiple each touch a feature, they do what #5 explained. Then all the columns compare their lists, removing any objects not in all those lists. They all end up with the same list. That’s voting.

The output layer (or whatever you want to call it) narrows down a list of possible objects. In that way, it’s prediction about what you’ll see in the near future, using firing cells. So I guess that’s sorta predictive firing. There’s also anomaly detection going on, like when it rules out every object it had in its list of possibilities (the paper doesn’t touch on that topic).

Some newer ideas apply the output layer to other things besides objects, like locations. Maybe predictive firing is all over the place in cortex, just not predictive firing in terms of sequences. More like predictions about what’s going on in the world before you see what’s already out there.

This is a general misunderstanding. In some cases it is true. In the neocortex there is an integration window that can be in the order of 20ms. But I guess it can be even longer and there are inhibitory dendrites that are much stronger. Also there can be a relatively long delay for a neuron to recover after firing.

Also interesting to keep in mind that most neurons are firing very rarely and about 50% of the activity in the neocortex is generated by the same 10% of neurons. This seems like a major clue that a viable theory would need to explain.

This thread is about the term “predictive processing” and comparison with research where that term means something much more than active dendrites in HTM (where predictive is the right term).

I would avoid the use of the term sensation as this is associated with conscious experience which is delayed compared to sensory processing.

For all living organisms without nervous systems it was quite easily overstated!

With great difficulty! There are lots of different types of model. Perhaps we could make the case that HTM is modelling the sensor. I’m trying to pull together a presentation on models in machine learning at HLC. For the claims Jeff makes the model would need to allow for inference based on modeling the environment. That would require generalization and a degree of invariance. I don’t see those topics even discussed in TBT but maybe I missed that part.

People have been proposing frameworks since the 1950s at least. This is one of the problems with TBT. It can leave the impression it is the only hypothesis when there are many (arguably more sophisticated) frameworks in existence eg ART or Leabra.

I was thinking about how to bring this thread to some sort of conclusion. I see a few different lines of thought:

  1. General agreement that “predictive processing” is a term that is not used with sufficient rigor in that paper.
  2. Support for something like “passive prediction” from @Bitking and @Casey where processing is primarily (fast) reaction to the environment.
  3. Support for an “active prediction” which would be more inline with predictive processing from @Paul_Lamb along with plans to publish a new algorithm implementing this.

I like the predictive processing story and I’m not sure which architectures get closest to it. I wonder if Dileep Georges recent work is more along these lines. From a recent paper “making precise and falsifiable biological mappings need models that tackle the challenge of real world tasks” … “Efficient inference and generalization guided the representational choices in the original computational model.” … “The derived model suggests precise functional roles for the feed-forward, feedback, and lateral connections observed in different laminae and columns, assigns a computational role for the path through the thalamus, predicts the interactions between blobs and inter-blobs, and offers an algorithmic explanation for the innate inter-laminar connectivity between clonal neurons within a cortical column.” I particularly like that recurrence is an integral part of the model, rather than something that is being put off “for later.”

This is not to take away from Numenta’s work. Let a 1000 flowers bloom … noticing that it is attention to the differences that allows for that appreciation.

P.S. In HLC we are debating using the term microcolumn instead of minicolumn to refer to the HTM minicolum - distinguishing the single layer HTM minicolumn from the multi-layered biological minicolumn. Terminology really does matter…

I looked at a few neuroscience papers which use that term and they don’t use it for a precise meaning. Only one of them mentions predictive firing (maybe, not sure it was talking about that) and all of them are mostly about prediction error. I agree papers should be precise about what they mean when they use the word prediction.

Off topic stuff

Yes but if they get strong enough excitation immediately, they can fire quicker than that. You see that in the most extreme case, where you directly inject current into the cell, no synapses involved. When it comes to synapses, some receptors produce EPSPs rising very quickly (much less than 20 ms), while others produce EPSPs which take longer to rise. Also, EPSPs can last longer than 20 ms, which just means the integration window is longer in effect sometimes. For metabotropic receptors it can be hundreds or thousands of milliseconds duration.

In the source I referenced, neurons in cortex fired within the 6 ms. From memory (I’ll find a source if you want), multiple layers fire around the sound time during the initial response. I think it’s everything besides L1, L2, L5a, and L6b, and of course some cells in the other (sub)layers.

L5b is about 10% of neurons (Number and Laminar Distribution of Neurons in a Thalamocortical Projection Column of Rat Vibrissal Cortex | Cerebral Cortex | Oxford Academic) and fires a lot more than other layers. I don’t know if it’s those 10%.

This isn’t known as far as I’m aware. We don’t know whether minicolumns extend through all layers, or are separate in each layer, or continue between some layers, maybe not adjacent layers. (for example L5a and L5b have separate minicolumns maybe: https://science.sciencemag.org/content/358/6363/610.full)

In case you missed it, this article describes how they think viewpoint-invariance happens. Their examples use fingers instead of eyes, but the same principles apply.

2 Likes

In the context I think they would need to use the term as per the paper they are referencing.

You might have more luck searching for the term “predictive coding” It is widely used enough that it has a wiki page with “predictive processing” as a synonym.

Are you referring to the neurobiology? The minicolumn as per Mountcastle is the structural of the neocortex and is across all layers.

1 Like

Frontiers published a group of articles on “predictive coding” in 2012.

About this Research Topic

The brain is constantly confronted with a wealth of sensory information that must be processed efficiently to facilitate appropriate reactions. One way of optimizing this processing effort is to predict incoming sensory information based on previous experience so that expected information is processed efficiently and resources can be allocated to novel or surprising information. Theoretical and computational studies led to the formulation of the predictive coding framework (Friston 2005, Hawkins and Blakeslee 2004, Mumford 1992, Rao and Ballard 1999). Predictive coding states that the brain continually generates models of the world based on context and information from memory to predict sensory input. In terms of brain processing, a predictive model is created in higher cortical areas and communicated through feedback connections to lower sensory areas. In contrast, feedforward connections process and project an error signal, i.e. the mismatch between the predicted information and the actual sensory input (Rao & Ballard, 1999). The predictive model is constantly updated according to this error signal.

Although central concepts of this framework reach back to early perception science (Helmholtz 1863), these ideas remain in conflict with mainstream models of cortical processing in which feedforward projections integrate essential information and feedback connections serve only modulatory purposes (i.e. gain control).

In recent years however, the concept of predictive coding has been validated by a number of brain imaging studies investigating predictive feedback and the processing of prediction errors (i.e. Alink et al. 2010, Bar 2007, DenOuden et al. 2010, Egner et al. 2010, Rauss et al. 2011, Smith and Muckli 2010, Summerfield et al. 2006, Todorovic et al. 2011). Predictive coding is considered a significant paradigm shift in neuroscience, affecting every level of cortical processing and warrants inclusion in a unifying theory of the brain (Friston 2010), even though empirical evidence remains relatively scarce.

This research topic will focus on the latest evidence for the core features in the predictive coding framework – the role of (1) predictive feedback and (2) forward projected prediction errors. The term ‘predictive coding framework’ is adopted here to accommodate different models. Theoretical contributions, reviews, and empirical contributions using neurophysiological and brain imaging methods are welcomed for this issue.

Source: https://www.frontiersin.org/research-topics/599/predictive-coding

1 Like

You’re on your own with this one, sorry.

But not at all, we are 6 discussing it in HLC :slight_smile:

1 Like