HTM vs. bayesian inference (network), predictive coding

mirgee · March 31, 2018, 12:28pm

First of all, I am sorry for the overall incoherence of this post. These are some vaguely connected newbie questions I have about HTM.

Is it reasonable to see HTM column learning the structure of its input as analogous to bayesian estimation of hidden model parameters (or performing non-parametric bayesian inference)? Is temporal memory a way of estimating unknown probability density function (functionally similar to e.g. a bayes filter without markov property or just expectation maximization)? And most pretinently, temporal pooling as a high order sequence classification?

The reason I am asking if these analogies hold is, and please, correct me if I’m wrong, on the first sight, HTM seems quite pliable to the idea of hierarchical bayesian inference model of the large scale organization of the cortex: a region can learn high order sequences, “label” (infer) them via TP, and feed the labels from L2/3 up the hierarchy to L4 of higher regions, which learn sequences of unions of the labels coming from lower areas, and feed back contextual information (a prior or a conditional) to L1 of lower regions.

Another candidate for hierarchical model is predictive coding (Rao & Ballard, Friston, …), which speculates that errors are fed up and predictions (formed via prediction error accumulation in L5) are fed back (via inhibition). It also maps nicely to the physiological evidence.

And finally, there is classical backprop. There seems to be experimental evidence for both (BP, PC) these frameworks, so the best current working hypothesis is that brain is doing a mixture of the two.

So, although I know that hierarchy is not the focus of Numenta’s research (although this blog post suggests they have some ideas), hopefully it’s not a taboo topic. I would like to know what does the community think about if and how these hierarchical frameworks may apply to HTM?

Paul_Lamb · March 31, 2018, 2:08pm

I’ve been thinking of asking this for a while, and figured this was a good thread to ask it. What mechanism (from the perspective of cell states and synapses) would be involved in the transmission of “error”?

I interpret “error” as how far off a prediction was from an actual result. This is used to strengthen/create synaptic connections which match the actual result, and degrading those which do not. I’m having trouble visualizing how this could be used to form some type of representation that could be transmitted to other levels of a heirarchy.

mirgee · March 31, 2018, 2:40pm

My understanding is that in the context of predictive coding, an “error” is the difference between prediction (coming from a higher level in the hierarchy) an the actual input. This difference would be computed in the superficial pyramidal cells of the lower region via feedback inhibition. This difference would then be fed higher up, so that the higher region could update its prediction to minimize the error. I am not sure if this answers your question, though.

rhyolight · March 31, 2018, 2:43pm

I did a lot of studying about Bayesian inference models and probability theory recently. What Bayesian inference will not tell you about intelligence is HOW it works. You could create a probability-based prediction model for any dynamical system, and it might match the predictions of the system it is modeling, but it does not tell you how the system works.

We are interested in how intelligence works in biology, so we are not focusing on anything Bayesian. Of course there is probability at work within HTM and in your brain. Probability is essentially a part of every process. And Bayesian techniques are extremely powerful. But we don’t think they hold the answer to how intelligence works.

If you really want to pursue these ideas, you should read the 2009 paper from Jeff and Dileep (note the editor of this paper is Friston). Numenta abandoned Bayesian models when Dileep left to found Vicarious after this paper was released (they continue with the Bayesian work towards intelligence). If you read the paper, you can easily see the dichotomous tone between Jeff and Dileep already.

So, as I found out recently, this ground has been trodden before, almost 10 years ago. Jeff and the rest of Numenta are going in the biological direction. Others more interested in probability theory continue onward in the mathematical direction.

Paul_Lamb · March 31, 2018, 4:14pm

As I understand it, though, predictive states do not transmitted information (they simply give a cell an advantage to activate sooner and inhibit it’s neighbors).

Rather than transmitting predictions, I have always visualized heirarchy in my mind, as making predictions based on input not just from the same level, but also the next level up.

If some cells are predicted by activity only in the current level in the hierarchy, while others are predicted by activity also from the higher level, the ones with both levels predicting them win and inhibit the others. The winners have their synapses adapted to better align with the activations from both levels.

Not sure if that is the same mechanism you are imagining though.

Ed_Pell · March 31, 2018, 4:19pm

The Friston paper is excellent. He also has a paper in 2017 continuing this work. https://arxiv.org/abs/1709.02323

mirgee, we agree feedback is sent down from N+1 to N. I would say the purpose is to drive the N layer neurons to change to minimize the error, as opposed to the N+1 neurons changing.

David Cox when he was at Harvard looked at networks that used error correct coding as the feed forward channel. One thing he found was simple tasks only needed a few layers to arrive at the complete solution and the subsequent layers learned nothing. More complex tasks needed more layers.

dimitrispp · March 31, 2018, 5:33pm

I would say Friston’s take of hierarchical Bayesian inference is fully neurobiological, see eg. https://www.ncbi.nlm.nih.gov/m/pubmed/23177956/ and for tests of theory against high resolution data, eg.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5312791/
One basic neurobiological prediction of the theory is that computation is performed by minimizing errors across the hierarchy. These are computed locally by L2/3 pyramidal cells (see first paper above). To test if this is the case, one can analyse data from thin, laminar electrodes (see second paper).

mirgee · March 31, 2018, 9:55pm

Let me correct what I said. In PC, prediction error is the difference between the expected cause of sensory input encoded in the activity of superficial pyramidal cells (maybe analogous to the object representation in HTM) and predictions of it originating from the deep L5 pyramidal cells. Friston explains about how these errors are generated in some detail here - can’t say it makes much sense to me.

The mechanism you describe, i.e. modulatory feedback connections carrying higher level predictions is in accordance with both the bayesian hypothesis (where they would use rather the terms prior, bias or maybe attentional factor) and predictive coding. The former relies mostly on modulation, the latter mostly on inhibition, and both seem to actually occur (see “Are Feedback Connections Excitatory or Inhibitory?” in here).

mirgee · March 31, 2018, 10:11pm

Thanks for the link. Friston is awsome.

OK, but what is N+1 doing with its input and at what point does it change its state?

Interesting. This is exactly what is needed and also can be quite powerful in combination with Numenta’s finding about the capacity of a single column. I’ll have to do more research on when does the brain do PC vs. backprop and why.

rhyolight · April 2, 2018, 4:19pm

Just because it can replicate the functionality of a biological system does not mean it is “neurobiological”. These techniques are modeling populations of neurons. The models might be great, but they can’t explain why they exist. If we understand the biological dynamics of neuron populations, it will bring us closer to implementing intelligent systems. We have to understand how it works, modeling is not enough.

dimitrispp · April 2, 2018, 5:13pm

I think we agree. IMHO, there are many ways to understand “how it works”. Focusing on detailed neural networks is one. Coming up with new approaches for testing/inferring principles of computations in multiple datasets is another. They have slightly different starting points, but, again, IMHO they are very close and can be combined: the common feature is the focus on neurobiology, for example trying to explain the different roles of different cortical layers in performing intelligent computations.

rhyolight · April 2, 2018, 5:45pm

Yes, I agree we can make more progress working together. I would be really happy to see some collaborations between different camps.

Topic		Replies	Views
Predictive Processing vs Predictive Dendrites Numenta Theory	56	1741	April 27, 2021
Implementation of "Towards a Mathematical Theory of Cortical Micro-circuits" Tangential Theories research , cortical-columns , bayesian-inference , minicolumns , predictive-coding	8	694	March 17, 2023
How would you create a Bayesian inference model of an HTM system? Numenta Theory bayesian-inference	6	715	March 20, 2018
Can the brain do backprop? General Neuroscience	30	3096	July 2, 2018
Predictive systems FTW! Lounge gqn , predictive	6	736	July 27, 2018

HTM vs. bayesian inference (network), predictive coding

Related topics