First of all, I am sorry for the overall incoherence of this post. These are some vaguely connected newbie questions I have about HTM.
Is it reasonable to see HTM column learning the structure of its input as analogous to bayesian estimation of hidden model parameters (or performing non-parametric bayesian inference)? Is temporal memory a way of estimating unknown probability density function (functionally similar to e.g. a bayes filter without markov property or just expectation maximization)? And most pretinently, temporal pooling as a high order sequence classification?
The reason I am asking if these analogies hold is, and please, correct me if I’m wrong, on the first sight, HTM seems quite pliable to the idea of hierarchical bayesian inference model of the large scale organization of the cortex: a region can learn high order sequences, “label” (infer) them via TP, and feed the labels from L2/3 up the hierarchy to L4 of higher regions, which learn sequences of unions of the labels coming from lower areas, and feed back contextual information (a prior or a conditional) to L1 of lower regions.
Another candidate for hierarchical model is predictive coding (Rao & Ballard, Friston, …), which speculates that errors are fed up and predictions (formed via prediction error accumulation in L5) are fed back (via inhibition). It also maps nicely to the physiological evidence.
And finally, there is classical backprop. There seems to be experimental evidence for both (BP, PC) these frameworks, so the best current working hypothesis is that brain is doing a mixture of the two.
So, although I know that hierarchy is not the focus of Numenta’s research (although this blog post suggests they have some ideas), hopefully it’s not a taboo topic. I would like to know what does the community think about if and how these hierarchical frameworks may apply to HTM?
I’ve been thinking of asking this for a while, and figured this was a good thread to ask it. What mechanism (from the perspective of cell states and synapses) would be involved in the transmission of “error”?
I interpret “error” as how far off a prediction was from an actual result. This is used to strengthen/create synaptic connections which match the actual result, and degrading those which do not. I’m having trouble visualizing how this could be used to form some type of representation that could be transmitted to other levels of a heirarchy.
My understanding is that in the context of predictive coding, an “error” is the difference between prediction (coming from a higher level in the hierarchy) an the actual input. This difference would be computed in the superficial pyramidal cells of the lower region via feedback inhibition. This difference would then be fed higher up, so that the higher region could update its prediction to minimize the error. I am not sure if this answers your question, though.
I did a lot of studying about Bayesian inference models and probability theory recently. What Bayesian inference will not tell you about intelligence is HOW it works. You could create a probability-based prediction model for any dynamical system, and it might match the predictions of the system it is modeling, but it does not tell you how the system works.
We are interested in how intelligence works in biology, so we are not focusing on anything Bayesian. Of course there is probability at work within HTM and in your brain. Probability is essentially a part of every process. And Bayesian techniques are extremely powerful. But we don’t think they hold the answer to how intelligence works.
If you really want to pursue these ideas, you should read the 2009 paper from Jeff and Dileep (note the editor of this paper is Friston). Numenta abandoned Bayesian models when Dileep left to found Vicarious after this paper was released (they continue with the Bayesian work towards intelligence). If you read the paper, you can easily see the dichotomous tone between Jeff and Dileep already.
So, as I found out recently, this ground has been trodden before, almost 10 years ago. Jeff and the rest of Numenta are going in the biological direction. Others more interested in probability theory continue onward in the mathematical direction.
As I understand it, though, predictive states do not transmitted information (they simply give a cell an advantage to activate sooner and inhibit it’s neighbors).
Rather than transmitting predictions, I have always visualized heirarchy in my mind, as making predictions based on input not just from the same level, but also the next level up.
If some cells are predicted by activity only in the current level in the hierarchy, while others are predicted by activity also from the higher level, the ones with both levels predicting them win and inhibit the others. The winners have their synapses adapted to better align with the activations from both levels.
Not sure if that is the same mechanism you are imagining though.
mirgee, we agree feedback is sent down from N+1 to N. I would say the purpose is to drive the N layer neurons to change to minimize the error, as opposed to the N+1 neurons changing.
David Cox when he was at Harvard looked at networks that used error correct coding as the feed forward channel. One thing he found was simple tasks only needed a few layers to arrive at the complete solution and the subsequent layers learned nothing. More complex tasks needed more layers.
I would say Friston’s take of hierarchical Bayesian inference is fully neurobiological, see eg. https://www.ncbi.nlm.nih.gov/m/pubmed/23177956/ and for tests of theory against high resolution data, eg.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5312791/
One basic neurobiological prediction of the theory is that computation is performed by minimizing errors across the hierarchy. These are computed locally by L2/3 pyramidal cells (see first paper above). To test if this is the case, one can analyse data from thin, laminar electrodes (see second paper).
Let me correct what I said. In PC, prediction error is the difference between the expected cause of sensory input encoded in the activity of superficial pyramidal cells (maybe analogous to the object representation in HTM) and predictions of it originating from the deep L5 pyramidal cells. Friston explains about how these errors are generated in some detail here - can’t say it makes much sense to me.
The mechanism you describe, i.e. modulatory feedback connections carrying higher level predictions is in accordance with both the bayesian hypothesis (where they would use rather the terms prior, bias or maybe attentional factor) and predictive coding. The former relies mostly on modulation, the latter mostly on inhibition, and both seem to actually occur (see “Are Feedback Connections Excitatory or Inhibitory?” in here).
OK, but what is N+1 doing with its input and at what point does it change its state?
Interesting. This is exactly what is needed and also can be quite powerful in combination with Numenta’s finding about the capacity of a single column. I’ll have to do more research on when does the brain do PC vs. backprop and why.
Just because it can replicate the functionality of a biological system does not mean it is “neurobiological”. These techniques are modeling populations of neurons. The models might be great, but they can’t explain why they exist. If we understand the biological dynamics of neuron populations, it will bring us closer to implementing intelligent systems. We have to understand how it works, modeling is not enough.
I think we agree. IMHO, there are many ways to understand “how it works”. Focusing on detailed neural networks is one. Coming up with new approaches for testing/inferring principles of computations in multiple datasets is another. They have slightly different starting points, but, again, IMHO they are very close and can be combined: the common feature is the focus on neurobiology, for example trying to explain the different roles of different cortical layers in performing intelligent computations.