I’m not sure where to send this feedback, so I’ll post it here. It would be easier to provide feedback if the PDF had page numbers and section numbering. I’m using the December 10, 2019 revision.
In the section Temporal Memory Algorithm Steps:
“If the cell is active due to lateral connections to other nearby cells we say it is in the ‘predictive state’ (Fig. 3).” is misleading, it is tempting to imagine that the cells in “predictive state” are active based on this sentence.
“Figure 3 At any point in time, some cells in an HTM layer will be active due to feed-forward input (shown in light gray). Other cells that receive lateral input from active cells will be in a predictive state (shown in dark gray).” this figure is confusing because it shows an arrow to the Next Level as if the predictive information is available to the next level.
“The cells activated by connections within the layer constitute a prediction of what is likely to happen next.” is misleading because the cells are not “activated” in the sense of being “active” they are instead put into a predictive state and the cell is not active.
I had assumed that prediction still had the same meaning as in On Intelligence "Prediction means that the neurons involved in sensing your door become active in advance of them actually receiving sensory input.” but this would need to be changed to align with BAMI e.g. "Prediction means that the neurons involved in sensing your door have an internal predictive state in advance of them actually receiving sensory input.” This is a radical change because the predictive information is not available to other cells. HTM has shifted from being a predictive framework in On Intelligence to a reactive framework (from BAMI "The resulting set of active cells is the representation of the input in the context of prior input.”)
Funnily enough, On Intelligence is providing an argument for why HTM as implemented is not going to lead to “true” machine intelligence. Jeff has said that he does not see any major differences in the framework described in On Intelligence and the current work. Shifting from a predictive to a reactive framework is a fundamental difference.
I would maybe characterize it as an “anomaly detection framework”, than a purely reactive framework. It is very good at knowing when it is predicting things incorrectly and calling that out through bursting, which is a hyper-active (less sparse, more dense) output.
I can split this to a separate thread if you think it is too off topic, but are there some scenarios that you can think of in which the view of predictions from BAMI become a difficult problem on the road to “true” machine intelligence?
The main one that I am aware of (which you also mentioned on another thread) is this one:
One way that I have implemented this particular capability (which I call “Temporal Unfolding”) is by tweaking the TM algorithm so that motor cells which are predicted both distally and apically become active. Another way to implement it is that motor-related cells just don’t behave the same way as TM cells, and their predictions are “active predictions” (i.e. rather than going to predictive state from distal input, they activate). This would be even less stable though (would certainly require refinement to be added on top, as anyone who has tried this with HTM before knows – may require a model of the cerebellum, for example), and would require either a different learning algorithm or the separation of “training” and “inferring” stages (which is counter to continuous learning)
I feel like the use of the word “prediction” brings in certain expectations that run counter to the actual HTM mechanism.
I think that most people assume that if I see “ABC” it may make sense to predict “D”. People are not as comfortable with predicting that the next letter is any of the set “ADQ15”
The basic mechanism in HTM only remembers the sequences it has seen before and reports after the fact of seeing “ABCD” that “D” was one of the learned transitions after “ABC.” There may have been several sequences that started with “ABC” where any of those sequences will not burst so we know that we have seen this before.
Saying something like “the network is reporting (by not bursting) that this is a sequence that has been seen before” would be a more accurate description of the behavior. This does not fit well with the word “prediction.” I can see how this leads to your comments about reactive vs predictive. The priming is predictive, the burst/not burst is reactive. They are both parts of the same basic mechanism and are local to the column.
Perhaps anomaly detection can be used everywhere the word prediction is found.
The concept of a passive predictive state is taken from biology, though (the TM algorithm wasn’t implemented that way arbitrarily). Most of the synapses on distal dendrites, when experimentally activated, do not cause an action potential. But if a number of them close together in space and time are activated, it still doesn’t cause an action potential, but it does have a big impact on the cell. The cell becomes depolarized, and primed to fire sooner that it normally would have when the anticipated input occurs.
Can you point to papers that show the difference between distal and proximal dendrites? My assumption is that if a neuron is already close to firing it could occur because of events on either.
This is not to say that dendrites don’t allow for “passive” prediction. There is even evidence to support dendrites having segments that perform computations like logic gates. I’m sure there are many more behaviors. The further HTM strays from the biology the harder it will be to gain insights for the algorithm from biology - that is (hopefully) my point.
Perhap it is “matching” a sequence rather than predicting. It is only predicting from the perspective of an observer noting the depolarization of the cell. The anomaly detection (or bursting) is the consequence of “not matching”
From that paper “most of the patterns recognized by a neuron do not directly lead to an action potential” which means that some do.
It would also be the case that most proximal synapse to not cause an action potential. I’ve recently read that a typical number would be 175 spikes (not all proximal). The proximal dendrites also include more inhibitory synapses (which decrease the liklihood of a spike).
I don’t think there are such simple rules as that paper implies. If you are interested in the recent neuroscience results, described for a general audience, then I highly recommend the book The Spike by Mark Humphries. But it does not have a lot of good news for HTM
I wouldn’t go so far as to say this is a death knell for HTM. We have to set the magnification somewhere, and Numenta set it at a particular level that is more granular than some, but less granular than others.
One could argue that not modelling to this level is missing important details. But if the magnification were placed there, then one could argue that not modelling molecular chemistry with high fidelity is missing important details. Then not modelling atoms, then not modelling quantum states. (incidentally, there have been a couple posts on the forum making the quantum state argument)
If we assume for a moment that the logic-gate aspect of synapses is not relevant at a macro level, then we’d have to explain what macro effect those details we glossed over should have. A couple answers that come to mind would be that it further reinforces sparsity (essentially having a random dropout sort of effect) and improves distribution and overall capacity (by recruiting cells into encodngs that wouldn’t otherwise have been if using simpler rules).
It may turn out that we missed an important feature, but hopefully that will be discovered later when we hit a roadblock on a particular capability that we are trying to model (such as feature extraction, for example)
Neither did I. Exaggerating someone’s claims is “strawmanning”. Why not read the book and then decide if it is good news for HTM?
I mentioned there are important discrepancies at the scale of the HTM algorithm! This gets ignored and you imply I am claiming there needs to be a quantum equivalence. This is not critical thinking.
A major role for theory is to provide justification for abstracting away details. For example, ignoring quantum effects in a system at a larger scale. When there are experimental results that invalidate assumptions of a theory, the scientific approach is to revise th theory, not to ignore the experimental results and blunder on. I am not saying that is what Numenta is doing, I am saying that it what your approach here would imply. Please read the book and then we could discuss it. Even better join HLC and discuss it there
Nice paper to point out, thanks. From the abstract “proximal excitation lowers the threshold, but also substantially increases the gain of distally-driven responses” which implies to me that this goes both ways i.e. the action potential could just as well be triggered by the distal but the proximal have much more impact. It also seems to be making the case for processing in the dendrites, that would be more complicated than the HTM model, perhaps it fits with some of the work on distal segments (where a neuron is closer to a two layer ANN)
Are you aware of research that shows the proximal are connected to a different layer than the distal ? I think HTM implies this (only proximal go to inputs and only distal go to neighboring cells).
That comment was supposed to be funny, but guess I need to work on my sense of humor You have also used humorous hyperbole, such as a comment ruining your day, etc.
Like I said, I think there are some reasonable justifications for not going down the dendrite logic gate rabbit hole. But then again I am not a neuroscientist, so I’ll leave it to others to provide a better answer.
I will put it down to my bad sense of humor not yours.
Yes that seems fair. But the book is mainly about much more abstract properties of spikes. For example, around 75% spikes that reach a synapse fail to make ANY impact on the receptive dendrite i.e. the synapse fails. It is as high as 90% in some brain regions. The book proposes some ideas of why this might be a feature not a bug. There are plenty of other issues like this and I think it provides some good “filters” for coherent theory. This is not to say that HTM is wrong but the HTM theory needs to explain away many more dynamics than I had imagined prior to reading that book.