Until I realized that CNN are used in modelling the biology I was imagining the algorithm was optimized for performance. There are hierarchical models that get good results on certain kinds of tests based on the modelling of the biological architecture - that is hard to believe coming from the HTM perspective.
Not from my perspective. It just re-iterates that hierarchy is useful for building abstractions, which I don’t think anyone with an HTM background should be disputing (especially given the H in HTM). Some may be underestimating what I think reference frames are going to bring to the table though – a deep hierarchy should not be necessary to do simple tasks like recognizing the same object in different poses, distortions, etc.
I meant, what mechanisms or properties cause higher levels of the hierarchy to recognize more complex objects?
Sorry that reply was long, and sorry this one is too. I’m interested in this topic because this part of HTM theory resolves an inconsistency I focused on for a while when I was reading a lot of neuroscience papers. There are a couple more related inconsistencies which I think still need solving.
I’ll try to more clearly explain why hierarchy in TBT is different from the classical hierarchy.
We shouldn’t think about the cortex exclusively in terms of loops either.
Thalamic nuclei bring new info to the cortex in a specific way. A couple pathways between cortical regions mimic that, and they only go to higher levels. That’s good reason to think in terms of strict feedforward in regards to new information arriving about the outside world.
This hierarchy must do something interesting because the cortico-thalamo-cortical pathway up the hierarchy originates from the cortical motor output (layer 5), which also has strong sensory responses directly triggered by the thalamus. That seems like it’d constantly create knee-jerk motor responses. There are ways to stop that, like different modes of neural activity, but why go through the trouble?
Jeff Hawkins described an explanation for that. Layer 5 represents displacements, which play a role in handling reference frames. That’s at the nexus of sensory and motor. Layer 5 handles two questions. How did I move in the object’s reference frame, and how do I move to another part of the object?
Those layer 5 cells project through the thalamus to other cortical regions in a strictly hierarchical manner (except modulatory synapses), so object recognition involves hierarchy.
For that reason, TBT doesn’t contradict the fact that lower levels of the hierarchy recognize simpler objects.
Another way TBT doesn’t contradict lower levels recognizing simpler objects is different properties of different regions. Higher levels often receive direct sensory input via thalamus, but that info is different, e.g. larger receptive fields. If I’m not making sense, the TBT book describes that better using text size as an example.
That’s a second sort of hierarchy different from object composition, and these two ideas probably need to be combined somehow. Displacements in reference frames seem completely different from sensory info, but the same thalamic relay cells which bring info from L5 up the hierarchy often also receive direct sensory input. The concept of displacements combined sensory and motor, and now I think that idea needs to be combined with sensory again, except a different kind of sensory info.
The actual inconsistency is confusing so I’ll describe a simpler version.
In TBT, higher levels receive sensory input with larger receptive fields. However, that’s often the case for different types of cortical columns in the same region. In at least one system, the two types of cortical columns are at different levels of the hierarchy if you look at the pathway up the hierarchy through thalamus. Even if they’re usually at the same level in that pathway, if sensory input is part of why higher levels of the hierarchy recognize more complex objects, perhaps the relationship between different types of cortical columns is similar to different hierarchical levels. That could enable some sort of bootstrapping, using sensory input suited for a higher level to bring layer 5 to the next hierarchical level.
Those three inconsistencies are why I asked what creates the hierarchy.
One of the ways I think about it is that an abstraction-forming algorithm is always searching for correlations, in order to form representations (i.e. abstractions) of those correlations. Another reason for combining activity between hierarchical levels (such as the level-skipping connections), besides providing a view of the input at different scales, is that there may be correlations between different levels of abstraction.
A (probably bad) example, just to hopefully explain the point, would be a composite object which consists of a line, a circle, and a dragon. If those objects exist at different levels of abstraction, forming a composite of them requires bringing those representations together somewhere to associate them. I don’t know if that helps any with understanding the inconsistencies you pointed out though.
I hadn’t thought about that. That kinda helps explain sensory input combining with other things in higher levels, but I’m not sure how raw sensory input would be integrated. Maybe if the sensory input is flow information it’d be able to combine with displacements.
I don’t think we can because it is too complex. But I can’t imagine there being any strict hierarchy in the system. Heterarchy would arguably be a better term but even then this lacks the notion of how all predictions are going to be accurate or erroneous (i.e. a loop)
My guess is that if you identify a hierarchy the boundary is either artificially defined or we can find evidence of recurrent connections. A pathway may go in one direction, but I’ll bet you’ll find another pathway going in the other direction too.
I have not made the claim that TBT has no hierarchy, I’ve said that it implies complete object models at lower levels of the hierarchy - which should allow for empirical testing.
Perhaps this is an opportunity to refine the questions you have, I’m not sure to have understood you. There are 3:
- I would not think about things in terms of feed-forward with TBT, it is still a predictive paradigm. I’m not sure which inconsistency you are worried about here.
- You bring in regions and TBT does not seem to hypothesis on regions apart from claiming CC is different regions vote to settle on the model. Object composition in TBT needs to happen inside the CC not through the hierarchy otherwise reference frames need to be shared?
- Maybe this is still 2)?
Perhaps you can have a single sentence that explains each inconsistency in simple terms? I know that is much harder to do but it might help you clarify the inconsistency you intuit.
TBT does not seem to propose a solution for how two different CC could know what other CC are voting on in terms of a representation. I guess they are meant to learn the association but that would seem to lose the advantages of modelling e.g. being able to predict compositions that have not been experienced previously.
Another idea would be to focus on one of the inconsistencies you raise and try to present just that one in as simple form as possible. I would vote for this
It is a causal relationship. Something like, “My activity affects activity in this other layer, but so does the activity from elsewhere (I don’t have a monopoly on it). In the past when I have been observing Object X, I’ve also seen activity in that other layer that matches a particular pattern. At the moment, I am predicting either Object X or Object Y, and I see something like that pattern over there now, so let’s go with Object X.”
When a CC settles on an existing object that it knows, it will start to make predictions. If the predictions are wrong, it will start bursting (more cells active, giving it a louder voice). If the one CC’s voice isn’t loud enough, it will be forced to update its model. If enough CCs are screaming about wrong predictions, though, they would force the"output layer" to something new, and a new object could be learned collectively.
This is an important issue - it is probably not a causal relationship as far as the model is concerned. It is a correlation.
You described the HTM way of thinking about that, but I think voting is something different.
I think it has to make predictions to settle on an object. The devil is in the details and I think that is why Jeff does not have the details. If it was as simple as you’ve described then I’m pretty sure we would have had an implementation years ago.
As far as I know, there is no evidence of recurrence in the strictly feedforward pathways. Sherman and Guillery have written many papers on the CTC pathway up the hierarchy in many parts of the brain. I don’t know if recurrence has been completely ruled out at high levels of hierarchy, but some of the papers looked there, and there’s certainly none (maybe a few stray synapses are possible) in the first couple levels of rodent cortex.
There are many other pathways going other ways, but they have different synaptic properties.
Thanks, this was helpful. Only the first inconsistency I mentioned makes sense. L5, the cortex’s motor output, also always responds to sensory stimuli. The idea of displacements in reference frames resolves that inconsistency.
This seems like a good challenge. I guess we would both be happy if we find or don’t find recurrence. Can you be very specific about a particular pathway that you believe does not involve recurrence and then I can try to find a counterexample? I would be equally happy to not be able to find one as it would change my perspective.
No, I meant it has to settle on an object to make predictions. Predictions themselves do not cause an action potential. An unexpected input comes in, cells burst putting other cells into a predictive state. Activity in the output layer also puts cells into a predictive state. Those which are predicted from both ends have the best chance of activating first and inhibiting their neighbors. When another input comes in, either one/fewer of the predictions are correct, and a more sparse number of cells go into a predictive state (or none of the predictions are correct, and we get more bursting and different cells going into predictive state).
Rinse and repeat, until the bursting stops (and we have settled onto an object). If the object is correct, it will begin making (sparse) correct predictions from that point on.
Or if the object is not something the CC has modeled before (or has a bad model of), it will continue dropping back into a bursting state, and learning the new object in the process. Once learned, it will then begin to make correct predictions.
I guess technically this is a “which came first, the chicken or the egg?” argument, though. There is an interaction between the populations of cells making incorrect predictions and bursting, eventually settling on an object representation, leading to correct predictions further stabilizing that object representation with a confident vote.
Sorry, I realized I just fell back into my interpretation of the theory again. I need to stop doing that
The type of pathway I’m talking about involves what this paper calls class 1 responses. VPM and POm are two thalamic nuclei. VPM is mostly a primary thalamic nucleus (except two small parts called the tail and the head, but most studies seem to go for the middle part). POm is a higher order thalamic nucleus, meaning it doesn’t target the first level of the cortical hierarchy (S1, primary somatosensory cortex, in this case) in its layer 4. It relays activity driven by layer 5 up the hierarchy, although it also receives direct sensory input.
From the paper “paralemniscal projection may instead provide modulatory inputs to S1” which should require a loop because it has to modulate relative to some information correlating with the target of the modulation. If “the role of the paralemniscal projection is to provide modulatory inputs to barrel cortex” and if TBT is right about sensing through active inference then there must be a loop (in this case it would be muscle control over the whiskers).
Perhaps you are imagining that a loop needs to be inside the skull but there is no requirement for that, some will be and others will require actuators and sensors along with a “natural” environment to complete the loop.
The SDR of CC is a prediction if the current sequence has been learnt - right? Maybe you are focusing on the situation where the input in novel. I imagine higher layers predict further ahead (and/or broader) so they provide context which allows for a lower CC to predict a (previously learnt) object from relatively little data.
This touches on an issue of what exactly is being modelled in HTM. The focus on active-inference at low levels of abstraction obscures the larger loops the system needs to be part of. That might allow for meaningful modelling rather than memorizing sequences.
I agree with you that this is diverging from the topic.
Thinking of V1, if we agree there is no difference between features and objects at that level of the hierarchy, then I wonder why the CC algorithm would be the same as higher layers. Is there any reason for V1 to use reference frames or would it make more sense to build reference frames at higher layers where features are assembled? What would be the advantage of a reference frame if feature detection can do the same thing? Given the small number of objects a CC can model, it seems the inputs need to be sufficiently abstract so that 100s of objects can cover a significant portion of the inputs, otherwise the CC risks to be forgetting more than it is predicting.
While on the theme of CC details. Is there a long tail distribution in the firing of neurons in the HTM CC as observed in the brain? My guess is that, over time, each neuron is as likely as any other neuron to fire in HTM.
All I’m claiming is there is a strict hierarchy in regards to driver pathways. By pathway, I mean a connection between specific cell groups, e.g. layer 4 excitatory cells or layer 5 thick tufted cells. One of the driver pathways goes through thalamus.
So a strict hierarchy of neurons being triggered to fire starting from the sensory input, but also modulation going all over the place.
For example to show what I mean, in the temporal memory, the sensory input goes to the spatial pooler which determines which neurons can fire. Exactly which ones fire depends on the context, modulatory input on distal dendrites.
It’s a bit muddier than that but the point is it’s right to think about some aspects of the cortical processing as strictly feedforward up the hierarchy.
I forgot to mention, the higher order thalamic nuclei, meaning ones receiving input from layer 5 of the cortex, don’t just project up the hierarchy. They project all over the place, but the non-feedforward projections are different. They’re modulatory and target different layers, e.g. not layer 4.
So yeah, there absolutely are loops, but they’re modulatory and/or target distal dendrites.
All I’m claiming is we shouldn’t ignore strict hierarchy being part of what the cortex does, and part of the generic cortical circuit. It does a lot more too.
I’ll just comment on the neuroscience.
There is probably some sort of difference, or features=objects being combined into more complicated features=objects. Some cells are called simple cells. They respond to a line at a particular orientation at a particular location, with the line defined in terms of the line’s projection onto the retina. Other cells are called complex cells, which I guess means everything else. For example, they can respond to a line at a particular orientation in a bigger variety of locations on the retina.
You could probably recreate that in HTM by replicating the natural variation found in the brain. For example, some cells might have higher firing thresholds than others. It could be useful to use an SDR with a variety of sparsities, maybe to quickly learn general things and then slowly learn more specific things, but probably not essential.
That may be more of a philosophical question. The active bits in the “output layer” of the CC (using the term from the Columns paper) is certainly not reality itself, but it is a result of what was actually sensed. So from a certain perspective, it is a “prediction”.
In our context, though, the term “prediction” is primarily used for something else – referring to pyramidal cells becoming depolarized in anticipation that they are about to become active in a near-future input, based on their temporal memory of the past. That is how I was using the term above.
There is also a third meaning (which I have seen Jeff bring up the potential of from time to time, but it hasn’t been included in the proposed architecture at the moment) are “active predictions”, which are representations of near-future sensory input (like normal predictions) except the cells are actually firing rather than in a predictive state. I tend to think of this type of prediction in the context of the Three Visual Streams paper.
All I’m claiming is that your claim is an arbitrary isolation of the paths. If I limit the scope of connections being considered to connections between two layers then I can say there is no hierarchy, just direct connections, that would be disingenuous. Likewise to ignore that the “hierarchy” is part of a loop seems an over-simplification.
Given that there is so many loops at so many scales I don’t think we can reason effectively in terms of strict hierarchy from inputs. I created another thread on this.
I will take that as a no. If the algorithm is the “right” one then it should probably already have those distributions.
For me this is a foundational aspect to all of Jeff’s work since On Intelligence. The basic concept of active inference fits with predictive coding. I’m very surprised that you are interpreting TBT as a reactive framework, one of us is way off target here. I have not implemented HTM so it is more likely me. Here is an example of my assumption
“The output of a layer includes minicolumns in both active and predictive states.” Hierarchical temporal memory - Wikipedia “The third generation builds on the second generation and adds in a theory of sensorimotor inference in the neocortex.” see Predictive coding - Wikipedia for a general idea of active inference
The actual implementation is counter-intuitive when you first see it, but yes this is how HTM works (and I see no reason at the moment to change it in TBT).
Yes, this is true. The predictive states are cells which are depolorized. This is an internal state of the cell that is not readable by other neurons, as they are not transmitting anything. They have recognized the context and believe from past experience that they are about to fire, so they are primed and ready. This allows them (if they are correct in their prediction) to activate a little bit faster and inhibit their neighbors who share a similar receptive field.
The word “output” there is probably a bit ambiguous though. Predictive states are an output only from the perspective of a developer. Other neurons are blind to them, so it is not an “output” from a biological perspective.
Yeh, sorry about that. I’ll start my own thread to explore my interpretation of TBT. I’ve been unplugged for a while but starting to renew interest in it again with these recent discussions.