I’d like to come back to this. I have a hazy memory of Jeff claiming one reason we see only edge detection is because the experiments were conducted with anaesthetized animals. If you look at the diagram of the last Numenta paper with the hierarchy of coffee cups you see a complex object at multiple levels. What you describe sounds closer to classical hierarchical feature recognition - which is not inline with TBT.
I know there is object composition in TBT but this is meant to be very different from hierarchical feature recognition.
TBT seems to imply that if the cortical column has the inputs that correlate with an object then it will learn the object not just features. So I guess we should see correlations with objects much lower in the visual hierarchy that the traditional story presents (otherwise TBT is the traditional story). The classic example is a patch of skin learning a coffee cup and I assume this applies to a patch of retina learning objects too - although obviously not all objects and only at the scale of the sensory patch the column “sees”
I agree that this is the implication of TBT. There is a limit to the number of objects that a single CC can learn though (in the book, Jeff mentions “hundreds”). I would expect that an individual CC would have a better model of the objects that it encounters most frequently (since it has a lot more training on those), which implies that it would have a fuzzy/inaccurate representation for objects that it encounters less frequently.
In the case of a patch in V1 closest to input from the eyes, it is exposed to everything coming in (which contains far more objects than it alone can model), so that implies to me that it must expend a lot of its capacity modelling lower-level objects to pass on to other CCs in the stream (I’ll stop there though, since that is starting to fall back into my interpretation that I posted earlier…)
BTW, I should point out that this is still very different than the classical view of an image recognition hierarchy. It isn’t recognizing points in the first level, lines in the next level, planes and simple shapes in the next level, etc.
Low-level objects could easily include shapes, letters, maybe even frequently read words (recognition of them, not their abstract meaning). And their reference frames (poses, scaling, transforms, etc) And yes it would include points and lines (the fact that we can talk about those things, means they are objects modeled somewhere).
But the point is that you can quickly fill up a capacity of “hundreds” with such low-level objects, so I would predict that CCs closest to the sensors are using up most of their capacity on these types of objects.
Concur … even if for some reason we came up later with 2-3 different algorithms I think the correct approach is to assume 1 for starters, keeping in mind that this is the most complex problem “facing” humanity.
~100s of different algorithms are both theoretically and practically not feasible …
This is the point. I agree with the limited number of objects per CC means lowest levels may have objects which are similar to features. Coincidentally, do you know the visual field covered by layer 1 CC?
We should be see some data that shows object recognition happens much earlier than expected. My impression is that the confidence in the traditional visual hierarchy is getting stronger not weaker - there are deep CNN that are matching many properties of biological systems (I don’t have the reference nearby sorry).
There are a number of ML enthusiasts who would disagree with this point (Geoffrey Hinton probably most notably, such as in this video for example). The missing piece of the puzzle is reference frames. Geoffrey attempts to address this major gap with a concept he calls “capsules”. CNNs completely miss this point (as Geoffrey cleverly demonstrates near the beginning of that video)
Also I have seen videos where Jeff has stated (in the context of TBT) that they are not throwing out the traditional idea of hierarchy completely, but expanding on it (that there is more to it in a biological system). Rather than deep, narrow hierarchies like in current DNNs, the brain has relatively shallow, wide hierarchies, and the connections are shared across multiple levels to support scale (basically, even the fundamental wiring is completely different than in a DNN).
If you are interested, I’ve also posted on other threads (for example here), the implication of modelling happening within a region as opposed to between regions, supporting unusual hierarchical connections like level skipping and recursion. Note that the concepts of an “output layer” comes from the Columns paper, which you can arrive at by tracing through references from the Frameworks paper which introduced TBT. I’ve not seen Jeff indicate that they believe they were wrong about that – instead they’ve built on it.
In any case, the fact that TBT specifically predicts “hundreds” being the number of objects that a single CC can model, I don’t see any way around use of hierarchical levels for building abstractions. Otherwise, what would be the point of CCs which receive input from other CCs and not directly from sensors? Maybe that is a question for Jeff - just pointing it out. And probably agreeing with your original argument, if your interpretation of TBT is that it is devoid of all abstraction-building hierarchical connections (that’s just not how I interpret it)
I understand TBT has hierarchy, they show that in the last Numenta paper.
Did you take a look at the CC paper? “Thirdly, by the time Hubel and Wiesel received the Nobel Prize for physiology and medicine (with Roger Sperry) in 1981, the solidified columnar understanding of the cortex was already outpaced by new experimental findings.”
Of course, I definitely agree that CNNs are quite useful, and getting more so. Not being a neuroscientist myself, I can’t really comment on biological plausibility, or compare two competing interpretations of the biology either. I’m mainly in the HTM camp because I have a deep understanding of its current itteration, and have used it to solve problems for the company I work for that were otherwise quite difficult.
Being a programmer and not a neuroscientist, I’d take the perspective of Costa and Martin 2010, that a “canonical microcircuit” could replace the column as a basic unit of mesoscopic brain organization without assuming a vertical anatomical module.
Anatomy is less interesting to me than algorithms (for example, I have some old threads on the forum here where we talked about how the TM algorithm doesn’t require the physical minicolumns of neurons to be functionally relevant – it just requires multiple neurons with overlapping receptive fields that can mutually inhibit each other). But of course this is the General Neuroscience section of the forum, so I’ll leave the topic of anatomy for others to debate
Some mutations in our recent phylogeny have been more momentous than others, and all of these have been pleiotropic.
One such, that happened in the “modern” human lineage but not in that of Denisovan and Neanderthals (two since long lopped lineages of “archaic” humans), was a duplication type mutation.
The H of HTM gives temporal pooling.
Also, the formation of fusion of different sensor modalities. This should incorporate an attention system to collect a sense of “now” into the base reference frame.
The hippocampus has time cells.
This time base reference frame also provides sequencing and motion processing.
Our model of the world must incorporate time in the representation. Our object memory will surely include sequence of some sort or another. I spend a fair amount of time thinking about the contents of consciousness and what kind of representation hardware is required to do this or that.
HTM is able to be a significant part of those requirements. There are some other very important chunks that have to be added to get a more accurate understanding of what the CC is doing. The thalamus jumps up as a key gating and attention forming component. Those subcortical things are kind of a big deal, much more than some people seem to realize.
The same way that time cells stamp your experience with a sense of now, the amygdala stamps the experience with emotional weights. Was something good or bad? Was it part our a positive sequence? This sense of good or bad is also coded in the object memory. I can’t see how the cortex only computation would be very useful without a significant capacity to aquire judgement as it learns.
This will require looking for the right thing. Something like, evidence of recognizing a warped letter “E” in a patch that is closest to input from the eyes (i.e. don’t look for evidence of recognizing Jennifer Aniston – the “hundreds” of objects recognized this close to the sensor are probably going to be more fundamental than that). A classical hierarchy doesn’t recognize warped letters in the first level.
That makes sense. In awake animals, there should be a process of recognition involving movement (reference frames) and integrating info over time. My point was just that we don’t need evidence of complex object recognition in V1, although that could be happening to any extent. Still, more complex objects than edges would be nice, which maybe exists but I don’t understand what are called complex cells well.
If not complex objects, there’s at least evidence in V1 of more going on than recognizing something from a snapshot in time. For example modulation by head rotational movement (or something like that, I forget what it was exactly). Another is presaccadic predictive remapping. That’s the thing Jeff Hawkins mentions in the book where cells fire right before the eye movement brings the receptive field to the feature.
Side tangent: I’m not sure there’s enough cells firing for it to be a whole SDR predictively firing in V1, since the effect is much more noticeable in the FEF region. Predictive firing isn’t part of TBT, so I imagine something else is actually go on which can look like predictive firing e.g. object representation triggered by attention shifting to the saccadic target, or updating locations of sensory patches to keep with the reference frame, or whatever. My point is experimental evidence can be pretty vague because statistical correlations can be found between the same experimental results and many possible mechanisms.
It’s pretty clear that higher levels of the cortical hierarchy recognize more complex objects, or at least tend to.
What’s less clear to me is why that is. What creates the hierarchy? Connections between regions creating a flow of information like the classical picture, or properties of the regions and their inputs? It’s probably neither, also both, and somewhere in between but also something else and a bad question. And what I said is pretty clear really isn’t clear, because the where pathway has a hierarchy too but it doesn’t recognize objects, and some regions don’t fit neatly in either pathway.
TBT’s object composition may well involve a hierarchy like the classical picture’s, but only in the sense that signals flow up the hierarchy. There is a strict hierarchy embedded among the crazy wiring, and that’s all there is when it comes to driver pathways between cortical regions - the ones which are akin to the sensory input relayed by the thalamus (synaptic properties, laminar targets, etc.)
There are two driver pathways up the hierarchy. One is direct from L2/3. The other is from L5 TT cells via the thalamus. Those are the same cells which act as the cortical motor output. It might seem strange for that to go through thalamus, whereas sensory info goes through thalamus to the first level (also to other levels but that doesn’t matter here). It’s not just a relay you can ignore, since it does things, like attention. But the thalamus also relays signals from subcortical motor structures, so signals from L5 TT don’t fit in either bucket of sensory or motor. (L5 TT has big sensory responses, so it’s not equivalent to motor info relayed from subcortical structures, which I doubt is purely motor anyway.)
That’s why I like the idea in HTM’s theory of L5 TT being a displacement signal. Reference frames combine sensory and motor but are more than either alone. You move through them, define them uniquely for each object, and use sensory info to help figure out where you are and which object’s reference frame you’re in.
For object composition, I think the idea is about how moving attention through on object’s parts is how we treat objects. You see a feature, move to the next one, figure out the displacement in the object’s reference frame, and continue until you know enough to figure out what the object is. Each displacement is like a reference in programming, and eventually you shift your attention to the whole object, creating references from the features to the whole object. Except it’s not the whole object, just a wheel on a car, and they aren’t mere features, but whole objects like bolts (you might start with even simpler objects but it doesn’t matter). Then you repeat it, looking between the various objects which compose the car. Once you know what a wheel is, you can reuse it when you build up other objects through the same process of looking from part to part.
That doesn’t necessarily require a strict hierarchy, but the wiring (L5 TT corticothalamocortical pathway) implies it involves one. I guess that makes sense if it needs to keep in mind multiple levels of object composition simultaneously.
Maybe it has something to do with a hierarchy of reference frames. Reference frames come in a big variety and can be far removed from sensory inputs, nothing like a map of the sensor even in the where pathway. So they might be learned based on the available info from the inputs, not computed by custom circuits for each one. I would think it needs to make use of reference frames in other regions in order to get more complex ones.
Neither am I but the neuroscientists do write review papers like the ones I pointed you to. They are much easier to grasp than figuring it out based on the references (that does require being a neuroscientist).
Comparing Mountcastle with Darwin seems a bit of a stretch after reading the history of CC. My impression is that the centrality of Mountcastle’s notion of CC in TBT is more contrarian than I expected after reading TBT.
I like the idea of a canonical microcircuit. I would like to see compelling evidence for it. The CC concept seems too restrictive and somewhat outdated by empirical results. It is a shame Jeff did not adopt the “canonical microcircuit” terminology.
It also seems clear there are algorithms operating at many scales. Some much larger. So it seems way too early to be claiming to have defined the framework for intelligence. Intelligence is going to have to include a lot more than object recognition and it is unclear if higher cognitive function will map to the same algorithms as object recognition.
I will guess that each scope of recurrent loop is associated with at least one algorithm. The way we think of “algorithm” as a Turing computable function is probably the wrong paradigm too. The recurrent activity never seems to stop so it may not be a linear algorithm that is looping but a dynamical system, more in line with other complex systems like the weather. We simulate the weather with a grid like structure but only because that is convenient, the placement of the grid is arbitrary.
Regarding CNN dealing with distorted text they are probably better than you think eg https://www.mdpi.com/2079-9292/9/9/1522/pdf My impression is that people are often criticizing the first generation of CNN circa 2012 while the latest results are far more sophisticated and incorporate all sorts of features like attention (the diversity of algorithms has exploded)
Which problem did you solve with HTM - maybe you can point me to a thread. I would like to hear some good news
Hierarchy is perhaps our simplistic way of thinking about complex recurrent systems. If humans were smarter they would probably not think in terms of feed-forward and feed-back but loops.
Reference frames seem important but that is not the contribution of TBT. The concept of the brain using reference frames is widely known e.g. grid cell modules. TBT is specific about reference frames being in CC and consisting of cortical grid cell modules - the hypothesis seems to rest on that.
Yes, CNN is very good with distorted text – I wasn’t trying to dispute that. I was just giving an example of the type of evidence someone might look for in the brain where objects are being modeled without a traditional hierarchy right at the first level. CNN doesn’t do that (it requires hierarchical levels to recognize an object like distorted characters). From the above paper, for example:
Mostly in the realm of anomaly detection of streaming data off networks from small embedded systems. We’ve also done work in the NLP realm. I don’t post any of that here though, just the stuff I do on my personal time
Until I realized that CNN are used in modelling the biology I was imagining the algorithm was optimized for performance. There are hierarchical models that get good results on certain kinds of tests based on the modelling of the biological architecture - that is hard to believe coming from the HTM perspective.
Not from my perspective. It just re-iterates that hierarchy is useful for building abstractions, which I don’t think anyone with an HTM background should be disputing (especially given the H in HTM). Some may be underestimating what I think reference frames are going to bring to the table though – a deep hierarchy should not be necessary to do simple tasks like recognizing the same object in different poses, distortions, etc.