Great talk Jeff!
There are some nice new slides in this deck like this one. Click the slides below to jump to that point in the talk.
Great talk Jeff!
There are some nice new slides in this deck like this one. Click the slides below to jump to that point in the talk.
Are these slides posted somewhere? Thanks for posting this very comprehensive update to HTM theory.
Hey @jhawkins do you mind posting your slides?
The slides are posted here: https://www.slideshare.net/numenta/mit-cbmm-numenta-15-12-2017
This is very interesting! It is great to see the things falling together so cleanly, with each of the (sub) layers having functions which are clearly defined and intuitive.
Many thanks
very impressive, since last numenta hour talk about sensory motor behaviour.
But what would be the rational answering to point that
@jcua: best thanks for very interesting slides of Jeff…
Hey Matt
It was so interesting to se Jeff Hawkins present his update on HTM theory at MIT here the 15 dec. 2017. Asking the question “have we missed half of what the neocortex does” is total change in the HTM understanding of the brains decision system. The change is combining processes for location (allocentric) and orientation (egocentric) - these two in combination is exactly as you and I were writing about when I presented my model The Human Decision System in september - here “location” is called “engaging a target (which has a location)” and “orientation” is called “achieving a goal (which you are oriented towards)”…these names are appropiate if you integrate the movement into the vocabulary…then location is about accuracy (validity) and orientation is about precision. Based on this compliance between THDS and HTM, this opens up the doors for much more integration on the theoretical level (as Jeff said there are greenfields in this…). One interesting thing is, as Jef said, the layers handle data on “different scales”…yes in THDS these scales are based on data from different time periods…and this Jeff implements as sequential data (present in layer 4) and pooling (which are future in layer 3). Finally Jeff talks about the where and the what functions…in THDS there are, if you remember, the Where, what, which, why, when, how functions in exactly that sequence which minimize entropy/uncertain in the decision process. This indirectly comes forward when Jeff writes in the last ppt “resolve uncertainty by associative linking…” - this is the end goal “resolve uncertainty” which is minimizing entropy.
Why don´t we go through this in details, to run up the greenfields left? Because without doubt my guess is that the insight in THDS must have given a good spark to this new orientation and incredible improvement in understanding the neocortex - achieving a goal/orientation engaging a target/location is something of a core concept?
Regards
Finn
2 posts were split to a new topic: High-level thought evolving out of re-purposing a spatial navigation mechanism hack
I think it would be appropriate to link this idea as per my perception with “deep learning”
I have watched this talk a couple of times. It is very interesting and hopefully constitutes some important advances in understanding how the layers of the cortex work together. I realize that the whole theory is a work in progress, but a couple of issues kept coming to mind:
What about the evidence that V1 is tuned to recognize low-level features such as
edges and lines? Is there any evidence that V1 can recognize something as complex as an entire object?
If evolution evolves in a direction so as to maximize brain power while minimizing energy usage and the amount of cortex tissue needed (given space limitations in the cranium) doesn’t it seem wasteful for the cortex to be learning the same object representations thousands of times in parallel? The whole point of the hierarchy view, other than the evidence for it, is that hierarchy allows representations to be re-used in different contexts to save memory and to allow for generalization from the specific to the more abstract.
How can a single cortical column model all of the objects in the world? That does not seem possible. Is each particular cortical column just modeling certain subsets of everything that is out there in the world? Relatedly, is the idea here that V1 learns to represent all simple, discrete physical objects in the world, and then as you go from region to region the brain learns more abstract concepts that arise from those objects? For example, is it possible that V1 learns simple, discrete objects like plates, cups, and forks, with later regions learning more complex combinations of those simple things such as “banquet” “dinner table” “family sitting down to eat at dinner” etc?
Most of this is interesting and I accept it without any major concerns.
The “rethinking hierarchy” seems off. The reading I have done suggests that parsing and transformation is a fundamental property of the observed tracts and maps of the brain. The localization of function is well documented and indicates that there is some sort of transformation in representation being performed.
We can’t have it both ways - “the tissue is the same everywhere” and “This is a step in the parsing chain.”
What exactly does it mean to have all of the senses (touch, vision, sound, smell) learning a fairly high level “cup” at the same time? How about a bIgger cup? Different material? Does each area learn all types, shapes, sizes, temperatures?
The tradition bidirectional parsing of properties and decomposition into attributes and synthesis between these properties makes sense to me and matches with the literature I have read. It passes one of my strongest tests of a theory: it has predictive power.
Distributing individual objects all over the cortex has a very large body of material to reinterpret to be successful.
These are great questions.
There is evidence for this. von der Heydt has shown that cells in V1 and V2 exhibit what they call “border ownership” properties. Cells are first characterized by their classic receptive fields by showing the animal visual gratings, but then the cells don’t respond as expected when the awake animal is viewing complex objects. The cells respond only when their receptive field aligns with a particular location (a unique location on a border) of a complex object. The cell responds uniquely to the object, as if it “knows” what the larger object is and where it is sensing on the object. This is predicted and expected if a column is recognizing complete objects as we suggest. There are other examples like this.
Many of the properties we predict will only be detectable with an awake animal, sensing a familiar object, and often only by recording from a sparsely activated population of cells. This kind of recording has been impossible until recently. If you don’t record under these conditions you will see the “classic” response properties.
In my talk I was careful to point out that we are not rejecting the traditional view of hierarchy. The hierarchy is still needed to learn representations of objects that contain other objects. Reuse of learned components is still happening. If fact, the upper layers of a column represents “features” independent of the “object” represented in the lower layers. So even in a single column “features” can be part of multiple objects.
What is new is that every region is learning complete models of objects in the world, and this provides an explanation of many of the connections in the cortex that don’t make sense if you assume complete objects are only learned at the top of the hierarchy.
Evolution is about survival. Reducing energy consumption and reducing the size of the brain are just two factors, but not the only ones. The ability to discover structure and patterns in the world that is hidden from other other animals is important too. We should be careful not to outguess evolution. Recall that every one of the 10s of trillions of cells in your body contains a complete and identical copy of your genetic code (millions of bits). Making 30 trillion copies of the exact same thing isn’t the most efficient way to store this information, but that’s how it is.
I hope I didn’t say a single column models “all the objects in the world”. I believe I said a column models “complete objects in the world”. Every column learns complete objects that can be learned from its inputs. Some will learn small visual objects, some larger visual objects, some will learn tactile objects, some will learn conceptual objects that are not directly sensed. Any particular object in the world will be modeled by many different columns, often in different sensory modalities. But not all columns will learn all objects.
As best I understand it now, simple objects are learned at all levels of the hierarchy, but complex objects cannot be learned at the lowest level. I can attend to a single letter and say what it is. This implies that I have a representation of that letter at the highest levels in the hierarchy. But normally, without attending to the details, when I read, V1 will be recognizing letters and entire words, and the next level will be recognizing phrases. V1 cannot recognize phrases.
When we deduced that all columns are learning complete objects, the first question I asked is, what is the capacity of a column? How many things can it learn? We did simulations that showed a small column can learn hundreds of complete objects. These simulations are in our “columns” paper that came out in October. The limit was the pooling operation in the “output” layer as described in the paper. We subsequently learned several things which I believe will dramatically increase the capacity of a column. First is the fact that the sensorimotor process is divided into two steps (upper and lower layers). Another is that recognition of the object can occur without pooling. This is a subtle but important point. The location representation in L6b and L5b is unique to the object. So once you know the location you have in a sense identified the object. Pooling is necessary only if you want to assign a stable label to the object. This is why we often “know” something but can’t remember the name of it. Another factor in capacity is how large the column is. If you make a column larger and don’t increase the number of active cells, then the capacity goes up dramatically. There is evidence that this is happening in upper levels of the hierarchy. One of our goals in 2018 is to empirically explore the capacity of a column given these new insights.
Jeff
A closely related but alternative suggestion is a co-operative parsing that flows down from higher level maps. With this additional input, a more accurate recognition can be performed. This does not have to be a “where” signal but instead more input to a “what” signal.
The lack of higher level input from an anesthetized animal leaves the sensory area to fend for itself leaving only the classic receptive field.
Thanks so much for these helpful responses. I wrongly assumed that you were saying that all columns in V1 are trying to learn every object in the environment, which would seem to be impossible, especially given what you noted about the October paper columns only having a capacity numbering in the hundreds of objects. That sounds promising about the potential for columns to have a much greater capacity than already shown. Hierarchy is still mystifying, but your recent statements about that make it seem at least a little bit more understandable.
Look forward to the day when you have figured out enough to get a simple vision system working that could beat the best of the deep learning systems. Even if you showed excellent results on a simple dataset like MNIST the AI people would stand up and take notice. It is only a matter of time as you put more and more of these pieces together.