Jeff leads Numenta Research Meeting - Sep 4, 2019

Pretty sure this is happening. I am not sure what the topic is, but it’s neuroscience, not machine learning.


We had a bit of an internet failure and lost 30 minutes of this stream, but back online now. I’ll have to patch this video together and repost it later.


That would be awesome, there is a really critical part missing! :slight_smile:

Fixed! See new video:


I’d like to suggest an answer to the question: Why there are so many motion sensitive cells? Or, to rephrase: Why are motion sensitive cells so important in the neocortex?

I think the answer is two part:

  1. The world is extremely dynamic. It’s made of of things that “move”.
    Everything moves. Objects move through the world, objects themselves move
    or “morph” into different shapes. Even rigid break. How objects behave
    (or deform) and move is probably more important to their understanding
    than the shape of those object. That is, dynamic behavior is the most critical
    thing the neocortex needs to model about the world.

  2. Building a model of the world, or any object, requires sensing that object
    from different perspectives and when it’s in different states. Features of
    an object serve as points of reference that allow for the comparison and correlation
    of two subsequent observations to be brought together into one model.

Without the ability to track which part of one observation correlates to a different part of a subsequent correlation, it’s not possible to build a model of the object. If not for motion sensitive cells, the neocortex would have to do a lot of temporal and spatial correlation work in order to determine how a particular set of features in observation A correlate to a different set of features in observation B. Motion sensitive cells that track moving features (that are moving either because the object moved, or because the sensor moved) are a powerful shortcut to correlated data across different observations.

Tracking features lets the cortex learn faster.

  • Instead of trying to learn a model of the world by processing of a very large number of independent static images (like we do in deep learning), the neocortex can easily know which features in the “field of view” likely belong together and are part of one object: If I watch a cat walk across the room, I see a whole bunch of features moving together (not exactly, but mostly), so I can quickly learn that the cat is a separate object from the rest of the room. If I only ever see pictures of cats, this job becomes much more difficult. Moreover, It would be nearly impossible to build an internal model of how a cat walks/moves without every seeing a single cat moving. After seeing 10M pictures of cats (or of the same cat, but in scrambled order), I might be able to build a good model of the possible shapes of a cat, but not how it walks. On the other hand, I only need to see a novel animal walk once to build an internal model of how that animal walks.
  • By comparing relative movement of different features, the neocortex can quickly understand how that object deforms (i.e. how the software parts of the cat body change over time), and can quickly determine perspective (e.g. infer depth information across an object in the 2D visual field).

One little weirdness’ of deep learning is that every picture is assessed as a whole, with its attendant needs to parse foreground and background and all of the myriad possible presentations of the same object, not even considering classes of objects.

One of the things we know about human vision is that it really only sees tiny fragments of an object at any one time; building to clusters of features. The visual scan plays an optical version of the old 20-question game to collect this sequence of features into something that stands for an object. This also offers some degree of scale invariance. While stereo vision and saccade length give some size and distance information, the relative spatial relation of the features gives object recognition.

An interesting property of this mechanism is that we have a hard time recognizing things with a novel presentation or even novel lighting.

This “sequential basket of features” concept is a key characteristic of cortex based vision. It also gives a good framework for understanding all of the senses and the need for motion and sequence in recognition.

You may wish to take what I am putting down here and go re-read the previous post and see if it puts a different light on it.


This is an example of the basket of features leading to recognition. In this case, we have to scan the entire face before we can determine the emotional content. The scanning patterns are a lower level program that follows a stereotyped pattern; when a set of features is collected recognition take place.

Happy, angry or neutral expressions? Eyes react just as fast

Original paper (Sorry - paywall)

No paywall:

Author’s page:

Looks like a fun lab.