# Numenta Research Meeting - July 27

Numenta’s latest research meeting) for July 27, 2020.

@jhawkins describes his latest thinking about how the visual system detects different types of motion across the observer’s field of view and how the cortical mini-columns might be performing local path integration in order to separate self-motion from external motion. Excellent discussion with the Numenta crew ensues.

3 Likes

Excellent discussion as always. As I was listening in, I pulled out a notepad and began frantically scribbling. I felt like I’d seen this problem before (or at least one with striking similarity).

The mathematics to describe this situation have been around since Copernicus first gave us the heliocentric model of the solar system (and perhaps even sooner). It’s a relatively straight-forward application of consistent coordinate transformations.

Of course the motion of planets in the solar system is a vastly simpler problem to work out than the motions of ourselves and the individual objects in our surroundings. But the essential nature of inferring the kinematic parameters of the coordinate transformations remains the same. Copernicus had centuries worth of geocentric observations of the planets from which to construct his model. Whereas our brains are constantly making the same inferences, in real-time, from egocentric sensory inputs.

The problem we face (and solve spectacularly each and every day on a moment-to-moment basis) is to rapidly estimate the relative position, orientation, and motion of ourselves and of numerous independent objects in our immediate vicinity. So, while I can probably write out the mathematics required to do this parameter estimation by solving a massive set of linear equations, I have less intuition on how the brain solves the same problem with simple complex switching elements.

Assuming that there is a way to frame the problem that uses SDRs to represent the sensor data (egocentric state) and some way of processing these representations to arrive at another SDR representation of an external, inferred, (allocentric) state that is responsible for generating the observed data, then it should be possible to produce results similar to the linear algebra solution.

Still pondering…

2 Likes

This was a very interesting discussion.

My main thought is, what if you abandon all Cartesian prejudice? The question that immediately arises is how do dimensions arise? I think we want to avoid any question of preprogramming as long as possible, and see how far we can get on the basis of sequence flow alone. What is interesting is that we all seem to come up with more or less the same dimensions with which to analyse our sensorium. But then, most of them seem to be based strongly on the characteristics of the signal: frequency and amplitude for hearing; the five basic tastes; on or off, hot or cold, and bodily location for touch. Though touch appears to gather some sequence based refinement in the form of texture. Vision seems less direct, with biases coming from related signals such as eye position, other motor inputs and simultaneous touch sensations.

There is also some evidence of variation in development. Is being tone deaf a dimensional development issue? Or one thinks of the undoubted musicality of Evelyn Glennie through touch alone. Does her ability go so far down into the wiring as a dimension?

Another influence may be the fact that most sensor input is not passive. One thinks of a baby learning to focus. Or a little later on, switching static and varying areas of the visual field by learning to track a moving object. What drives this development? One is tempted to argue for a predisposition to maximise information input. This might also explain some reactions in the other senses: revulsion for certain smells might be cultural, but I don’t think we ever teach anybody to hate the sound of nails scraping down a blackboard. Maybe we just don’t like things that swamp all other input from a sense. On the other hand, preprogramming the ability to track the movement of an inbound threat seems like an evolutionary smart thing to do.

But on the whole it seems imaginable that dimensions arise purely out of the sensory flow. So one thinks about experimenting with a variety of inputs to see what dimensions arise. What mechanism might be used? What if a lot of dimensional mud is thrown at the wall in the form of multiple different samplings from the flow of SDRs? Dimensions that experience repeated sequences are reinforced, those that do not wither away. Dimensions that experience sequences that are proximate in that they move in synchrony might compete, with the strongest suppressing their neighbours.

Is anything like this remotely plausible?

A secondary thought arising from this is to wonder how fundamental the question of scale is. It only seems to arise for vision, it makes no obvious sense for the other senses. Then consider talking to a person blind from birth. What is remarkable is how few deficits, if any, they have as a person. But how could scale arise in their world? Things are in reach, or they are not. They are smaller or larger based on direct motor feedback. Velocity is only available through secondary cues: am I walking or running; can I feel wind on my face. In a car, the evidence is weak. One thinks of the philosophers examining an elephant. The only real indication is that other places take more or less time to get to. Thinking about movement around a familiar room, path integration is used, as well as re-anchoring when arriving at a main touch point, but I’m struggling to see why scale arises as something deep in the wiring. Recognition considerations are much the same for a large dog versus a small dog or a dog that is either close or far away. Scale only becomes interesting when I’m planning to reach the dog (or worrying about whether the dog can reach me), which seems like a higher-order activity.

I’ve been thinking a bit about sparse monologues. Instead of a few among many sparse representations how about a recurrent neural network taking in high dimensional data such as video or sound and producing a low bit rate stream of symbols (or short vectors) as a running monologue of what it sees.
Further processing could be done by much smaller neural networks or decision trees (eg. if-except-if decision trees) etc. Also it would be much easier set up memory systems for short sequences of symbols compared to large, vector to vector associative memory.
If you evolved a recurrent neural network to produce such a monologue it would be very interesting to observe the language it created.
It is certainly a more realistic option for many than large vector end to end neural networks that require a major budget.

Not quite! Generate in its own language a description of the scene, as a human can do quite effectively if presented with an image to describe, a human with a ruler can do even better.
Perhaps the monologue would be 30 bytes per second.
This would allow storage of the entire life experience for review.
Allow very large online tree algorithms to work continuously.

Symbolic systems have some well known problems, however the initial neural network can simply choose not to pass along problematic sequences.

I did mention if-except-if decision trees.
Given the string “orang” then if ‘g’ then ‘r’ might be stored in the tree.
However it is an ‘except if’ tree.
if ‘g’ then ‘r’ except if prior ‘n’ then ‘e’. And you get “orange”. You can go back as far as you need with exceptions.
You can keep adding exceptions as new data arrives, you never have to delete anything.
They actually work really well, for reasons to do with implicit probabilities etc.
They can’t work out complex grammar decisions obviously, but quite effective for life long learning.