Saccades key to vision?

sebjwallace · April 23, 2018, 7:04pm

I’ve been increasingly inclined to believe that how the eyes move over an image has a lot to do with abstracting and recognising what is in an image. Convolutional ANNs have made progress with visual invariance by looking for overlap in small chunks of an image, rather than the whole image. I think another key to visual invariance is the information sent to the cortex about eye movement.

Saccades tend to follow contrast (edges?) and the movement of the eyes when following the edges can give a lot of invariant information about an object because eye movements are relative to the image. The image below shows lines representing edges and red dots representing saccadic points. The arrows represent the common eye direction over the edges. If you were to use convolution to find the overlap between A and B there would be a fairly low score in all the levels of features. But to us they look similar because they both share characteristics of corners. This could be due to the very similar eye movements that occur on both. The motor commands to the eyes go up, up, up, … then right, right, right, … Very similar, and when representing that as an SDR over time there would be a massive overlap.

The same kind of invariance can be used on C, even though the direction of eye movement is different. The similarity comes in the sequence of saccades. Instead of up, up, up, … right, right, right, … the sequence is down, down down, … left, left, left, … the abstract representation being a, a, a, … b, b, b, … They key point is the absolute delta of the eye movements in the [x,y] directions between each saccade. [0,1], [0,1], [0,1] … [1,0], [1,0], [1,0].

Given another shape like an oblong, although there is very little overlap between the two orientations, we see they are very similar. When your eyes move over the edges the movements are very similar, especially when you compare it to a rectangle.

Using saccading information could solve the problem of position, scale and rotation invariance. I’d like to read other people’s thoughts on this, and I’m curious if there’s any relevant research already out there on how saccades feed into the cortex.

Bitking · April 24, 2018, 3:12am

I hate to break it to you but Saccades don’t work that way.
Check this out to get some idea of how real vision takes in the visual world.

Here is what you “see” when you look at this picture:

Here is what the V1 area is fed with saccades driving foveal vision:
Biker%20left%20eye%20area%20-%20eye%20straight%20on%20-%20Copy

gmirey · April 24, 2018, 6:33am

Interesting idea, however I’m not sure such similarity is detected in the early steps of vision.

Lack of distinction could be for a lack of a word… Draw two squares, one horizontal/vertical, and a second one tilted 45°. Then ask a kid to name the shapes. Chances are the second would be labelled “diamond” or something. I would even know some grown ups answering the same.

Also, more on that tilting a few degrees and association subject, I think I remember that there is something like a linear time delay (relative to the rotation angle) for processing such “similarities”. Don’t quote me on this, though.

Interestingly this could be a clue that we do need to SMI on saccades to get a clear picture, because simply watching at the animation trying to keep the eye fixed won’t allow me to build back the representation

Bitking · April 24, 2018, 12:06pm

What you see in the animation is only half of the information.
You have to combine that with the frontal eye field position coding.

gmirey · April 24, 2018, 12:21pm

is also my point ^^
this is indeed a testament to the necessity of location signal to raw sensor data. Or equivalently, to the incompleteness of raw sensor data on its own.

Although I must say against my own statement, as it is :
Adding fovea distorsion in that animation is a nice touch, for sure, but for the purpose of this statement, it does not quite provide direct support of it. To vaguely come close to an acceptable psychophysical experience, allowing us to reach this conclusion, I guess an in-protocol animation should be :

non-fovea distorted
spanning full-perceptive-field
center point decorated to anchor fixation (or forced in with A Clockwork Orange kind of device ? oh maybe in 2018 they would have nonintrusive dynamic display-on-contact-lens for this stuff. Nevermind)
free of artifacts from image clamping boundary

Bitking · April 24, 2018, 12:40pm

Here is the image dataset

Biker%20nose%20area%20-%20right%20nostril Biker%20right%20eye%20area%20-%20right%20margin Biker%20nose%20area%20-%20tip Biker%20nose%20area%20-%20left%20nostril Biker%20nose%20area%20-%20bridge Biker%20mouth%20area%20-%20uppper%20left Biker%20mouth%20area%20-%20right%20margin Biker%20mouth%20area%20-%20lower%20lip Biker%20mouth%20area%20-%20left%20margin Biker%20left%20eye%20area%20-%20right%20margin Biker%20chin%20area%20-%20straight%20on Biker%20chin%20area%20-%20right%20side Biker%20chin%20area%20-%20left%20side

The original un-distorted picture is in the post above.

Bitking · April 24, 2018, 12:54pm

keghn_feem · April 24, 2018, 5:50pm

A few layers into a CNN, deep neural network, outline are detected.
Following outline is a “attention based cnn”.

So a outline of image can be done with a edge detector algorithm. And outline of
patches of shade can be obtained with a segmentation algorithm.

Efficient Hierarchical Graph-Based Video Segmentation:

Than if every outline could be turned into chain code all data could be crunched allot
faster. and may be able to compete with aNN.

Chain code:
http://answers.opencv.org/question/88501/how-to-classify-chain-codes/

keghn_feem · April 24, 2018, 6:01pm

Though about changing out linens to chain code?

Bitking · April 24, 2018, 6:48pm

I did not get your point.

keghn_feem · April 24, 2018, 8:33pm

Outlines of of edges can be found with a edge detector algorithm. Like Soblel algorithm.
And outlines of outline with of pates of color with a special HOG algorithm. Or some other
segmentation algorithm.

Efficient Hierarchical Graph-Based Video Segmentation:

Then convert the outline into a chain code. Which is sting of value of the angles to the
next pixel. Assuming outline pixel are only one pixel thick.
Chain code:
http://answers.opencv.org/question/88501/how-to-classify-chain-codes/

Here the only use eight angles just multiply it by 40 to get 360.

And then match them with DTW:

Pattern recognition:

Bitking · April 25, 2018, 4:47am

Ok then - can you do that with a biologically plausible mechanism?
You do realize that I am asking for examples of the chain code and not the well-known saccades?
I have been reading about V1 biology for years and don’t recall seeing anything about graphs and lists in any of that. Do you have a reference that explains how this applies to the cortex or retina?

The saccades are really fast and face recognition is achieved in as little as 100 ms. Even at the higher brain wave processing speeds this only leaves a few cycles of activity for each fixation.

I have been trying to work out how the grid forming layer could do something like the chain method referenced; so far I have a wastebasket full of crumpled up scratch paper and nothing to show for it.

It would be awesome to see a segmenting algorithm that does work in the cortex.

keghn_feem · April 25, 2018, 12:39pm

               Yes.

Bitking · April 25, 2018, 6:58pm

That is the point - what V1 gets IS warped as I am showing. The images ARE overlapped onto a single processing map in rapid succession.

The scene parsing task is taken in small chunks in rapid order.

The eye does refresh the scene as memory of what is “out there” by looking again. Multiple studies show that we are remarkably unable to tell if things change if they are out of the current fovea fixation area.

When I read an AI vision task that does not take these things into account I am sure that it is a non-biologically plausible approach. It may work but it is NOT how the brain is doing it.

keghn_feem · April 25, 2018, 9:39pm

Maybe???
But i will explain with artificial neurons. In a deep NN. Which is from my project on
unsupervised Generative Neural Networks, Un-GANs.

GAN are two NN in one. There is the detector NN, front part, And then there is the generator
NN that recreates what input data activated the detector, back part. The recreated data can be
exact copy, or highly compressed lossy recreation, for memory storage, or be
channeled to a motor.
When detector NN detect something it is one binary bit that goes high and is stored in the SDR. Also the detector NN is like the left hemisphere of the brain that deal with logic. The
generator is right side of the brain that deal with art.

The Brain Made Simple:
http://brainmadesimple.com/left-and-right-hemispheres.html#

So the Un-GAN has many layers, very deep. The fist layer detect the color of one pixel at
a single location. And each layer will have its own generator NN for recreating and storing
the information.
So the generator NN will recreate that one pixel, Easy-peasy and fast. But deeper layer of the
Un-GAN will detect more complected orientations with may pixels of many different colors.

Now with In the Un-GAN a certain depth a neuron will be set up to activate when when colors
form two different location are not the same. There could be an edge there, some where in
between. But if there is a chain activation in a line will say yes.
Or the eye can move and trace the edge with just one small set up. No need to replicate
the detector in neighboring neurons. The eye position will be need, too. To rebuild map of the
world with generator NN set up:)

So when a see chain code i see chain of differential neuron chained together.

Bitking · April 26, 2018, 4:27am

So at best - only very distant relationship to anything biological.

From what I have seen from fMRI studies most of this is in a small number of maps and most of the processing is local to a few connected maps - perhaps six in the visual cortex and about half of that is frontal eye stuff directing gaze.
It seems like that the amygdala recognizes faces at a very primitive level all by itself. There seem to be a dozen or so built-in shapes recognized there.

keghn_feem · April 26, 2018, 9:10pm

Orientation/edge detector cells in visual cortex:

Bitking · April 27, 2018, 5:07am

Sure - this stuff has been known since the 1950’s; I read about it in high school in the early 1970’s.
How do you get from there to the chain code outline algorithm with a biologically plausible implementation?

sebjwallace · April 27, 2018, 7:30am

At the time of writing I was aware saccades jump ‘laterally’ from high-level feature to high-level feature (eye, node, mouth etc.). However I found that when I look at something that I do not recognize, I catch myself breaking the image down into smaller and smaller parts - leading the saccades to scan for smaller and smaller features. This lead to the idea that perhaps when we were infants we firstly learned the smallest features first (edges), and that’s where your saccades are focused. Then we began learning features that combine edges, etc. and that’s where our saccades are focused. Each level of features progress over time until we recognize whole objects in a single glance (ie eye,nose,mouth,etc.). Of course it would be completely wasteful for us (in maturity) to have to saccade over edges, but at one point in time we had to - to start building up our feature hierarchy.

Of course the feature hierarchy is known for structuring absolute/concrete data (edges,shapes,objects,etc.) but I feel there must be other structures that build up over time that are abstract/relative. I feel as if the sensory-motor map could be key for an abstract feature hierarchy that works with the concrete hierarchy.

Anyway, this is all just speculation - I really don’t know enough about neuroscience to back any of these ideas up.

Paul_Lamb · April 27, 2018, 12:47pm

Purely speculative on my part as well, but my thought is that could be an effect of object scale (versus a different classes of saccade behaviors). When you are learning objects of smaller scale, the saccades are going to move the eyes over much smaller distances. This could be tested by looking at saccades over images of faces with a range of different scales – presumably the scale will correlate to the distances the eyes are moving.

This then ties into the idea objects within objects. Once an object is learned, it can become a feature of some other object. When a person is looking at something they don’t recognize (which would be most of the time for a very young person), they presumably would have fewer objects learned that can be used as features of larger-scale objects. When attention is on a larger-scale object which has features that are other objects, the eyes would saccade from feature-object to feature-object. Then if attention were to shift to one of the feature-objects, the eyes would then saccade over its features (smaller-scale movements).

This is probably exactly what you are saying. Just making sure I interpreted it correctly

Topic		Replies	Views
Sensorimotor Importance to Vision with Precise Timing Numenta Theory	10	1276	February 13, 2019
How HTM is supposed to deal with spatial invariance? Numenta Theory	50	4211	February 17, 2018
Prototype of Stability Mechanism for Viewpoint Invariance Engineering	7	923	April 16, 2020
Visual Processing and Vector Calculus Tangential Theories	24	1731	July 17, 2019
Is the topology in HTM similar to the convolution in CNN(Convolutional Neural Network)? YouTube topology	18	1817	October 26, 2018

Saccades key to vision?

Related topics