Saccades key to vision?

So now that i clearly explained how chain code relates to neural edge detection, So i will move
on.

My chain code is the angle to the next pixel in a outline. Not matter how bumpy or lumpy
the outline is the sum of all values of a outline are always sums to 360. So this will make
it easy to find the centroid position of any outline.
If polar regression is done to chain code outline, the
outline will form into perfect circle and will have its own radius and pixel count to identify it.
Or spline smoothing algorithm can be applied iteratively to remove noise from the out line
or to get the amount of iteration to make it a perfect circle.

Out lines of any objects can be described with chain code. And a narrow focus can follow a
a out line around silhouette. So a sting of chin code could be place into a data base
or temporal array. And a sliding window algorithm could follow the outline.

Predictions could be made of will occur a head of the sliding window based on Markov chain
model. Or LSTM or RNN could be used to re create what lies a head.
The only thing left is conscious logic make the decision of what outline to follow at branch
decision points.

Computer vision uses a kernel mask to fined edges and blobs. Brain use neurons to compare
differences of two location to find a edge. And also, if neurons find same in two different location
then they have found a blob.

I think quad tree would be good for finding edges in a image. First starting high up and then
move on to higher detail so that lot of data can be generated.

Quad tree:

Yeah it seems so, though I term ‘features’ what you term ‘objects’.

Essentially - saccadding from a relatively high-level features (ie from nose to mouth) and saccadding from relatively low-level features (ie corners and lines) is the same thing - just at different scales.

It might be that absolute scale it not important. Imagine you’re an infant and you have yet to recognize faces (or any objects/features at that level). You are at the stage of learning basic shapes. The lines and corners that make up the shapes may occupy a small space on the retina, or they could occupy a large space (or any scale in between). ie, the square shape of a book, or a small square shape within a checker pattern - the saccades will jump between the features of the square (corner, line, corner, line, etc.). So although the scales are different, the features and saccadic movements are quite similar. If the square is rotated or scaled up/down the saccadic movement will remain similar, therefore lending to invariance. When you begin looking at a square and detect set of features during saccadic movement (corner, line) the cortex begins making predictions about what feature is coming next (corner).

image

In the image above you saccaded over the dots and perhaps you recognized it as a square, even though there are no lines. Perhaps you recognized it as a square because you made similar saccading movements as you would when you look at a square with lines or fill. On each saccade satisfied a main feature - an edge.

image

The same could be said for the images above. Then this could lead into the weird and wonderful field of perception.

image

Yes, I was only distinguishing them for the purpose of clarifying the point. Essentially things are composed of other things which are composed of other things etc.

These are also the same thing in the long run. :man_shrugging:

1 Like

Has anyone come across any studies of saccade movement patterns over simple geometric shapes? For example, do the eyes spend more time on corners, edges, curves, etc?

2 Likes

If you trace out a image outline with pin point focus Then the recorded eye movement then
become part part of the image information. So if you eye jump to three points of a triangle then
that is a large fraction of information describing it. Just like a hop field network. Give a hopfield
network a fraction of the information and it spit out the missing information.

That’s sort of what I’m getting at. Somehow when our attention locks onto a particular entity, we are able to assign a coordinate system to it almost instantaneously. In the case of simple geometric shapes, I’m trying to figure out what the eyes are drawn to which allows this to happen so quickly. For example, is it areas of contrast (which would imply saccades are drawn to edges initially rather than points)?

In the absence of a study on the subject, I’ll have to rely on theory. For example, it would make sense if for a triangle, that the eyes first move to the area of greatest contrast, then roughly follow that to a point and use the point as an initial reference, etc. I’m hoping to understand how the eyes actually solve the problem, though. I’m trying to model attention. Vision and geometric shapes seems to be a great place to get insights into how it works.

I should explain why I am interested in geometric shapes specifically. There are a few reasons. One is that they are 2D rather than 3D, so it is a simpler form of the coordinate system problem. Second, there are no internal features, so I can be more confident that the movements are related to the shape itself, versus attention shifting to a sub-feature. Third, there aren’t likely to be unrelated survival behaviors baked in. With a face, for example, it appears that significantly more time is spent on the eyes (probably to collect information that is only relevant to social behaviors). I’m guessing there are baked in behaviors for several other entities like that as well.

2 Likes

A naive approach I had thought of in the past is similar to the quad-tree demonstrated above. Imagine all the ‘pixels’ of the retina feeding into the cortex. The layer of pixels is 16x16. Parallel to that layer is the same retina feed but into a layer of 8x8 pixels. Each pixel is on/off depending on the number of on pixels within its ‘receptive field’ of the pixels below it in 16x16 layer. If over half the children are on, then the parent is on.

image

Putting to together into a 2d hierarchy (or quad-tree) then you have a representation of the image at various levels.
(the below drawing is not exact, but good enough for illustration)
visual_scales

The purpose for this is to control the movements of saccades top-down in the hierarchy/tree. In the 2x2 there are two on pixels representing 2 areas of interest. Down to 4x4 the form becomes clearer representation but serves to focus the attention on the relative objects/corners/edges. The control then feeds further down until you get to an exact edge or corner in 32x32. The jump from (say corner to corner), can easily be done by feeding down the target ‘features’ from 4x4 to target corners in 16x16.

As you can see above - the movement from one point to another is smaller as you go up the hierarchy. The saccades still occur on the 16x16 but the control works on all levels in a coordinated fashion.

This could also help in scale invariance too. If features were to be detected at each level then a feature close-up or far-away will be captured as the same.

But of course, pure theory.

3 Likes

I see the focus pointer with a X and Y movement on a image and z for moving through recorded
images. or reconstructed, or constructed images.
Also there is value for the size of focus.
For searching, Start out with a wide so that only
the big feature of a out lines are found fist and then if need be go to a higher resolution for finer
out lines of greater details.

After selecting the starting point of a outline, a pick point,
I like to use “relative chain code”. The fist two pixels of a outline set the direction of angle of zero
and third is the relative angle direction, and so on.
if the next pixel is to the right it is negative value. If it is to the left it is a positive angle.
Moving in a straight line is a angle of zero, positive value. A complete outline sums to
around + or - 360 degrees.
Scratches or stand alone lines do not sum to + or - 360. And also stand alone single pixels.
Absolute pixel location of a out line can be used for a descriptor table.

But back to relative chain code. First the outline sub features are found. They are the
curved and straight sections of the outline. The are re described by sub sum of the curve
divided by the chord:


divided by:

This is to do dimension reduction to make the system generalize better. It adds more data to
a descriptor table. But it can make a really fast look up table. To find matching patterns of outline
or compare the differences to other complete outlines.
Will, this is my hack on every thing, for now:)

1 Like

@sebjwallace This really good work. I could read and talk about this stuff all day.

1 Like

What does it mean to “recognize a pattern?”
Who or what looks at this recognized pattern?
I get that you have some pattern of bit in an SDR but how does that do anything useful? That is just as arbitrary and meaningless as the pattern of bits in the original image.

This is silly. The essential thrust of the HTM work is that the neocortex uses SDRs as its common internal representation, so that common algorithms can be applied to a range of inputs. Sensory inputs are converted into SDRs and eventually SDRs are converted into motor actions. Why do you need ‘meaning’?

HTM theory does not yet address how SDRs are combined. The SP does not serve this purpose.

Further, for image recognition forget anything you have learned about image segmentation you may be familiar with as none of those techniques work with saccades.

Ignore saccades, they come later. For static image recognition, look at tachistoscope experiments. The eye does feature recognition very quickly, and combines features (lines, angles, etc) with location information to recognise objects within the duration of a single saccadic image.

My belief is that basic feature extraction is part of input encoding, and is hard-coded into the retina and pre-cortical components. I would expect the techniques (such as convolution) from AI/ML visual recognition to be useful, before handing over to HTM components.

HTM theory does not yet address how locations are combined with feature SDRs.

Have you considered that tacho vision recognition works on the whole shape without any breakdown to components?

Since it covers non-foveal visual areas it does engage the areas normally used to detect motion in the peripheral fields. It’s fast and very low resolution.

Once you subtract this outlier all human vision depends of collecting little postage stamp sized snippets (foveal acuity) and combining those into object recognition. Each snapshot lies one atop the other in V1. This is not some higher order visual task - this is the norm for all visual perception.

This is radically different than the usual computer graphics techniques I have been teaching for decades.

1 Like

This is incorrect. We combine SDRs all the time. That’s why we have a MultiEncoder. The SP does exactly this.

You still have more papers to read:

2 Likes