Moving the eye may serve to shift the data over the processing field.
In ANN we move the kernel over the data.
Moving the eye may serve to shift the data over the processing field.
In ANN we move the kernel over the data.
Not only eye moving. Each macrocolumn is also such a ‘kernel’ on its own.
I’ll get back to this when I have time to look for refs.
First I’ll make clear that my use of ‘convolution’ above is only to follow the ANN story. Brains were a thing before we knew of mathematics…
So, what we know for sure is that there exist cells in V1 which react to some well identified situations. I’ll describe some below. Between neurons and ANNs, the ones trying to copy the others are of course our modern notions of ‘convolutions’ and ‘kernel’.
Those well identified situations date back to Hubel&Wiesel studies. In particular they identified cells which reacted quite characteristically to edges seen in the visual field. Each such cell fired vigorously to the perception of edges of a particular orientation, and was mostly silent at other orientations. They were named ‘simple cells’.
One of the earliest large-scale, scientific visualizations of their organization is, to my knowledge, this plate by Gary Blasdel:
This colors a large patch of primary visual cortex, each color representing sensitivity to a particular orientation for those edge detectors. This was from a monkey’s brain, but rest assured I have very similar stuff in mine.
Now, this is almost textbook data. You may find more recent imagery (and possibly papers) on the subject, googling for “functional maps of orientation preference”.
What one needs to consider when trying to interpret any of the images above is that, beside this orientation thing, V1 layout is largely retinotropic, hence two close regions on the cortical patch react to events occurring in also two close regions in the visual field.
You can thus infer from the beautiful patchworks above that each local area in the visual field is associated to a patch of nicely arranged cells, covering the whole set of possible orientations for an edge happening to appear in that area.
So when a CNN uses a fixed edge-detection function over a local area, with a different output for different orientations, and applies this function as a convolution kernel over the entire input as its first filtering step, it is in essence trying to simulate the output of those ‘simple cells’.
V1 simple cells do that in a massively parallel way across each local area of its surface (and hence, of the visual field). Note that the concept of a ‘local area’ for visual field in brains is a lot more fuzzier that what is used in CNNs. And it could be arguably more continuous than discrete - Nevertheless. What’s alike is that both are local, and both perform similar local edge-detection across the whole input.
What can be linked to @Bitking’s remark about the scanning nature of our visual perception, is that, in modern CNNs which may allow their convolution kernels to learn, what differentiates a ‘convolution’ layer from a classical one with respect to learning is that same set of convolution cells in the model is fed repeatedly the input for each and all ‘local areas’ composing the full input picture. Allegedly, the structure of our visual world and the fact that we constantly move our eyes over it, would have exposed each of our ‘natural kernels’ to statistically similar data. Once again, this part of a CNN model and what V1 does would match.
Now, there’s more to V1 than ‘simple cells’, which we’d have a few proposed models for already. There are complex cells. Color blobs. Cells concerned with stereoscopy, and all kinds of stuff. There is still a lot to be found. The very layout of this organization in cortical topology (in contrast to, say, simply studying CNN kernels with an ability to learn) is allegedly interesting in itself.
Anyway. There are large holes to our knowledge even about V1, but it’s also one of the best specified patches of cortex we have, straight from the lab.
As I expressed in the very first post… I’m far from the first person being interested in V1, by precisely this reason. And… maybe I’m a train late here. Could be. I’m just willing to try and inject our new understandings about NMDA spikes, also JH-style prediction, also possibly wave-interference ideas… lots of stuff, really, in those kinds of studies, and see what comes out of it.
As we all know, visual recognition works well without eye movements, so I would leave the eye movements aside from the discussion.
Actually - you can’t leave out eye movements.
If you do your vision fades to a dull grey field very quickly.
Putting these micro-saccades aside - part of effective scene recognition is is examining the scene by moving the eyes. If you want to cripple vision by forcing fixation to a central dot you dramatically reduce effectiveness.
I don’t have papers handy on this here at work but I may be able to support this later.
If you do decide to leave out saccades, you’ll need to replace it with another form of sensor movement / perspective change.
Thank you, I didn’t think from this point of view. I agree, this can be considered an inspiration for CNN, although the differences are significant.
For instance, a position of the edge inside a kernel is important, but we know nothing about different kinds of the simple cells with the same angle of its edge. Also, we can find other patterns among the filters of the first layer of CNN: carves, texture, etc. - again, not what we found in V1. Finally, to make CNN work well, typically you would need many layers, what we don’t see in the cortex.
BTW, talking about the kernel analogy, does anybody know what is the size of the receptive field for simple cells in V1?
From one of the authorities in the field:
It varies by where you pick in the retina.
Well, all experiments of H&W and the most of others ware made with fixed by anesthesia eyes.
Also, we can recognise a key object on a complex image for less then 15ms - it’s just not enough time to make even one saccade.
We just do it much faster than any muscles could support it. Unless we do it virtually, somewhere in the cortex
You can recognize it maybe after the fact with a single glance, but your brain can’t learn it in the first place without movement. At least that’s the HTM perspective.
I think that you may find it interesting to actually read the H&W work. This first real response was to a MOVING edge of a microscope slide.
“Whereas many geniculate cells respond to diffuse white light, even if weakly, cortical cells, even those first-stage cells that resemble geniculate cells, give virtually no responses. One’s first intuition, that the best way to activate a visual cell is to activate all the receptors in the retina, was evidently seriously off the mark. Second, and still more ironic, it turned out that the cortical cells that did give on or off responses were in fact not cells at all but merely axons coming in from the lateral geniculate body. The cortical cells were not responding at all! They were much too choosy to pay attention to anything as crude as diffuse light.
This was the situation in 1958, when Torsten Wiesel and I made one of our first technically successful recordings from the cortex of a cat. The position of microelectrode tip, relative to the cortex, was unusually stable, so much so that we were able to listen in on one cell for a period of about nine hours. We tried everything short of standing on our heads to get it to fire. (It did fire spontaneously from time to time, as most cortical cells do, but we had a hard time convincing ourselves that our stimuli had caused any of that activity.)
After some hours we began to have a vague feeling that shining light in one particular part of the retina was evoking some response, so we tried concentrating our efforts there. To stimulate, we were using mostly white circular spots and black spots. For black spots, we would take a 1-by-2-inch glass microscope slide, onto which we had glued an opaque black dot, and shove it into a slot in the optical instrument Samuel Talbot had designed to project images on the retina. For white spots, we used a slide of the same size made of brass with a small hole drilled through it. (Research was cheaper in those days.) After about five hours of struggle, we suddenly had the impression that the glass with the dot was occasionally producing a response, but the response seemed to have little to do with the dot. Eventually we caught on: it was the sharp but faint shadow cast by the edge of the glass as we slid it into the slot that was doing the trick.”
As to the recognition of gross and fine details not all of this is done in the cortex. There is considerable work showing that coarse detail in done earlier in the processing stream - in sub-cortical structures. (The lizard brain) This is where blind-sight is processed. Some of this trickles into the cortex though rather diffuse pathways such as the frontal eye fields. The fast responding peripheral area is very low resolution; we have to pull the high resolution fovea over to the targets selected by the lizard brain to learn more about it.
These are the papers that I was referring to before - it is not full high-resolution recognition without moving the eyes.
The convolution kernels in DNN recognition are dragged over the whole image whether it contains anything of interest or not. The lizard brain is more selective and does a fair job of picking salient points to focus the high-resolution mammalian brain analyses. This is the comparison I was trying to highlight.
This is also why you get these weird artifacts in images recreated with DNN scanning where there are features recreated/recognized in places where they don’t belong.
Thank you, but it’s about the receptive field of the ganglion cells. I checked the part about the simple cells in this book too, but didn’t find anything. I’m asking about the simple cells in V1.
I’ve never heard anybody mention it and ones I tried to look for it intentionally. I definitely could miss the right source, but it seems kinda weird to me, that this information isn’t broadly available.
Learning is a good point, I tend to agree with it. Still, we are talking about recognition here…
The size of a simple-cell receptive field depends on its position in the retina relative to the fovea, but even in a given part of the retina, we find some variation in size. The smallest fields, in and near the fovea, are about onequarter degree by one-quarter degree in total size; for a cell of the type shown in diagrams a or b in the figure on this page, the center region has a width of as little as a few minutes of arc. This is the same as the diameters of the smallest receptive-field centers in retinal ganglion cells or geniculate cells. In the far retinal periphery, simple-cell receptive fields can be about 1 degree by 1 degree.
Again, you might enjoy actually reading the H&W work.
I didn’t read this book (yet), but I’m more or less familiar with their work by their lectures and other sources. The story about the moving edge of a slide it one of my favourites in the row of semi-occasional scientific findings
I believed this is alternative and supportive visual pathway, not preliminary to V1…
Yes, scanning all parts one by one would be a terrible idea
Nevertheless, I don’t see how old brain could really help with scanning only essential parts of, let’s say a car, besides focusing on most contrasting elements. I mean we need some hypotheses to check them, scanning the most discriminating parts of the pattern, so it can be done only in the cortex which learned these patterns, not in the old brain with innate structure. Plus, we would have the same question about spacial convolution, but for the old brain: to help focus at important details it has to recognize the whole pattern first.
Thank you, I scanned this chapter but missed this part. I will definitely read the book later.
That’s actually very interesting: the receptive field for edges described as squares
What could be a reason for this?
I am not sure how to help you with this. I have been reading papers on the topic of vision for decades and have a pretty good idea of how much of this works. It is unfortunate that I can’t point to a single authoritative source for the things you are asking. Most of the papers I have been reading are written to a single point - often not the point I am researching - and the useful information is there as a side-effect.
I don’t think that you would find it satisfying to have me point you to a foot thick stack of paper to gather the same background that I am working from nor do I expect you to respect me as an authority. This exchange of points in “call and response” format quickly loses focus and become unreadable.
I will hit what I think are my main points here and you can take from them what you will.
The vision system is not the exclusive realm of the cortex. It integrates the posture system, the vestibular system, the early processing in the lizard brain (primarily though to the amygdala), and several layers from V1 forward to a host of centers in the cortex. On the path to these cortical destinations, the deep cortex layers pass axons down to the old brain and back again in loops that let the lizard brain take repeated samples of what the cortex is doing.
Vision is an active process that does not work in a single exposure like a photograph. The lizard brain forces the eye to layer multiple images one on another as the saccades force the fovea to look at what the old brain chooses to be interested in. In the cortex, I think of this ends up looking like a palimpsest; when we learn something we are learning what the parts layered on top of each other look like.
There are known built-in archetypes like faces, concave, convex, horizon lines, shapes of secondary sexual characteristics, and various animal shapes. These are low-resolution shapes that are the drivers for the sub-cortical structures to drive frontal eye fields to drive saccades for scanning. I have reason to believe that as the cortex become more capable some of this catalog of shapes is increased but I don’t have a single authority on that conjecture.
I am firmly convinced that we are born with our lizard brain mostly in control and this serves as the training guide to the cortex. In the “Three Visual Streams” paper I have seen the clearest defense for my intuition that for the most effective learning in a multi-map/layer system you need training to both push from the bottom and pull from the top. The lizard is at one end and the real world is on the other end. I am convinced that is why we have both feedforward and feedback paths throughout the brain.
In the beginning, the lizard brain drags the cortex through the motions. Some of these motions are learning to see, learning to move, leaning to imitate noises and through that eventually to speak. Through speaking we learn to think and reason. As the cortex learns the world the cortex eventually takes over. The cortex digests the world into a format the lizard brain can deal with and the lizard brain makes decisions that are then implemented through the forebrain. It is has been repeatedly demonstrated that we make decisions before we are aware that we have done so.
I call it my dumb boss/smart advisor theory.
You asked what can the lizard part of the brain see? If you think about this you will have to answer the basic question - what DOES a lizard see when it looks at a car? Even if it does not know anything about cars it sees something. A crocodile can chase prey over the shape of a car without knowing much about cars.
As long as we are here: I have posted my take on consciousness on this site - if you combine that with the global workspace theory you will have a pretty good idea where I am going with my take on an AGI. In a nutshell - the senses feed into the parietal lobe and the lizard brain feeds into the frontal lobe. There are multiple broad pathways with successive data processing on the way with autobiographical memory somewhere in the middle, and some rather direct connections between these two poles. If a something is sensed in the world that meets up with a lizard-generated need there is a global workspace ignition and the related plan in the frontal lobe is enhanced into action. This plan allows for nested actions and constant feedback on the state of an ongoing global plan. If you want to talk about this I can follow up on it in the “post if you have an AGI” thread.
Thank you, that’s useful references and interesting thoughts.
I’m not familiar with the inner structure of the old brain, for me it’s a black box connected to the cortex. Do you think in general it uses the same principles for vision as the cortex? I mean, is it basically the previous version re-specialized to work in pair with the cortex, or does it work completely differently?
It would be interesting, thank you in advance. The global workspace theory and forming the world model it’s what I’m deeply interested, but didn’t have enough time to dig in.
Yes - there are layers. There are also sizable nodes that run on more of a boltzmann-hopfield network. Dragons be here in these poorly charted waters.