Project : Full-layer V1 using HTM insights

Thank you, but it’s about the receptive field of the ganglion cells. I checked the part about the simple cells in this book too, but didn’t find anything. I’m asking about the simple cells in V1.
I’ve never heard anybody mention it and ones I tried to look for it intentionally. I definitely could miss the right source, but it seems kinda weird to me, that this information isn’t broadly available.

Learning is a good point, I tend to agree with it. Still, we are talking about recognition here…

The size of a simple-cell receptive field depends on its position in the retina relative to the fovea, but even in a given part of the retina, we find some variation in size. The smallest fields, in and near the fovea, are about onequarter degree by one-quarter degree in total size; for a cell of the type shown in diagrams a or b in the figure on this page, the center region has a width of as little as a few minutes of arc. This is the same as the diameters of the smallest receptive-field centers in retinal ganglion cells or geniculate cells. In the far retinal periphery, simple-cell receptive fields can be about 1 degree by 1 degree.

Again, you might enjoy actually reading the H&W work.

I didn’t read this book (yet), but I’m more or less familiar with their work by their lectures and other sources. The story about the moving edge of a slide it one of my favourites in the row of semi-occasional scientific findings :slight_smile:

I believed this is alternative and supportive visual pathway, not preliminary to V1…

Yes, scanning all parts one by one would be a terrible idea :slight_smile:
Nevertheless, I don’t see how old brain could really help with scanning only essential parts of, let’s say a car, besides focusing on most contrasting elements. I mean we need some hypotheses to check them, scanning the most discriminating parts of the pattern, so it can be done only in the cortex which learned these patterns, not in the old brain with innate structure. Plus, we would have the same question about spacial convolution, but for the old brain: to help focus at important details it has to recognize the whole pattern first.

Thank you, I scanned this chapter but missed this part. I will definitely read the book later.

That’s actually very interesting: the receptive field for edges described as squares :confused:
What could be a reason for this?

I am not sure how to help you with this. I have been reading papers on the topic of vision for decades and have a pretty good idea of how much of this works. It is unfortunate that I can’t point to a single authoritative source for the things you are asking. Most of the papers I have been reading are written to a single point - often not the point I am researching - and the useful information is there as a side-effect.
I don’t think that you would find it satisfying to have me point you to a foot thick stack of paper to gather the same background that I am working from nor do I expect you to respect me as an authority. This exchange of points in “call and response” format quickly loses focus and become unreadable.

I will hit what I think are my main points here and you can take from them what you will.

The vision system is not the exclusive realm of the cortex. It integrates the posture system, the vestibular system, the early processing in the lizard brain (primarily though to the amygdala), and several layers from V1 forward to a host of centers in the cortex. On the path to these cortical destinations, the deep cortex layers pass axons down to the old brain and back again in loops that let the lizard brain take repeated samples of what the cortex is doing.

Vision is an active process that does not work in a single exposure like a photograph. The lizard brain forces the eye to layer multiple images one on another as the saccades force the fovea to look at what the old brain chooses to be interested in. In the cortex, I think of this ends up looking like a palimpsest; when we learn something we are learning what the parts layered on top of each other look like.

There are known built-in archetypes like faces, concave, convex, horizon lines, shapes of secondary sexual characteristics, and various animal shapes. These are low-resolution shapes that are the drivers for the sub-cortical structures to drive frontal eye fields to drive saccades for scanning. I have reason to believe that as the cortex become more capable some of this catalog of shapes is increased but I don’t have a single authority on that conjecture.

I am firmly convinced that we are born with our lizard brain mostly in control and this serves as the training guide to the cortex. In the “Three Visual Streams” paper I have seen the clearest defense for my intuition that for the most effective learning in a multi-map/layer system you need training to both push from the bottom and pull from the top. The lizard is at one end and the real world is on the other end. I am convinced that is why we have both feedforward and feedback paths throughout the brain.

In the beginning, the lizard brain drags the cortex through the motions. Some of these motions are learning to see, learning to move, leaning to imitate noises and through that eventually to speak. Through speaking we learn to think and reason. As the cortex learns the world the cortex eventually takes over. The cortex digests the world into a format the lizard brain can deal with and the lizard brain makes decisions that are then implemented through the forebrain. It is has been repeatedly demonstrated that we make decisions before we are aware that we have done so.

I call it my dumb boss/smart advisor theory.

You asked what can the lizard part of the brain see? If you think about this you will have to answer the basic question - what DOES a lizard see when it looks at a car? Even if it does not know anything about cars it sees something. A crocodile can chase prey over the shape of a car without knowing much about cars.

As long as we are here: I have posted my take on consciousness on this site - if you combine that with the global workspace theory you will have a pretty good idea where I am going with my take on an AGI. In a nutshell - the senses feed into the parietal lobe and the lizard brain feeds into the frontal lobe. There are multiple broad pathways with successive data processing on the way with autobiographical memory somewhere in the middle, and some rather direct connections between these two poles. If a something is sensed in the world that meets up with a lizard-generated need there is a global workspace ignition and the related plan in the frontal lobe is enhanced into action. This plan allows for nested actions and constant feedback on the state of an ongoing global plan. If you want to talk about this I can follow up on it in the “post if you have an AGI” thread.


Thank you, that’s useful references and interesting thoughts.

I’m not familiar with the inner structure of the old brain, for me it’s a black box connected to the cortex. Do you think in general it uses the same principles for vision as the cortex? I mean, is it basically the previous version re-specialized to work in pair with the cortex, or does it work completely differently?

It would be interesting, thank you in advance. The global workspace theory and forming the world model it’s what I’m deeply interested, but didn’t have enough time to dig in.

Yes - there are layers. There are also sizable nodes that run on more of a boltzmann-hopfield network. Dragons be here in these poorly charted waters.

1 Like

So, I can’t go to sleep for a few hours without you bringing the party over here, is what you are saying ?
17 messages, really ? :grin:

Nah… great conversation guys :slight_smile:
Now I have to decide whether I shall be nitpicky or not :dark_sunglasses:

Lemme see…

I think what @spin’s talking about with respect to eye movements is that ANNs used for vision seem perfectly capable to work without them for recognition. I’d agree with him that chemistry of receptors accustoming to current illumination level is not necessarily something the camera of a T800 would have to replicate.
Yet, until we know otherwise, it is not necessarily something we can leave out either, in a model trying to get a clue how it actually works.
Although, up to a point, we ourselves are perfectly able to massively parallel our way through image recognition without a saccade. A literate reader does not scan through letters and certainly does not saccade across every bar and loop of the calligraphy. Whole words are seemingly photographically recognized, per fixation.

@rhyolight, HTM 2016 ain’t much concerned about saccades either. What matters to the temporal thing of the model is that the input flows. You could as well fixate at a TV screen or through a train’s window. Now I don’t know of Numenta’s more recent SMI insights. I hope you guys tell us about them very soon.

I’m not necessarily aware of each and every paper on this as of 2018. I’m not sure we could say we know nothing of those, though.

I’m tempted to reply to that with two, orthogonal, yesbuts.
Yes, I think it is fair to say that the response of V1 simple cells is very specific and covers only a tiny subset of all possible mathematical functions we could dream of applying to a local area of an image, but…

  • But that’s also quite precisely the reason why they are worth studying. Why did they wire to this particular choice of a function. And how is it relevant to an AGI substrate that they do so.
  • But again simple cells are not the only cells in V1. You’ll find many others concerned about other aspects. Still a subset of all possible functions, hence still worth exploring how and why, though.

They marked out those based on rough limits for their responses to a stimulus, before testing inside those rough outlines with specific stimuli.
if you look at the following video from the timepoint I linked to (tried to… or about 18 minutes in, if that did not work)… I believe the insert on the bottom-right is a video archive of their actual protocol back in the day.

I think you’d agree that a convolution kernel being applied to a local square does not prevent its sensitivity to any specific subset of shapes within that square either.

Today for more detailed outlines we look at the wiring. Relationship between V1 dendrite and LGN axon makes inferring a precise receptive field in visual space quite a complicated matter, though.

Anyway. I expect that the wiring range is quite standard throughout V1 topology itself. But the fact that V1 retinotropy is distorted (giving much more space for fovea than periphery) means that a given wiring range translates to very different sizes in the visual space indeed according to eccentricity, as @Bitking pointed out.

Studying V1 could give us clues into that black box also. At least I hope it will : LGN and V1 form quite intricate loops. I’m always happy to ask @Bitking to the rescue on many of those interrogations.
I don’t believe cortex and those older parts would work the same at all. Retina to thalamus to V1 looks like a very delicate clockwork, designed by ‘evolution’. And they provide carefully tuned filters, surely very concretely helpful for processing visual data (center-surround, etc). But once in V1 we’re dealing with our elusive universal learner… quite a different matter.


As for whether those structures have been tweaked since their inception prior to the development of a pallium… my guess is yes. Possibly heavily. It happens we have subtle differences in vision-related circuitry in thalamus and midbrain from one species to the next, even among mammals.

To me, the obvious explanation for this is that optimizing visual processing at the level of specialized networks is quite doable/probable/common… but that optimizing the wonderfully weird cortical substrate as a whole, without breaking something, is a much harder proposition than most.

Had to be done a few times still, though… birds don’t have the 6-layered version of it that we mammals share, but they have something covering their older parts too… and they’re not all that dumb.

I’d expect last change to the intrinsics of the cortical fabric itself happened for us between 120 to 320 Mya… So… not exactly your everyday mutation.

And if “Gaia” herself did not find a way out of it in more than one hundred millions of years, we’d better pay attention to the details for how to replicate it. Even to those insignificant-looking V1 edge detectors.

…as well as to the lizardy things coupled below, if only to appease @Bitking’s wrath.

1 Like

This is true.

1 Like

We currently only have a rough road ahead brushed for SMI is what I was saying. From what I got reading a recent intervention of Jeff himself your research teams are currently busy dealing with deeper layers in V1.

Which is quite exciting for me and I’m quite impatient to get a shot at studying those details.


A recent attempt at making sense of V1, using computational models which try to stay true to current biological knowledge.

direct pdf here:

hey @Bitking, pretty griddy, their L2/3 activation maps at times ^^


The integration of visual and target signals in V4 and IT during visual object search

Going a bit upstream from the V1 area.


This is interesting, topdown modulation is inhibitory. It is excitatory in the current top layer of SMI research. Interestingly L2/3l has short range inhibitory connections implicating cell grids as you pointed out.


It seems that there is a lot going on in V1.
High-dimensional geometry of population responses in visual cortex

1 Like

Modeling the Visual System


Thanks for the links, @Bitking :slight_smile:

On my side, not much to report, except… I’m just starting to implement, as tightly packed as possible, my foreseen 16b-indexed/4b-valued synaptic scheme; trying to keep the thing compatible with both HTM and the possibly diverse V1 designs (Leabra & stuff).

and leaving the door open for some Calvin :dark_sunglasses:


Note the inference about visual cortex responses.

1 Like

Maybe because of the discrete repartition of color blobs in V1 ? And utter reliance on edges.