Project : Full-layer V1 using HTM insights

Iā€™m afraid itā€™s impossible to do in isolation from V2, V4, etc. - itā€™s an oversimplification to treat any of this zones as mostly separated modules or as a strongly vertical stack of modules.

I agree, but itā€™s possible to do not by trying to exactly reproduce SOME of the details which are clear for now, but looking for the key motives.

Do we? As I see it, they reported about some patterns in the behaviour of the neurons in V1, V2, etc. (what was no doubt super useful), but didnā€™t provide a theory of what was really going on there. If they did, we would have an AI based on it, but unfortunately, we donā€™t have it yet.

Thatā€™s why it has to be the first research target. When it will be found, everything else will be clarified much easier. Without it, nothing else has practical meaning.

I strongly believe we will find how does it work by modelling different ideas earlier, then somebody in a lab will find the final details. There is enough collected data to find insights, or at least, to check the plausibility of your theoretical ideas. So, you have everything you need :slight_smile:

You said you know what is the V1 output. Why donā€™t start from here and find a way to go higher in the hierarchy? We donā€™t really need another edge detector, we need a biologically plausible alternative to CNN.

Iā€™ve got a great idea - @gmirey can pursue what he finds to be an interesting challenge and YOU can find the biologically plausible alternative to CNN. Then everybody is doing what they want to do, WIN-WIN!

Again, my point is it isnā€™t a productive idea to clarify anything by modeling the visual pathway up to the V1. Because the output of this part is relatively simple and can be received in many different ways. So, your working model wonā€™t prove anything but your coding skills.
On other hand, the cognitive part of the visual pathway perhaps can be implemented in only few biologically plausible ways and your working models will be a good starting point to go to any direction from there. Plus will be useful by itself.

I completely agree, it just looked like @gmirely didnā€™t want to just have an interesting challenge, but use his model to understand some important things about the cortex. Sorry if Iā€™m wrong, Iā€™m really better do my own stuff 0:-)

1 Like

And you are right :slight_smile:

It is. The ā€˜targetā€™.

ā€˜Modellingā€™ā€¦ā€˜Checkingā€™. Yup, precisely. I donā€™t see a computer model as a despicable end-product. It is an integral part of our R&D (or even theorization) toolbelt now, for modelling and checking. As much as a pen and paper for drawing boxes, arrows, and/or equations.

You do realize, that what I find the most interesting known-output of all, as far as V1 goes, is the output of the ā€˜learningā€™ function itself, right ?
aka the end-state of those cells and dendrites and synapses after exposition.

Assuming that you doā€¦ I donā€™t quite understand over what weā€™re disagreeing here.

  • If you think V1 formation is so complicated that it wonā€™t work in isolation, then weā€™ll try to add parts of a hierarchy. I stated as much already. That endeavor could give us some evidence of this very requirement.
  • If you think V1 formation is so simple that any model would do, and thus we wonā€™t get any insight reaching it, thenā€¦ well at that point I donā€™t think it will be that easy. But right; it is some possible concern. If that turns out to be the case we can always turn the ā€˜probingā€™ part on its head and look for models which fail. Or strip ingredients one by one to get a clue about which are the necessary onesā€¦

Weā€™ll learn ā€˜somethingā€™ either way.

I donā€™t know how my coding skills are relevant to the discussion, since you did understand that I donā€™t want to hardcode V1-like output (or didnā€™t you ? the purpose is not a clever edge detector for the sake of it), but let a model learn online and see if its cells tune themselves towards edge detection and stuff.

Now, if your concern is that I canā€™t model anything before having a well-defined model in the first place, Iā€™ll restate that ā€˜let a model learn onlineā€™ in the sentence above will more likely turn out to be an iterative ā€˜let several, many, theorized models, in turn, learn onlineā€™. And see which of these succeeds. I may already have some insights for the first tries, grantedā€¦ but Iā€™m not putting too much confidence in them anyway, and all these models (against which to ā€œcheck the plausibility of your theoretical ideasā€) could very well be dug out/refined/invented as we go.

ā€˜Inventedā€™ā€¦ Hacker-style :dark_sunglasses: since Iā€™m no Einstein, sadly.

To concludeā€¦ I donā€™t know how V1 decision of forming ā€˜simpleā€™ edge detection when exposed to visual stimulus is relevant to (A)GI. But I strongly bet that it is. Relevant. V1 is cortex. We both agreed on that, it seems. And I believe, that by witnessing concretely ā€˜howā€™ would V1 be driven to come to that particular choice, weā€™d gain insight into precisely that.
ā€œWhat stands as relevant info and/or coincidences to wire to, from an (A)GIā€™s substrate point of viewā€.
Quite the nut to crack if you ask me.

Hey.
I know from others telling me in the past, that I may sound cold while debating, but please understand, both of you, Iā€™m not upset at all.

I donā€™t have a precise understanding of why weā€™re arguing, mind youā€¦ but it seems like we are. To my mind, at this point, either I was not able to explain something very fundamental about that project, or Iā€™m still not aware of some very fundamental objection you could raise against it. I really donā€™t know where to put the ball here. And Iā€™d like to find out.

If there is still room for a very core misunderstanding on the goal of that projectā€¦ like, Iā€™d betā€¦ the fact that ā€œreaching a V1-similar self-formation, by online learning, requiring us to come up with biologically plausible learning rules which would be common to all cortexā€ is really the name of the game, Iā€™d rather keep discussing, and hopefully manage to expose this more clearly for anyone to read.

If, to the contrary, there exists such fundamental objection to my approachā€¦ like, I donā€™t know, sayā€¦ ā€œsame has been tried repeatedly and reached a dead-end quicklyā€ then Iā€™d rather know about it, discussing with you.

Regards,
Guillaume.

1 Like

Iā€™ve been lurking here, and I know how hard it is to communicate complicated ideas over text like this. It always seems like arguing, usually it is not. In this case, all parties are being quite civil, polite, and engaging. You folks are good eggs.

5 Likes

Letā€™s go back to the beginning of my intervention here: I believe any new efforts in the domain can be useful and insightful. Especially, when we are talking about something like self-formation, which is cool by itself!
I just tried to share my hypothesis, that starting with hard part of the problem can be paradoxically an easier way to find universal motives of the cortex functioning. I can be wrong, or you can be just lucky to find a right angle starting from a more distant point.
So, good luck and keep us posted!

2 Likes

Actually the idea that my endeavor sounds incomprehensible is worrying me.
Iā€™ve thought about another approach to explain it, maybe more amenable to an introduction for readers with a ML background.

Multilayered perceptrons as they are known for decades, already proved quite successful at, say, categorizing handwritten digits (eg, trained against MNIST). Their internal layers were not showing any abstractions that weā€™d believe were very clever, thoughā€¦ then people came up with the idea of adding convolutions to filter the input towards the start of the network. Performance of these more modern ANN was much better, and the intermediate steps were driven towards more intelligible things.

That sounded pretty good, especially given the fact that all those messy biological V1 edge detectors actually seem like theyā€™re doing convolution themselves. So weā€™re feeling very legitimate, using this as a starting ground and developing better models on top of that, for simulating or understanding whole ā€˜visual pathwaysā€™ and coming up with computer vision algorithmsā€¦ or even to give birth to our global-scale theories of brains.

I was kinda para-dropped into the field at this state of affairs. My point is that, simply being happy with those pre-made convolutions, to the point that weā€™re feeling ā€œoh, noā€¦ not another edge detectorā€ seems like forgetting a little too soon that V1 is cortex, and that it had to decide (quite early in life) to form those kinds of convolutions by itself, only crunching on retina/LGN output (and unsupervised at that).

Well, Iā€™m settled on understanding precisely how.

I believe the retinotropic locality of the cortical patches processing different parts of the visual field help with that a lot, so Iā€™ve tried to understand topological relationships in some detail firstā€¦ But there is probably more to it than that.
And, both topology and any bit of ā€œmore to itā€ we can understand here, will likely prove relevant for understanding any other area of cortex.
Which in turn will likely prove relevant for global theories.

2 Likes

Beside our previous discussion: could you direct me to any proves, that our cortex makes anything like spatial convolution? I regularly hear the claim that CNN was inspired by our brain, but never encountered any related neurobiological clues.

Moving the eye may serve to shift the data over the processing field.

In ANN we move the kernel over the data.

1 Like

Not only eye moving. Each macrocolumn is also such a ā€˜kernelā€™ on its own.
Iā€™ll get back to this when I have time to look for refs.

1 Like

First Iā€™ll make clear that my use of ā€˜convolutionā€™ above is only to follow the ANN story. Brains were a thing before we knew of mathematicsā€¦
So, what we know for sure is that there exist cells in V1 which react to some well identified situations. Iā€™ll describe some below. Between neurons and ANNs, the ones trying to copy the others are of course our modern notions of ā€˜convolutionsā€™ and ā€˜kernelā€™.

Those well identified situations date back to Hubel&Wiesel studies. In particular they identified cells which reacted quite characteristically to edges seen in the visual field. Each such cell fired vigorously to the perception of edges of a particular orientation, and was mostly silent at other orientations. They were named ā€˜simple cellsā€™.
One of the earliest large-scale, scientific visualizations of their organization is, to my knowledge, this plate by Gary Blasdel:

(see http://hubel.med.harvard.edu/book/b30.htm)

This colors a large patch of primary visual cortex, each color representing sensitivity to a particular orientation for those edge detectors. This was from a monkeyā€™s brain, but rest assured I have very similar stuff in mine.

Now, this is almost textbook data. You may find more recent imagery (and possibly papers) on the subject, googling for ā€œfunctional maps of orientation preferenceā€.

What one needs to consider when trying to interpret any of the images above is that, beside this orientation thing, V1 layout is largely retinotropic, hence two close regions on the cortical patch react to events occurring in also two close regions in the visual field.

You can thus infer from the beautiful patchworks above that each local area in the visual field is associated to a patch of nicely arranged cells, covering the whole set of possible orientations for an edge happening to appear in that area.

So when a CNN uses a fixed edge-detection function over a local area, with a different output for different orientations, and applies this function as a convolution kernel over the entire input as its first filtering step, it is in essence trying to simulate the output of those ā€˜simple cellsā€™.
V1 simple cells do that in a massively parallel way across each local area of its surface (and hence, of the visual field). Note that the concept of a ā€˜local areaā€™ for visual field in brains is a lot more fuzzier that what is used in CNNs. And it could be arguably more continuous than discrete - Nevertheless. Whatā€™s alike is that both are local, and both perform similar local edge-detection across the whole input.

What can be linked to @Bitkingā€™s remark about the scanning nature of our visual perception, is that, in modern CNNs which may allow their convolution kernels to learn, what differentiates a ā€˜convolutionā€™ layer from a classical one with respect to learning is that same set of convolution cells in the model is fed repeatedly the input for each and all ā€˜local areasā€™ composing the full input picture. Allegedly, the structure of our visual world and the fact that we constantly move our eyes over it, would have exposed each of our ā€˜natural kernelsā€™ to statistically similar data. Once again, this part of a CNN model and what V1 does would match.

Now, thereā€™s more to V1 than ā€˜simple cellsā€™, which weā€™d have a few proposed models for already. There are complex cells. Color blobs. Cells concerned with stereoscopy, and all kinds of stuff. There is still a lot to be found. The very layout of this organization in cortical topology (in contrast to, say, simply studying CNN kernels with an ability to learn) is allegedly interesting in itself.

Anyway. There are large holes to our knowledge even about V1, but itā€™s also one of the best specified patches of cortex we have, straight from the lab.
As I expressed in the very first postā€¦ Iā€™m far from the first person being interested in V1, by precisely this reason. Andā€¦ maybe Iā€™m a train late here. Could be. Iā€™m just willing to try and inject our new understandings about NMDA spikes, also JH-style prediction, also possibly wave-interference ideasā€¦ lots of stuff, really, in those kinds of studies, and see what comes out of it.

5 Likes

As we all know, visual recognition works well without eye movements, so I would leave the eye movements aside from the discussion.

Actually - you canā€™t leave out eye movements.
If you do your vision fades to a dull grey field very quickly.

Putting these micro-saccades aside - part of effective scene recognition is is examining the scene by moving the eyes. If you want to cripple vision by forcing fixation to a central dot you dramatically reduce effectiveness.

I donā€™t have papers handy on this here at work but I may be able to support this later.

If you do decide to leave out saccades, youā€™ll need to replace it with another form of sensor movement / perspective change.

1 Like

Thank you, I didnā€™t think from this point of view. I agree, this can be considered an inspiration for CNN, although the differences are significant.
For instance, a position of the edge inside a kernel is important, but we know nothing about different kinds of the simple cells with the same angle of its edge. Also, we can find other patterns among the filters of the first layer of CNN: carves, texture, etc. - again, not what we found in V1. Finally, to make CNN work well, typically you would need many layers, what we donā€™t see in the cortex.
BTW, talking about the kernel analogy, does anybody know what is the size of the receptive field for simple cells in V1?

From one of the authorities in the field:
http://hubel.med.harvard.edu/book/b10.htm

It varies by where you pick in the retina.

1 Like

Well, all experiments of H&W and the most of others ware made with fixed by anesthesia eyes.
Also, we can recognise a key object on a complex image for less then 15ms - itā€™s just not enough time to make even one saccade.