Project : Full-layer V1 using HTM insights


Although I understand your concern using the car parts analogy, I think the quote above is precisely why there is hope to find something interesting (unless I misunderstood what you were referring to, and you’re speaking about subsequent processing). Anyway. How come first step of cortical processing after retina forms kinda the same whatever you look at ? I don’t have the answer to it, and the answer seems interesting in itself.
Don’t you think modelling V1 will give us clues into it ?

Yes. and No.
Maybe the western African coast was not the most interesting part of the story for volcanology. However the observation that it does seem to match Brazilian coast was not at all uninteresting for subsequent developments in the field.
To my eyes there are striking “what the hell with this mid-Atlantic ridge?” anomalies to V1 and visual input.

I do not want to sound defensive about that question - and there is also the very real possibility that I run out of gas or just plainly ‘fail’ at it… however, reading you, I have the feeling that something very core to my approach of it was not quite clear. Either that, or there’s something very obvious to your eyes which I’m being oblivious to. Maybe we can sort this out ^^.

So. Putting aside “me comparing to my betters” once again, please consider the following analogy, as a hint to why I’m struggling at giving your question any meaningful answer.
“Toronto, June 18th, 1902:
Imagine you successfully derived an equation of motion where the speed of light is held constant - what would be your next step? It’s unlikely this part will give you sufficient basis for going further.”


My point is everything up to V1 is just preprocessing, or creating the primal sketch in Marr’s terms. You can reach a similar result by using something like scikit-image much faster. I don’t see any mystery in the fact, that the first step of cortical processing after retina forms kinda the same whatever you look at. You can use your chemical receptors instead of the retina and be able to see. As for me, it means it’s tough to find any insights for the cognitive part of the vision in retina-V1 part because you can completely change it without breaking the whole process.
I believe if we want to get something useful, we should focus on the cognitive part of the vision from the very beginning. You can always go back to the V1/LGN/V1 modelling later if you need it. At least, at that moment you’ll know what aspects of it are essential to you.


First, one possible point of agreement, I’m not arguing about retina here :wink:

Now, V1

Okay well, I do.
I don’t know of a detailed explanation for it. Do you know any ?

I’m not arguing that. To the contrary :slight_smile: But I don’t take it as an explanation per se. I’m simply viewing this as further evidence of same kind of surprising things which we need to get clues into.
There are also experiments in which we wire retinal input to other parts of cortex and they would form the same… The “anomaly”/“surprise” is, to me, the fact that cortex does that. Consistently. Whether in V1 or anywhere else. Anywhere to which we’d submit visual input.

some of my initial questions leading to that topic:

Would you object to that ?
Or is this simply a misunderstanding, us debating this now ?

The very beginning… ain’t that V1 ?

(Note that if what you’re saying after all, is simply that I seem to spend too much time fiddling with retina concerns… then I’d agree with you. Quite entirely. But I’m cursed with that very peculiar need to spend lots of time on the details)


Playing the devils advocate here: why is the processing in V1 NOT the start of vision processing?
If “cortex is cortex” why would the next map be different?


Nothing controversial to the classical view: instead of the retina-linked dotty information, a brain needs to form an image representation linked to its model of the world using meaningful level details, so it creates a primal sketch based first of all at object edges (plus a lot of other work which we don’t need if we have static mono image with fixed receptive field as an input).

It’s amazing, but perhaps not that surprising, considering the homogeneous structure of our cortex.

V1 doesn’t work in isolation, so it’s hard to answer your question without clarification of some details. Nevertheless, is we are talking about the V1 with the input only from LGN, I believe it is the wrong statement.

I guess V1 is not enough.

It’s not about the level of details, but more about perspective: to find principals first, then go to implementation details.


It is, because a brain needs the elements formed here (or we need more abstract representation of them instead).

It depends on what to consider as the map. I’m not making a direct analogy here, but let’s compare it with ANN (as a metaphor only). The mapping of each layer to another one is technically the same, but the real map is the connections with weights and biases and it’s completely different for each layer.
When we are talking about the homogeneous structure of the cortex we just saying that it can create any needed structure at the level of the information processing. It doesn’t mean it literally processes the information in the same way everywhere.


… and the connections to the sub-cortical structures; V1 has them in spades.

I am working through the “three visual streams” paper again and it is doing a lot with these connections to the thalamus.

Based on this reading I will go out on a limb and say that the V1 area is part of the vision stream and understanding this area is likely to yield important insights in the relationship to these sub-cortical structures.

The copious animal research data offers real possibilities in making the related computational functions understandable. From the foundational H&W research we have a pretty good idea what is being done - we need to work out how it’s being done.


The “principals” of cortical function remain quite elusive. And in particular to your question of “how do you see a realization of invariant representation”, I’d be surprised if anyone had a definite answer.
I’d love to find out. But I can’t claim I have cracked the thing even before I started.

I liked Marr’s approach, but I’m turning it the other way around. The fact that, in his view, brain will form a “primal sketch” and stuff is possibly a good systematic explanation of visual function, but does not readily give us an understanding of the cortical process itself.
“how” does it form that way, in V1, as a clue to “how” does any cortical patch forms the way it forms, and generally processes information, is what I am after.

To this day, V1 has one of the best defined inputs and best defined outputs.
Nobody has a detailed model - I certainly don’t. And I don’t have electrodes and a lab.
I have the ability to read and experiment computational models.

If you’d rather have me state this in a reassuring “model first” way, then we could maybe envision the whole endeavor in such well defined terms, although in an iterative manner.

  • "I hypothesize that a matrix of cells driven by pure hebbian learning and no inhibition, when fed with a signal akin to LGN output, will have formed cells akin to V1 edge detectors
    ° let’s find out if it actually does.
    ° oh, it does not. Okay, let’s work again on the theory.
  • "I hypothesize that a vanilla HTM spatial pooler, when fed with a signal akin to LGN output, will have formed cells akin to V1 edge-detectors
    ° let’s find out if it actually does.
    ° oh, it does not. Okay, let’s work again on the theory.
  • rinse and repeat…

That, and adding indeed other parts to the model when realizing that ‘a V1 sim in isolation is maybe not sufficient to produce a V1-like result’ yes… that’s kinda part of the plan. At some point.


Depending on the actual distinctions you put behind the words of ‘structure’ and ‘process’ we may agree here about something obvious to my eyes also, which I don’t consider relevant to the discussion… or finally have identified the fundamental, implicit disagreement here.

Homogeneity means to me that there’s a common frame for initial cell positions and common learning rules. Of courses “weights” of the ANN analogy are different from one area to the next. They’re learnt from each specific input. But understanding the learning rules themselves… or for that matter any step in that direction, however tiny, seems like a pretty big deal in itself.


I’m afraid it’s impossible to do in isolation from V2, V4, etc. - it’s an oversimplification to treat any of this zones as mostly separated modules or as a strongly vertical stack of modules.

I agree, but it’s possible to do not by trying to exactly reproduce SOME of the details which are clear for now, but looking for the key motives.

Do we? As I see it, they reported about some patterns in the behaviour of the neurons in V1, V2, etc. (what was no doubt super useful), but didn’t provide a theory of what was really going on there. If they did, we would have an AI based on it, but unfortunately, we don’t have it yet.


That’s why it has to be the first research target. When it will be found, everything else will be clarified much easier. Without it, nothing else has practical meaning.

I strongly believe we will find how does it work by modelling different ideas earlier, then somebody in a lab will find the final details. There is enough collected data to find insights, or at least, to check the plausibility of your theoretical ideas. So, you have everything you need :slight_smile:

You said you know what is the V1 output. Why don’t start from here and find a way to go higher in the hierarchy? We don’t really need another edge detector, we need a biologically plausible alternative to CNN.


I’ve got a great idea - @gmirey can pursue what he finds to be an interesting challenge and YOU can find the biologically plausible alternative to CNN. Then everybody is doing what they want to do, WIN-WIN!


Again, my point is it isn’t a productive idea to clarify anything by modeling the visual pathway up to the V1. Because the output of this part is relatively simple and can be received in many different ways. So, your working model won’t prove anything but your coding skills.
On other hand, the cognitive part of the visual pathway perhaps can be implemented in only few biologically plausible ways and your working models will be a good starting point to go to any direction from there. Plus will be useful by itself.


I completely agree, it just looked like @gmirely didn’t want to just have an interesting challenge, but use his model to understand some important things about the cortex. Sorry if I’m wrong, I’m really better do my own stuff 0:-)


And you are right :slight_smile:

It is. The ‘target’.

‘Modelling’…‘Checking’. Yup, precisely. I don’t see a computer model as a despicable end-product. It is an integral part of our R&D (or even theorization) toolbelt now, for modelling and checking. As much as a pen and paper for drawing boxes, arrows, and/or equations.

You do realize, that what I find the most interesting known-output of all, as far as V1 goes, is the output of the ‘learning’ function itself, right ?
aka the end-state of those cells and dendrites and synapses after exposition.

Assuming that you do… I don’t quite understand over what we’re disagreeing here.

  • If you think V1 formation is so complicated that it won’t work in isolation, then we’ll try to add parts of a hierarchy. I stated as much already. That endeavor could give us some evidence of this very requirement.
  • If you think V1 formation is so simple that any model would do, and thus we won’t get any insight reaching it, then… well at that point I don’t think it will be that easy. But right; it is some possible concern. If that turns out to be the case we can always turn the ‘probing’ part on its head and look for models which fail. Or strip ingredients one by one to get a clue about which are the necessary ones…

We’ll learn ‘something’ either way.

I don’t know how my coding skills are relevant to the discussion, since you did understand that I don’t want to hardcode V1-like output (or didn’t you ? the purpose is not a clever edge detector for the sake of it), but let a model learn online and see if its cells tune themselves towards edge detection and stuff.

Now, if your concern is that I can’t model anything before having a well-defined model in the first place, I’ll restate that ‘let a model learn online’ in the sentence above will more likely turn out to be an iterative ‘let several, many, theorized models, in turn, learn online’. And see which of these succeeds. I may already have some insights for the first tries, granted… but I’m not putting too much confidence in them anyway, and all these models (against which to “check the plausibility of your theoretical ideas”) could very well be dug out/refined/invented as we go.

‘Invented’… Hacker-style :dark_sunglasses: since I’m no Einstein, sadly.

To conclude… I don’t know how V1 decision of forming ‘simple’ edge detection when exposed to visual stimulus is relevant to (A)GI. But I strongly bet that it is. Relevant. V1 is cortex. We both agreed on that, it seems. And I believe, that by witnessing concretely ‘how’ would V1 be driven to come to that particular choice, we’d gain insight into precisely that.
“What stands as relevant info and/or coincidences to wire to, from an (A)GI’s substrate point of view”.
Quite the nut to crack if you ask me.


I know from others telling me in the past, that I may sound cold while debating, but please understand, both of you, I’m not upset at all.

I don’t have a precise understanding of why we’re arguing, mind you… but it seems like we are. To my mind, at this point, either I was not able to explain something very fundamental about that project, or I’m still not aware of some very fundamental objection you could raise against it. I really don’t know where to put the ball here. And I’d like to find out.

If there is still room for a very core misunderstanding on the goal of that project… like, I’d bet… the fact that “reaching a V1-similar self-formation, by online learning, requiring us to come up with biologically plausible learning rules which would be common to all cortex” is really the name of the game, I’d rather keep discussing, and hopefully manage to expose this more clearly for anyone to read.

If, to the contrary, there exists such fundamental objection to my approach… like, I don’t know, say… “same has been tried repeatedly and reached a dead-end quickly” then I’d rather know about it, discussing with you.



I’ve been lurking here, and I know how hard it is to communicate complicated ideas over text like this. It always seems like arguing, usually it is not. In this case, all parties are being quite civil, polite, and engaging. You folks are good eggs.


Let’s go back to the beginning of my intervention here: I believe any new efforts in the domain can be useful and insightful. Especially, when we are talking about something like self-formation, which is cool by itself!
I just tried to share my hypothesis, that starting with hard part of the problem can be paradoxically an easier way to find universal motives of the cortex functioning. I can be wrong, or you can be just lucky to find a right angle starting from a more distant point.
So, good luck and keep us posted!


Actually the idea that my endeavor sounds incomprehensible is worrying me.
I’ve thought about another approach to explain it, maybe more amenable to an introduction for readers with a ML background.

Multilayered perceptrons as they are known for decades, already proved quite successful at, say, categorizing handwritten digits (eg, trained against MNIST). Their internal layers were not showing any abstractions that we’d believe were very clever, though… then people came up with the idea of adding convolutions to filter the input towards the start of the network. Performance of these more modern ANN was much better, and the intermediate steps were driven towards more intelligible things.

That sounded pretty good, especially given the fact that all those messy biological V1 edge detectors actually seem like they’re doing convolution themselves. So we’re feeling very legitimate, using this as a starting ground and developing better models on top of that, for simulating or understanding whole ‘visual pathways’ and coming up with computer vision algorithms… or even to give birth to our global-scale theories of brains.

I was kinda para-dropped into the field at this state of affairs. My point is that, simply being happy with those pre-made convolutions, to the point that we’re feeling “oh, no… not another edge detector” seems like forgetting a little too soon that V1 is cortex, and that it had to decide (quite early in life) to form those kinds of convolutions by itself, only crunching on retina/LGN output (and unsupervised at that).

Well, I’m settled on understanding precisely how.

I believe the retinotropic locality of the cortical patches processing different parts of the visual field help with that a lot, so I’ve tried to understand topological relationships in some detail first… But there is probably more to it than that.
And, both topology and any bit of “more to it” we can understand here, will likely prove relevant for understanding any other area of cortex.
Which in turn will likely prove relevant for global theories.


Beside our previous discussion: could you direct me to any proves, that our cortex makes anything like spatial convolution? I regularly hear the claim that CNN was inspired by our brain, but never encountered any related neurobiological clues.