Project : Full-layer V1 using HTM insights

Hi everyone,

I start this as a companion topic for my bewilderment over there :

What turns out in fact, is that this idea is anything but new (Yeah, well). If one takes a look at scholarpedia for example, V1 is singled out as a “prime workbench” for such studies, although I still didn’t manage to gather much information about those.

Not breaking the habit of posting too quikly, what it looks like at first glance though, is that people studying that seem for the most part quite versed in algebra, gaussian filters, and “quite standard” ANN models.

I’m not.

But my desire here is to take more of an ad hoc approach, leveraging HTM insights about the dichotomy between proximal/distal dendrites, taking topological constraints into account, maybe a salt of thalamus influence, put all this into arrays of “layers”, and see how it could self-organize towards those well-documented, expected responses. If it does not, adding fierce topological inhibition heatmaps into the mix. Or wave phenomenons. Or a deeper hierarchy. Whatever. Well, just throw at it all hacker-minded inspiration we may have, until it does start to organize like V1.

I may not be able to put godly amounts of time into it, but I can code some C++ at the very least, and possibly a bit of GPU shader if need be.
Anybody interested ?

3 Likes

Actually, I think I’ve been doing something like this for a bit, though it doesn’t learn yet, and it’s only up to basic end stop cells. (It’s also a bit slow, since I decided to try getting multiple scales out, and that’s still in progress.)

Feel free to check this code out: https://github.com/PyGPAI/PyGPNeural

2 Likes

Nice stuff @SimLeek :slight_smile:

I’m a very slow Python decoder, would you care to try to explain your approach a little here ?

I’ve read your post on Who is currently Building HTM Systems? as well as the GitHub readme, however I cannot say I’m confident to have a grasp on it all.
From what I understood, PyGPRetina simulates the output of retinal cells when presented visual input (such as a video), then you send it to a hardcoded V1 sim ?

If that is the case, I’m very, very, interested by the retina stuff.
Maybe I’d see a little less of a match for the hardcoded V1, although you may have developed insights here that I currently lack. To answer your concern on the other post, I’m quite inclined to believe that indeed V1 functionality can be faithfully engineered and hardcoded, but as you certainly understood by now, my whole drive here would be to see how it could self-organize from a signal such as your retinal output.

(you also seem able to code GPGPU, while my experience in shaders is mostly restricted to graphic rendering. So, many thanks for saying hi here ^^)

Regards,
Guillaume

I believe this is as good a place as any to link towards that MIT course about vision
Available in video, starting here :

I’ve currently only “attended” to 2, 3 and 4… long way to go through the 13 (I believe) sessions, but I’ll continue watching those attentively, of course. Awesome stuff so far.

4 Likes

That’s pretty much it. As for how it does that, it’s currently being modified. It used to output more of a color if the center color was brighter than its surroundings, or vice versa, and I could change the size of the surroundings. Now though, I’m changing it so I get out multiple scales of images.

I’m not sure if the retina actually gives out multiple scales of images, or if it’s done somewhere in the V1, but eventually the V1’s equivalent to the entorhinal cortex needs that so it can recognize scale invariant features. (Before scale changes, if you were to walk away from the screen, you might notice some of the end-stop colors going away. In multi-scale, they’re still there, but in a larger image.)

Honestly though, I should probably give a few presentations on all the stuff I’ve done for that repository, once my work lets up.

Of course. If you need any gpgpu stuff, I might be able to lend you a hand. If not, I should at least be able to give you advice.

Thanks,
Josh

1 Like

I see, so do you have, like… a per pixel-on-video bunch of RGCs ? With most types combinations (like ON/midget…) for each ? Or more random ? A square neighborhood or hex ? Fovea-like input with Cones only or Cones&Rods in some proportion ?

I believe your RGC then produces a scalar value ? Or would it fit somewhere in an all-or-nothing HTM way ?

Hm, I believe it does not. Quite straight direct “sensor” stuff up to there, and from quite straight optics.

Would be great, but no rush if you’re onto something else

sounds sweet :wink:

1 Like

Alright, I’m gonna try to answer all of those questions…

Pretty much. The shaders each run their code per pixel.

I combined the on and off center cells’ outputs to make things easier for myself. I set the entire image at 127, then added to that number for on-center, and subtracted for off-center.

I initially thought the black and white ones were more midget cells, and the color ones were parasol. Now though, I’m starting to think almost all RGCs give some color output. Under that assumption, the code would pretty much only work with midget cells now. (I think this would end up being too much for the brain, as the bundle of nerves required to transport this would be bigger than the retina and would displace more brain cells.)

The smallest ‘surround’ checks a square perimeter, but that’s because pixels are arranged that way on computers. The larger checks use a normal distribution to determine where their input is coming from, with the normal centered on some locations about a square.

So, still not completely accurate to biology, but fast, and better than a square.

At first I used black and white to mimic rods, however rods are physical things, and webcams don’t have them. Rods actually take in a different color best when compared to all the other cones, so I think it might be better to invest in an infrared camera if I’m going to work on rods. It may not be the same wavelength, but it’s more useful at night, and I think would have a lot of the same considerations as working with biological rod cells, and is in my budget.

Yes, it produces a scalar value. The v1 code is what’s needed to translate that into something usable in HTM.

Well, now I’m disagreeing with myself. With the different sizes of RGCs, such as midget, parasol, bistratified cells, etc., it seems like it does produce multiple scales of images. Though, those cells sometimes go to different areas of the brain depending on their size.

Thanks. I should be a bit more free around May, so maybe then.

Edit: Ah, this looks like it explains the different RGCs fairly well: Parallel Processing Strategies of the Primate Visual System - PMC

1 Like

Hey @SimLeek, thank you for trying to answer in such detail :wink: Sorry for that heavy carpeting of questions at start. I wanted to convey the feeling that I was interested by “this kind” of detail.

From your answers however, I feel I do not have the required expertise to address each of those concerns. We could for example discuss about that parasol thing and your multiple scale proposition (I believe parasol has other constrast and fire sustainment duration properties, rather than a direct influence on invariant representations). But ultimately I’m not currently confident enough in any of this myself to argue much about it.

What is already almost certain in my view, however, is that all those questions about what does what in the retina (and hopefully in geniculate as well) are by now well documented, somewhere, and that we’ll be able to do a very biologically convincing simulation of it. Which is exactly what you seem to have been working towards, and that’s really great.

I believe that, for the retina-sim purpose of being an input to a heavy V1 study, if you’re interested in this approach, I may advise an orientation : you spoke of optimization, and for example using video layout directly because it was fast and convenient. I suspect that a fully layered V1 sim together with required bits of hierarchy, will be really computationally expensive (and that’s even before assuming wave-like implementations). So I can safely bet that, were you interested in adding more complexity to retina, going full hex grid or even adding a stochastic perturbation to it, or density variation further from fovea center… and generally allowing the retina shader quite expensive operations, I’d be fine with it, as it should have minimal impact on the overall clock.

Take care,
Guillaume

I guess I shall also try to layout a few ideas.

  1. Input : a high quality visual source
  • Either high res video showing almost no compression artifact, or same-quality computer rendering.
    Full suite should ultimately allow one or the other
  • First try could accomodate of RGB, 8b per channel, but provision for allowing/requiring HDR input.
    We could make the case that the unmodeled Iris would have brought incoming illumination in manageable range, and hypothetize that temporary sun dazzling (or conversely delay before low luminance adaptation) is unlikely to be a key factor to V1 learning. Ultimately though, it may be the case that the still high range of human response cannot be conveyed well enough on 8bit for our final purpose.
  • Provision for a short term upgrade of the model to binocular vision
    As stereoscopic is likely to play a great part in V1 development, and input from both eyes are merged in a very structured way on V1. We thus can’t do much biologically accurate mapping from retina to cortex without taking both eyes into account.
  1. First transform sampling : input “texture” from coordinates in the receptor map
  • With center-point parameter
    As we’ll ultimately need to support and compute saccade positions, with foveal points at different positions along time within the larger, fixed-scale input.
  • Having devised a receptor map layout allowing density variation away from center
    however, there is a possibility that we’d already get interesting results modelling only a smaller part of whole V1, around fovea as my best bet. So the density variation in this case is not necessarily huge, and we could possibly have only cones to indeed consider.
  1. First transform operation : RGB to scalar excitement of receptor
  • Trying to derive a typical spectra of wavelength from the RGB input, transforming that to response intensity for the wavelength dependent sensors
    I believe you do some of that already, @SimLeek ?
    Maybe no need for modelling the rods specifics if we’re quite centered on fovea region => Only the 3 cones could be fine
  1. Second transform : sampling receptor map from coordinates in the RGC map, and filtering like RGC
  • Taking into account the typical responses and extent of different types of RGC
    Your work seems well developped down that path, again, @SimLeek
  1. Third transform : geniculate interactions ?
  • Need to study further about this
  • Have a say in saccades ?
  1. Mapping to V1
  • Need to precisely study stereoscopic mapping to V1
  • Need to precisely study what targets which layers
  1. Cortical stuff
  • Need to precisely study pretty much everything at this point
  • First try with V1, V2
    With hopefully a connectivity model for each layer and a layerwise clocked simulation
    Provision for trying out cell-per-cell wave propagations at some point
    Provision for higher visual areas, I believe they’ll not take long to stand out as being required.
  • A SMI somewhere, driving the saccades for the eyes
  • Using HTM model for pyramidal neurons
    Provision for allowing different excitation levels
    with maybe more of an impact on local inhibition than on its transmission ?

    Provision for allowing coincident apical and basal, distal activation to fire on its own ?
  • Hebbian learning fusing together concepts of fast, “daytime” memory, with post sleep synaptic reinforcement
  • Non-hebbian, slower organization following… a “strive” for each cell to be a pertinent data analyst
    Leading to competition between them in the context of fierce local inhibition imposed by other, strongly firing neighbouring cells
    Maybe operating around the day or week scale ?
    Each almost-never firing cell gets a shot at growing new segments.
    Each almost-never firing or too-often firing cell gets a shot at fully switching synapse connectivity or even whole segments topology ??

I could maybe develop things further ; this is coarsely brushed, but I feel I need to start laying things out.
Any comment is of course appreciated.

Please note that most of that stuff is subject to change as I (we?) learn more about the field, and most of the ideas about the cortical layout will have to evolve, get refined, or get linked with other brain systems until we see something happening.

1 Like

It took me over three days to make it through them, it was a long tiring video marathon weekend. I’m sure glad I endured it. You will too!

I ended up testing myself by adding what was shown in lesson 12 without surrounds, which still has useful signals:

'Spatially Specific Inhibition (below) is used to extract directional information from the three RGB colors.
'See page 11 of pdf in Notes folder, and online at: https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-04-sensory-systems-fall-2013/lecture-notes/MIT9_04F13_Vis12.pdf
'  also MIT OpenCourseWare, 12. Motion perception and pursuit eye movements https://www.youtube.com/watch?v=oPb9AWMN2fY&feature=youtu.be&t=1082
'Spikes are generated when light moves in a direction that does not inhibit next GanglionIn connection.
'Bipolar cells still function when connection is inhibited, spikes do not make it across junction to ganglion.
'
'Receptor  (0)      (1)      (2)      (3) RetRcp(E,N,C) for each Eye, 0=Left Eye, 1=Right Eye
'N=0 TO 3   |        |        |        |
'           |->      |->      |->      |-> To ON Bipolar circuit for
'OFF        |        |        |        |   GangOut(,,0,1) and GangOut(,,1,1)
'Bipolars  (0)      (1)      (2)      (3)
'           |        |        |        |____________________________________
'           |        |        |_______ | __________________________         |
'           |        |_______ | ______ | _________________         |        |
'           |_______ | ______ | ______ | ________         |        |        |
'           |        |        |        |         |        |        |        |
'RetInh-    |___(0)  |___(1)  |___(2)  |         |  (0)___|  (1)___|  (2)___| <- RetInh(E,N,C,1,0)
'(,N,,00) __|__   \__|__   \__|__   \__|__     __|__/   __|__/   __|__/   __|__  RetInh(E,N,C,LfRt,OffOn)
'         _____    _____    _____    _____     _____    _____    _____    _____
'           |        |        |        |         |        |        |        |
'           |________|________|________|         |________|________|________|
'           GangIn(E,N,C,0,0)          |         GangIn(E,N,C,1,0)          |
'                                      |                                    |
'                     LEFT Motion OFF (0)                 RIGHT Motion OFF (1)
'             <--------------         _|_           -------------->        _|_
'                               GangOut(E,C,0,0)                     GangOut(E,C,1,0)
'                               GangOut(E,C,LfRt,OffOn)              GangOut(E,C,LfRt,OffOn)
'
Private Sub CalcRGBSignals()
Dim E As Long
Dim C As Long
Dim OnOff As Long
Dim LfRt As Long
Dim N As Long
 For C = 0 To 2         'For each of 3 colors.
  For E = 0 To 1        'Left Eye=0, Right Eye=1
'From https://www.kenrico.com/media/bembook/28/28.htm
'Early Receptor Potential (ERP) generated by changes in photopigment molecules in photoreceptors due to light action.
'Usually causes positive R1 signal followed by negative R2 signal, followed after around 2 ms by a late receptor potential (LRP),
'which (combined with the remainder of the ERP) forms the main constituent of the a-wave,
'a corneo-negative waveform (see Figure 28.6). Both rods and cones contribute to the a-wave.
'Due to computer using RGB colors and food is always lit: critter only has cones, and does not need rods.
    For N = 0 To EyeFacetTo                         'For each Eye facet. EyeFacetTo is total-1, count starts at 0.
         RetRcpWas(E, N, C) = RetRcp(E, N, C)       'Retinal Receptor reading Was, before below change.
'Calculate amount of light now reaching photoreceptor.
         RetRcp(E, N, C) = RetRGB(E, N, C) / 255    'Fractional Retinal Receptor value from 0 to 1.
'Use the briefly stored Was value to detect intensity change since previous timestep.
         RetRcpCmp(E, N, C) = RetRcp(E, N, C) - RetRcpWas(E, N, C)   'Compare what value Was to new.
'Photoreceptor is a negative going (A-Wave) photomultiplier with an energy gain of some 10^5 times.
         RetAWave(E, N, C) = -RetRcpCmp(E, N, C) * (10 ^ 5)
'It is below assumed at least one full unit of the 10^5 is required to change the state of a bipolar cell.
'OFF bipolars are sign conserving. OFF cells respond to light decrease, ON respond to increase.
      If RetAWave(E, N, C) < 1 Then RetOnOff(E, N, C, 0) = 0 Else RetOnOff(E, N, C, 0) = 1
'ON bipolars are sign inverting. ON receptor activation leads to closing of channels, causing hyperpolarization.
      If RetAWave(E, N, C) > -1 Then RetOnOff(E, N, C, 1) = 0 Else RetOnOff(E, N, C, 1) = 1
    Next N
'Ganglion cells form fields by passing action potential spikes from multiple ON or OFF cells, in parallel.
    For OnOff = 0 To 1
      For LfRt = 0 To 1
              RetGangOut(E, C, LfRt, OnOff) = 0
        For N = 0 To EyeFacetTo
'Each ON and OFF cell here connects to two Ganglian cells, one for left and one for right.
          If RetOnOffInhbt(E, N, C, LfRt, OnOff) = 1 Then
             RetGangIn(E, N, C, LfRt, OnOff) = 0
          Else
             RetGangIn(E, N, C, LfRt, OnOff) = RetOnOff(E, N, C, OnOff)
          End If
'If any of the Retinal Ganglion Inputs are one then Ganglion Output is one.
          If RetGangIn(E, N, C, LfRt, OnOff) > 0 Then
             RetGangOut(E, C, LfRt, OnOff) = 1
          End If
        Next N
'Send action potentials to brain as Sensory to include in episodic memory.
              SnsVal(E, RetMotionN(C, LfRt, OnOff)) = RetGangOut(E, C, LfRt, OnOff)
      Next LfRt
'Inhibitory Interneurons disable a ganglion input next to it, for one timestep duration.
'Now that it was used in the above step it's state is changed, for the next time around.
      For N = 0 To EyeFacetTo
        If N < EyeFacetTo Then            'One less inhibitory interneuron than receptors.
             RetOnOffInhbt(E, N + 1, C, 0, OnOff) = RetOnOff(E, N, C, OnOff)
        End If
        If N > 0 Then
             RetOnOffInhbt(E, N - 1, C, 1, OnOff) = RetOnOff(E, N, C, OnOff)
        End If
      Next N
    Next OnOff
  Next E
 Next C
End Sub

It’s in the newest Lab, but not yet needed. Signals are shown behind the eye, where retina is located. As long as there are not too many visual sensors it’s a convenient place to show a readout for the retinal signals. It’s then harder to say whether it’s meant to be a compound eye or not.

I have been working towards a function related development environment where readings of primary variables for various parts of the body show in its shape and color, and in the case of stomach contents also its size. There are then circular paws/feet/wheels/wings for the upper level left/right and forward/reverse that can control any motor system that changes color according to speed. The right shape is then a circle, and getting fancy with articulated legs and such is all show and no go.

I’m always trying to simplify the model to basic parts and stay in 2D so that the third axis is something that can later by added by using routine math rules for going from 2D to 3D, something a mathematician with little knowledge of how a brain works could then likely accomplish. This allows focusing on 2D drawings shown in the video, where only one per direction is needed instead of whole array of them at various rotations and related code.

After considering how helpful it would be to human society to make how the deepest darkest mysteries of how our brain works easy for even a child to understand, and the invisible moving shock zone (call it a “pit”) arena environment that will fry all zombie bots including those that fake it using too repeatable to be the “real thing” and never be capable to ever really “love” someone we get Bawitdaba.

I’m not actually using direct video output. I’m actually updating the retina because it wasn’t giving enough data for the V1. It should be more biologically accurate now, though more computationally intensive.

I won’t be using a hex grid, unless someone can give me a hex camera, however, I’ll probably add proxal/distal excitations/inhibitions to the v1 end-stop output, which would lead to a hex grid in certain situations.

I do have the ability to put a lot of stochastic systems back in my retina simulation, and they’ve worked really well, so I most likely will. I have no problem with stochastic operations. I think with the way I have things set up, a rand() or norm(mean, variance) call should take less work than a sin() or cos() call. Actually, I don’t think I’ve even used a single sin() or cos() call yet, so the system should be really fast.

I have a dash cam on my car now, so I can do tests with that. That should satisfy the high-quality cam recommendation. I didn’t know about HDR though. I might want to look into that sometime.

While HDR and binocular vision is interesting, I’m going to wait until other stuff is reasonable finished before I throw in harder stuff. While there is some merit to developing with harder specifications if those are what you’re solely developing for in the end, it isn’t really in this case. The retina code can and should work fine with a single camera, especially since most people don’t have two cameras on their computers. And I’d argue that we can do some fairly biologically accurate mapping from retina to cortex with a single eye, because people don’t completely stop understanding vision after losing an eye (which is what stereoscopic point cloud generators would do).

Do you mean like a lens transformation, like a barrel transform? I was thinking of just putting a lens on my camera.

Yeah, I believe I already do this.

Yup, already doing. However, I’m trying to improve this with the multi-scale stuff. I’ll tell you when I commit that.

You mean the LGN? I’ve looked into this. It seems like the LGN has a much larger input from V1, so to me it seems like an advanced… predictive video smoothing/sharpening/etc. algorithm. Lesions in the LGN have produced schizophrenia in some cases, so it might be that upper layers of the V1 contribute to it.

While it’s really interesting, I’m currently able to get end-stop cells to show up decently on a standard webcam, so I don’t need any predictive filtering yet.

Actually, yeah, I should know which layers of the V1 are monocular and which are stereoscopic…

I’m not going to promise a v1 system that accurately simulates neurons, as that likely wouldn’t be real time. However, I’ll try to make everything as open-source and easily modifiable as possible. I’ll definitely need feedback on that ‘easily modifiable’ part though.

I’m working on a gyroscopic camera for saccades though. Once I get my 3D printer on Sunday, I think I’ll try printing out the newest model and seeing if I can balance it better.

Oh, I definitely need to go through all that. I don’t think my model has Spatially Specific Inhibition yet.

1 Like

I’m inclined to believe it ^^ although with two kids having by now a well working V1, A1 and M1 and craving to make use of them, I doubt I’d be able to spend 24/24 of my weekend watching it

With code and all, hey, you seem to have pondered on that stuff for a while ! This is really nice. I haven’t learned about that retinal inhibition mechanism at this point. Will have to wait till chap 12 ?

Seems wise, and a sensible approach for your top-down spatial framework.
I can’t cut it for studying V1 responses to edge orientation though.

@SimLeek. I feel I’d need those presentations at some point, to really understand where you’re at and where you’re heading :wink:

A few quick answers :

  • HDR is in my mind, quite of a long term “possibly”. Reading your answers I feel like I got into much detail for the retina part “down the line”. Maybe I got carried away given the work you already presented. I believe what you have for retina could already do fine for what I have in mind.
  • Binocular however is something with a possible impact on the complexity of the mapping, so I’ll probably have it modeled early. It may also be the case (since originally recorded with stains instead of, dunno, fiber analysis ?) that cortex has input for both everywhere, and that the zebra pattern is emergent ? In which case that very pattern width would give insights into connections lengths, inhibition strengths… (all speculative here on my part. I should study that some more anyway).
  • Also, I understand your own requirements - fine with me. (And also your reserve that one-eye seems enough to “see” in daily life). For my part I’d try to stick to modeling cases which would look similar to those lab cases we have the most data to compare to.
  • Barrel transform : Wasn’t thinking in terms of a transform on the continuum per se. In my mind it was kind of a change of lattice. But yeah, now that you mention it I believe this could be envisioned as a barrel transform. I didn’t even consider the possibility of a physical lens.
  • Geniculate yes, I mean LGN. I’m quite happy if you have considered that already.
  • Cortical stuff : I understand your project has a goal of its own and you want to process stuff in real time. I’d be quite happy already if I could use the retinal part. My model for V1 would diverge from there on I believe. As for its openness and ease of use that would be great. Although at first I don’t know the extent by which I’d be able to understand, use, or plug to anything pythonny, so my own feedback on that would be misleading.
  • Have fun with that printer :wink: never tried any of those.

V1 needs to detect 3D edges, so yes not all 3 edge orientation axes are then in the picture. In both cases though there are still edges at a given X,Y orientation needed to be included to calculate the Z component that takes edges out of flatland where they exist as a sudden change in light intensity around its rotational axis. An example are the two edges of the cyan color feeders that become further apart as it gets closer, which are located at a given X,Y in space.

Cortical columns also form a 2D surface. It’s then possible to send waves that travel out from a center. Evidence I saw linked to from another (not sure where) topic in this forum indicated that for an animal like ourselves when terrain goes up as at the bottom of a cliff a 2D map is tilted to its angle.

Starting from V1 adds the complexity of going from an egocentric 3D world view to allocentric 2D map of the world at entirely the other end of the cortical sheet you’re working from. The purpose of my model is to give you a better idea of what needs to happen in between to provide coordinates to map what can then be tilted to match terrain. At that end it’s inherently 2D, but where you’re at right now it’s inherently as 3D as it gets. You sure are ambitious!

I’m thankful there are others working on that part of the problem, where I wish I could be of more help but as they say “I can only go so far”. At least have something to connect to, where our reptilian brain meets motors, then afterwards needs inhibition to be human but still at least trying to control motors on first impulse as would a primitive lizard. It’s no wonder our thoughts are so filled with primal ones that have to be “kept to the imagination” or else there can be a workplace harassment lawsuit or something.

I think you saw it here:

Oops. @Gary_Gaulin, seeing your answer I guess I misunderstood what you referred to as being two dimensional. Having seen your in-code diagram of a retina slice, and by virtue of the top-down view requiring same “slices” of retina I believe ? I thought you were talking about that. The (polar?) 1Dness of your receptor, the 2Dness of the retinal diagram, and/or the 2Dness of the worldview.

In light of this I was answering about 3Dness of my required corresponding retinal diagram. But now I take it you meant either the 3Dness of the environment and perception, or even my strange sticking to binocular.

Well, as for the binocular obsession, maybe it could be simplified away. I don’t know. In any case, my rationale behind it is best addressed by this :

For the 3Dness of the visual environment, well… it comes from both the current workflow devised by SimLeek, which is a physical camera, or my initial synopsis as the output of a 3D rendering. I believe it’s in reach of current 3D rendering techniques to give believable texture, illuminations, and contrast properties at the “edges” of rendered objects. I also have some experience in this kind of things so, time considerations aside, it would even be in “my” reach.
I could go with totally-abstract 2D drawing of squares and balls as done for example in that paper, and give it a shot.
But I have doubt it could work for my purposes, as in:

… And I hope babies whose vision was developmentally studied were still more familiar with the look of their carrycot, of their mother’s face, of their toys or of the pace of the housecat, that they were familiar with the test screens.

Because, here’s the catch : I’m not really after 3D as a test case. I’m after 3D as training.

In fact I’m not waiting after comparing directly what we sense in 3D, to what V1 outputs, or mess with edge transforms, trying to see which matches where in the environment.
I’m waiting after training a V1 model from as-common-as-possible input, so that if V1 model then self-organizes, from its exposition to realistic visual stimuli, as a lab-testable edge-detector (here from abstract, 2D edgy stuff) , we’d know the whole model for that little patch of cortex is on the right track.
So that is maybe even more ambitious…
However, I’m not necessary imagining this as a one-man attempt :

Regards,
Guillaume

@SimLeek I am very interested in knowing which information from your simulated retina will be input directly onto HTM? Basically, retina provides magnocellular and parvocellular for further processing at higher levels? Which one do you put into HTM?

Does it? I don’t think it does. I believe the higher regions of the visual cortex work with depth better.

Here’s a paper investigating how depth emerges in the visual cortex: http://wyblelab.com/docs/jc/Finlayson_2017_NeuroImage_(1).pdf

Figure three shows the v1, and areas around it, are mostly 2D.

Actually, the information would have to go through V1 processing too before its suitable for HTM.

As for magnocellular and parvocellular, I believe I’m mostly focusing on parvocellular now, as the smaller cells are good for edge and feature detection rather than larger features of the whole image.

2 Likes

It seemed like you right away noticed the “3Dness of the environment and perception” that V1 senses is different from the downstream “2Dness of the worldview” but I was not fully sure. In either case I needed to explain more about the model I’m developing. To help everyone out I went into additional detail about the V1 related things in your thread, instead of mine for the other end of the cortical sheet.

I can add that while trying different signaling rules I have in the past formed what looked like ocular dominance columns, exactly two places wide, where each was the opposite state of the other(s). I did not experiment much with it but was like a (cortical signal only no retinal input) 2D environment version of V1 where for a 3D environment a range of angles would exist in between the two opposite states, extremes. Signal wise it was at least a stable signal geometry for a blurry forest with way more connections than necessary to prune down to. HTM has that type of sparsing process in it. I now wonder what kind of traveling waves might have been produced by throwing in some retinal signals, but I doubt I saved a copy. It is though something worth mentioning I thought of as a possible clue for modeling V1 traveling waves. In that case you would look for rules to sort out the chaos going on in a newborn V1 and from I recall that kind of signal jitter was included to force the network to settle to the most stable geometry, instead of whatever it right away would settle to then stay that way.

The model I now have uses the rules for mapping and navigation, but there are other ways to use the rules than that. Changing the rules that each place uses in a given area of the brain may work for modeling the entire cortical sheet. I sense that in the best case scenario there will be much like the wheel example a reinventing of HTM theory. Matt’s new visual aid should have the same or very similar variables to work from, and be as much or more useful than before.

Best way I know of to get a sense of the network behavior is try everything possible, including signal thrust/radiation pattern to favor pairing or other geometry, see what happens. Starting with a V1 model for a 2D environment instead of 3D will greatly reduce the possibilities, while still containing edges of lines. In flatland only one point of the edge line is seen unless exactly across the 2D plane in which case it’s a like wall of light at all points along it through that portion of its world. It’s similar to a “slice” but has 0 thickness. Two eyes with no 3D intermediate angles should only need the seen before 2 wide ocular dominance column structure. When eyes see nothing the network goes quiet. When something brightly moves by the (by signaling like an Attractor) make waves that travel at least along the length of each dominance column to V2, time of arrival can be expected to influence what at that point ends up drawn out as a traveling wave where information from both eyes are combined.

Starting off with a stable pattern makes like the surface of a pond and what seem like canals feeding waves into one. It’s the sort of information stream HTM cells were made for, where in this case the straw does not have to move.

I certainly could have been more specific. In this case the retinal detectors start the process by ON/OFF surround fields detecting color contrasting edge signals that V1 further extracts information from, so this might become a complicated one to find the perfect words to describe. I’ll just agree that I could have done better.

The paper on depth emergence is new to me. I now need to know whether the information is being extracted from traveling waves moving across its surface, while signals highlighted in a given area of the inflated brains in Figure 2 are the more powerful extracted signals that instead go downward/inward for enough of a distance and energy use to show up on fMRI. Your opinion?