Project : Full-layer V1 using HTM insights

V1 needs to detect 3D edges, so yes not all 3 edge orientation axes are then in the picture. In both cases though there are still edges at a given X,Y orientation needed to be included to calculate the Z component that takes edges out of flatland where they exist as a sudden change in light intensity around its rotational axis. An example are the two edges of the cyan color feeders that become further apart as it gets closer, which are located at a given X,Y in space.

Cortical columns also form a 2D surface. It’s then possible to send waves that travel out from a center. Evidence I saw linked to from another (not sure where) topic in this forum indicated that for an animal like ourselves when terrain goes up as at the bottom of a cliff a 2D map is tilted to its angle.

Starting from V1 adds the complexity of going from an egocentric 3D world view to allocentric 2D map of the world at entirely the other end of the cortical sheet you’re working from. The purpose of my model is to give you a better idea of what needs to happen in between to provide coordinates to map what can then be tilted to match terrain. At that end it’s inherently 2D, but where you’re at right now it’s inherently as 3D as it gets. You sure are ambitious!

I’m thankful there are others working on that part of the problem, where I wish I could be of more help but as they say “I can only go so far”. At least have something to connect to, where our reptilian brain meets motors, then afterwards needs inhibition to be human but still at least trying to control motors on first impulse as would a primitive lizard. It’s no wonder our thoughts are so filled with primal ones that have to be “kept to the imagination” or else there can be a workplace harassment lawsuit or something.

I think you saw it here:

Oops. @Gary_Gaulin, seeing your answer I guess I misunderstood what you referred to as being two dimensional. Having seen your in-code diagram of a retina slice, and by virtue of the top-down view requiring same “slices” of retina I believe ? I thought you were talking about that. The (polar?) 1Dness of your receptor, the 2Dness of the retinal diagram, and/or the 2Dness of the worldview.

In light of this I was answering about 3Dness of my required corresponding retinal diagram. But now I take it you meant either the 3Dness of the environment and perception, or even my strange sticking to binocular.

Well, as for the binocular obsession, maybe it could be simplified away. I don’t know. In any case, my rationale behind it is best addressed by this :

For the 3Dness of the visual environment, well… it comes from both the current workflow devised by SimLeek, which is a physical camera, or my initial synopsis as the output of a 3D rendering. I believe it’s in reach of current 3D rendering techniques to give believable texture, illuminations, and contrast properties at the “edges” of rendered objects. I also have some experience in this kind of things so, time considerations aside, it would even be in “my” reach.
I could go with totally-abstract 2D drawing of squares and balls as done for example in that paper, and give it a shot.
But I have doubt it could work for my purposes, as in:

… And I hope babies whose vision was developmentally studied were still more familiar with the look of their carrycot, of their mother’s face, of their toys or of the pace of the housecat, that they were familiar with the test screens.

Because, here’s the catch : I’m not really after 3D as a test case. I’m after 3D as training.

In fact I’m not waiting after comparing directly what we sense in 3D, to what V1 outputs, or mess with edge transforms, trying to see which matches where in the environment.
I’m waiting after training a V1 model from as-common-as-possible input, so that if V1 model then self-organizes, from its exposition to realistic visual stimuli, as a lab-testable edge-detector (here from abstract, 2D edgy stuff) , we’d know the whole model for that little patch of cortex is on the right track.
So that is maybe even more ambitious…
However, I’m not necessary imagining this as a one-man attempt :


@SimLeek I am very interested in knowing which information from your simulated retina will be input directly onto HTM? Basically, retina provides magnocellular and parvocellular for further processing at higher levels? Which one do you put into HTM?

Does it? I don’t think it does. I believe the higher regions of the visual cortex work with depth better.

Here’s a paper investigating how depth emerges in the visual cortex:

Figure three shows the v1, and areas around it, are mostly 2D.

Actually, the information would have to go through V1 processing too before its suitable for HTM.

As for magnocellular and parvocellular, I believe I’m mostly focusing on parvocellular now, as the smaller cells are good for edge and feature detection rather than larger features of the whole image.


It seemed like you right away noticed the “3Dness of the environment and perception” that V1 senses is different from the downstream “2Dness of the worldview” but I was not fully sure. In either case I needed to explain more about the model I’m developing. To help everyone out I went into additional detail about the V1 related things in your thread, instead of mine for the other end of the cortical sheet.

I can add that while trying different signaling rules I have in the past formed what looked like ocular dominance columns, exactly two places wide, where each was the opposite state of the other(s). I did not experiment much with it but was like a (cortical signal only no retinal input) 2D environment version of V1 where for a 3D environment a range of angles would exist in between the two opposite states, extremes. Signal wise it was at least a stable signal geometry for a blurry forest with way more connections than necessary to prune down to. HTM has that type of sparsing process in it. I now wonder what kind of traveling waves might have been produced by throwing in some retinal signals, but I doubt I saved a copy. It is though something worth mentioning I thought of as a possible clue for modeling V1 traveling waves. In that case you would look for rules to sort out the chaos going on in a newborn V1 and from I recall that kind of signal jitter was included to force the network to settle to the most stable geometry, instead of whatever it right away would settle to then stay that way.

The model I now have uses the rules for mapping and navigation, but there are other ways to use the rules than that. Changing the rules that each place uses in a given area of the brain may work for modeling the entire cortical sheet. I sense that in the best case scenario there will be much like the wheel example a reinventing of HTM theory. Matt’s new visual aid should have the same or very similar variables to work from, and be as much or more useful than before.

Best way I know of to get a sense of the network behavior is try everything possible, including signal thrust/radiation pattern to favor pairing or other geometry, see what happens. Starting with a V1 model for a 2D environment instead of 3D will greatly reduce the possibilities, while still containing edges of lines. In flatland only one point of the edge line is seen unless exactly across the 2D plane in which case it’s a like wall of light at all points along it through that portion of its world. It’s similar to a “slice” but has 0 thickness. Two eyes with no 3D intermediate angles should only need the seen before 2 wide ocular dominance column structure. When eyes see nothing the network goes quiet. When something brightly moves by the (by signaling like an Attractor) make waves that travel at least along the length of each dominance column to V2, time of arrival can be expected to influence what at that point ends up drawn out as a traveling wave where information from both eyes are combined.

Starting off with a stable pattern makes like the surface of a pond and what seem like canals feeding waves into one. It’s the sort of information stream HTM cells were made for, where in this case the straw does not have to move.

I certainly could have been more specific. In this case the retinal detectors start the process by ON/OFF surround fields detecting color contrasting edge signals that V1 further extracts information from, so this might become a complicated one to find the perfect words to describe. I’ll just agree that I could have done better.

The paper on depth emergence is new to me. I now need to know whether the information is being extracted from traveling waves moving across its surface, while signals highlighted in a given area of the inflated brains in Figure 2 are the more powerful extracted signals that instead go downward/inward for enough of a distance and energy use to show up on fMRI. Your opinion?

The David Marr Vision book; this has all been done before.
You owe it to yourselves to read it to keep from reinventing the wheel:


Yes, that’s the keeper. If it survives the test of time then that will have been a tremendous help understanding how a 2D map becomes applied to our 3D world. Thanks for adding the paper to this discussion too!

I’m not sure what to make of that one. It’s loaded with information but after scanning for what I in this case need to see accounted for by the word “wave” the only waves mentioned are in regards to sine wave examples. It’s from what I can see missing the kind of spatial reasoning I have been experimenting with and appears to assume a symbolic information processing system.

It may sound like I’m being overly demanding but if the virtual cortical sheet is not lighting up with wave action shown in more recent papers describing results of using new voltage sensitive agents and able to perform as expected in a two frame place avoidance test then it’s not a model that will impress modern neuroscience, therefore the search must go on.

That would be quite typical of me I believe. Jumping to an interpretation from a first photographic impression. Should have pondered about that some more, I’m sorry.
Yet I must say, I was quite impressed by the behavior of that lab rat you modeled, and so I’ve spent quite some time trying to decipher how you did it and what it was all about. In the end I’m not sure I’ve understood much of it. So, I’m still struggling to follow that approach. Please bear with my slow grasp on the matter. Adding to my confusion is that at times you can get quite metaphoric (eg, zombie bots), please consider english is not my mothertongue :wink:

I’ll try to give my understanding another shot, and answer to your post later in the day (And also to your reply on your oscillatory topic). At the moment I’m starting to read Marr’s paper. Thank you again for that gem, @Bitking.

The oscillation has always been in the background of most neuroscience.
It is only very recently that this has been understood to be part of a distinct wave pattern.

While that is important the older work where the oscillator features where a background item are still fully relevant.

I would be wary of forcing wave behavior over focusing on the local functions. I personally think that much of the wave behavior comes from Thalamo-cortico-thalamic connections.

1 Like

Been off for a while, finished reading David Marr’s Vision, watched several more from the MIT course, and looked at some papers.

That book, Vision, is where Marr develops on his proposition for 3 levels of analysis, which I had to punch somewhat before I was able to integrate it as an interesting and useful viewpoint, or more precisely, an interesting and useful method for expressing (and being aware of) different viewpoints when studying processes.

Carefully studying the visual system as he did, is precisely what I wish to avoid here. But his work on vision, derived from intriguing experiments in psychophysics, is definitely something to read. Most is concerned about how to solve several functional aspects of human vision, and his proposed framework for doing so, with lots related to the definition of a “primal sketch” and study of possible ways of sketching it, all the while solving for, eg., stereoscopy concerns, and amenable to higher level understandings such as adding an egocentric “depth” to the primal sketch (what he calls 2.5D) and inferring whole surfaces before switching to full-blown 3D semantics, which we maybe could call an allocentric representation here.

I’ve also finally dug out some of the work which was trying to explore at V1 dynamics in the same way as I intended. Found those two (related) papers Receptive Field and Feature Map Formation in the Primary Visual Cortex via Hebbian Learning with Inhibitory Feedback and The Dynamics of Image Processing by Feature Maps in the Primary Visual Cortex. Haven’t yet read every detail in those, but there’s already a few things standing out :

  • What’s encouraging, is that the from-scratch formation of orientation selectivity in V1 cells seems quite doable, from simple visual stimuli, and in reach of << quite standard ANN models >>.
  • As a downside, however, the from-scratch formation of orientation selectivity in V1 cells seems quite doable, from simple visual stimuli, and in reach of << quite standard ANN models >>.

So… what to do ? maybe after I learn more about it all, there would be other well-known V1 features which are not captured by these, or shall I look at another approach and try to model concerns more related to SMI or something, like saccades ? or explore more of a hierarchy, I don’t know really.

1 Like

On the subject of oscillation I recommend the book
Rhythms of the Brain by Gyorgy Buzsaki
Oxford Press 2006

How about a model like this for personal computers?

It would be nice to compare notes with the NEST group. At 2:40 in this video is a researcher modeling the visual cortex of a monkey:

My thoughts are to keep processing time to a minimum by using the usual 2D shock zone environment, which may be tilted to match a 3D terrain. This seems to be closest to how our brain works at the 2D cortical sheet network level. In either case we have many questions.

I would like to invite a guest neuroscientist or two to explain how their model works. We can all go from there. Your thoughts?

Since all “scientific theory” is tentative: whatever as a whole develops in the Numenta forum is still “HTM theory”. The guests would be working on supercomputer sized neuroscientific models that ultimately have to get into the finest of neural detail, which is not the same as HTM theory where there is the added challenge of modeling a whole cortical sheet inside our desktop sized computers. There is no competition that I know of to worry about.

Seems nice !

I’d definitely see the more exchange between such communities the better, but I’m Mr Nobody here. Since their project has been maintained for years, I believe Numenta is already aware it exists.

They claim to support a large number of neuron models, so I guess HTM is amenable to NEST, but NuPIC itself is python based so I don’t know of the benefits it could bring to do such a port (other than obvious positive effect of idea exchange and goodwill synergies). But it seems like it is exactly what @kaikun referred to in the quote you posted, so if there are any volunteers for it already, this could be neat.

As for myself, I’ll try to have a look at some of these links, and the brain-to-robot stuff of the Neurobotics platform is exciting my curiosity, however I know I have great difficulty to function fluently as a lib-user. And all those reference to “Clouds”, “Multi-Device”, “Global”, “Customizable” keywords are for me a no-go : Even if I do realize it offers immense flexibility to a fair number of people, To my mind it indicates that this would turn any issue I could encounter while using the system into an OS-level-configuration issue, for which I have a deep, almost Pavlovian fear.
So, even though I still did not understand your code, Gary, I’m way more confident in my ability to follow your own bitfield-kind-of-reasoning than I’m confident in my ability to use anything like this ^^.


Here some notes from what I learned just reading about the problem:

  • The main difference is that NEST is working with spiking neural networks (SNNs), which makes all the simplification of HTM to binary computations much more computational complex.
  • Nevertheless HTM theory comes from neuroscience and the algorithms are designed in a way so they should work for SNNs but adaptation is non-trivial.
  • However in the past the SNNs have lacked scalability and essential features like sufficient plasticity-customization were missing. (This is a general problem in the scientific community and reason why they often switched away from SNNs in e.g. robotics, if they aren’t experimentalists)
  • On the other hand an implementation in NEST can also be compatible with neuromorphic hardware projects like SpiNNaker or BrainScales. This makes it really interesting, even though they add complexity they ultimately are designed to run in parallel, which is hard to archive for HTM on traditional computer/network architectures.

Kind regards

About SNN adaptation : what I got very succinctly from NEST framework presentation is a reference to the fact that they would not typically operate on weight-based models, but rely on much more topology-oriented connectivity lists. Topology is not what the canonical HTM library would do in default “global” mode but is still, in my view, one of the primary strong points of the HTM state of mind : caring first about a topology of dendritic tree and not bother too much about synaptic weight.

As for conversion of a spike “frequency” to a one-bit signal… After seeing one of their visuals which does greatly look like an SDR, could this be for HTM simply a matter of tuning the simulation clock to capture each spike at precise time t as an “on” bit ?

Another idea I had a few days ago to integrate a per-cell scalar information (such as spike frequency) as input to an HTM model was that it could avoid impacting the implementation of the excitatory pathways (ie, not driving higher excitation to postsynaptic pyramidal cells), but rather, to control the level, or extent, of surrounding inhibition.

[Edit]Oh sorry, @kaikun, I think I finally understand what is at stake here. Is it that SNN have progressive increase in depolarization level until they fire at threshold-crossing ? Yes this seems harder to reconcile with HTM indeed.

That’s how I do it. A bit staying on is the same thing as a neuron spiking as fast as it can. If it’s one spike per hundred time cycles there is a 1% duty cycle. This is useful for giving things priority. Whatever most often signals (such as hunger bit or other need) gets most acted upon.

I found a PDF version, and scanned through it a little:

That led me to this one that I now maybe half understand:

We really need a visual of these waves, or something.

1 Like

I’ve started to read the decades old classic Hubel&Wiesel book, thanks to another link provided by @bitking. I’m almost done with the MIT course, although I’ll probably watch some of them again. In parallel, I’ve scanned through more recent publications.

From all this, a general scheme for the cortical layout of the computer model I intend to write is starting to emerge. Some of it is trying to match with HTM model of excitatory neuron, and some will try additional specifications, mostly related to the handling of topology.

  • I propose that the axonal arbors of excitatory cells are everywhere a fixed and pre-wired setup, that has to be precisely accounted for in the model. They will be allowed to target one or a set of given laminaes in either same cortical sheet as their soma - in which case their source cell position will constrain the extent to which they can transmit information ; or into a distant cortical sheet (eg. axons from LGN to V1, or from V1 to V2) - in which case a topological transform function, however complex, shall be specifiable (V1 to V2 could map a fovea-to-periphery reversion, LGN to V1 should map alternating zebra patterns of ocular dominance, things like that).

  • One very important aspect of this axonal mapping is the amount of overlap, at any particular point of the receiving sheet, from other axons originating from other locations on same source topology. This will have a dramatic impact on both the computation capability of the receiving layer, and the model’s synaptic address footprint for each dendritic segment, see below.

  • In contrast to the fixation of axonal arbors for distinct populations of cells, each excitatory cell dendritic trees seems highly plastic and will be so in the model : It will be allowed any kind of growth or shrinkage, from a (specifiable) default-at-birth which I imagine randomly pre-initialized on the order of 500µm in diameter. In my view, this plasticity is such that I’ve grown the belief that what we distinguish as cell types and layers based on the overall shape of the dendritic arbours is in fact mostly input driven (1).

  • For both biological accuracy and handling the layout size of the model, I propose to push the subdivision of HTM model one step further : what HTM currently defines as segments, I will decompose into subsegments, each with a definite position (2) and taping from laminar-specific inputs (The proximal part operates on a similar layout than distal segments, albeit from a single position fixed by the position of the soma itself). The position of the subsegment will determine which precise input cell it is allowed to retrieve information from, that is, which cells from the source sheet whose axonal arbour overlap the subsegment’s position.

  • This organization above has two effects : First, it is able to more precisely capture the fact, cited in Jeff’s paper, that as few as 8 coincident inputs can trigger a NMDA spike, provided they are close to each other by ~40µm, which is comparable to the extent of my proposed subsegments. Second, by using the (precomputable) reverse-mapping of axon-source-sheet to axon-arbour-center-in-target-sheet, this allows us to consider the task of sampling from source cells whose axonal arbors are ‘overlapping’ at the segment’s position, to simply sampling an area of definite dimension (corresponding to the size of axonal arbors) around that corresponding center in the source sheet.

  • This also enables us to probably fix an upper limit to the address size of each synapse from a given source. With my current synopsis, I believe a total footprint of 16 bits per synapse (up to 12b or 13b address + 4b or 3b stochastic permanency value) is manageable. The remaining offset information to retrieve the actual source candidate are spread out over much coarser subsegment divisions, even coarser per-cell information, or static per-cell-population templates as well as axonal mapping templates.

  • As hinted by the paragraph on dentritic plasticity, a given cell may however decide to tap from several distinct sources - possibly on distinct laminars, but also each laminar (viewed as axonal arbor targets) may hold several axonal mappings (from different sources) together, as long as the sum of distinct synaptic possibilities for a subsegment does not shoot over the 4096 or 8192 candidates for the 12b or 13b address schemes for synapses on each subsegment. According to my first-shot computations, a cell whose dendrites are tapping from a single laminar containing axonal arbor of a single source could still have subsegments able to sense inputs from a lower area spanning around a circle of about 2mm (12b) or 3mm (13b) in diameter over the cortical sheet, even if bijectively mapped to it, which sounds promising for the “integration over wider area” functionality of, say, a V2 relative to a V1.

  • I’m still working on how to convert (or carry on and work with) the fundamentally per-cell-scalar output of LGN, to the proposed binary scheme of HTM. I believe I’ll try some different options at this point.

  • In the case of L4 “simple cells” in V1, In my view, NMDA will probably be a requirement to overshoot the increased threshold from vast correlated inhibition, instead of being a mechanism for prediction as in TM. Such strong input-intensity-correlated inhibition is indeed supposed to play a major role in ensuring that cells respond preferentially to orientation, no matter how faint, and not to intensity itself when less-well oriented. Some “Complex cells” achieving motion direction selectivity, however, could rely on a more TM-like scheme.

Now on a more general note, I’ll start with V1 cells around identified layer 4 and see how they manage to organize orientation selectivity by themselves, (that is, by input-driven progressive reorganization, aka. learning) also in a disposition hopefully resembling the biological one. If I do manage this, I’ll start to add layers one by one, and possibly try to quickly get to L5 with an almost direct mapping to superior colliculus and thus starting to mess with oculomotor stuff.

The simulation of V1 will be that of any primate of the Hominoidea branch (3), either extant, extinct or even imaginary. This is to ensure a V1 layout which will look closely like the human one (4), without necessarily be constrained to be all-alike on the matter of, eg, cortical size. Thus here, V1 could possibly be set as ~3cm across, which for a full scale simulation would bring the number of V1 microcolumns (30µm spaced) to a manageable number of around 1 million on one hemisphere (5).

(1) See the V1 oversizing and multi-lamination of L4 in the most commonly accepted lamination scheme, and eg, the competing Hässler views of what should be called L3 or L4. Also, there is evidence that spiny stellate cells on V1 L4 themselves start out as regular pyramidal cells and progressively take their distinctive form, most probably from the drive of their specific sensory input. I take all those as clues that these laminar concerns are probably developmental and simply dependent on their ambient signal contexts. What seem fixed to specific positions and laminations are the incoming axonal arbors, though, which I’ll try to reflect (and take advantage of) in the model.
(2) position of subsegment is readily identifable as a microcolumn index, and/or offset from microcolumnar indexed position of soma
(3) tail-less monkeys and apes, of which we are a species
(4) although retina and V1 of the quite close-to-us macaque, is already very similar, they seem to have an additional lamination in their V1 “inbox”. This is believed specific to the sister Cercopithecoidea branch to which they belong.
(5) “Manageable” here is to be taken with some reserve : it is likely that for current PC this is still vastly out of reach - and thus I’ll start with a parameter to specify a much smaller extent of the truly simulated area around fovea - but not so much out of reach as to be inimaginable for the near future, or with many computation clusters.

1 Like