Topologies in the brain and how to model them

I’m quite invested in the matter of conceiving diverse topological approaches and evaluating the computation requirements for those. As scott replied in that post you linked to, taking into account topology does not have to be very costly. I’d bet your “Another reason we don’t use topology much in our examples is because we focused on scalar anomaly detection applications for several years” is closer to the mark, than ‘performance’.

That and, until your simulations hit the scale of several dozens minicolumns on a side, there was no much point considering topology. Many biological axonal arbors would indeed potentially innervate all minicolumns in that kind of range (some may not, and we can also model topological effects at smaller scales, but that’s already wanting to depart from high, HTM-level abstractions).

My handful of different takes range from very complicated to quite straightforward, with different performance vs realism profiles. I’d try to make a post with these at some point, once I have something more concrete to show…

What I want to point out for now is that, most of the complexity of the very intricate schemes I came up with, stem from the additional self-imposed requirement of decreasing synaptic addresses weights. If you relax this requirement, an HTM implementation with 16b-worth of potential presynaptic cells is already ‘almost there’… and if considering only binary activation, not-so-costly after all.

First, ‘Euclidean distance’ computations are maybe overkill to start seeing interesting effects using topology. Second, those distance-anything need only be computed at the limited times where a synapse needs to get newly connected (mostly when growing a new distal segment, for the TM part). Third, the reason why I believe a 16b HTM is ‘almost there’ already is that 16b addressing on a 2D bitmap is homing on the required axonal ranges, given biologically realistic space between minicolumns, on a more or less thin subset of a layer:
Quite importantly, you (simply) need to divide the problem along the vertical axis (by vertical I mean, the one perpendicular to surface of cortical sheet). Some activated axons would thus have to be repeated to a handful of these thin layers, but once you’ve done that, you already have all the building blocks for a first, simple, straightforward, topological model.

After that, you may consider a position of distal segments relative to the soma position, but that’s adding bells and whistles, already. While in the bells and whistles realm, you can go quite nuts with it (as I like to do, almost as much as flooding @bitking with my crazy designs every other day) and consider more and more spatial divisions and stuff, but… it does not have to be very complex to start with.

4 Likes

What would be the major difference between HTM with topology model and highly distributed localized HTM models? For the latter, I’m referring to N completely independent HTM models with global topology but accepting selected areas of the input (localized) working with each other to as an ensemble of HTM models? I’d like to get your thoughts on this if possible.

In relation to ensembling HTM models, I’ve tried ensembling multiple HTM models using the HTM community’s MNIST example. It was poor attempt to simulate combining SP states as there’s no easy way to pause an SP and save its state. Anyway I was a bit surprised that it has improved the accuracy to up to 1%.

1 Like

Thanks for this discussion. It helps a lot to have a better understanding of HTM.

Could you explain more what you mean by “16b addressing on a 2D bitmap is homing on the required axonal ranges” ?

1 Like

Sure. Although maybe it should be made into another post.

I’ve grown some set of handful figures for having an intuition about the local connectivity requirements of any cortical patch simulation wanting to get realistic axonal and dendritic potentials.

Assuming 15 billion neurons total in cortex [1], and mean unfolded surface of 0.12m² for each hemisphere [2], this is 62500 cells per mm² on average, with some areas maybe twice as dense [3].

Using a regular square lattice, spaced 40µm, we have exactly 25x25 (625) tiles covering one square millimeter, and with that value, about 100 cells per position. This seems to fit nicely into the concept of a developmental minicolumn [3].

The extent of the basal dendritic arbors seems to be a sphere roughly 0.5mm in diameter, quite consistently across cell types, although some segments may try to extend further away if their local neighborhood is arbitrarily starved (cf. experiments with eye occlusions, etc.). Same figure for the 2D diameter of most apical tufts.
But we’re more concerned about the axonal matter, here : the extent of the widest axonal arbors seems to be a fair 3mm in diameter. I’m speaking here about localized lateral projections in same area… either intralayer, or, like, from L4 to L2/3.
Long-range connections among different areas in the hierarchy can of course blow the 1.5mm radius limit away… but long-range aren’t modeled the same way, anyway. And more importantly, if evaluating the diffusion of axonal arbors once they reached the distant area (eg. some arbors from Thalamus to V1), the 3mm diameter figure seems to appear again. So, once a long-range input bit gets sampled by some cells in an area, it can also be sampled among those kinds of potential radius.

So… In the simplest of the topological models I came up with, when you don’t take much care to precisely sort out those “max-range” arbors from more concentrated ones, and each of these axonal arbors is represented by a single point, you’re facing a horizontal sampling range of “potential” afferents, in a 3mm circle around (that’s simply reversing the viewpoint, from axon arbour ‘centers’ to synapses).

If you’re ready to stay in the abstract and clamp that 3mm figure somewhat, you have a blinding-fast computable, relative offset in a 64x64 minicolumns region around your cell, corresponding to a 2.56x2.56mm square.

That takes 12bits. You’ve 4 bits left on your 16b envelope to still chose a particular afferent to that minicolumn (=> 16 distinct afferents per minicolumn). This number is not biologically realistic if it should represent all afferent-centers per minicolumn to the whole sheet, though… and may not be sufficient to represent an HTM sim with deep minicolumns either. But if you’re ready to decompose that problem vertically, in any number of ‘thin-layers’ you need to accurately represent the overall sampling potential you require, then eveything is set:

Each cell or segment can easily be localized in 3D. Sampling 1 thin layer of 16 afferents per such cell or segment (or two of 8 each, allowing more diversity in potential ‘coincidence detection’), you’ve reached the (first) interesting mark of addressing all potential axons to a synapse, in a somewhat-biologically realistic manner, using 16 bits.

The only twist is that you’re sampling from 16 distinct 2D-maps per minicolumnar position, and that each input cell (such as, to take an HTM example, the t-1 from all cells from all minicolumns), may write to several of these, if overall input is spanning more than 16 per minicolumn. Typically 2 minimum for a 32 cells per minicolumn TM implementation (but probably more than this, distributed stochastically, to get a realistic spectrum for distal coincidence detectors).

[edit] As an additional bonus, that “input writing to one or several particular 2D-map(s)” pass may straightforwardly solve the issue of sorting “potentially connectable” from “out-of-potential” inputs for the proximal synapses of HTM during the Spatial Pooling phase. It gets more realistic and straightforward with finer topological schemes than this one, but it’s getting there.



[1] “There are between 14 and 16 billion neurons in the cerebral cortex”
[2] “When unfolded in the human, each hemispheric cortex has a total surface area of about 0.12 square metres”



[3] “Minicolumns comprise perhaps 80–120 neurons, except in the primate primary visual cortex (V1), where there are typically more than twice the number.”

2 Likes

Are you talking about quantization? If so, that would speed things up all over, not just for topology. It is something the research team thinks about a lot.

You’d be right! We never needed it, so we never put much effort into performance optimization. We will likely need it in the next round of research, so we welcome your insights. :smiley:

2 Likes

The topology needs to be local, but far-reaching. The current “minicolumn neighborhood” calculations can create topological projections of minicolumns through a sensory space based on a simple distance metric. This works fine for us to simulate inhibition and minicolumn competitions, but we need another topology so that the cortical columns can vote together. These two topologies are not the same thing. (I’m guessing the latter might be a small world style topology, with some local and some far-reaching connections.)

1 Like

I’m not sure of what you’re referring to, here. My own drawing board has quantized values for synaptic persistence (or strength), indeed… is that what you mean? Could it be that we’re pursuing same goal here? :slight_smile:

However, what I was saying about my bitcount requirement in the post above was purely to go smaller than 16b for pre-synaptic addresses themselves. Which would explain why I find the 16b mark “relaxed” :stuck_out_tongue:

Having thought about really nasty stuff in the low range of bit counts… currently revised my position and now aiming for 12b addresses, but I admit you need to get quite fancy with topology to reach that already (when considering large cortical sheets, that is… otherwise you don’t really need topology at all).
Dunno in the end if performance will be on my side. Will report about that when I reach the experimental phase. Hopefully soon enough

2 Likes

Yes, I think I’ve heard @subutai say he would like to get to 8 bits.

2 Likes

Wasn’t aware of that, but I salute that direction. Just go for it.

[edit] I won’t be best judge of the mathematical impact of this, from a purely machine learning theory point of view (which usually is sensitive upon squeezing tiny weight deltas for learning).
But from a biological perspective I’d be surprised if we could see or define more than 15 distinct buckets of synaptic strengths.
And should we get that far (4b !!) towards persistence compression, to compensate for the decreased finesse during learning, we could maybe… go stochastic. Quite “simply”? (again, not yet experimental support to back that up).

2 Likes

It is estimated that 4000–5000 glutamate (GLU) containing axons reach any given square millimeter of rat L1 (Rubio-Garrido et al., 2009) to selectively target apical dendritic tufts (Herkenham, 1986; Arbuthnott et al., 1990; Lu and Lin, 1993)

Just came across this number in here. It may interest you to choose your optimal number of bits :wink:

3 Likes

It does!

1 Like

Why Topology ?

What HTM is already good at

HTM default recommended count of minicolumns for the SP and TM is 2048.

With those figures, we cover 3.28mm² of cortical surface. That’s roughly a disc of 1mm radius, which could stand for a “macrocolumn” concept where macrocolumns are independent, and each macrocolumn is concerned with sampling some limited subset of a sensory input modality (such as, one whisker, or the tip of one finger). As such, we can address, using 16bits, a full, 32-cells per minicolumn-deep “mirror” of similar cell count, such as a typical potential for a TM at ‘t-1’, spanning roughly one large layer worth of cells, if we limit ourselves to that macrocolumn.

Why care about 16b address size ?

HTM is concerned with replicating a somewhat realistic synaptic connectivity scheme, where each synapse in a distal dendritic segment is connected in a huge sea of potential presynaptic cell. It is unrealistic to have this connectivity modeled “densely” as in more classical NN, as a vector of “weights” (here, persistence values) to each and all potential presynaptic cells. A segment should hold about 40 connected synapses, thus most “weights” from the 64K potentials in a 2048-minicolumn, 32-cells per-minicolumn would be 0.
A sparse connectivity scheme, where such segment is, instead, represented as a list of actually connected synapses, each with presynaptic “address” + “persistence”, is far more practical. But then, we must reserve some fixed memory space for those addresses, and those persistence values.

With that issue explained, we can consider different bitcount values for address and persistence, and ask ourselves whether we should care at all. Today’s computers are beasts, after all, and in a majority of scenarios, we usually can setup a pythonny prototype without giving much thoughts to such gross things as a “number of bits” for our values.

Well, sorry, lads. Even with todays computers, those bitcount matter a lot when what we’re simulating are brains… HTM takes to the letter the fact that neurons in a brain have several thousands of synapses. I’ve seen figures of 10 thousands on average often put forward. Let’s consider that number to be conservative, cut that in half for our purposes, and see how it goes, were we to consider simulating all cells in a column (like, TM + those recent sensorimotor developments, spanning across most layers).

^ bits per syn . . . . . . . . . . . . . . . . . . . syn weight in 2048-minicol
| (address + . . . . . . . . . . . . . . . . . . . . area, 100 cells/minicol,
| persistence) . . . . . . . . . . . . . . . . . . .5000 synapses per cell.
|
|- 96 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 GB . [A]
|
|- 88
|
|- 80
|
|- 72
|
|- 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 GB . [B]
|
|- 56
|
|- 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 GB . [C]
|
|- 40
|
|- 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 GB . [D]
|
|- 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 GB . [E]
|- 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 GB
|- 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 GB
|- 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 GB
|- 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.9 GB
|
|***

  • Those numbers matter, because they are what needs to be stored on disk to save a trained model, and gigabytes start to be significant figures for hard drives.
  • Those numbers matter, because they are what needs to be handled during execution in RAM, and gigabytes are really significant figures for RAM ; or for graphics memory for GPU implementation.
  • Those numbers matter, because memory accesses impact today’s performance a lot. Today’s computers can crunch billions of operations per second, if their pipeline are correctly fed. If accessing carelessly gigabytes of memory, performance can drop down hundreds of times, waiting for those values in memory to get transmitted to your CPU. To a point, GPU implementations are also sensitive to these considerations.

[A] a totally “careless” implementation, using 32b integers for addresses and 64b floating point for persistence.
[B] a standard, atopological implementation, using 32b integers for addresses and 32b floating point for persistence.
[C] a “macrocolumnar” implementation, limited to those 2048, using 16b integers for addresses and 32b floating point for persistence.
[D] a “macrocolumnar” implementation, limited to those 2048, using 16b integers for addresses and recent-GPU 16b “half-float” for persistence.
[E] a “macrocolumnar” implementation, limited to those 2048, using 16b integers for addresses and subutai proposal for 8b (0…255) quantized values for persistence.

Why not end there?

The diagram above shows that we could easily end with what we have as a good-enough scheme. 16b-addresses HTM implementations (simulating independent “macrocolumns”), using subutai proposed 8b-per-persistence value, use only 3 bytes per synapse, and as such, are already in the low-range of potential overall data weights.

However, several reasons could bring us to the “topological” realm:

  • A need to address across several macrocolumns, such as a voting implementation using synaptic mechanisms ; for example with calvin-like grids, or modelling any kind of connection schemes within a local neighborhood ;
  • An interest to explore more finesse towards realistic axonal extents, horizontally or vertically ;
  • A will to try multi-modality potential afferents (the TM algorithm is “single-modality” in that regard, assuming only buttons from well-defined afferents carrying ‘t-1’ info would be wired to distal synapses of a layer) ;
  • A will to decrease synaptic weight yet again (I’ll try to present those “below-[E]” schemes at some point)
  • Your own drive there
3 Likes