Topologies in the brain and how to model them

I’m quite invested in the matter of conceiving diverse topological approaches and evaluating the computation requirements for those. As scott replied in that post you linked to, taking into account topology does not have to be very costly. I’d bet your “Another reason we don’t use topology much in our examples is because we focused on scalar anomaly detection applications for several years” is closer to the mark, than ‘performance’.

That and, until your simulations hit the scale of several dozens minicolumns on a side, there was no much point considering topology. Many biological axonal arbors would indeed potentially innervate all minicolumns in that kind of range (some may not, and we can also model topological effects at smaller scales, but that’s already wanting to depart from high, HTM-level abstractions).

My handful of different takes range from very complicated to quite straightforward, with different performance vs realism profiles. I’d try to make a post with these at some point, once I have something more concrete to show…

What I want to point out for now is that, most of the complexity of the very intricate schemes I came up with, stem from the additional self-imposed requirement of decreasing synaptic addresses weights. If you relax this requirement, an HTM implementation with 16b-worth of potential presynaptic cells is already ‘almost there’… and if considering only binary activation, not-so-costly after all.

First, ‘Euclidean distance’ computations are maybe overkill to start seeing interesting effects using topology. Second, those distance-anything need only be computed at the limited times where a synapse needs to get newly connected (mostly when growing a new distal segment, for the TM part). Third, the reason why I believe a 16b HTM is ‘almost there’ already is that 16b addressing on a 2D bitmap is homing on the required axonal ranges, given biologically realistic space between minicolumns, on a more or less thin subset of a layer:
Quite importantly, you (simply) need to divide the problem along the vertical axis (by vertical I mean, the one perpendicular to surface of cortical sheet). Some activated axons would thus have to be repeated to a handful of these thin layers, but once you’ve done that, you already have all the building blocks for a first, simple, straightforward, topological model.

After that, you may consider a position of distal segments relative to the soma position, but that’s adding bells and whistles, already. While in the bells and whistles realm, you can go quite nuts with it (as I like to do, almost as much as flooding @bitking with my crazy designs every other day) and consider more and more spatial divisions and stuff, but… it does not have to be very complex to start with.

4 Likes

What would be the major difference between HTM with topology model and highly distributed localized HTM models? For the latter, I’m referring to N completely independent HTM models with global topology but accepting selected areas of the input (localized) working with each other to as an ensemble of HTM models? I’d like to get your thoughts on this if possible.

In relation to ensembling HTM models, I’ve tried ensembling multiple HTM models using the HTM community’s MNIST example. It was poor attempt to simulate combining SP states as there’s no easy way to pause an SP and save its state. Anyway I was a bit surprised that it has improved the accuracy to up to 1%.

1 Like

Thanks for this discussion. It helps a lot to have a better understanding of HTM.

Could you explain more what you mean by “16b addressing on a 2D bitmap is homing on the required axonal ranges” ?

1 Like

Sure. Although maybe it should be made into another post.

I’ve grown some set of handful figures for having an intuition about the local connectivity requirements of any cortical patch simulation wanting to get realistic axonal and dendritic potentials.

Assuming 15 billion neurons total in cortex [1], and mean unfolded surface of 0.12m² for each hemisphere [2], this is 62500 cells per mm² on average, with some areas maybe twice as dense [3].

Using a regular square lattice, spaced 40µm, we have exactly 25x25 (625) tiles covering one square millimeter, and with that value, about 100 cells per position. This seems to fit nicely into the concept of a developmental minicolumn [3].

The extent of the basal dendritic arbors seems to be a sphere roughly 0.5mm in diameter, quite consistently across cell types, although some segments may try to extend further away if their local neighborhood is arbitrarily starved (cf. experiments with eye occlusions, etc.). Same figure for the 2D diameter of most apical tufts.
But we’re more concerned about the axonal matter, here : the extent of the widest axonal arbors seems to be a fair 3mm in diameter. I’m speaking here about localized lateral projections in same area… either intralayer, or, like, from L4 to L2/3.
Long-range connections among different areas in the hierarchy can of course blow the 1.5mm radius limit away… but long-range aren’t modeled the same way, anyway. And more importantly, if evaluating the diffusion of axonal arbors once they reached the distant area (eg. some arbors from Thalamus to V1), the 3mm diameter figure seems to appear again. So, once a long-range input bit gets sampled by some cells in an area, it can also be sampled among those kinds of potential radius.

So… In the simplest of the topological models I came up with, when you don’t take much care to precisely sort out those “max-range” arbors from more concentrated ones, and each of these axonal arbors is represented by a single point, you’re facing a horizontal sampling range of “potential” afferents, in a 3mm circle around (that’s simply reversing the viewpoint, from axon arbour ‘centers’ to synapses).

If you’re ready to stay in the abstract and clamp that 3mm figure somewhat, you have a blinding-fast computable, relative offset in a 64x64 minicolumns region around your cell, corresponding to a 2.56x2.56mm square.

That takes 12bits. You’ve 4 bits left on your 16b envelope to still chose a particular afferent to that minicolumn (=> 16 distinct afferents per minicolumn). This number is not biologically realistic if it should represent all afferent-centers per minicolumn to the whole sheet, though… and may not be sufficient to represent an HTM sim with deep minicolumns either. But if you’re ready to decompose that problem vertically, in any number of ‘thin-layers’ you need to accurately represent the overall sampling potential you require, then eveything is set:

Each cell or segment can easily be localized in 3D. Sampling 1 thin layer of 16 afferents per such cell or segment (or two of 8 each, allowing more diversity in potential ‘coincidence detection’), you’ve reached the (first) interesting mark of addressing all potential axons to a synapse, in a somewhat-biologically realistic manner, using 16 bits.

The only twist is that you’re sampling from 16 distinct 2D-maps per minicolumnar position, and that each input cell (such as, to take an HTM example, the t-1 from all cells from all minicolumns), may write to several of these, if overall input is spanning more than 16 per minicolumn. Typically 2 minimum for a 32 cells per minicolumn TM implementation (but probably more than this, distributed stochastically, to get a realistic spectrum for distal coincidence detectors).

[edit] As an additional bonus, that “input writing to one or several particular 2D-map(s)” pass may straightforwardly solve the issue of sorting “potentially connectable” from “out-of-potential” inputs for the proximal synapses of HTM during the Spatial Pooling phase. It gets more realistic and straightforward with finer topological schemes than this one, but it’s getting there.



[1] “There are between 14 and 16 billion neurons in the cerebral cortex”
[2] “When unfolded in the human, each hemispheric cortex has a total surface area of about 0.12 square metres”



[3] “Minicolumns comprise perhaps 80–120 neurons, except in the primate primary visual cortex (V1), where there are typically more than twice the number.”

2 Likes

Are you talking about quantization? If so, that would speed things up all over, not just for topology. It is something the research team thinks about a lot.

You’d be right! We never needed it, so we never put much effort into performance optimization. We will likely need it in the next round of research, so we welcome your insights. :smiley:

2 Likes

The topology needs to be local, but far-reaching. The current “minicolumn neighborhood” calculations can create topological projections of minicolumns through a sensory space based on a simple distance metric. This works fine for us to simulate inhibition and minicolumn competitions, but we need another topology so that the cortical columns can vote together. These two topologies are not the same thing. (I’m guessing the latter might be a small world style topology, with some local and some far-reaching connections.)

1 Like

I’m not sure of what you’re referring to, here. My own drawing board has quantized values for synaptic persistence (or strength), indeed… is that what you mean? Could it be that we’re pursuing same goal here? :slight_smile:

However, what I was saying about my bitcount requirement in the post above was purely to go smaller than 16b for pre-synaptic addresses themselves. Which would explain why I find the 16b mark “relaxed” :stuck_out_tongue:

Having thought about really nasty stuff in the low range of bit counts… currently revised my position and now aiming for 12b addresses, but I admit you need to get quite fancy with topology to reach that already (when considering large cortical sheets, that is… otherwise you don’t really need topology at all).
Dunno in the end if performance will be on my side. Will report about that when I reach the experimental phase. Hopefully soon enough

2 Likes

Yes, I think I’ve heard @subutai say he would like to get to 8 bits.

2 Likes

Wasn’t aware of that, but I salute that direction. Just go for it.

[edit] I won’t be best judge of the mathematical impact of this, from a purely machine learning theory point of view (which usually is sensitive upon squeezing tiny weight deltas for learning).
But from a biological perspective I’d be surprised if we could see or define more than 15 distinct buckets of synaptic strengths.
And should we get that far (4b !!) towards persistence compression, to compensate for the decreased finesse during learning, we could maybe… go stochastic. Quite “simply”? (again, not yet experimental support to back that up).

2 Likes

It is estimated that 4000–5000 glutamate (GLU) containing axons reach any given square millimeter of rat L1 (Rubio-Garrido et al., 2009) to selectively target apical dendritic tufts (Herkenham, 1986; Arbuthnott et al., 1990; Lu and Lin, 1993)

Just came across this number in here. It may interest you to choose your optimal number of bits :wink:

3 Likes

It does!

1 Like

Why Topology ?

What HTM is already good at

HTM default recommended count of minicolumns for the SP and TM is 2048.

With those figures, we cover 3.28mm² of cortical surface. That’s roughly a disc of 1mm radius, which could stand for a “macrocolumn” concept where macrocolumns are independent, and each macrocolumn is concerned with sampling some limited subset of a sensory input modality (such as, one whisker, or the tip of one finger). As such, we can address, using 16bits, a full, 32-cells per minicolumn-deep “mirror” of similar cell count, such as a typical potential for a TM at ‘t-1’, spanning roughly one large layer worth of cells, if we limit ourselves to that macrocolumn.

Why care about 16b address size ?

HTM is concerned with replicating a somewhat realistic synaptic connectivity scheme, where each synapse in a distal dendritic segment is connected in a huge sea of potential presynaptic cell. It is unrealistic to have this connectivity modeled “densely” as in more classical NN, as a vector of “weights” (here, persistence values) to each and all potential presynaptic cells. A segment should hold about 40 connected synapses, thus most “weights” from the 64K potentials in a 2048-minicolumn, 32-cells per-minicolumn would be 0.
A sparse connectivity scheme, where such segment is, instead, represented as a list of actually connected synapses, each with presynaptic “address” + “persistence”, is far more practical. But then, we must reserve some fixed memory space for those addresses, and those persistence values.

With that issue explained, we can consider different bitcount values for address and persistence, and ask ourselves whether we should care at all. Today’s computers are beasts, after all, and in a majority of scenarios, we usually can setup a pythonny prototype without giving much thoughts to such gross things as a “number of bits” for our values.

Well, sorry, lads. Even with todays computers, those bitcount matter a lot when what we’re simulating are brains… HTM takes to the letter the fact that neurons in a brain have several thousands of synapses. I’ve seen figures of 10 thousands on average often put forward. Let’s consider that number to be conservative, cut that in half for our purposes, and see how it goes, were we to consider simulating all cells in a column (like, TM + those recent sensorimotor developments, spanning across most layers).

^ bits per syn . . . . . . . . . . . . . . . . . . . syn weight in 2048-minicol
| (address + . . . . . . . . . . . . . . . . . . . . area, 100 cells/minicol,
| persistence) . . . . . . . . . . . . . . . . . . .5000 synapses per cell.
|
|- 96 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 GB . [A]
|
|- 88
|
|- 80
|
|- 72
|
|- 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 GB . [B]
|
|- 56
|
|- 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 GB . [C]
|
|- 40
|
|- 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 GB . [D]
|
|- 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 GB . [E]
|- 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 GB
|- 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 GB
|- 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 GB
|- 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.9 GB
|
|***

  • Those numbers matter, because they are what needs to be stored on disk to save a trained model, and gigabytes start to be significant figures for hard drives.
  • Those numbers matter, because they are what needs to be handled during execution in RAM, and gigabytes are really significant figures for RAM ; or for graphics memory for GPU implementation.
  • Those numbers matter, because memory accesses impact today’s performance a lot. Today’s computers can crunch billions of operations per second, if their pipeline are correctly fed. If accessing carelessly gigabytes of memory, performance can drop down hundreds of times, waiting for those values in memory to get transmitted to your CPU. To a point, GPU implementations are also sensitive to these considerations.

[A] a totally “careless” implementation, using 32b integers for addresses and 64b floating point for persistence.
[B] a standard, atopological implementation, using 32b integers for addresses and 32b floating point for persistence.
[C] a “macrocolumnar” implementation, limited to those 2048, using 16b integers for addresses and 32b floating point for persistence.
[D] a “macrocolumnar” implementation, limited to those 2048, using 16b integers for addresses and recent-GPU 16b “half-float” for persistence.
[E] a “macrocolumnar” implementation, limited to those 2048, using 16b integers for addresses and subutai proposal for 8b (0…255) quantized values for persistence.

Why not end there?

The diagram above shows that we could easily end with what we have as a good-enough scheme. 16b-addresses HTM implementations (simulating independent “macrocolumns”), using subutai proposed 8b-per-persistence value, use only 3 bytes per synapse, and as such, are already in the low-range of potential overall data weights.

However, several reasons could bring us to the “topological” realm:

  • A need to address across several macrocolumns, such as a voting implementation using synaptic mechanisms ; for example with calvin-like grids, or modelling any kind of connection schemes within a local neighborhood ;
  • An interest to explore more finesse towards realistic axonal extents, horizontally or vertically ;
  • A will to try multi-modality potential afferents (the TM algorithm is “single-modality” in that regard, assuming only buttons from well-defined afferents carrying ‘t-1’ info would be wired to distal synapses of a layer) ;
  • A will to decrease synaptic weight yet again (I’ll try to present those “below-[E]” schemes at some point)
  • Your own drive there
4 Likes

I’m in the process of (finally) actually coding a C++ version of the SP and TM algorithms, mostly with same capacities as the reference implementation (the original nupic.core).

The final intent is to try and expose some actual performance differences between the configurations exposed in the post above… so, I’m not using the reference libs on GitHub (nupic.core and/or htm.core) directly, but re-coding them with as much bit-magic as I will deploy in the topological versions afterwards, so that each can be compared on fair grounds.

Also, implementing an HTM from scratch is addressing a long overdue promise I made to myself.

@rhyolight :
This has exposed me to some of the implementation details I had left out up to this point. And I wanted to share some opinion about the way some “topological” stuff is implemented in the SP.
In fact, I just realized that the connection potential of a vanilla SP already takes into account some form of topology. That’s the first thing. Second, I understand the next big-option towards vanilla HTM “topology” is the global_inhibition setting.

I take it that most people run HTM with global_inhib on, yet there are vasts amount of code with the sole purpose of addressing local inhib concerns.
I mean, the SP code is quite straightforward, and the majority of the complexities there stem from that local inhib option, and possibly its relation to boosting. And… if that’s your intuition for “topology” and the performance of it, it is no wonder you consider it shall be a 2050 feature.
My feeling regading vanilla HTM local_inhib:

  • There is a significant number of distinct, global passes computing “averages” or “maxes” of various kinds across a large number of columns, times the total number of columns, as soon as local inhib is chosen.
  • The various values and tweaks computed during these passes feel… quite ad-hoc. There are certainly reasons as to why it is the way it is, and I can’t pretend to know them all, but I bet if people used this local inhib option more often with HTM, and truly developed a feeling for which of these options matters there, most of that complexity could be cut down.
  • Inhibition radius is itself dynamic, with another global pass… and an integral part of the ‘ad-hoc’ feeling, imho. After cleaning up and retaining only what really matters, setting a fixed, and small inhibition radius would drastically increase performance with these kinds of “average” stuff passes. (and… who would have thought? be much more distinctly topological than a large one).
  • And then you may get competitive topology, even taking into account localized inhibitory neighborhoods, which is something I didn’t talk much about in the rest of this thread.

Cheers ! :slight_smile:

2 Likes

1 Like

Now that I’ve been able to compare with htm.core, here are some first results for SP, with learning on.

Note: ‘reworked “vanilla HTM” SP’ below stands for my re-coding of the SpatialPooler. Some of the dimensional complexities have been reduced to only accept a static 2D cortical sheet*. The remaining increase in performance comes from applying an extreme data-oriented coding philosophy, and some bitwise optimizations. All behavior (except for the “no update radius” (new) option, fixing it to 6) should however be similar to vanilla HTM**.

In all cases, input sheet is 64x32x4 (8192), Output sheet is 64x32 (2048 columns). Potential radius is set to 12 with wrapping on (=> all columns sampling inputs from a 25x25 square around in 2D, and across all 4 input depths), with a 50% connection potential chance. Results were gathered over at least 1000 runs each time. Both implementations are single-core, at this point.

  • htm.core SpatialPooler, float32 synapse permanences.
    – no boosting, global inhib . . . . . . . . . . . . . 2.0 ms/iter
    – no boosting, local inhib . . . . . . . . . . . . . . 33 ms/iter
    – boosting & global inhib. . . . . . . . . . . . . . . 2.2 ms/iter
    – boosting & local inhib. . . . . . . . . . . . . . . . 33 ms/iter

[edit] updated result values for actual published code, see post below (several fixes were made).
also, “inhibition radius” being the most impactfull for perf with local inhib on, it was coerced to empirically-found htm.core result of ‘9’ for those runs:

  • reworked “vanilla HTM” SP - float32 synapse permanences. (reported times for whole loop of input choice + densification + spatial pooler ‘compute’ with learning on)
    – no boosting, global inhib . . . . . . . . . . . . . 0.37 ms/iter
    – no boosting, local inhib . . . . . . . . . . . . . . 2.7 ms/iter
    – boosting & global inhib. . . . . . . . . . . . . . . 0.40 ms/iter
    – boosting & local inhib. . . . . . . . . . . . . . . . 3.4 ms/iter
    – boosting & local inhib, no update radius . 3.2 ms/iter

I have displayed the type holding synapse permanence values, as I have also run tests with fixed-point 16b and 8b (subutai proposal). However, there were no significant impact to performance for this SP test (only decreasing memory requirements). Perhaps this will be more important with TM, I expect this to be much more obvious for full-sheet sims, and really shining when reaching the 16b-mark for overall synaptic weight.

(*) Static sheets, where a column is one in a 64x32 (=2048) area there. The input however can be larger than 2048, by stacking vertically up to 32 of those 64x32 sheets.

(**) More data for judging of the quality of the results (how well behaved is the SP) to be expected in another post.

5 Likes

Now publishing the source code for the above results:

This source was tested under MS Windows, compiled with MSVC 2017
Should also compile and run fine for linux with gcc (if anyone would be kind enough to check that it is indeed the case, I’d be grateful!).
In any case, x64 is the target architecture of choice.

[edit] just pushed a new config based on a Gaussian filter for local inhibition, 5x faster than previous attempts.

This project is intended to become a playground to try more topological models at some point, with some of the ideas described in this thread.

4 Likes

Worked on some visualizations and tried to find a well-behaved SP, tweaking learning params, boost methods, and stuff.

kinda hard.
lots of things to tune, really

above picture is output from current HTMATCH when compiling WinConsole.cpp as main() provider

Input was, for this run, an encoding of 64 distinct possible input codes (in a 8192-bits array) represented by the 4 top-left rectangles above, displaying blue shapes. This was chosen to get an easier time discerning whether the outputs of the SP were consistent from one exposition to a particular code, to the next.

Each display rectangle above is 64x32 and corresponds to the dimensions I chose for the vanilla “sheets”. top left is input, everything on the right are different several for different SP configurations.
output columns, from left to right:
(1) VanillaSP ; Global inhib ; No Boosting
(2) VanillaSP ; Bucket inhib ; No Boosting
(3) VanillaSP ; Local inhib ; No Boosting
(4) VanillaSP ; Global inhib ; With Boosting
(5) VanillaSP ; Bucket inhib ; With Boosting
(6) VanillaSP ; Local inhib ; With Boosting
(7) VanillaSP ; Local inhib ; With Boosting ; No update of inhib radius
(8) VanillaSP ; Local inhib playing with gaussian filters ; With Boosting

Local and Global inhib you know of. “Bucket” inhib is simply dividing the sheet into smaller patches (here 8 of 16x16 each) and applying a like-global pass, but constrained within the limits of that patch.

The 18 output rows are, each time, 6 displays for: 32b float synapse permanence (topmost 6), 16b fixpoint synapse permanence (middle 6), 8b fixpoint synapse permanence (lower 6).
The 6 displays show, from top to bottom:

  • a color-code of the raw activation levels, per position in the sheet (number of connected synapses to an active input. blue is less)
  • a color-code of the current boosting factor, per position in the sheet, if boosting is used for this configuration
  • a color-code of the activation levels after having applied boosting, per position in the sheet, if boosting is used for this configuration
  • a color-code of the “min” activation level required for considering a cell as active after “inhibition” (constant over the whole sheet for global inhib mode… obviously patchy for buckets, and more or less smooth for local inhib schemes…)
  • a display of the actual active cells after all this (these positions are the one to be passed as ‘active’ to the TM, in the nominal workflow)
  • some space for visualization of various stats:
    • top red-orange bar is the “consistency” display. The more slots and redder to the left, the more cells encoding a previous same-code are now missing. The more slots and redder to the right, the more cells encoding a previous same-code are new this round. Optimally you’d want that gauge to be thin… and disappear like with the leftmost no-boosted modes.
    • middle red-orange bar is the “active count” display. With a fixed 2% sparsity I asked of these various SP, we’d expect about 41 active cells as a result. If bar shows extended to the left, we had less than expected this round. To the right, we got more. Since we “ask” a Spatial Pooler to output a steady and user-chosen count, we’d not want that bar to show, optimally.
    • still lower red-orange bar is the “outsider activations” display. The more slots and redder to the left, the more under-active columns are present among the 2048 of a sheet. The more slots and redder to the right, the more over-active columns are present among the 2048 of a sheet. One of the purpose of the SP is to try and keep an average activation properly balanced between its columns.
    • lowest bar: blue part is the mean of the current moving-average of activations. We see that the boosted local SP showing the two extended blue bar to the left are quite consistently emitting less than expected; orange part at the very bottom is the variance of those moving-averages. The thinner the better.

There are some obvious trade-offs between all those configurations here…
A few further notes:

  • While the display is continuously running, some gauges are visibly very jaggy.
  • Going over 10K or 20K runs and there were some stable-untill-then gauges which started to show red.
  • There can be quite different results with different learning speeds…
  • There can be quite different results with different input schemes. I have run these also with a very basic scalar encoder (plain or hashed), as well as a noise emitter… the patterns are qualitatively different.
  • Boosting function for the runs above uses a… log, instead of nupic.core exponential.
3 Likes

Hi gmirley,

I know of a simple and fast way to implement local inhibition. It runs as fast as global inhibition. The problem is that the code for local inhibition is slow. The solution is to use many small Spatial Poolers with global inhibition to simulate local inhibition. Simply divide up the input space in a topological fashion and make an SP(w/ global inhib.) for each piece of the input space. To make this work well you should have many overlapping input areas.

For example, if you wanted a Spatial Pooler with 2D local inhibition with 2,000 cells, you could have 20 Spatial Poolers (w/ global inhib.) with 100 cells each and arrange the SP’s in a 4x5 grid to cover the input space.

HTH

2 Likes

Nice catch :wink:
I believe in effect this solution is captured by the “Bucket” modes in current HTMATCH implementation

2 Likes