AKA 100 million brains machine
The problem this machine attempts to solve is that of scaling - currently the HTM tests/studies I’m aware of are limited to modelling relatively small “cortex patches” (aka macro columns) spanning a few thousand mini columns while real mammalian brains may span up to over 100M units.
As the title suggests, the unit of computation in this machine is the mini column which I’ll further refer to as column.
I’ll try to be as terse as I can by using the following
example architecture (numbers may vary but 100x units are easily comprehensible):
- Columns are arranged in a 10000x10000 2d matrix, aka cortex.
- The “macro” columns (as in biology) are not fixed nor delimited within the cortex, but many 100x100, arbitrarily positioned, windows can be addressed (opened) at any time and be regarded as “macro columns”
- There is a fixed underlying SDR representation for either inputs and outputs which any (mini-) column may “see” as an input. It is e.g. 10000 bits size (conveniently mapped as an 100x100 bit array). The 10000 SDR size isn’t suggested by anything else than further convenience and the intuition that it should be sufficiently large to provide a rich representation (aka embedding) of every context/thing and any complex relationships between related things that this machine might encounter, learn and think about in its lifetime.
And to keep it within a reasonable processing capability of the underlying hardware.
So,
Every column can see the full 100x100 input SDR but it projects on a single point (bit) on a corresponding 100x100 output SDR.
For example the column placed at the address 1573x2545 in the cortex projects into bit 73x45 onto the output SDR. (it’s simple modulo 100 for both X and Y coordinates).
Now, it’s time to detail into the concepts of address (already highlighted above) and window activation yet first let’s recap what this machine is meant to achieve:
Implementing an 100M column model that doesn’t break the national grid and hopefully not the bank.
The way to do that is to have only 1% (1M columns) active at any time step, which should be spreadable on ~100 cores of ordinary machines.
How? By activating only 100 windows, (100x100 columns each) within the 10Kx10K cortex.
Ok, but how? By projecting the input SDR into 100 address points into the cortex in a manner that preserves similarities
For every address a corresponding 100x100 squared window originating at that respective point is activated.
100 windows 100x100 size each is 1% active columns.
What preserving similarities means exactly? It means that for two input SDRs:
- if they are exactly the same , they will project exactly within the same set of windows
- If they have a certain overlap (are similar) the corresponding active windows should have a sufficient overlap too.
The magic projecting/addressing tricks are hinted in lots of places, Kannerva’s SDM, associative memories and the random projection theory for dimensionality reduction are just a few of them.
The same architecture could be applied to any underlying processing/learning units instead of HTM’s minicolumns that spans in width instead of depth (which Deep Learning does).
They “minicolumns” could be shallow (a few layer) yet wide NNs, or even NEAT evolved networks or Random Trees - any small unit able to learn spatio-temporal patterns should be amenable to this wide, sparse processing structure .
In the unlikely case this is in any way a novel idea I’d like to call it Sparse Distributed Processing.
I will gladly deepen the topic, like how to connect actual multi modal sensory encodings, and to output responses, or in what ways this could be psychologically or biologically plausible, or how sequences of SDRs might be mapped into it in a manner that keeps the “threads” connected.