TLDR Here I propose
- a TM block which instead of having a fixed depth (column size) it starts with all columns shallow and each column grows (stacks new cells) as necessary,
- how TM learning rules could be changed in order to accomplish the above.
- what advantages - in terms of learning rate and compute/memory resources this new TM breed might have
We know (at least most of us do) a TM is organized in a matrix of columns X cells, one column purpose is to predict one future SDR position (or bit), and is made by stacking a fixed number of cells, same depth for all columns. When a cell activates the whole column activates.
The algorithm being that each cell tries to make predictions of the following time step by âlookingâ for, and learning patterns in the activations of all other (or only neighboring) cells in the previous time step.
What is important to notice is the architecture (and therefore learning capacity) is fixed from the beginning - a certain column depth (# of cells/column) is assumed for the whole lifecycle of the TM.
A problem with the fixed TM size is its âMakerâ has to guess in advance what architecture is optimal for the given task - too shallow HTM canât learn complex relationships, a too large one wastes resources.
To go on the Expansive TM would be sketched like this:
-
first, for simplicity sake, each cell is responsible to learn a single pattern - it has one segment that turns it active.
-
start with one cell depth in each column. Cells mature very quickly, which means once they âlocked inâ to a pattern their synapses freeze and non-permanent synapses are discarded.
That brings in two benefits:- the learning algorithm can skip mature segments (efficiency). Normally one learning cell per column is sufficient.
- mature segments do not need to hold synaptic permanences, all synapses are permanent (memory savings)
- mature segments reliably activate each time when they see the pattern they locked into (consistent behavior)
-
there are two errors a segment/cell can do:
- to not activate when it should
- to activate when it shouldnât
How does GTM handles these:
- in this case a new immature cell is stacked on top of the column and its purpose is to learn a new pattern that the cell(s) below it did not noticed.
- in this case an âinhibitiveâ segment is added to the cell - its purpose is to learn a particular exception pattern that is supposed to inhibit the cellâs activation.
-
As you noticed, I almost lied above: A cell doesnât have a single segment, it has a single activation (positive) segment and zero or more inhibitory (negative) segments, yet only one of them can be in immature stage - learning an inhibitory pattern. A new inhibitory segment is added whenever all preexisting inhibitory segments failed to inhibit the cell (hence column) activation when they should have.
-
as a bonus to speed - as long as a positive segment does not activate, its corresponding inhibitory segments can be skipped (do not need to check whether they see an inhibitory pattern). So a cell can learn many exception rules without having to evaluate them at every time step. Further on, for cells with multiple inhibitory segments, they are evaluated in the order of maturation - and once one inhibits the cell we (the algorithm actually) can stop checking for further exception rules for the same cell.
-
The consistency rules:
- a learning segment can only get its input from mature segments. This way is guaranteed whatever the reason (pattern) the segment has learned, its future activations will not be hindered by its input cells âchanging their mindâ by learning new rules. That is why an input synapse should not point towards a whole cell but one certain segment within that cell. (0,1,2⌠where â0â is the positive segment)
- Mature segments are assigned a maturity value 1⌠N (where 1 is the oldest mature segment and N is the newest one. So segment i can only read its inputs (== attach synapses) from segments 1, 2,âŚ, i-1
-
further optimizations: linking inhibitory segments: When a cell in a column miss-fires, before adding an new, immature inhibitory segment, algorithm can check all previous inhibitory segments in its column, and if any one checks true, it simply symlinks it in the newer cell, which means that âthe already learned inhibitory rule applies in this particular case tooâ
Ok, I hope it does look simple enough to raise your interest.
Opinions are welcomed e.g. what use cases vanilla TM can solve and this one couldnât or how difficult would be to implement it, or how do you think the presumed performance improvements would really matter.
I personally like the following benefits:
- that a column can be as shallow as necessary. If a pattern bit is easy to predict, why allocate dozens of cells?
- and even a column not having learning segments/cells as long as it makes correct predictions.
- once an older cell decides to activate (its positive pattern says âGO!â and its negative patterns do not oppose it) then there is no need to ask/evaluate newer
cells from the same column. - since by Pareto rule, 20% of relevant patterns for each column would be encountered 80% of the times, it means the most relevant patterns would tend to be learned sooner, hence the advantage above (bypass rare complicated rules) is further magnified.