Hi,
Here-s a proposal about how to stack several TM blocks at different time frames, in a manner similar to STREAMER (recently mentioned here) and Ogma’s SPH as a means to:
- deal with limited learning capacity of any single TM block.
- having an upper TM block operate at slowed time frame as the block(s) below it.
- incremental learning by ascending the hierarchy
- as a bonus it would provide a means to “shield” a TM block from unfamiliar patterns.
(I will detail 1…4 above after I’ll explain how it works)
First, a short reminder of a TM block architecture/function - it is organized in a matrix of columns X cells, one column per input SDR field, each column stacking a fixed number of cells, each cell learning to anticipate whether its corresponding input SDR bit will be 1 (turned ON) at the following (t+1) time step. For e.g.'s sake we have a 1000col x 30cell TM block.
Failure to correctly predict any column state at (t+1) is what drives the learning mechanism which produces changes within corresponding cells synapse connections via bursting (or boosting? I don’t recall the actual term).
An important notion to take away is the receptive field - which is where from any given cell can connect its input synapses to: the receptive field at any time t is the union of input sdr and cell activation states at previous time step (t-1)
Before cutting to the chase a (not too) short reminder what the general problems any ML (not only TM) model struggles to handle. (or skip to the next message if already bored)
- sample efficiency and catastrophic forgetting - skip these for now since TM handles them pretty well.
- handling out of training dataset (aka real world) cases. The obvious way TM avoids this issue is with continual learning, yet that in itself does not shield it from:
- the trade between resource capacity (memory,compute) and complexity (in space and time) - which is commonly handled by scaling up: (use more hardware to) make a similar, bigger (hopefully) faster model and begin its training from scratch.
What aggravates it is:
- it is hard to know in advance what level of resources are needed to handle decently (==economically) a certain problem.
- the increment in memory (# of neurons/parameters/synapses/etc.) requires an excessive increase in training compute required to learn from a dataset.
- problem complexity increase requires an excessive increase in training dataset size and training time.