It strikes me as odd that temporal memory is set up to allow for one-shot learning (through the growth of new cell segments) but spatial poolers are not. Instead of growing new segments in response to novel spatial inputs, minicolumns have a single set of randomly-initialized connections that change comparatively slowly. Why not let the minicolumn synapses behave the way the cell synapses do, opening the door to one-shot spatial learning as well as one-shot temporal learning?
I’m guessing there is a good reason; could anyone explain it?
I think the SP needs to be less volatile, because you want it to activate the same columns when given the same encoding vectors. If SP columns were aggressively connecting to different sets of encoding bits as the TM cells do, it would seem to destabilize the SP for a while at least, causing the same inputs to activate totally different sets of columns at different times.
I’ve done something similar by applying vanilla HTM (specifically the Spatial Pooler) to the Omniglot dataset not too long ago. Unfortunately, the results are horrible. This is perfect timing IMO, I might try to see if this is possible.
I did something like that but not exactly! In my work, it was more oriented towards the non-functional mini-columns (those that are not heavily involved in computations or barely used). I did not get that much boost in performance, at least with the MNIST dataset, but the entropy was better!
Honestly, I did not pay that much attention to network convergence speed as my goal was to boost the performance. However, here I’m talking about a spatial task! the story may be totally different when it comes to temporal tasks.
My interpretation of the spatial pooler is that it acts like a kind of filter bank that applies something akin to a convolution on the input data. Each mini-column is effectively looking for a sufficiently matched feature (extracting a coefficient for a learned basis function). The accuracy of the match determines the degree of activation with the most active mini-columns being selected by the k-winners filter.
By this interpretation, it should be possible to reconstruct the salient parts of the input by linear combination of the filters (basis functions) associated with the activated mini-columns. Although it may be necessary to use duplication of filters (i.e. multiple mini-columns responding to similar filters) as a proxy for the original basis function coefficient extracted from the data. The learned basis functions (filters) do not have to be orthogonal, and indeed they should not be in order for this approach / interpretation to retain the semantics of the input.
As for why the SP uses Hebbian learning rather than one-shot learning, I believe @sheiser1 is correct. You want the learned filters to be more stable and to accurately reflect the cumulative statistics of the input data. If you dedicated a mini-column to every novel pattern that appeared on the input stream, then you would likely run out of storage capacity very rapidly and also loose some of the semantics of the input by not effectively grouping similar input patterns under a common set of previously learned filters.