Spatial Hierarchy: The visual cortex is a hierarchy that represents small features at the bottom (edges) and gradually builds up larger features to the top (whole objects).
Temporal Hierarchy: The auditory cortex represents small features at the bottom (tones) and gradually builds up larger features to the top (words, sentences).
By this logic spatial pooling pools spatial features while temporal pooling pools temporal features. It is like the receptive fields could be represented like this:
The spatial receptive field is pooling features in one moment in time while the temporal receptive field is pooling features over a duration of time. So there could be a temporal field that spans only milliseconds in time at the bottom of the hierarchy (that say… spans 64 cells) and a field at the top that spans seconds (again spans 64 cells). This means that a temporal pooling cell at the bottom only remains active for milliseconds (as responding to small features) while cells at the top remain active for seconds (responding to larger features). I guess this is what Jeff meant when he says ‘representation becomes more stable the higher up the hierarchy’?
My question is - does HTM temporal pooling do this (or something similar to this)? or does each region learn an arbitrary length sequence (as opposed to a fixed length sequence as I’ve described above)?
This isn’t exactly an answer about heirarchy, but for a given region, I personally believe the ideal temporal pooling algorithm should be able to extend predictions further and further into the future, the more often a sequence is encountered. First time through a long sequence, the pooling layer would transition through a series of representations, but after a lot of training should have a more stable representation throughout the whole sequence.
If I am correct about this, then it would mean that each region could learn sequences of arbitrary length.
I wasn’t sure before but I think you’re right - sequences in each region is arbitrary in length and begin with a string of subsequences then gradually ‘join together’ to form a more stable representation. This can be thought of intuitively. When you learn a sequence for the first time (say a song) you remembers parts, it is kind of fuzzy. You may remember bits that repeated or are similar parts to other songs. As you listen to the song more your memory goes from fuzzy to more specific.
I was worried that a low-level region would be able to learn very long sequences that should be represented higher in the hierarchy. However, given that low-level inputs tend to be very noisy, it would be unlikely that the exact same input sequence would repeat enough times. It is only when TP generalizes the noisy low-level sequences that it can becomes more stable in the higher regions. So it’s almost like it self-organizes these levels of abstractions - in theory.
I am currently taking my time writing a post in the other thread about TP. I’m taking what you explained about TM and bursting and combining it with the competing TP cells that increase their activity when they correctly predict and decrease when they incorrectly predict. That along with lateral inhibition gives a good representation of how similar TP cells are to the input sequence. The more I think about TM and bursting the more thing become possible (eg subsequence abstraction). Thanks for opening my eyes