I’ll mention that yes the representational capacity of HTM is astronomical due to large, distributed sparse patterns inherent in SDR encodings and local learning rules. Learning one thing doesn’t significantly alter previously learned information because the model adjustments are completely local. This is a fundamentally different approach than globally training a DNN with any gradient-based procedure where adjustments occur over the entire network. Catastrophic forgetting is an obvious consequence of global learning strategies.
My 2 cents from messing around with temporal memory the past year is that the representational capacity in even a modestly sized HTM network (2048 columns, 32 cells per column) is so astronomically large and it’s near impossible to discern between semantically similar temporal sequences in it’s current form. Assuming 2% sparsity of active cells, the number of different possible sets of active cells is 65536 choose 1311 which is an unimaginably large number. Should each configuration represent a distinct context? No way…there’s no need for that much discernment. However, in the current form of TM, each configuration essentially does represent a distinct context with respect to which lateral dendritic segments will be activated leading to predicted cells for the next timestep. There’s a serious lack in TM’s ability to generalize among semantically similar contexts.