It’s like why we don’t just use FullyConnected/Dense layers in neural networks. Its
Using too much memory
Too slow - O(N^2)
Also, You’ll a 2D connection list isn’t enough. You’ll need a 3D one to make sequence learning to work. Otherwise the algorithm can’t learn any concept about time and acts like a simple a->b mapping algorithm.
One reason is that in most applications, the inputs are not all semantically dissimilar. For example if TM algorithm learned transition A - B’. Then later an input came in which shared 25% of the same bits as A. I would expect a weak prediction (depending on the configuration) whereby some (but not all) of the cells representing B’ become predictive.
Your 3-step TM algorithm looks simple which is good, but I honestly do not follow as to how it would satisfy the TM computational/programming requirements. Could you explain more?
My understanding is that TM inputs are active columns that are set up by the SP, they don’t directly care that much about SDRs.
In think you meant on bits of the input space. Not all columns that contain a subset of the SDR bits in their receptive fields become active. If they were then the TM algorithm would be simpler. I guess I know how you got your 3-step algorithm.
I think the TM alorithm is not complicated, if one compares this to other learning algorithms out there. It is straight forward however it is a bit hard to make a mental picture of it because the sequence learning is done in a distributed manner.
Implementation-wise, the SDR is just a concept of what the HTM prefers to represent its inputs. Inputs are encoded as SDRs using specific SDR encoders. The SP I believe is yet another concept, but implementation-wise it maps columns to input bits in the input space in groups called receptive fields. An input bit may or may not fall into a column’s receptive field this also means that if a column gets active then that is because it has “seen a set of bits” that overlaps the input (represented as an SDR). This set in practice is almost always a subset of bits not a superset. So an active column is really carrying the information of “I activate when I see this pattern from my receptive field”.
Going back to your 3-step algo, how does sequence learning work in that simple algo?
@OhMyHTM This looks to me like @Jose_Cueto is trying to understand your points and asking you counter questions. I also don’t quite understand the point you are trying to make. You’re saying the input space doesn’t have to be sparse, so may not technically be an SDR, and I can see from Jose’s response he also understands that, too. But you are claiming (it seems to me) that the TM has too many steps, and could be simplified. Correct? If so, that’s big news.
Maybe this is the confusion. It’s not really two vectors, but a vector of vectors. From one neuron’s standpoint, it has X dendritic segments, each having a unique number of synapses. This is more complex than just two vectors.
My apologies for the confusion. Was trying to clarify and test my understanding of HTM in the implementation point-of-view. Most importantly simplifying the TM impl I believe would be a big code improvement.
@rhyolight
If you are confused, I can write a demo later. i think these two vectors can store the transition of the pattern and can be used for prediction.
What do each of those bit strings represent? It seems quite a feat to get the TM down to two vectors, since there are so many layers of data structures:
Each SP mini-column contains a list of neurons, each of which contains a list of dendrite segments, each of which contains a list of synapses to specific cells (each synapse with a continually updating permanence value). I’m curious how simple a data structure can be while maintaining all of this info.
How would the SP algorithm reach anything close to O(n^2)?
There is a limited number of dendrites with a limited number of synapses. The dendrites are evaluated once per exposure (unless something very different has been introduced into the theory). Training (actual training and boosting) is done once per dendrite per exposure.
This gives O(2n) -> O(n) since there is nothing but linear referencing of the dendrites.