HTM is just a k-means algorithm (with a small mistake in its implementation)


I did an extensive work on my own mathematical model of neurons and then it turned out that your HTM is just a special case of my model (with norm L1). There is a formal mathematical proof that my model is special form of a k-means algorithm. I also show how my model relates to free energy principle and slow feature analysis. It can also produce “multilayer convolutional neural networks” with translation invariance but without any weight sharing and using only hebbian learning. I also believe that my model can explain bursting phenomenon and minicolumns in a more natural way than it has been proposed in temporal memory algorithm, though that may be just my opinion.

I was wondering whether this work is worth anything.

The paper does not explain how to derive HTM from ECC, but I can show the derivation somewhere else if there is enough interest.

At the end of the day, the performance is not that great (and neither is that of HTM), so publishing it in a reputable place is probably impossible. Some additional mechanisms are necessary. I thought I will just stick this research in the drawer, but the math and theory is good and might be of interest to the HTM folks, so I’m just posting it here.

1 Like

I guess by HTM you mean mostly Spatial Pooler? Didn’t get into details, but I do hear they actually run k-means in implementation.

I always had a problem with k-means and the like: they don’t distinguish between proximity and similarity of clustered elements. In real world, proximity in space-time is critical, and it’s only loosely correlated with similarity of the content. In fact, novel (unexpected, thus informative) similarity anti-correlates with proximity. Both are multivariate, starting with space-time dimensions and sensory modalities, and then practically exploding with derived parameters. So, bunching them all together in one “space” seems nonsensical to me.

1 Like

My model can produce clusters that take into account time-space proximity and built invariant representations. So in this sense it’s actually superior to HTM. And yes, I’m referring to Spatial Pooler mostly. Temporal pooler to some degree too, but it is just an over-engineered approach to expressing time dependencies that polychronization can address more elegantly (at least in my opinion)

Nonetheless, I have an actually formal proof relating htm to k-means. I haven’t seen it anywhere in Numenta’s papers. Hence I thought it might be interesting to you

1 Like

Another formal analysis of HTM is [1601.06116] A Mathematical Formalization of Hierarchical Temporal Memory's Spatial Pooler how does this compare with your work?

I have seen several papers like the one you mention but I would say those are very technical and overly focus on the implementation. In such analysis it is hard to draw any simple and elegant connections between HTM and other fields of mathematics. My work lies on a much higher level. It proves convergence for an entire family of models (HTM being just a special case of them). It highlights elegant connections to Bayesian inference, Free energy principle, graphical models, slow feature analysis, reinforcement learning, group theory and geometric priors. The algorithm itself is elegant, simple and can be written in a few lines of code (at least without optimizations). Whereas all the formal analysis papers of HTM that I’ve seen, feel a bit like “reading someone’s spaghetti code”, if you know what I mean. They lack this mathematical simplicity and elegance that you could find in analysis of deep belief networks, predictive coding or most other machine learning algorithms. By being more general and starting from first principles, my work (hopefully you will agree) achieves a similar level of elegance. The HTM can be then derived by filling in the missing, less important, implementation details.