Hello fellow hackers, I would like to discuss the way we all use different sorts of distances in our HTM implementations.
So to kick off, what distance metric do you use for what and why?
I guess the most straight forward one is the Hamming Distance to check bit-vector similarity. We can use the distance to check how similar two bit vectors of same length are.
I started out using the Hamming distance and a ‘percentage threshold calculator’:
func compareArrayOnBits(a a: Array<Int>, b:Array<Int>) -> Double {
var counter: Double = 0
for i in 0...a.count - 1 {
if a[i] == b[i] && a[i] == 1{
counter += 1
}
}
return ((100 / 8) * counter) // for percentage on a 8bit vector
}
Excuse the hardcoded magical numbers.
Anyway both my percentage calculator and the Hamming distance are useless as they only calculate bit likeness in a vector disregarding vector value dynamics.
Anyways, it would be great to hear what kind of Distance calculations you guys have used and for what, to be able to rule out certain types and discuss around them.
I believe you’ll find distance calculations in the CLAClassifier, the KNNClassifier and the new SDRClassifier (which replaces the CLAClassifier?). But Yuwei (the author of the new SDRClassifier), should be able to fill in the details about this?
I assume you are talking about distance calculations used in HTM algorithms like spatial pooling and temporal memory; specifically how neurons (via dendrite segments) match an input bit array, or previous state. The original specification of these algorithms uses overlap, the number of “on” bits matching.
I actually think this is a theory question, not just an implementation question. So I suggest this topic is in the wrong category here. And note I did just post on this topic at Positive-only matching vs negative matching.
There aren’t any “distance” calculations in the actual Algorithms per se. The only comparison you’ll see is in the SpatialPooler which does a “selection” process between the input field (bits in the input vector) and the bits represented by the SpatialPooler’s columns. The comparison done is to select the active bits choosing which of the SP’s “connected” bits are above an activation threshold; are connected to active input field bits; and are in the top 2% of the number of all the bits having the highest number of connected active bits. Each SP column (bit) is connected to only (and maybe up to 50%) of the input vector’s bits; then of those bits, only those above a connected threshold are considered; and then only the top 2% of those considered columns are selected to become active.
So there really isn’t a direct “distance” calculation between the SP column vector and the input field’s vector…