ANN : Scalar Matrix-Hashing ?encoder?


I build Scalar Matrix-Hashing !encoder!.
Please read the Readme.txt

The interesting thing about it is that it should support as big numbers as possible.
Additionally you may not need to use Spatial pooler if the hashing works correctly i.e. you can link the Encoder directly with the TM.

The only problem I see is how to guarantee there are no collisions between the different hash functions.
For this we would need to figure some way to adjust the matrices, based on collisions that happen over time OR figure out way to that from the start !!!

My main reason to post here is to hear from you some ideas on lowering collisions between hashing functions.
In the current implementation you can check collision statistics :


btw there is no similarity between nearby numbers ! which may disqualify it as Encoder.


This is true… the first rule of encoding (from Encoding Data for HTM Systems) is:

  1. Semantically similar data should result in SDRs with overlapping active bits.

I’m not sure the results of this encoder will be process-able by HTM systems without that.


Locality sensitive hashing (LSH) is the way to go:
If you regard LSH as feature (cue) detection and you could extend LSH with some unsupervised learning of those features and you created a read out layer…


That looks promising. If anyone tries this, please let us all know.


Yes, I figured it out after I posted it ;( … seems I was too quick to post … it seemed too good to be true :wink:


Do you know of a hash function that increases collision for similar numbers , as example ?


Ok, you have a list of numbers, in a recomputable way randomly add and subtract those numbers and take the sign of the combination as a binary bit. Hey, did you just destroy all the information in the list? No, actually not. You can say if all the added values were greater than all the subtracted values or not. If you get another bit by using a different random combination that second combination is almost exactly orthogonal to the first. If the list of numbers is sparse in some way the compressive sensing crowd can show you can exactly recreate the list from not so many bits. Obviously if you have two lists that are almost the same they will tend to output the same hash bits. Simple as that really. It isn’t rocket science.


Maybe this paper is a bit easier:

A simpler viewpoint is that if the input data is sparse the full information content of that data can be captured with a moderate number of random correlations. It is never going to be as compressive as say Jpeg but it makes less assumptions about the data and is easy to do.

For LSH if the data vector has 65536 (2^16) elements and you want 65536 hash bits you have a slight problem there. You would need to do nn (6553665536) (±) operations, kind of slow. You can reduce that down to n(ln n) 65536*16 (±) operations by doing (recomputable) random sign flipping of the input data followed by a Walsh Hadamard transform.

There is an old joke that everything in AI is dot product calculations, and so it is with the above when you look into it.