Perceptual hashes for large SDR machines

I’d start with the scale problem this idea could address.
Consider one wants to build an abstract “SDR Machine” which roughly implements the following architecture:

  • there is a so called “current input space” SDR. It could be a “state space” combining both inputs and predictions in a single SDR, or a couple SDRs encoding sensory inputs and previous predictions of these inputs, what follows addresses either. I’ll name this I-SDR
  • what SDR machine does - it has a very large number of SDR decoders (which are basically minicolumns), and the generic work of the SDR machine is to match the above I-SDR against each minicolumn’s own Learned-SDR (L-SDR) to see if any particular minicolumn should be involved in further processing for the I-SDR.

So a big bottleneck for our SDR machine is this comparison of one I-SDR with potentially millions of L-SDRs. It could be a processing bottleneck or more likely a data bottleneck, depending on how SDRs are internally encoded, either 256 byte arrays (for 2kbit) or a vector of ~40 integers representing active synapses or maybe other encoding.

The trouble is it has to bring all that bunch of L-SDRs into CPIU cache and compare each with an I-SDR at every time step.


Now perceptual hashing is a technique used in image searches. There are several image hashing algorithms which all implement the same idea:

  • take an arbitrary image and transform it into a 64 bit number called perceptual hash, or phash using a compression algorithm.
  • this algorithm is designed in such a way that the resulting hashes have some very useful properties:
  1. hashing two arbitrary images are very unlikely to produce similar hashes
  2. it is resistant to multiple changes applied to an image - changes could be masking areas of an image, rescale on X, Y or both, blurring, adding noise, removing noise, shift colors,luminance, loosing JPG encoding. Regardless of these changes the hash of the resulting image is very close to the original image hash within a few bits difference between the two 64 bit hashes
  3. The 1. 2. properties above means an image hash is a reliable identifier of any image and thus allows indexing huge image databases (hundreds of millions or even billions) for quick search and retrieval of any image (or its variants resulting from processing) with a very good chance of finding good matches.

The proposal/idea I want to advance here is if someone devises similar hashing for SDRs it could eliminate the above mentioned bottleneck of the SDR machine: finding fast a very sparse subset of matching Learned-SDRs for any given Input-SDR .

And what also improves is this optimization would be quite insensitive to the size of the underlying SDRs - If a “normal” SDR machine is dragged down by the size of its SDRs - comparing two 20kbit SDRs is 10 time more expensive than 2kbit - once a SDR is hashed to a fixed 32 or 64 bit representation, speed of search&retrieval is unaffected by the actual SDR size.


Thoughts, ideas? I was thinking at this for a while maybe I missed something too obvious on why it couldn’t work. Or the SDR Machine above is totally irrelevant on how any future HTM processing is intended to work.

If anyone is interested I also have some implementation ideas for SDR perceptual hashes. As for images, where different algorithms were tested, there could be many ways to compute perceptual hashes for SDRs.

What is important is to have the first two numbered properties above - matching SDRs to have a very low probability of mismatching for their respective hashes, and un-matching SDRs to have a low probability of matching hashes.

1 Like

Regarding perceptual hash, there’s some interesting work that uses a hash + pooling to output an SDR:

“ A neural algorithm for a fundamental computing problem”

And later

1 Like

For “matching” maybe you can use dot product as a measure for similarity. That is how the “kernel trick” works in Kernel-SVM and Kernel-PCA… but dot product assumes you have fixed connections?

Similarly, In “attention is all you need” they compare the “query” vector to the “key” vector by projecting each into a new space, and then multiplying the two new vectors (in the new basis/space) by dot product. (In that paper they use backprop to train the projection matrices though. But it was very effective)
QK ~ similarity.
Q
WqWkK ~ similarity in new space.
(W dimension is mxn, projecting from m-features to n-new space. )

I am having some difficulty trying to project that proposal back to the biology.

There are many NN theories that are good for statistics theory but are not found in the biology - it looks to me that your proposal fits in that category.

That does not make it a bad tool but it just seems like it is a crutch to get around hardware limitations.

As such, it misses many of the built-in feature we get from the biological implementation; features that will have to be substituted by some other, equally artificial, implementation.

Just trying to keep it real! I’ll just retreat back over here to my biological based implementation curmudgeon corner now - as you were.

The idea of reducing an image to a small object (i.e. a 64 bit number) sounds a lot like what encoders and spatial poolers already do, just arranged unconventionally. I can imagine a spatial pooler with only 64 minicolumns, static synapses and 50% sparsity as an HTM equivalent to the image hashing @cezar_t proposed.

Hashes where you repeatedly add in a few bits of context data at a time are quite interesting. You can associative more general responses with the earlier hashes and more specific responses with the later hashes.
If you trained up a system that way you can just follow the chain of more and more specific responses until none further are found.
The last response found (the most specific) would presumably be the one to use, though you could look at the earlier more general responses.

Very true very true.
Note that dot product with a vector can be interpreted as synapsing with neuron… but… you can’t get there without a learning algorithm that isn’t backprop :sweat_smile:

1 Like

When people think about context they want to start at some fixed point in the past and go forward.
If you think about it you should start at the current situation (now) and incrementally step backward in time to figure out what is happening and what to do next.
Relevance fades as you go backward. Therefore you should not start examining a problem starting at some fixed point in the past and work forward.
The most information rich way is to work backwards.

Hi, thanks everyone hinting at papers about Locality Sensitive Hashes (LSH) - FlyHash, BioHash, etc… - which seem more specific to neuronal processing than perceptual hashes I was talking initially which is a similar approach towards narrower set of problem.

@Bitking your objection is very sound, from a certain valid perspective it doesn’t look at all as what brain actually does and indeed what I was thinking can be considered a crutch to deal with the differences between brain’s hardware and the one we have available for experiments.
But… what if biology itself just bundled a lot of crutches together because it didn’t had other means than to grow axions dendrites and synapses?

Here-s an example, a neuron. Or a minicolumn in NuPic. Isn’t it so great it can connect with 10000 other neurons and recognize maybe 200 hundreds patterns in its inputs? - because that’s what can be done with 200 dendritic segments with 50 synapses each which can fire 10-20 at a time to trigger an activation.

But what does that means? Let’s say each pattern represent an opinion of that that “means”. One pattern of input SDR means “hey I see a cup!”, other “a mouse” other “a star” and so on. Each of them has the same result - the minicolumn flipping one output bit.

Well, this should be significant. How significant? If the minicolumn “recognizes” 100 patterns there-s 1% chance the input pattern is a cup, 1% that it is a circle, and so on. Which is 99% uncertainty.
If the minicolum “recognizes” only 3 patterns then uncertainty drops to 66%.

Doesn’t seem to good either only 50% drop in uncertainty by renouncing at 96% of minicolumn’s capacity.

Yet how can we be certain what the querry-sdr means actually? By recruiting more minicolumns each with diverging opinions on what the input means.
If you have 20 minicolumns saying all together “it’s 33% chance we-re looking at a mouse” then we reduce the 66% likelyhood of the pattern not being a mouse down to 0.025% . You better be sure that beast is coming at you, jump on the table now!

But when dealing with minicolumns recognizing 100 patterns each, you would need almost 400 of them firing together to reach the same certainty.

Somehow a bit too much data to deal with in a single SDR. 20 bits - sparse, ok, 400… not so. Consider there are thousands of other minicolumns with contradicting opinions, a bunch saying its a xylophone and a couple rogues even claiming the given SDR is Santa Claus. Too many patterns per minicolumn could end up in a sea of uncertainty or worse, an epilepsy stroke.

But then you may ask, why would a minicolumn be prepared to respond to hundreds of patterns, and spread their dendrites & axons when dealing with more than a handful of patterns is already too much?

And one good reason might be the neurons, no matter how great synapse plasticity might be, cannot make arbitrary connections anywhere in the brain and in order to reach a useful number of recognizable patterns have to be pre-wired with hundreds of potential ones. Once a new, unseen pattern appears it is thrown somehow at thousands of potential “solvers” until one, then 5 then 20 minicolumns fire at the same time some central authority says “that’s it folks, we got the desired level of recognition, from now on we-ll call this weird creature a ferret” == the new SDR emerging from 20 minicolumns firing reliably together.

Then is this … narrowness of human processing despite its huge potential bandwidth. A neuroscientist driving back home maybe has a reverse problem: how to inhibit all parts of the brain trying to dance, to identify brain tumors from a mri, how to play chess or play with kids and narrow the activity down into those relevant parts needed to drive safely, without mistakes.

It’s a wild guess, but maybe all that bunch of long, thick, slow myelinated axons occupying 50% of scull real estate are there only because that was the compromise biology was capable of negotiating between the narrowness of the bandwidth needed to deal reliably with the “task at hand” and the capability to eventually deal with 999 other tasks it wouldn’t be able to deal at the same time together but eventually needing to handle each at its own time anyway?

What I’m aiming at - there-s a slight chance we don’t have to copycat the inner workings of the brain in order to reach a similar AI, is a possibility we can make significant steps even with the hardware at hand.

Think about this: A laptop can load a spatial pooler from SSD in RAM in the time needed for a distal axon to pass the signal from hippocampus somewhere in the neocortex.

Of course it isn’t enough for a fully developped brain,
but could prove the actual resources needed for an AGI are way more scarce than a whooping “live” network of neurons on 1000 machines ready to fire in RAM and a 1000 terabit/s switch clustering them.

The fact a computer can kick my arse at Go, chess, and hundreds other “narrow intelligence” tasks is however significant.

Maybe what is missing is a way to load/unload narrow “sdr processors” all sharing a common interface/representation of… “anything” and all AGI should do is to identify which ones out of gazillions “narrow models” available on SSD or in the cloud are needed to be loaded in live RAM to deal with the current “task at hand”.

1 Like