The sparsifying machine

It is known a lot of AI folks hypothesize one of the main brain functions is that of associative memory.
In HTM terms such a device is able to memorize SDRs and later recall them later even with an incomplete query SDR. This property makes it a kind of nearest neighbor search indexer.

An earlier attempt at implementing such a memory was a bit disappointing, because

  • it was relatively sluggish compared with ANN benchmarks. And performance is severely impacted by number of “ON” bits or solidity.
  • virtually nobody seemed interested about it
  • the MNIST tests I made with it revealed a tendency to “saturate” when some bits are very frequent in its input SDRs and that inherently affects its recall ability.

Despite these it reached >95% accuracy on MNIST with 40/2000 bit SDRs - it doesn’t seem much yet it is competitive against SP+SDR Classifier at similar SDR size & such low sparsity. Yup the low 2% sparsity is needed to maintain an acceptable performance

Recently I stumbled into @POv’s Triadic/Diadic memory paper which has a similar addressing scheme but a different, additive storage method resembling more Kanerva’s sparse distributed memory concept.

While trying to implement it in python/numba the performance it reached was an order of magnitude better than my previous experience - which motivated me to refactor & numbify my own “SDR Indexer” - and that made it significantly faster.

During that another observation popped out - that a slightly modified addressing scheme in the indexer makes it capable to index & store arbitrary wide SDRs with virtually no penalty.
Performance degrades with (at least) the square of ON bit counts - e.g. 20 ON bits resolve to 190 storage/search address locations (aka slots) while 40 bits expand to 780 address slots.

But performance-wise it doesn’t care if it operates with SDRs with a solidity of 20 bits out of 500 available ones, or if the 20 active bits are spread across a ridiculously large 100Kbit or 1M bit SDR space.

And this property hints at the title - what if instead of dreading extremely large SDRs we embrace them?

In what ways a machine which transforms low space, dense information flows (== short, denser SDRs) into few higher meaning bits resembles the brain?


The 20 questions game

we all remember it: One player thinks at a random secret and the other player must figure out what the secret is by asking 20 questions first player must faithfully answer with yes or no.

It is that simple - anything we can think of can be recollected with 20 significant bits out of a potentially very large amount of possible questions one can ask.

What is important to note is each of those questions must carry a high semantic power - they would ideally divide the space of remaining possibilities in half.

And this hints out at what a sparsifier machine should do: iteratively transform its “raw” data into higher semantics data. Fewer bits to represent the same “thing”.

“a vechicle[1] having a pair[3] of wheels[2], which can be powered by pedals[4] and has a rider’s[51] seat[5] and a horn shaped[62] steering device[6]” - all those details can be abstracted away by a single, higher semantics bit: bicycle[777]

A “knowledge algorithm” should be able to do exactly that.

A few days ago, discussing with @POv about implementing a MNIST classifier I suggested a simple (almost) random projection encoder to transform digit images into 20/2000 SDRs which can be written into associative memory and later queried for classification.
The results were starting to get distorted by a similar local saturation effect when many SDRs within the learning dataset hit too often certain memory addresses.

And Peter made then a rather annoying remark: “There has to be a better way”. “Sure” I thought to myself “how bout a convolutional auto encoder or something even better”.

But later, ruminating about the discovery the SDR Indexer doesn’t care about SDR size I wondered:

What happens if one takes the most offending bit pair in the input data and replaces their simultaneous occurrence with a third, totally new single bit by expanding the 2000 bit wide space to 2001, and use the new 2001 representation to store the data in the same associative memory?

Think a bit (sic): bits 55 and 781 (out of 1 to 2000) fire together so often that they saturate the memory. Yep sometimes 55 fires without 781, or backwards, yet out of millions of locations associative memory is most hit at the address (55,781).

After we replace the (55,781) simultaneous occurrence in the input 2000bit wide space with bit 2001 in a 2000+1 wide space, we just got:

  • sparsified the input data, since now instead of 20/2000 input data new data will be 20/2001 bits and occasionally 19/2001 bits.
  • found out a higher meaning bit called 2001 that is simpler than and can be easily decoded back to (55,781)
  • a 10% increase of performance when storing/searching 19 bits instead of 20
  • and - cherry on top - we solved the saturation problem at address (55,781).

Is this (what I believe) the real type of process that the Hippocampus performs, which is to phase shift the input to align the short term buffer into a temporally correlated/relevant pattern. The patterning can then create overlaps (simultanious firing) which by default creates a new concept instance or accesses a prior instance that combines the two separate inputs. This may then occur within the short term memory buffer space.

Phase shifting of an input stream is a singular vauge concept…

The alternative thought is that the spindle waves perform this overlapping (bit compression / new concept) by bridging the LTP gap between sequence points to create higher level concepts after the event.

1 Like

Good post, but:

  • nearest neighbour has always seemed to me more about noise immunity than a property in itself
  • I was under the impression the number of bits was a function of the count of neurons in a column – surely the brain can’t do million bit SDRs?
  • surely algorithm comes first, performance is just engineering? And why Python if you need speed?
1 Like
  • Nearest Neighbor search is a well known problem in ML. Noise filtering, classification, regression are few prominent applications. Most search engines for unstructured data (music, images) use a nearest neighbor search backbone.
  • Unless we-re certain all columns “see” the same version of any and all SDRs we can’t tell whether each column is shown the whole picture SDR or a localized small part of it, relevant to that specific column.
  • The numba refactoring I mentioned already improved performance by almost 1 order of magnitude. I’m still working/testing it I will release code in couple days.

Details on performance:
first glossary: “sdr size” is the available bit space e.g. 1000, 2000 or 20000 bits. “sdr length” is the count of active (==1) bits on a certain sdr. Equal to the length of the sparse representation of the SDR.

  • Speed is heavily dependent on the number of bit pairs a SDR can be expanded before storage or query. A 10 bit length SDR expands to 10*9/2 = 45 address locations. A 20 bit one expands to 190 (= 20*19/2)
  • So speed is inversely proportional with the square of sdr length. Searching 100 bit length SDR is at least as expensive as searching 100 x 10 bit SDRs.
  • The speed is best quantified in # of bit pairs/second. Querying costs are about 4M bit pairs/sec while writing is 10x faster.

One million SDRs 20 bit long each can be stored in ~5 seconds. Search is 10x slower. Doubling the number of bits then speed drops 4 times.