The power of bit pair expansion

oh, wait, I completelly missed that part. now looking back, I see the P*(P-1)/2 equation.
I thought the bit pairs were being created in a single loop iteration

I was thinking it was kinda similar to an outer product binding but its actually identical to an outer product with zeroed diagonals lol.

curious that this is exactly how you compute the weights for an hopfield network, but you are using those weights as an SDR

It’s “only” a SDR in the case it is used directly as input to Numenta’s SDR Classifier.

In all other applications (associative memory, value map, id indexer) that expanded SDR represents a list of memory addresses where various “stuff” is stored.

PS Even in the case of SDR classifier there can be considered the bit pair expansion as an input layer of size N to a hidden layer of size N*(N-1)/2, where each node of the hidden layer connects with exactly 2 nodes in the input and computes function AND on its input node pair.

In this sense is a very sparse, fixed (untrained),first hidden layer

1 Like

I usually think of it as your hidden crossbar layer analogy, makes it a little easier to grasp what its doing.

thinking that way, it really resembles the granule cell layer in the cerebellum.

Right … if C can be computed from A and B, it carries no information and there’s no point in storing it.
But a memory is necessary if random SDRs {A,B,C} do carry non-computable information to be learnt, for example {sky, hascolor, blue}.

1 Like

I just dont understand how are we treating SDRs here.

sounds like you mention as if A and B are bags of features (MNIST digits) but C is a random symbol.
I thought the sparse machinery were mainly used to process random symbols with minimal overlap.

Yes, I would be interested too. My understanding likewise is that SDRs are unique identifiers represented as sparse bit patterns, with the specific features of a degree of noise immunity and the ability to test for multiple SDRs at once using a suitable mask. If so one would be looking for algorithms Including sequence recognition) built on that data structure and most likely with a high degree of parallelism. If not, then what?

1 Like

There are multiple 2D morphological transformation steps involved to map an MNIST digit to an SDR that can be processed by an associative memory. Those preprocessing steps should be fixed and deterministic, not requiring learning. Bit pair hashing with 2D topological constraints may prove useful, especially if combined with conventional image filters such as edge/gradient detection. As soon as you combine two such low level SDRs into a higher-level concept, random SDRs will come into play. Several steps up in the hierarchy of associative memories, you’ll have entirely random SDRs presenting abstract and stable concepts like “the digit 7”.

I thought the point was that if C is deterministic then the triadic memory learns the identity function and becomes “dead”
but I think that’s an issue caused by the diagonal addresses, if we set them to zero like you do in the bit expansion, there shouldn’t be a problem.

We talk in parallel here about different problems.

  • Digit recognition is one of them,
  • learning sequential patterns of either “symbols” or “data” encoded/represented as sdrs is another one.

Sure somewhere has to be a transition between sensory data to symbols and a machinery should be comfortable with both.

Beware the large language models (transformers) do not work with “pure” symbols, the are based on vector embeddings that encode some sort of “resemblance” between symbols.

So certainly there is value in doing that.

the hard part is finding a way to encode “natural” sensory data as useful symbols,

if I got it right thats what you are trying to do with all those transformations and your benchmark for how useful those symbols are is the SDR classifier.

Yes the encoding is important. I think it is grossly overlooked in ML,

Google is trying to catch up, there is still active a Kaggle contest on making an universal image encoder, that doesn’t need further training so it produces a fixed size embedding which is useful in every task and more important is consistent which means there will be little incentive to further train a new “visual cortex” with every ML problem.

Sounds good. All we need to do now is figure out the ‘combine’ algorithms…

Another hard part to add to the above is figuring out what sensory data is useful.

Someone gave an extreme example. Imagine a pacman or any game in which the main character is represented, instead of an animated image, as a single red pixel moving around. What kind of algorithm would figure out on its own the respective pixel is the one that matters most.

I imagine a system that learns to split the input into its individual components then it just learns to trow away what is not useful.

for games and realworld visual data, I think motion is a good clue to tell what are the components (objects) making up the input

pixels that move together mean together.

Yes, that’s why Google’s contest has small chances to succeed. Generally when a company with Googles budget is shopping for $25k ideas on Kaggle, it means they ran out of them.

was the pacman example a real contest?

no someone’s partial joke on reddit. More reasonable question is how do we figure (search?) out what matters and what not in sensory data.
How big that pixel needs to be in order to get noticed.
Movement is a great clue.
Synchronization bettwen agent’s actions and pixel motion are also very important.
If I wiggle some muscles in a certain rhythm, what parts of the sensory stream oscillate in sync with them?

That’s why I thought content-neutral cycle sensitivity could be very useful.

I think we are getting caried out of topic.

anyway I think I managed to implement a somewhat limited version of the triadic memory.

it uses bits instead of bytes for the synapses so it uses 8x less RAM, but can only go in the A ⊗ B => C direction using a variation of the bit pair addressing.

do you guys recomend any test you have done for me to compare the performance?

nothing specific you can start feeding it random relations between random SDRs and see after how much input it starts coughing

here’s the results with N=1000 and P=10
the faliures stop being negligible at about 2.5M stored patterns
maybe I should post this at the triadic mem thread.

2 Likes