Large Temporal Memory

wrong wording … ALL possible combinations of 5% where 1 bit of column is selected…f.e

     (col:1, row:1), (1,2), (2,1),(3,4)  <= 5%

possible SDRs

     (1,1),(2,1),(3,4)
     (1,2),(2,1),(3,4)

it shouldn’t be that much … may be we can pre-filter them to lower the combinations

Maybe but keep in mind you also need to store each SDR with every possible subset of bits in order to match all possible queries.

Then - besides increasing storage a couple orders of magnitude - every query will return not one but hundreds or thousands potential answers and what you save on pennies will spend in sorting out which of hundreds or thousands responses are “best”.

Just to put things in numbers, for a 20 bit SDR (I mean 20 1-s out of e.g. 1000 bit for 2% sparsity) you can subset it in:

  • 20 * 19 / 2 = 190 pairs
  • same amount of 18 bit subsets.
  • 20 * 19 * 18 / 6 = 570 triplets
  • same amount of 17 bit subsets
  • already 1520 possibilities and I’m tired to count further subsets of either 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or 16 bits out of just 20

PS and I skipped 20 possible combinations of 19 bit subsets.

Ok, lets consider a possible optimisation: Neurons don’t bother unless there are at least 6-7 matching bits in input in order to “fire”. And any more than that is superfluous - if we get several 6 bit matches no need to look further to consider a match has happened.

Let’s store only 5 bits combinations - because # of posibilities is much less than for 6 bits and being on silicon we can tune down the fleshy noise we find in ourselves.

Which means 5 bits out of a 20 ones SDR there can be 15504 possible combinations (== 20! / (20-5)! / 5!) .

Still sounds bad but what if we store only 10% = 1550 possible combinations? Then with a “query burst” of 50 random 5 bit combinations from within the same SDR we’ll get a match with > 99% chances.
50 x 500 nanoseconds = 25 microseconds, or 40000 queries/second on a single machine - this might not be that bad IF the 500ns/query assumption holds true.

There are few interesting observations here

  • “pure concepts” should be encoded in as few bits as feasible.
  • the performance isn’t affected by the actual SDR bitspace size, only by the number of 1 bits. Which counterintuitively might lead to means of compensating smaller count of 1 bits with wider SDRs.
    Otherwise said we can forget arbitrary 5% rule, how would <1% sparsity work with 8kbit sdrs?
  • learning recurring patterns can happen gradually, storing a few 100s 5bit keys at a time. The more “used” a pattern is the more chances they-ll be retrieved with shorter query bursts. There is definitely the possibility to “emphasize” most important/used “concepts” at the cost of extra storage of these particular parts.

the last one sounds a lot like natural learning

1 Like

my assumption is that most of the Predicted cells wont be in the same columns.

Now that i think about it you dont have to allow multiple predicted cells in column.

We only have to allow more bits … even 1% of 10_000 =~ upto 100 predicted columns i.e. 100/1000 =~ upto 10%, which acts like UNION of all possible predictions.

now we can just match against normal SDRs … we can even use Permanence to do weighted overlap

YEP that is more feasible … and also align with my theory that classificaion/prediction is systematic mapping … i.e. Upscaling in higher dim map and then downscaling again

What would be nice to have is a predictor that when in doubt whether to output “A” vs “B” doesn’t just overlaps A with B SDRs but produces a short stream of alternating A, B, A, B, A, B SDRs.
An “oscillation” between most likely top 2-3-N choices.

That could be a trick of real spiking columns in order to avoid confusions.

If I didn’t misinterpret your guys’ discussions, must not we allow multiple predicted cells in a column? For example the concept of ‘car’ and ‘porsche’ should have a lot of columns overlapping.

If the TM predicts more than one potential SDR patterns as the next sequence and we try to match 5 bits out of 20 to a ‘concept’ (e.g. ‘car’, ‘porsche’, ‘tesla’) it would be troublesome if for example 2 of the 5 bits are chosen from ‘car’ SDR, the other 1 bit is chosen from the ‘porsche’ SDR and the last 2 bits are chosen from the ‘tesla’ SDR. Those 5 bits (especially if those bits are not shared between the three predicted SDRs/concepts) could be matched to an SDR pattern that doesn’t belong to any of the three concepts it’s supposed to predict? For example, the 5 bits could be matched to ‘hyundai’ in best case scenario and ‘elephant’ in the worst case scenario as it’s in an entirely different category.

Maybe, and that’s why such searching is potentially expensive, it needs sampling the database many 5bit candidates until some matches are starting to pop out.

But your example however might not be too realistic if the SDRs for various cars have an overlap - that’s what we would expect when they represent/encode related
meanings.

Then the oposite may happen - certain 5bit selections might get saturated with thousands of responses, one for every car, and only one of these would represent the generic idea of a car. Then we’ll need to select the one which best matches our query pattern.

But that is quite unlikely. The stored pattern with most matching bits gets quickly most likely to be retrieved first. For example if our unknown SDR (which we try to find a “meaning” for) has 10 bits matching “Porsche” and 7 matching “Tesla”, the chances an arbitrary 5 bit sample would match Porsche is over 10 times greater than for Tesla. It can happen but when we do 2, 3, … N queries the chances we-ll receive more Tesla-s than Porsche-s drop to nil very quickly from 1/10 to 1/100, 1/1000, 1/10**N

Keep in mind I talk here about how could we implement a would-like-to-have associative memory for SDRs .

Something in the lines of Kanerva’s Sparse Distributed Memory but for sparse large vectors instead of dense ones.

I’ve made some tests with 2 points indexing (instead of 5), results are interesting, it seems to work but nodes in lower dimensions are more prone to saturation.

1 Like

it doesnt matter by the idea in the post. u use predicted-bits > 2% of 1000.
So u merge them ‘by column’ if there is multiple cells pred. in a column.

because predicted-bits > 2% of 1000, the output SDR is sort of a UNION of predictions.
U match them against the lexicon-SDRs or other SDR-memory or Decoder, so it predicts better car,tesla,porche rather than elephant.
Which one u predict, depends on your selection mechanism.
F.e. rank the top 5 and probabilistically pick one OR just pick the top one

those will be fixed over time by bit/cell punish-reward mechanism

this means u have to change the TM algorithm.
OR u can train with modified seq :

X1 A # X2 B # X3 C

where

X1 = X >> 2
X2 = X >> 4
X3 = X >> 6

scroll to the end
https://vsraptor.github.io/book/docs/ihtm/isdp.html

I think better option is to figure how to modify FST to handle OVERLAP

keyvi is dynamic updatable FST

this is static

Set operations

There is one last thing we need to talk about to wrap up basic querying: set operations. Some common set operations supported by the fst crate are union, intersection, difference and symmetric difference. All of these operations can work efficiently on any number of sets or maps .

This is particularly useful if you have multiple sets or maps on disk that you’d like to search. Since the fst crate encourages memory mapping them, it means we can search many sets nearly instantly.

Let’s take a look at an example that searches multiple FSTs and combines the search results into a single stream.

1 Like