Possible Simplified implementation of Grid cells layer

I have an idea for Simplified implementation of Grid cells layer, may be !!
You decide if I’m bit too clever for the wrong reason :wink:

What is left is the easy part i.e. how to ENCODE/DECODE this thing :slight_smile: to/from SDR ?

lets start with some snippets from the paper .

To form a unique representation requires multiple grid cell
modules with different scales or orientations (Figures 1C,D).
For illustration purposes say we have 10 grid cell modules and
each module can encode 25 possible locations via a bump of
activity. These 10 bumps encode the current location of the
animal. Notice, if the animal moves continuously in one direction
the activity of individual modules will repeat due to the tiling, but
the ensemble activity of 10 modules is unlikely to repeat due to
the differences in scale and orientation between the modules. The
representational capacity formed by such a code is large. In our
example the number of unique locations that can be represented
is 25 10 ≈ 10 14 .

To summarize the above properties, a set of grid cell modules
can unambiguously represent locations in an environment.
These locations can be path integrated via movement, and
environmental landmarks are used to correct path integration
errors. By choosing random starting points within modules,
unique location spaces can be defined for each environment. The
space of all possible cell activations grows exponentially with the
number of modules, thus the capacity for representing locations
and environments is large.

So required property number one “unambiguously represent locations” i.e. many unique locations.

The idea is to use MODULO operation as a grid-module.

F.e. if we select 3 prime numbers 5,11,13 … we get unique location-coords, based on the reminders.

Second the loc-ids are tiled courtesy of mod-operation.

Third if you return back at a position you get the same unique ID.

Forth adding more primes increases the capacity.

Fifth starting from random value (position) gives us different sequence of reminders as we move around i.e. different maps

In [515]: [(i,list(i % np.array([5,11,13]))) for i in range(50,60)]                                                                                                          
Out[515]: 
[(50, [0, 6, 11]),
 (51, [1, 7, 12]),
 (52, [2, 8, 0]),
 (53, [3, 9, 1]),
 (54, [4, 10, 2]),
 (55, [0, 0, 3]),
 (56, [1, 1, 4]),
 (57, [2, 2, 5]),
 (58, [3, 3, 6]),
 (59, [4, 4, 7])]

The first question that remains is how to ENCODE/DECODE the location-coords to/from SDR ?

This is 1D representation of integers i.e. if used to create grid-layer beside scaling it to N-dims we have to solve the problem of nudging it based on feedback from the Sense layer.

Here is how we do that :

1. decode the sense-SDR
2. using the reminders (location-coord) as input to Chinese-reminders-theorem with pairwise-coprimes (primes fit the bill) recover the integer value

Why this could work ? For 1D-integer MOVE-commands could be +X and -X steps, so we do not feed the command directly but instead calculate :

Curr-1D-pos + Move = new-pos

and pass this value as input.

The structure is :

        |=----------------------------->|
   [ L4-TM ] <=-=> [grid mod] <=-=> [ L6-TM ]

Let say we start with 55 :

   init pos = 55
   loc = (0,0,3)   /modulo op/
   convert loc => SDR1
   pass SDR1 to L4 TM
   sense SDR2
   L4 TM : learn SDR1 => SDR2
   pass SDR2 to L6 TM
   move = +1
   pos = 56
   loc = (1,1,4)
   conv loc => SDR3
   predicted = predict(SDR2)  /TM of L6/
   merge = predicted * SDR3 => SDR4
   reminders = conv(SDR4)   
   new_pos = CRT(reminders)
   nudge : pos = new_pos
   L6 TM : learn SDR2 => SDR4

Yes, you can create an encoder for a scalar value (integer or real) that is based on the modulo operation. Primes are not required for the module widths, but they do give the greatest number of unique representations for a given number of representational bits. (More details.)

One of the design constraints for my implementation was constant sparsity (all numbers needed to have the same number of on-bits). This allows the encoding to be used directly as an input SDR in an HTM network.

1 Like

thanks, may be it could work… but its very hard to match the correct nbits&sparsity

f.e. for ~2000bits/2% , closest is 33-34 bit sparsity (primes) with 1988-2127 nbits

 sum(nprimes(start=2,count=34))                                                                                                                                                 
Out[525]: 2127

In [526]: sum(nprimes(2,33))                                                                                                                                                 
Out[526]: 1988

Also a problem with this encoding is that almost every encoded values will have common bits because of the small primes like 2,3 … i.e. it wont be distributed enough.

i.e. overlap will be decided not by how close values are, but by close they are to the primes.

Which may be OK if instead of having the Move command change linearly, somehow follow this new MOD-similarity pattern !! ?

For example for 2,3,5 the product is 30 , so if we pick step 30…

In [541]: [i % np.array([2,3,5]) for i in range(1,101,30)]                                                                                                                   
Out[541]: [array([1, 1, 1]), array([1, 1, 1]), array([1, 1, 1]), array([1, 1, 1])]

In [542]: [i % np.array([2,3,5]) for i in range(2,101,30)]                                                                                                                   
Out[542]: [array([0, 2, 2]), array([0, 2, 2]), array([0, 2, 2]), array([0, 2, 2])]

but this does not work for 33 primes … too big step

As I mentioned, I was considering a specific set of design constraints. If you relax those constraints (e.g. consider non-prime module widths), you can probably tailor the number of bits vs. sparsity in a more fine-grained manner.

You are correct, that in terms of duty cycle, the bits for the smaller primes will be used far more frequently than larger primes. If that’s a concern, then you might want to start with larger values of primes (e.g. use 41x43x47x51=4225911, rather than 2x3x5x7x11x13x17x19=4849845). The closer the primes are in relative magnitude, then the more similar the duty cycles of the individual bits will be.

you still need ~40 primes for 2000 nbits ;(

I’m currently looking into sparse similarity hashing, may be there is solution there !!

So, if we start with the number of bits that you want (nBits), then divide by the number of on bits (nOnBits=sparsity*nBits) to get an approximate start point to do your prime search.

nbits/(sparsity*nbits) = 1/sparsity.

With 2% sparsity, then we would begin looking for (2000*0.02=40) primes around (1/2% = 1/0.02 = 100/2 = 50).

The interesting thing (to me) is that while the number of primes needed is set by the number of active bits, the approximate magnitude of those primes is set entirely by the sparsity. (Lower sparsity implies larger prime values for module widths).

Alas, after playing with this for a bit, I think I see your point. Even if I start looking for 40 primes around 50, I can only pull the fifteen primes smaller than 50 and then try to fill the remaining modules with primes above 50. So, we still end up with primes from 2, 3, 5, … 131, 137, which give you 1988 total bits but with only 33 on bits (33/1988 = 1.66% sparsity). Adding the next prime gets you 34 on bits out of 2127 total bits (1.6% sparsity). Starting with larger primes only decreases the sparsity further.

Now that I look at it, the largest module you can create using only primes that will produce 2% sparsity is the following:

# Primes: 28
# Bits:   1371
# Reps:   3444735206
Sparsity: 0.020423
Largest Prime: 107

The sparsity continues to drop as the number of primes increases.

I suppose it would help to know what your application is. Or are you just experimenting with different encoder schemes?

1 Like

How does “sparse similarity hashing” work, @mraptor? The papers are too complex for me, currently, thanks.

i’m still looking … most of them like you said are hard to understand and use NN to work, which kill the case of using them.

Second from what I red so far they use them for large files. What I need is for short integer sequences.

This paper gives quick overview of SPHF (this as you probably got is for Dense not Sparse btw :slight_smile: ):

State of the Art in
Similarity Preserving Hashing Functions
V. Gayoso Martínez, F. Hernández Álvarez, and L. Hernández Encinas

http://digital.csic.es/bitstream/10261/135120/1/Similarity_preserving_Hashing_functions.pdf

This is sparse and understandable :

https://openreview.net/pdf?id=BJNXJgVKg

split to N random-bit buckets
output_bit_i = sum(bucket_i_bits) mod 2
1 Like

I was thinking along the lines of modification of your scheme, where instead of appending prime-number of bits you use the prime to split the bits in sections.

f.e. 2 splits the 2000bits in two 1000bit pieces, then the reminder tells us which one to choose…
3 splits it in 666,666,667 … and so on …

after that I’m stuck how to pick a bit !!!
random pick on every encoding wont work …

but now that I think may be you can pre-build a table how to pick them, then encoding/decoding will be consistent !!

Lookup table where bits are picked randomly excluding already taken bits.!!! I dunno … there is still the problem of overlap between neighbooring values !!

It becomes almost like a hash function with memory :wink:

SimHash is mentioned in one of those papers, it’s actually pretty easy. In case it’s helpful:

1 Like

Possibly, if I may…
I’m not sure “encoding a grid cell” or “using a grid cell encoding” (as input to place cells or HTM SMI ideas) is that hard of a problem to begin with.
Realistically generating a grid cell module from simple primitive signals, so that grid-cell-like patterns would arise all over the place in a “thousand brain” scenario, pretty much is (imho).

And I won’t intuitively assume there are so many opportunities for replacing a complex neural circuit with a simpler ALU circuit with equivalent functionnality. Besides, modulo ain’t a very cheap ALU instruction anyway.

That said, my math skills are limited. And I should take more time to ponder on such answers.
Yet, those were my two cents for tonight ^^’

1 Like

sorry, didn’t understood this sentence can you clarify. thanks

I guess what I wanted to say there is, what eludes me with those grid cells is “how do they arise”.
using the property that they spatially overlap in a module to find “where am I” seems like a relatively cheaply solved question already ?
I mean it could be cheaper sure if your prime number and hash idea worked, but not by an enormous amount ?

1 Like

thats what I was wondering too…
The Landmark/features i.e. Sense has to come first.
According to the theory CC receives Motor command not Locations.
So Locations are probably random at first … SOM(Kohenen maps) comes to mind as possible process to cluster as a grid. I mean Algorithmic-ally, neurosciantifically i’m totally in the blue … no idea!

The Grid mods can be totally random at first … it will work if there are more of them and they are uni-formally random…

I have this thought that you are defacto GRIDDING the Sensor space (not real space) which is limited i.e. it is possible to create a grid if you know f.e. that the max number of sensor detectors is X, you dont need grid module probably bigger than X/2 sensor-units.
Now you can randomly divide this space and let it work until it is adjusted by interactions…

I think the grid cells are already there i.e. they dont grow on demand (I think i red that not many new neurons grow in general it is mostly synapses that pop up and disappear)

If your application is only expected to encode short integer sequences, and you want 2% sparsity, then just use the 1371 bit (28 prime) encoding mentioned above. It is capable of 3.4 billion distinct integer encodings. Or do you have some other requirement for a 2000 bit wide encoder?

If the 2000 bit encoder is a hard requirement, then you could split the representation into N, (2000/N)-bit encoders that you can apply to the results of N integer valued hash functions F1(key), … FN(key). With this scheme you are not restricted to integer key values, but you will need to contrive the hash functions appropriately.