Grid Cell Inspired Scalar Encoder

So, here is a quick and dirty implementation in JavaScript that highlights most of the properties that I mentioned in the previous post.

class IntegerEncoder {
  constructor(numPrimes) {
    this.numPrimes = numPrimes;
    // Generate N primes to serve as a basis set
    this.primes = [2];
    for (var i=3; this.primes.length < numPrimes; ++i) {
      if (this.primes.reduce((a,b) => a && (i%b), true)) this.primes.push(i);
    }
    this.numBits = this.primes.reduce((a,b) => a+b, 0);
    this.numReps = this.primes.reduce((a,b) => a*b, 1);
    this.sparsity = this.numPrimes/this.numBits;
    this.data = new Uint8Array(this.numBits);
    this.data.fill(0);
  }
  encode(num) {
    var N = parseInt(num);
    this.data.fill(0);
    var idx = 0;
    this.primes.forEach( function(p) {
      this.data[idx + N%p] = 1;
      idx += p;
    }, this );
  }
};

for (var n=1; n<10; ++n) {
  var S = new IntegerEncoder(n);
  console.log("primes:   " + S.primes);
  console.log("numBits:  " + S.numBits);
  console.log("numReps:  " + S.numReps);
  console.log("sparsity: " + S.sparsity);
  var N = parseInt(S.numReps*Math.random());
  S.encode(N);
  console.log(N, S.data.join(''));
}
1 Like

I had a similar idea after watching the talk, Circuitry and Mathematical Codes for Navigation in the Brain, given by Ila Fiete about grid cells. In her talk, she mentions modulo based number system (residue number system) which is like an abstraction of how grid cells work. To explore the advantages of modulo based number system for arithmetic operations that she talks about, I wrote a simple code which converts a number between fixed based number system and modulo based number system. After having residue values, it is straightforward to make an SDR out of it. The problem I realized later is that in encoding process it is desired to have similar SDRs (high overlap) for semantically similar entities, however, with this approach, even an increment give completely different SDR. Or I am missing something. This method may have some other benefits that I have not foreseen yet, though.

Here is the code I wrote:

    from functools import reduce
    import numpy as np


    def gcd(a, b):
        """Return greatest common divisor using Euclid's Algorithm."""
        while b:
            a, b = b, a % b
        return a


    def lcm(a, b):
        """Return lowest common multiple."""
        return a * b // gcd(a, b)


    def lcmm(*args):
        """Return lcm of args."""
        return reduce(lcm, args)


    class ResidueNumberSystem:
        def __init__(self, *modulos):
            self.modulos = sorted(modulos)
            self.least_common_multiple = lcmm(*self.modulos)
            self.weights = self.find_weights()

        def find_weights(self):
            weights = []
            for i in range(len(self.modulos)):
                m = self.modulos[i]
                rest = self.modulos[:i] + self.modulos[i + 1:]
                M = lcmm(*rest)
                for j in range(1, self.least_common_multiple // M + 1):
                    if (M * j) % m == 1:
                        weights.append(M * j)
                        break
            return weights

        def encode(self, n):
            digits = [n % m for m in self.modulos]
            return digits

        def decode(self, digits):
            n = 0
            for d, w in zip(digits, self.weights):
                n += d * w
            return int(n) % self.least_common_multiple

        def to_sdr(self, digits):
            sdr = np.zeros(sum(self.modulos), dtype=np.uint8)
            offset = 0
            for i in range(len(self.modulos)):
                idx = digits[i] + offset
                sdr[idx] = 1
                offset += self.modulos[i]

            return sdr

        def tabulate(self):
            header_format = '{:8s}' + '{:4d}' * len(self.modulos)
            trow_format = '{:8d}' + '{:4d}' * len(self.modulos)
            print(header_format.format('n', *self.modulos))
            print('=' * (8 * (len(self.modulos) + 1)))
            for i in range(self.least_common_multiple):
                print(trow_format.format(i, *self.encode(i)))


    class RN:
        def __init__(self, rns, digits):
            self.digits = self.normalize(rns.modulos, digits)
            self.rns = rns

        @staticmethod
        def normalize(modulos, digits):
            num_modulos = len(modulos)
            num_digits = len(digits)
            digit_array = np.zeros(num_modulos, dtype=np.int8)
            digit_array[num_modulos - num_digits:] = digits
            for i in range(num_modulos):
                digit_array[i] = digit_array[i] % modulos[i]

            return digit_array

        def __str__(self):
            return '{} % {}'.format(str(tuple(self.digits)), str(tuple(self.rns.modulos)))

        def __repr__(self):
            return 'RN(ResidueNumberSystem({}), {}'.format(str(tuple(self.rns.modulos)), str(tuple(self.digits)))

        def __neg__(self):
            return RN(self.rns, -1 * self.digits)

        def __pos__(self):
            return self

        def __abs__(self):
            return self

        def __invert__(self):
            x = np.array(self.rns.modulos, dtype=np.int8) - self.digits
            return RN(self.rns, x)

        def __int__(self):
            n = self.rns.decode(self.digits)
            return n

        def __add__(self, other):
            if isinstance(other, int):
                other = np.full(len(self.digits), other)
            elif isinstance(other, (list, tuple)):
                other = np.array(other, dtype=np.uint8)
            elif isinstance(other, self.__class__):
                other = other.digits

            x = self.digits + other
            for i in range(len(x)):
                x[i] = x[i] % self.rns.modulos[i]

            return RN(self.rns, x)

        def __radd__(self, other):
            return self + other

        def __sub__(self, other):
            if isinstance(other, int):
                other = np.full(len(self.digits), other)
            elif isinstance(other, (list, tuple)):
                other = np.array(other, dtype=np.uint8)
            elif isinstance(other, self.__class__):
                other = other.digits

            x = self.digits - other
            for i in range(len(x)):
                x[i] = x[i] % self.rns.modulos[i]

            return RN(self.rns, x)

        def __rsub__(self, other):
            return -self + other

        def __mul__(self, other):
            if isinstance(other, (list, tuple)):
                other = np.array(other, dtype=np.uint8)
            elif isinstance(other, self.__class__):
                other = other.digits

            x = self.digits * other
            for i in range(len(x)):
                x[i] = x[i] % self.rns.modulos[i]

            return RN(self.rns, x)

        def __rmul__(self, other):
            return self * other

        def __pow__(self, n):
            return RN(self.rns, self.digits ** n)

        def __eq__(self, other):
            return np.all(self.digits == other.digits)

        def __neq__(self, other):
            return np.any(self.digits != other.digits)

        def __iadd__(self, other):
            return self + other

        def __isub__(self, other):
            return self - other

        def __imul__(self, other):
            return self * other

        def __idiv__(self, other):
            return self / other

        def __ipow__(self, other):
            return self ** other


    if __name__ == '__main__':
        rns = ResidueNumberSystem(2, 3, 5)
        n = 10
        digits = rns.encode(n)
        sdr = rns.to_sdr(digits)
        print(n, digits, sdr)
        n = 11
        digits = rns.encode(n)
        sdr = rns.to_sdr(digits)
        print(n, digits, sdr)
7 Likes

Thanks for the link to that talk. It was very informative. She did a good job of explaining a few things about grid cells that have been bugging me for the past few weeks. If what she says can be generalized, there is a whole field of numerical mathematics that could stand to benefit from these insights. I will definitely have to give it some serious thought.

You are correct about my encoding scheme. There is not much in the way of intrinsic semantic meaning in the bits other than perhaps the shared modulo values (i.e. is a number odd or even, divisible by 3, 5, etc.). However, there may be some other potential advantages, two of which I discovered while watching the the talk you linked to: 1) having a robust estimate of what time it is encoded in an SDR (see ~41:30 into the video), and 2) being able to do simple arithmetic directly with SDRs (see carry free arithmetic example (~45:30 into video).

Well, some of my questions have now been answered, but like all good science, I am left with many more questions that I must now try to wrestle with. I will let you know if I come up with anything useful or interesting.

3 Likes

So, I’ve been trying to develop an intuition for the behavior of the grid cells. My thoughts have clarified a bit since watching the video shared by @bdsaglam. In the talk, the presenter described the grid like structure in the cells arising from local inhibition fields around the currently active neurons in the cortical layer. The resulting field of overlapping inhibition spheres naturally give rise to a triangular lattice pattern of cell activations. When the sensor moves, the pattern of active cells in the layer all shift together in roughly the same direction. I imagine this shift occurring in a manner similar to how a flock of birds or school of fish move en masse in response to subtle variations in the movement of its constituent parts. The tell-tale repetitive grid pattern observed in the lab rats are a result of these cells periodically firing as the shifting pattern realigns with the original pattern, which will occur at regularly spaced intervals as the sensor is shifted (or rotated). Different grid cell modules will have different responses (in phase, period, and orientation), which then gives us multiple populations of cells that can be correlated to obtain unique location/orientation representations.

So, this is where I am at the moment. I’m currently trying to work up a visualization of this inhibition generated pattern, but I’m also very interested in the necessary input requirements for shifting the grid cell representation. The presenter in the video seemed to think that the network was something like a self-organizing map (EDIT: actually it’s a Hopfield network.) (see around here in video), but instead of having a finite set of stable fixed point attractor states, there could exist continuous manifolds of stable states with similar attraction strengths (Lyapunov function with a flat valey in K dimensions). One could potentially move along these states like walking along the floor of a canyon rather than having to climb up out of a local minimum valley before descending into another. These manifolds would basically represent all of the known transitions from one state to another. (e.g. Teleporting from one location to another is currently not possible, so we don’t have a convenient way to represent how such a transition would be able to properly update the internal representation.)

4 Likes

I found an awesome looking new (April 2018) research paper for “Examining the contribution of grid cells to place cell formation within the context of place cell heterogeneity.”

I’m not sure how this influences your model, but it very much relates to this video:

@rhyolight’s video was my primary inspiration for the scalar encoder that I described in the original post. My thinking on grid cells have moved on since then, but his visualizations contributed to the genesis of the idea.

I’ve not yet gotten to the point where I’m thinking about the relationship of grid cells to place cells. I’ll take a look at the linked paper later to see if it sparks any additional insights.

As a side note, the reason I choose prime numbers for the basis periods was because I knew there would not be any representation collisions in the range that I gave. I’m fairly certain that the effective range (without collision) would probably be the least common multiple of the basis periods. For example:

periods: 2,4
numBits: 6
numReps: 4
sparsity: 2/6 ~ 0.33

periods: 2,3,4
numBits: 9
numReps: 12
sparsity: 3/9 ~ 0.33

It turns out that these types of numbering systems have been well studied in the context of electrical and computer engineering. They are referred to as Residue Number Systems which were formulated as an alternative to binary radix representations to address some of the issues related to the accuracy of floating point operations with finite number of bits. (See here for some reference material.) Now, in their scheme they are not using fixed sparsity representation, but rather binary encodings of the modulo digits. This allows them to use less bits for the final representation. For the purposes of HTM however, any operations on the SDR representation (whether it be mathematical or just transitional) would need to result from a learned set of weights which would bias the inputs to the grid cell representation in such a way as to cause it to shift in a specific direction and by the appropriate amount (modulo each of the basis periods).

2 Likes

Eric, after reading your reply I thought of this paper. I’m not sure whether it makes sense in regards to how grid cells encode a unique address (for addressing place cell memory described in the other paper) but it seemed worth mentioning:

I found something else suggesting a scale that almost doubles:

The lengths of grids recorded from different dorsoventral locations in each rat show a tendency to cluster. In each rat the ratio of the shortest and second-shortest cluster is a fixed non-integer ratio approximately equal to 1.7

https://www.researchgate.net/profile/Caswell_Barry/publication/6345505_Experience-dependent_rescaling_of_entorhinal_grids/links/0912f50c613ade80bb000000.pdf

Since a scale change over 2.0 would cause severe ambiguities it makes sense to stay a little bit below that amount.

I also have to wonder whether (as with electronic voltage meters reading a high resistance test point) the measuring device can load the circuit in a way that causes a less than perfect measurement, possibly changing the frequency of the circuit. I recall earliest estimates of approximately 1.4 and now that probes have become more accurate the number is greater.

1 Like

Starting from the observation that raising animals in different environments shapes the early visual system (H&W work with kittens in impoverished environments) it is possible that the difference is due to some factor in the researchers environment.

1 Like

Excellent thought Mark! I did not think of that possibility. It makes sense that if an animal is born with ten or so grid modules and the largest environment they ever experienced is the lab room they were raised inside then the scale might be less than for an animal that needs to (using the same number of modules) map much larger areas.

Anyhow, I compared prime numbers with the 1.7 estimate and binary:

Module=   1   2    3    4    5     6     7     8     9      10 
-----------------------------------------------------------------
Primes=   2   3    5    7    11    13    17    19,   23     29 
*1.7  =   2   3.4  5.8  9.8  16.7  28.4  48.3  82.1  139.5  237.2
*2.0  =   2   4    8    16   32    64    128   256   512    1024

Even at 1.7 the result ended up close to a binary representation. Prime numbers did not work out very well.

1 Like

The following would be true for finding patterns in noisy data like in a busy restaurant with music and many people talking at the same time:

In the earlier mentioned “Heterogeneity in hippocampal place coding” paper the genetic variation of place cells very much resemble a digital RAM chip, powers of two would then be ideal. As few as 10 bits would be required.

It might be that sparse coding is best for complex sensory inputs, while for memory addressing it’s best to use the fewest bits possible.

2 Likes

Oh, right. I keep forgetting that I don’t always need to maintain both sparsity and unique representations of input. Using a spatial pooler after an encoder like you said would probably be better, especially a spatial pooler with multiple of those encoders as inputs.

2 Likes

I was thinking of a 10 (grid modules) to 1024 (place cell memory) binary decoder for providing the 83 pA of injection current needed to access the 1024 place cell memory locations. As in a computer RAM having 16, 32 or 64 bits of data per address there can be more than one place cell per address location. And in addition to 10 bits (or so) of grid module location encoding: more bits can be used to select other properties such as what the map is for so that many maps can be stored in the same memory space.

The following video shows a 3 to 8 decoder being used to address 8x4 electronic ROM cells.

The reason for there being many grid cells per module (instead of only one) may have to do with ahead of time planning routes through an environment, as opposed to physically having to be there to activate associated place cells. We would then be able to visualize places along a route we were in our mind traveling through, or like an author of a fictional book created in our imagination and does not actually exist.

This is my best guess for what might be happening. In the future some or all of it could turn out to be wrong, but this seems to be a possibility worth exploring.

I now need help answering a question related to spatial pooling.

The “Heterogeneity in hippocampal place coding” paper mentions that before exposure to a novel environment cells that go on to form place fields are more prone to “burst” in response to current injection. Is this the same as (or evidence of) HTM minicolumn cell “bursting”?

The subthreshold membrane potential of an active place cell (which represents its net input at the soma) varies in a hill-like fashion as a function of location, while that of a silent cell is essentially flat [13••, 63, 64]. On its own, this finding could support a model in which simple summation of spatially modulated input is followed by a thresholding processes, driving place cells to either fire in a spatially restricted manner or remain silent. In such a scheme, silent cells would receive either spatially homogenous or weak input. Yet, several observations challenge such a model. Each CA1 pyramidal neuron is thought to receive spatial input at thousands of synapses across its dendritic compartments [65]. Simple summation of these inputs would likely result in multiple membrane potential peaks across space rather than the unimodal hill typically observed, and silent cells would still be expected to have membrane potential peaks, albeit smaller ones that did not reach threshold [64]. Additionally, differences in the intrinsic biophysical properties of active and silent cells appear to predict the initial establishment of a place field. Before exposure to a novel environment, cells that go on to form place fields are more prone to burst in response to current injection and during exploration the firing thresholds of active cells are significantly lower than those of silent cells [63]. Perhaps the strongest argument against the simple summation hypothesis is that injection of uniform positive current can instantly convert silent cells into active place cells that are indistinguishable from typical place cells in their subthreshold membrane potential profiles [64] (Figure 3).

The second part shown in bold text is to help explain what I found similar to how digital RAM chips enable a given address location to store a memory. This seems to be something else in addition to spatial pooling, but of course my best guess could be incorrect and I must welcome any additional information to help make sense of this part of the system too.

I can only reply based on layer 5.

I’ve never seen bursting used in journal articles to mean minicolumn bursting. Usually, a cell requires stronger somatic injection to burst or somatic injection alongside distal apical injection. Since a temporal memory minicolumn bursts when none of its cells receive sufficient predictive input (on the distal basal dendrite), a minicolumn bursts when its cells are less stimulated, which is the opposite of the extra stimulation needed for rapid spiking bursting.
It probably depends on methods like anesthetic a lot, but some studies found L5 cells that burst repetitively with weaker somatic current injection but with stronger injection, they burst once and then switch to regular spiking. That’s a potential way to argue that bursting cells are receiving less input, not more input, but it seems like a stretch to connect that to minicolumn bursting.

Another thing to consider is whether or not hippocampus has minicolumns. If it doesn’t, that doesn’t mean it doesn’t function the same way, with a predicted cell inhibited some other cells, but it would be another thing to find evidence for.

One interesting thing is that bursting might be involved in synaptic plasticity, consistent with bursty cells learning to respond to the new locations.

I can link some articles about bursting in L5 if you want. Bursting might work similarly in all pyramidal cells, including in hipocampus, although I don’t know that for a fact.

Maybe those cells just have lower thresholds in general, whether because some cells in CA1 have lower thresholds than others or because some cell classes have lower thresholds than others. Do they specify the pyramidal cell class or layer? Indistinguishable might just mean below the statistical significance cutoff. I worry that which cells become place fields isn’t arbitrary (like would be required for selecting cells for memory formation or place field formation), because they say the cells that go on to form place fields were more likely to burst beforehand. Depending on how long beforehand, it might be unlikely that those cells were already chosen to form place fields and more likely that they’re just a different group with a different role or just are more responsive because of random variations in neuron thresholds or burstiness.

One way around that is, maybe those bursty cells are the ones which haven’t been assigned a place field yet, so they are more responsive for a functional reason. (Not that some cells being more responsive than others isn’t useful. A distribution of SDR sparsities might allow better flexibility and initially learning things more coarsely with the less selective cells and over time learning things more detailed with the more selective cells.)

1 Like

It seems that way for CA1, but no layer .

It probably depends on methods like anesthetic a lot,

I agree. This is from the paper that was referenced:

https://www.sciencedirect.com/science/article/pii/S0896627311001966

Pre-exploration Bursting Quantification

Immediately upon breaking into the neuron and achieving the whole-cell recording configuration, while the animal was anesthetized, we injected a series of depolarizing current steps. For each step, the current started at 0 nA, lasted for 300 ms, then returned to zero. The first depolarizing step was 0.1 or 0.2 nA and was increased in increments of 0.1 or 0.2 nA, respectively, for successive steps. The firing pattern of the first step that evoked ≥5 APs was used to determine the propensity to burst and is shown for each cell in Figure 5. The degree of bursting was defined as the fraction of all APs in the firing pattern that occurred in bursts of ≥2 APs with ISIs ≤10 ms. An exception was made for cell 1 which (1) fired some spontaneous APs, thus for that cell the first step value that evoked a consistent firing pattern was used, and (2) displayed a CS-like burst at the beginning of each step of 0.2 nA or more, thus the APs in that burst were counted as bursts even though the ISIs were > 10 ms.

but some studies found L5 cells that burst repetitively with weaker somatic current injection but with stronger injection, they burst once and then switch to regular spiking. That’s a potential way to argue that bursting cells are receiving less input, not more input, but it seems like a stretch to connect that to minicolumn bursting.

My first thought was that the CA1 place cells burst at a lower current level due to not yet having stored a memory, or in other words over time received less input. While exploring the novel environment the cell then received enough of the right input to store one. In this case being more prone to burst is very similar to what HTM theory defines.

Considering how new neurons seem to be added as needed to CA1 it seems possible that the new place cells connect back to a (entorhinal cortex only) grid cell encoded address bus, as well as to signals that define shape, color, and other factors that make places unique. This way CA1 does not have to start off with billions or trillions of place cells where most would never be addressed in an animal’s lifetime. One way to accomplish the sensing of both (action potential) 1 or (no action potential) 0 state is the type of synapse that is used to form a permanent connection, and/or the associated basket cell to inhibit the neuron when an out of place AP is sensed from connections that should all be 0, no AP’s or else it’s the wrong address.

Does that make sense to you too?

There’s actually just one layer of pyramidal cells:

But it has sublayers:

(Kenji Mizuseki, Kamran Diba, Eva Pastalkova, and György Buzsáki, 2011)

It says that deep CA1 pyramidal cells fire more rapidly, burst more frequently and more often have place cells than the superficial pyramidal cells.

It seems likely that the cells which were going to be the place cells for a novel environment bursted more easily because they were deeper cells, where there are more place cells.

I didn’t read the entirety of any of the articles in this discussion so it might be worth making sure there isn’t a way around that.

Even if the article you’re drawing on doesn’t work for this, that still makes a lot of sense. Homeostatic mechanisms would probably lead to something like that effect. Not necessarily bursting more, but at least reinforcing synapses more, or having a lower threshold, or something like that. Since firing in the hippocampus is very sparse, cells without a place field would basically never fire, so homeostatic mechanisms might have a huge effect (unless they shouldn’t). That depends on how place fields are formed in the first place, though. They might be pre-defined based on random inputs from grid cells.

I’m not sure if neurogenesis happens in CA1 except with injury. I haven’t researched that, though. There’s a lot more neurogenesis in the dentate gyrus, at least, and more with exercise (maybe for more place fields with more exploration). I might be remembering wrong.

That mostly makes sense, and RAM is a good comparison, but it can’t work exactly like RAM. A single bit can’t make a difference since there is noise and multiple place cells respond because place fields overlap.
I’m not sure about this, but I think basket cells target densely (more than 50% like would be needed in binary). Lateral competition like in the spatial pooler could work. If there are some 1s which should be 0s for the cell to fire, other cells will outcompete it.
Although, there’s tons of redundancy maybe (thousands of grid cells rather than tens), so thinking of it as an exact binary number where every bit matters might be right.

1 Like

A direct correlation has been documented in animals that hide their food in multiple hard to later find locations, and the spatial mapping region of London taxi drivers:

http://www.pnas.org/content/97/8/4398

From what I have over the years found in research papers this neurogenesis is something that must be accounted for in neuroscientific models.

I now think the difference is due to a biological (would still qualify as a) RAM having the advantage of not being burned out like a digital RAM would be from activation of more than one memory location at a time. Also, the biological circuit would need at least a small amount of pyramidal cell activation, as opposed to a digital circuit where the first address is all zeros, totally inactive.

Digital RAM requires exact powers of 2, while this biological circuit would require a little less. The range of 1.4 to 1.7 should work very well. When at a given place it seems we are at the same time aware of the place it’s in such as the city or country and nearby places it’s associated with. The way grids move when the head angle changes makes sense too.

Being as selective as a digital RAM would for us be a disadvantage. This would be like only being able to draw maps where only one place can be shown, it’s not even a map. Normally boundary/border/barrier cells and place cells only become active when an animal is in the vicinity, which is something that I have not yet accounted for in the model I’m working on that currently maps everything in its environment, which works fine for a small arena but very large areas would become unnecessarily overwhelming.

I can only find supporting evidence and feel close to another “Eureka!!” moment. In either case I have to thank Eric Collins for having starting this very exciting thread.

2 Likes

may be it will be good to randomize the bits i.e. randomly remap/permute the bits

1 Like