CycleEncoder and VarCycleEncoder

Here-s a short description of these two experimental scalars encoders I currently play with.

Their purpose is to produce an intermediate dense scalar vector of size N instead of a SDR (which is a bit vector of size N).
Reason of that it is:

  • any dense representation can be easily converted to a SDR by selecting top K values
  • two or more dense vectors can be added before thresholding the sum so it is a convenient way of compounding several scalars in a SDR representing all corresponding values together.
  • A scalar value can be attached to a SDR or other dense vector by simple addition, pretty much as position embeddings are added to token vectors in transformers. Allows compounding what with where in a single vector and where can be multidimensional.

Now how they work. Simplest is CycleEncoder, VarCycleEncoder it is similar but with variable cycles.

CycleEncoder:

Imagine N analog clocks (N is size of the vector) with a single arm each starting (=zero point) at a random different position. The period of all clocks is an instantiation parameter for the encoder. e.g. “real” clocks period is 12hours.

“Time” (the variable scalar value) is encoded by rotating all clocks in sync, for the specified time. In practice the random startup (zero) vector is added to the value N times and the result is divided modulo period.

The SDR is obtained by selecting the K clocks with their arm closest to 12 o’clock

One property of this encoder is the encoding repeats every period (therefore it is cyclic with a fixed period)

VarCycleEncoder is similar to the above one, except that besides a different (random) startup position each clock also has a random cycle period, uniformly spread within a configurable range.

This way the periodicity of the whole VarCycleEncoder becomes astronomically large.

4 Likes

So you are generating a bias term of the form:

B(t; omega, phi) = Beta cos(omega t + phi)

Where the period, omega, and phase, phi, can be global or local. Beta is the amplitude of course (for boosting). This is very similar to a theta/gamma frequency biasing term that I have considered in the past.

Might I also suggest that you allow the phase term phi to be a tunable parameter. That would allow certain groups of neurons to sync-up if they typically fire together in response to specific inputs.

The phase could also be tuned to encode temporal sequences by shifting which nodes are active at which phases. The ordering of phase values would correspond to the ordering of patterns, similar to what’s seen in place cells in the hippocampus.

2 Likes

I think the clock analogy is a bit confusing, it suggests there is some time involved here. It’s for any scalar value.

And it doesn’t use cosines, just modulo 1 to output values between 0 and 1

Here-s a code snippet:

# multi cycle encoder. As CycleEncoder yet its period varies linearly between a min and a max value
def VarCycleEncoder(sdr_size, period = 1, sdr_len = 0, seed = None):
    me = Self()
    if sdr_len == 0:
        # Arbitrary initial sdr length. 
        sdr_len = sdr_size // 4

    periods = np.linspace(period, period * math.e, sdr_size)

    r = np.linspace(0,1, num = sdr_size, endpoint = False, dtype = np.float32)
    r += 0.5 / sdr_size

    np.random.seed(seed)
    np.random.shuffle(r)
    np.random.shuffle(periods)

    @numba.njit
    def dense(x):
        return (r + x / periods) % 1

    @numba.njit
    def sdr(x, slen = sdr_len):
        # dense2sdr(v, k) simply outputs 1 for top k values in v, and 0 for all others
        # That's basic "sdr-ification" of the dense vector v 
        return dense2sdr(dense(x), slen)

    me.dense, me.sdr, me.periods, me.r = dense, sdr, periods, r
    return me

# Self is an empty class: 
class Self: pass

# dense2sdr(vector, k) just return the indices of highest k values in vector 
1 Like

The simpler CycleEncoder uses the same period (different phases) for all “clocks”:

def CycleEncoder(sdr_size, period = 1.0, sdr_len = 0, seed = None):
    # sdrs repeat each period. 
    me = Self()
    if sdr_len == 0:
        sdr_len = sdr_size // 4
    r = np.linspace(0, 1, num = sdr_size, endpoint = False, dtype = np.float32)
    r += 0.5 / sdr_size

    np.random.seed(seed)
    np.random.shuffle(r) # different seeds => different encodings

    @numba.njit
    def dense(x):
        return (r + x / period) % 1

    @numba.njit
    def sdr(x, slen = sdr_len):
        return dense2sdr(dense(x), slen)

    me.dense, me.sdr = dense, sdr
    return me

PS I considered (r+x/p)%1 is a cheaper way to have a “pulsating” value with given phase (r) and period (p) than using trigonometrics

1 Like

That’s not really pulsating/oscillating in the classical sense, more like wrapping around. But I think I see where you are going. Assuming that I’m properly understanding what that abused modulo operator is doing over the floats (% 1 is always 0 in C based languages), it seems like you are incrementing a floating point value by some small amount and then throwing away the integer portion. I suppose I could get the same behavior over the integers using something like this:

// result is an integer in [0,period-1]
[r,period](auto x) { return (r+x)%period; }

And if you really needed it to be a float, then it would just be:

// result is a float in [0,1]
[r,period](auto x) { return float((r+x)%period)/float(period-1); }

By closest to 12 o’clock, I’m assuming that you are looking for the K patterns with bits closest to the max or min bin? Depending on the size of K, it seems like that search might actually be the most expensive part of this calculation.

2 Likes

Yeah, “abused” is a good description :slight_smile: I was so amazed when I accidentally found python does it that I had to find a way to use it!

I guess you got the idea. “wrapping” is also a good term too, my English vocabulary is not the best.

And yes again to finding the top K elements in a list. What I do now is simple numpy’s argsort. I’ll stick to it for a while since lists-of-fixed length SDRs can be represented as rectangular 2D arrays in numpy and the numba llvm compiler does its performance tricks better around rectangular/cubic/etc. numpy arrays.

And the SDRs I plan to work with currently are relatively short.

1 Like

@cezar_t It seems that you are trying to get this type of effect, but instead you are select the top-k scalar values instead of having discrete values like I do. Your approach would guarantee constant k winners, but my approach has no such guarantee. In my bottom plot, this is reflected by the green plotted line of “total weight”.

I labeled each periodic cell’s parameters on the right-side of the top plot.

I am doing two things that you are suggesting:

  1. vary the period, here i’m doing it linearly
  2. vary the offset or “phase”, here i’m doing it randomly

I’m making my bin_size a fraction of the total period which should approximate the “closeness” to 12 o’clock. The size of the bin defines what we consider “close to 12”.

2 Likes

while we are at it, here’s the similarity plot for the VarCycleEncoder:
image

from matplotlib import pyplot as plt
import numpy as np
import math
np.random.seed(0)
from math import log2

# multi cycle encoder. As CycleEncoder yet its period varies linearly between a min and a max value

def dense2sdr(x, slen):
    return set(np.argsort(x)[-slen:])

def VarCycleEncoder(sdr_size, period=1, sdr_len=0, seed=None):
    me = Self()
    if sdr_len == 0:
        # Arbitrary initial sdr length.
        sdr_len = sdr_size // 4

    periods = np.linspace(period, period * math.e, sdr_size)

    r = np.linspace(0, 1, num=sdr_size, endpoint=False, dtype=np.float32)
    r += 0.5 / sdr_size

    np.random.seed(seed)
    np.random.shuffle(r)
    np.random.shuffle(periods)

    def dense(x):
        return (r + x / periods) % 1

    def sdr(x, slen=sdr_len):
        return dense2sdr(dense(x), slen)

    me.dense, me.sdr, me.periods, me.r = dense, sdr, periods, r
    return me

class Self:
    pass


encoder = VarCycleEncoder(100)

representations = []
X = 100
N = 10
for i in range(X * N):
    representations.append(encoder.sdr(i / X))

similarities = np.zeros((X * N, X * N), dtype=np.int32)

for i, a in enumerate(representations):
    for j, b in enumerate(representations):
        similarities[i, j] = len(a & b)

plt.imshow(similarities)
plt.show()

2 Likes

From what you describe your encoder is following a similar concept.

I also found a limited range of periods (the interval 1 to math.e was just a hunch) with as many intermediate steps as SDR size works pretty well. In the modulo 97 problem it works as is (without need to adjust the period)

I found useful the intermediate vector of scalars (instead of 0/1) since it allows compounding of few scalars by adding all their vectors and thresholding the sum. This works surprisingly well with CartPole problem where observation space is made up of four float values. Each with its own period and different random phasing.
The CycleEncoder in that source accepts several instead of a single scalar featured above. And works internally with integers instead of floats.

1 Like

That looks cool, thanks. I’ll have to run it myself.

PS You can modify the size of that yellow (overlap) band by changing the period:

encoder = VarCycleEncoder(100, period = 10)
1 Like

This is from a spreadsheet hack approximating what I think you mean. Size=10 top=1

Phasing effects may create some unintended outcomes as to the top n with varying amplitudes involved if all are time locked / synced

Some inputs are never in the top n and create a low n pattern

High pattern output from top 1

If you reverse the perception as to what is in sync the model becomes a type of linear wave effect rather than static cyclical.

2 Likes

The CycleEncoder is much smoother, interesting though the modulo97 problem “dislikes” it.
Figure_37
The above is a SDR size 100 with period 37
And below one with SDR size 300, same period:
Figure_sdr300

Thanks again @JarvisGoBrr, this visualizer is great!

1 Like

An interesting peculiarity here is when you pick a cycle width period of 256 (byte) instead of 1.0 (float) then one can scale values into [0,255] interval instead of (0…1).

Which would:

  • get rid of the relatively expensive modulo division - the inherent integer overflow does exactly that
  • integer addition in itself is algorithmically cheaper/faster than division.
  • shorten memory usage for the dense representation.

This opens potential for hardware accelerating on specialized low power devices.

1 Like

I think this is more or less what is going on in DL/LLM when they talk about quantization. They are effectively mapping floating point weights and activations onto 16-bit, or 8-bit integers to reduce the storage / bandwidth cost and try to push more mat-muls & adds through the system with each clock cycle. You can do a search for intrinsic instruction sets on each of the respective CPU and GPU platforms to see how they are trying to support these types of operations at the lowest levels.

Update: Some links for your reference.

2 Likes

I’m kinda curious to see how my Grid Cell Inspired Encoder would look in your visualizations. Since it explicitly encodes modulo remainders with respect to a set of chosen prime values, there should not be any repeating periods for which the encoded bits would align to skew the overlap distribution.

1 Like

There are several visualizations here and in related topics.

This is the only one I used CycleEncoder and VarCycleEncoder - #8 by JarvisGoBrr - which is quite simple if you look into code (most of that is the encoder itself)

1 Like

@CollinsEM If I understand your pseudocode correctly, this is your encoder scaled to the unit interval. This is with the primes: \{2, 3, 5, 7, 11\}.

2 Likes