The coding of longer sequences in HTM SDRs

well, I think i’m starting to get it where you are going. that sounds hard to do in a online learning system where the network dynamics are more your enemy than your friend. either we learn to tame those dynamics or tacling this will be very hard.

on the other hand it sounds way easier to do in a offline mode so I think it may be a good start to not use LIF neurons directly but first make a simplified model that actually works and then attempt converting it into a LIF model.

(I’m assuming you are using LIF neurons here, I may be wrong)

3 Likes

my experience is that its usually much harder to visualize synapses than activations. they are too many and too random.

anyway.
I spent this afternoon working on this clustering algorithm that analizes synapses and sorts neurons into a square grid, clustered by similarity.
that way we can watch an activity bump moving around.
I cant tell if its useful but looks cool anyways.

ezgif-5-1d8e6e1074

heres the algorithm:


class ClusterPreview2D:
    def __init__(self, matrix, iterations=500, neighbors=10, recursive_neighbors=3):
        w = self.w = matrix.weights
        self.size = round(w.shape[1] ** 0.5 + 0.5)
        self.grid = {}
        self.grid_locations = []
        self.neighbors = [[] for _ in range(w.shape[1])]
        self.remap_indexes = list(range(w.shape[1]))

        self.preview_image = np.zeros((self.size, self.size), dtype=np.float32)
        self.preview_image_flat = self.preview_image.reshape(self.size ** 2)
        
        self.im = plt.imshow(self.preview_image, vmin=0, vmax=1)
        plt.pause(0.001)

        i = 0
        for i in range(w.shape[1]):
            x = i % self.size
            y = i // self.size
            self.grid_locations.append((x, y))
            self.grid[(x, y)] = i

        for i in range(w.shape[1]):
            col = np.argsort(w[:, i])
            if neighbors:
                dot = w[col[-neighbors:], :].sum(axis=0)
                self.neighbors[i] = [x for x in np.argsort(dot)[-neighbors:] if not i == x]
            
            if recursive_neighbors:
                self.neighbors[i].extend(col[-recursive_neighbors:])
            
        for _ in range(iterations):
            self.iteration()
        
        self.update_remap_indexes()
    

    def iteration(self):
        for i in range(self.w.shape[1]):
            
            neighbors = self.neighbors[i]
            n = len(neighbors)
            location = self.grid_locations[i]

            avg_x = 0
            avg_y = 0

            for buddy_i in neighbors:
                bx, by = self.grid_locations[buddy_i]

                avg_x += bx
                avg_y += by
            
            target_location = (round((avg_x / n + location[0] * 2) / 3),
                               round((avg_y / n + location[1] * 2) / 3))
            
            if target_location not in self.grid:
                continue

            self._swap(location, target_location)
    
    def update_remap_indexes(self):
        for i, (x, y) in enumerate(self.grid_locations):
            self.remap_indexes[i] = x + y * self.size

    def _swap(self, p1, p2):
        i1 = self.grid[p1]
        i2 = self.grid[p2]

        self.grid[p1] = i2
        self.grid[p2] = i1

        self.grid_locations[i1] = p2
        self.grid_locations[i2] = p1
    
    def preview_activity(self, activity):
        self.preview_image_flat[:] = 0
        self.preview_image_flat[self.remap_indexes] = activity.values
        self.preview_image_flat /= self.preview_image_flat.max()
        self.im.set_array(self.preview_image)
        plt.pause(0.01)

full code:



import numpy as np
import time
import matplotlib.pyplot as plt

np.random.seed(0)


class Synapses:
    def __init__(self, inputs, outputs, initial_sparsity=0.1):
        self.weights = (np.random.sample((inputs, outputs)) < initial_sparsity).astype(np.int8)

    def project(self, inputs, outputs, backwards=False):
        if backwards:
            inputs.values += self.weights[:, outputs.winners].sum(axis=1)
        else:
            outputs.values += self.weights[inputs.winners, :].sum(axis=0)

    def hebbian_update(self, inputs, outputs, factor=1):
        self.weights[inputs.winners[:, np.newaxis], outputs.winners] += factor


class Activation:
    def __init__(self, size):
        self.values = np.zeros(size, dtype=np.float32)
        self.boosts = np.zeros(size, dtype=np.float32)
        self.winners = np.zeros(0, dtype=np.int64)

    def one_hot(self, x):
        self.winners = np.array([x], dtype=np.int32)

    def kwta(self, k):
        self.winners = np.argsort(self.values + self.boosts)[-k:]

    def noise(self, f):
        self.values += np.random.sample(self.values.shape) * f

    def boost_update(self, decrease=1, recover=0.01):
        self.boosts *= recover
        self.boosts[self.winners] -= decrease

    def clear(self):
        self.values[:] = 0
        self.winners = np.zeros(0, dtype=np.int64)

class ClusterPreview2D:
    def __init__(self, matrix, iterations=500, neighbors=10, recursive_neighbors=3):
        w = self.w = matrix.weights
        self.size = round(w.shape[1] ** 0.5 + 0.5)
        self.grid = {}
        self.grid_locations = []
        self.neighbors = [[] for _ in range(w.shape[1])]
        self.remap_indexes = list(range(w.shape[1]))

        self.preview_image = np.zeros((self.size, self.size), dtype=np.float32)
        self.preview_image_flat = self.preview_image.reshape(self.size ** 2)
        
        self.im = plt.imshow(self.preview_image, vmin=0, vmax=1)
        plt.pause(0.001)

        i = 0
        for i in range(w.shape[1]):
            x = i % self.size
            y = i // self.size
            self.grid_locations.append((x, y))
            self.grid[(x, y)] = i

        for i in range(w.shape[1]):
            col = np.argsort(w[:, i])
            if neighbors:
                dot = w[col[-neighbors:], :].sum(axis=0)
                self.neighbors[i] = [x for x in np.argsort(dot)[-neighbors:] if not i == x]
            
            if recursive_neighbors:
                self.neighbors[i].extend(col[-recursive_neighbors:])
            
        for _ in range(iterations):
            self.iteration()
        
        self.update_remap_indexes()
    

    def iteration(self):
        for i in range(self.w.shape[1]):
            
            neighbors = self.neighbors[i]
            n = len(neighbors)
            location = self.grid_locations[i]

            avg_x = 0
            avg_y = 0

            for buddy_i in neighbors:
                bx, by = self.grid_locations[buddy_i]

                avg_x += bx
                avg_y += by
            
            target_location = (round((avg_x / n + location[0] * 2) / 3),
                               round((avg_y / n + location[1] * 2) / 3))
            
            if target_location not in self.grid:
                continue

            self._swap(location, target_location)
    
    def update_remap_indexes(self):
        for i, (x, y) in enumerate(self.grid_locations):
            self.remap_indexes[i] = x + y * self.size

    def _swap(self, p1, p2):
        i1 = self.grid[p1]
        i2 = self.grid[p2]

        self.grid[p1] = i2
        self.grid[p2] = i1

        self.grid_locations[i1] = p2
        self.grid_locations[i2] = p1
    
    def preview_activity(self, activity):
        self.preview_image_flat[:] = 0
        self.preview_image_flat[self.remap_indexes] = activity.values
        self.preview_image_flat /= self.preview_image_flat.max()
        self.im.set_array(self.preview_image)
        plt.pause(0.01)



class SequencePreddictor:
    def __init__(self, n_state, n_input, k):
        self.n_state = n_state
        self.n_input = n_input
        self.k = k
        self.encoding_matrix = Synapses(n_input, n_state,  initial_sparsity=n_state / n_input)
        self.state_matrix = Synapses(n_state, n_state,  initial_sparsity=0.5)

        self.new_state = Activation(n_state)
        self.previous_state = Activation(n_state)
        self.previous_state_reconst = Activation(n_state)
        self.input = Activation(n_input)
        self.input_reconst = Activation(n_input)

    def step(self, input_index, train=False):
        self.previous_state, self.new_state = self.new_state, self.previous_state
        self.new_state.clear()
        self.state_matrix.project(self.previous_state, self.new_state,)

        if input_index is None:
            self.input.one_hot(self.decode())
        else:
            self.input.one_hot(input_index)

        self.encoding_matrix.project(self.input, self.new_state)
        self.new_state.kwta(self.k)
        self.new_state.boost_update(10, 0.0001)

        if train:
            self.previous_state_reconst.clear()
            self.input_reconst.clear()
            self.state_matrix.project(self.previous_state_reconst, self.new_state, backwards=True)
            self.encoding_matrix.project(self.input_reconst, self.new_state, backwards=True)

            self.previous_state_reconst.kwta(self.k)
            self.input_reconst.kwta(1)

            # plus phase
            self.state_matrix.hebbian_update(self.previous_state, self.new_state, 1)
            self.encoding_matrix.hebbian_update(self.input, self.new_state, 1)

            # minus phase
            self.state_matrix.hebbian_update(self.previous_state_reconst, self.new_state, -1)
            self.encoding_matrix.hebbian_update(self.input_reconst, self.new_state, -1)

    def decode(self):
        self.input_reconst.clear()
        self.encoding_matrix.project(self.input_reconst, self.new_state, backwards=True)
        self.input_reconst.kwta(1)
        return self.input_reconst.winners[0]


input_data = '''Lorem ipsum dolor sit amet, consectetur adipiscing elit, 
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
 ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip 
 ex ea commodo consequat. Duis aute irure dolor in reprehenderit in 
 voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
  sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
   mollit anim id est laborum. 123456789abcdefghijk'''.replace('\n', '')


EPOCHS = 1

seq_pred = SequencePreddictor(1000, 256, k=10)

for i in range(EPOCHS):
    print('epoch', i)
    for ch in input_data:
        seq_pred.step(ord(ch), train=True)

preview = ClusterPreview2D(seq_pred.state_matrix)


while True:
    seq_pred.step(None)
    print(chr(seq_pred.decode()), end='', flush=True)
    preview.preview_activity(seq_pred.previous_state)
    time.sleep(0.3)


1 Like

Nice. But what kind of similarity are the neurons clustered by?

I’ll reference this reply to @cezar_t which I thought was a clear explanation too. And it nests two or three references to the structure I’m looking for in other threads too. These things get lost in the length of the thread(s):

So one way to think about this is to say I want to replicate the Brazilian paper, but do it for the clusters formed by language sequences.

For instance, take the nice clear sequence example @complyue posted for the sentence “The quick brown fox jumped over the old lazy dog” (pulled out of this simple network using the prompt “q”):

The “lines” for the words are easy to find in this case:

If we sorted them vertically for time, the y-axis would spell out the sentence.

But for just a single sentence there is no time gap between the words. If you think of the whole language though, there are going to be many more paths between the letters of the words than there are between letters across words. So, for instance, there will be far more paths over “t-h-e” and “q-u-i-c-k” than there will be between “…e-q…” which is the sequence between the words “the” and “quick”. So if we can find a way to cluster more tightly according to the number of paths through letters, we should be able to cluster the spikes for the words more tightly together, and distinguish the words by synchrony of spikes instead of just sequence of spikes.

That might be enough to take in one go. But to sketch how I see this extending beyond just word identification, take the slight extension of this by adding a variation of this sentence:

lnet.learn_words_as_sequence(
‘’’
the quick brown fox jumps over the old lazy dog the quick red fox jumps over the lazy old dog
‘’',
sp_width=(3, 20), # width of spike train: [n_columns, n_cells]
sp_thick=60, # thickness of spike train
)

So the “corpus” now is two sentences, which are the same except “brown” is replaced by “red” in the second. If we “prompt” this with “q” as before, we get two sequences being traced through the network. They both start with “quick”, but then we get “brown” and “red” being traced at the same time:

And then after that it becomes a mess, because the sequence after “red” continues with “f-o-x” even as the sequence with “brown” is still spiking “-w-n…”

The thing to do would be to use the tighter clustering of “brown” and “red” to separate them from “fox” which comes after.

And then, if we could manage that, it introduces the interesting possibility of other sentences which are shared by “brown” and “red”, which continue with words other than “fox”, but which might be used to generalize “brown” and “red”. So we would have essentially projected out a “class” of words which include “brown” and “red”, and can be used to generalize the sequences of either by informing them with the other. Such clusters would be of the same type “learned” currently by large language models.

I’ll stop there, to see if any of this makes sense to anyone.

2 Likes

found this state machine as example on a video and I think I should share because it has implications for how complex stochastic sequences could be encoded somehow.


from random import choice
import time

transitions = {
    'h': 'e',
    'r': 'e',
    'e': 'rl_',
    'l': 'lo',
    'o': '_',
    't': 'he',
    '_': 'th',
}

state = 'h'
while True:
    print(state, end='', flush=True)
    state = choice(transitions[state])
    time.sleep(0.1)

output:

he_herelllllo_te_herello_helo_telo_thererellllo_telo_here_the_the_there_hererellllo_therelo_he_thello_hellllo_the_tellllo_he_he_helllllllo_hello_herererelllo_he_te_thererere_the_he_te_telllo_helo_he_he_the_hello_he_here_te_te_thelo_helllo_telo_thelo_helo_terelo_telo_herelllo_he_he_therelo_hello_
2 Likes

The only state transitions coded are bigrams?

We can see it babbles in chains of bigram “words”.

But no one realistically tried to do that, that I know of. The thing to do was to generalize the states.

The first paper I remember seeing about generalizing the states from the data was by Ken Church.

This podcast talking to him might be interesting:

Conceptually statistical part of speech tagging would group, or generalize over, its transitions data-base somehow. One generalization might be:

{h, r} : e

Then if “h” and “r”, separately, had other transitions, you could use that to generalize those to both. So if you had:

h : i
r : o

You could also generalize over {h, r} to produce:

h : o
r : i

For years the state of the art was to learn “hidden” states like that, with “Hidden Markov Models”. Markov, meaning only the prior state mattered (no “attention”.) And “hidden” meaning the state {h, r} was not explicitly stated, but “hidden” in the data.

Hidden Markov Models defined the state-of-the-art or 20 years or so.

But abstracting classes for correspondences like {h, r} met problems like the contradiction problems I always talk about. An early solution to that was to keep the class information distributed as in (my early vector model, though that was a bit too early!.. or) Yoshua Begio’s “Neural Language Model”, circa 2003 or so. Say this one:

A Neural Probabilistic Language Model

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin; 3(Feb):1137-1155, 2003.
https://jmlr.csail.mit.edu/papers/v3/bengio03a.html

None of that really got going until the computer game industry boot-strapped the hardware though. Then Bengio’s stuff took off, and things were happening.

As a good summary of the state-of-the-art pre-transformers, I used to recommend this talk from ACL 2014:
ACL 2014 Tutorial: New Directions in Vector Space Models of Meaning
Edward Grefenstette

As I recall they go into a lot of stuff about such “embedding” generalizations, in the first part of the three part talk.

Your toy demo does show how much is possible from a simple, “most common letter”, bigram model, though. And generalizing over such examples on-the-fly, instead of Hidden Markov Model style “classes” or even Bengio style state vectors, is what I am pushing for in this thread.

1 Like