The coding of longer sequences in HTM SDRs

I don’t understand why most activation immediately leaves the prompt states and reverberates in the rest of the network only.

Really, a word should act as a single “state” (because the cells of its paths should tie it together, and away from them individually.) So prompting that state should result in the activation of that state looping repeatedly through itself, as a whole “state”, and back through the rest of the network. The rest of the network should be cycling activity too. But the cycle should be longer, and certainly not in and out of the same letters only. Currently it seems to stay tightly looping around single letters.

I’m thinking that limiting chains of activation to “paths” should calm that. There’s no reason for a letter state to loop dominantly to itself if synapse paths are locked to full paths, which go off through other letters.

I don’t see that single letter looping behaviour, for the rest of the network only, moderating yet, though.

1 Like

Sure, I’ll add an arg of “train-band-width” to implement this, what’s the default/typical value you’d suggest?

I feel the same, and suspect there may still be subtle bugs prone, I’ll create some “unit-testing” NBs to verify results per basic scenarios I can imagine, you’re welcome to suggest any such basic verifications off the top of your head.

It’s a most simple idea off the top of my head, definitely not well thought. My rough goal is to suppress spikes per other paths other than the prompted one, to help the network concentrating on the prompted path.

Please keep outputting your ideas about the prompting algorithm, your design decisions will have precedence over mine.

2 Likes

Of 100 cells in a column… 30? Once we get sensible behaviour we can just try some numbers and see the effect.

Yes, good idea.

Is it possible to ramp up the magnification and “desk check” the plots, cell by cell, for a few inputs. It would be interesting just to trace an exact path of activation.

Ha. Precedence I should not have :-b I just have a vision for what the signal should be. I don’t have any special sense how to get that signal. The vision is that sequences of states which share beginning and end points should synchronize. We just have to design the network to tie those sequences together, and be sensitive to the fact of shared beginning and end points.

I understand the desire to directly address the activity we do not want. But the behaviour must emerge from the network. It must all tie back to this idea that groups of paths which share beginning and end points, should have a tendency to share behaviour. But how to make the network express that shared behaviour, anyone else’s idea is probably as good as mine. So long as the idea addresses that goal of making the network express shared behaviour because of shared sequence beginning and end points.

3 Likes

This conversation sounds interesting, I dont think I’ll catch up to all 100 replies up to now but I just want to leave a note that it seems like recurrent networks trained to reconstruct its own past state and the present input from its current state just naturally learn to encode sequences.

there doesn’t seem to be any more to it.

here’s the simplest network I could make that surprisingly is able to more or less learn sequences.



import numpy as np
import time
np.random.seed(0)

class Synapses:
    def __init__(self, inputs, outputs, initial_sparsity=0.1):
        self.weights = (np.random.sample((inputs, outputs)) < initial_sparsity).astype(np.int8)
    
    def project(self, inputs, outputs, backwards=False):
        if backwards:
            inputs.values += self.weights[:, outputs.winners].sum(axis=1)
        else:
            outputs.values += self.weights[inputs.winners, :].sum(axis=0)
    
    def hebbian_update(self, inputs, outputs, factor=1):
        self.weights[inputs.winners[:, np.newaxis], outputs.winners] += factor

class Activation:
    def __init__(self, size):
        self.values = np.zeros(size, dtype=np.float32)
        self.boosts = np.zeros(size, dtype=np.float32)
        self.winners = np.zeros(0, dtype=np.int64)
    
    def one_hot(self, x):
        self.winners = np.array([x], dtype=np.int32)

    def kwta(self, k):
        self.winners = np.argsort(self.values + self.boosts)[-k:]
    
    def noise(self, f):
        self.values += np.random.sample(self.values.shape) * f
    
    def boost_update(self, decrease=1, recover=0.01):
        self.boosts *= recover
        self.boosts[self.winners] -= decrease
    
    def clear(self):
        self.values[:] = 0
        self.winners = np.zeros(0, dtype=np.int64)

    

class SequencePreddictor:
    def __init__(self, n_state, n_input, k):
        self.n_state = n_state
        self.n_input = n_input
        self.k = k
        self.encoding_matrix = Synapses(n_input, n_state,  initial_sparsity=n_state / n_input)
        self.state_matrix = Synapses(n_state, n_state,  initial_sparsity=0.5)

        self.new_state = Activation(n_state)
        self.previous_state = Activation(n_state)
        self.previous_state_reconst = Activation(n_state)
        self.input = Activation(n_input)
        self.input_reconst = Activation(n_input)
    
    def step(self, input_index, train=False):
        self.previous_state, self.new_state = self.new_state, self.previous_state
        self.new_state.clear()
        self.state_matrix.project(self.previous_state, self.new_state,)

        if input_index is None:
            self.input.one_hot(self.decode())
        else:
            self.input.one_hot(input_index)

        self.encoding_matrix.project(self.input, self.new_state)
        self.new_state.noise(2)
        self.new_state.kwta(self.k)
        # self.new_state.boost_update(10, 0.0001)


        if train:
            self.previous_state_reconst.clear()
            self.input_reconst.clear()
            self.state_matrix.project(self.previous_state_reconst, self.new_state, backwards=True)
            self.encoding_matrix.project(self.input_reconst, self.new_state, backwards=True)

            self.previous_state_reconst.kwta(self.k)
            self.input_reconst.kwta(1)
            
            # plus phase
            self.state_matrix.hebbian_update(self.previous_state, self.new_state, 1)
            self.encoding_matrix.hebbian_update(self.input, self.new_state, 1)

            # minus phase
            self.state_matrix.hebbian_update(self.previous_state_reconst, self.new_state, -1)
            self.encoding_matrix.hebbian_update(self.input_reconst, self.new_state, -1)
    
    def decode(self):
        self.input_reconst.clear()
        self.encoding_matrix.project(self.input_reconst, self.new_state, backwards=True)
        self.input_reconst.kwta(1)
        return self.input_reconst.winners[0]


input_data = '''Lorem ipsum dolor sit amet, consectetur adipiscing elit, 
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
 ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip 
 ex ea commodo consequat. Duis aute irure dolor in reprehenderit in 
 voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
  sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
   mollit anim id est laborum. 123456789abcdefghijk'''.replace('\n', '')


EPOCHS = 1

seq_pred = SequencePreddictor(1000, 256, k=10)

for i in range(EPOCHS):
    print('epoch', i)
    for ch in input_data:
        seq_pred.step(ord(ch), train=True)


while True:
    seq_pred.step(None)
    print(chr(seq_pred.decode()), end='', flush=True)
    time.sleep(0.01)
    

6 Likes

That’s very cool @JarvisGoBrr. I was just going to suggest to @complyue that we go dark to the forum for a bit because we’re cluttering it with posts, and no-one is making code suggestions, and then you make this nice one.

When did you code this? In what context? Why didn’t you proceed with it?

On one level, yes, maybe this is trivially true. So, say, this is what a standard recursive NN does?

Historically with RNNs, as I understand it, the problem has been not “learning” the sequence, but learning how to generalize those sequences. RNNs learn a “memory” state which encodes the entire previous sequence. But there was no way to train a generalization over that.

Then LSTM hacked that by only carrying forward some of the previous sequence. And it was possible to learn some generalizations over that shortened history fragment. And then transformers basically made learning which bit of the prior sequence to “attend” to, the whole game. They didn’t even bother with any lossy forms of “recurrence” to keep trying to bring prior history forward.

So I guess the real question may be not learning observed sequences using recursive nets, but how you generalize over them. If you can “learn” predictive generalizations, as well as just learning to repeat observed sequences.

I’m suggesting that we not try to learn predictive generalizations, but further use properties of a network of sequences learned, to make predictive generalizations on the fly. That becomes a problem of how to tune spreading activation within a network or learned sequences, so that spreading activation reflects a predictive generalization.

And I conjecture that “tuning” should be to emphasize the extent to which sub-sequences within that network of learned sequences, share contexts.

Right now we’re trying to figure out how to use spreading activation from a “prompt”, to emphasize shared contexts among sub-sequences in the “learned” network. Currently the activation seems to fly off and get locked endlessly looping around single states!

But your code is suggestive of how a state can reinforce itself just purely from recurrent activation. Which might be relevant for our problem of reinforcing a prompt signal within an already learned, network, as much as it is relevant for the problem of “learning” that network in the first place.

2 Likes

I am intrigued by the detail of this though. For those more familiar with the detail of RNNs, how does this compare?

A RNN, as I recall, predicts based on a current state and a memory state, and then updates the memory state. My ability to read your code doesn’t tell me with sufficient resolution if by “reconstruct its own past state” you are feeding back to the immediately prior state, or a memory state of the whole sequence.

It may come to much the same thing.

If, as we had come to in @complyue’s code, we represent the “path”, or prior sequence, as an SDR over a subset of a full state SDR, that subset might function as a “memory state” within the full state.

In a regular RNN, I think it would be a completely different state. But maybe mapping to a subset SDR within the full SDR might do the same thing.

But it’s interesting that you do feed back. The stage we had reached we were just doing feedforward. Initially along single cell paths, and then proposed to expand that to an SDR.

My hunch is that feedback in a regular RNN is something which would be there to address a hoped for ability to learn generalizations. You feed back modifications of the memory state to adjust how well it predicted the next state. If you’re not trying to generalize, it shouldn’t be necessary? But you have this feedback?

Did you always find you needed feedback? The “reconstruct its own past state” part of your description above.

In the code you have a switch “backwards”. What’s that addressing?

2 Likes

Interesting code.
The output seemed chaotic, but after tingling with various # of epochs and parameters I got it to repeatedly replicate the input_data with:

EPOCHS = 10
seq_pred = SequencePreddictor(20000, 128, k=7)

Which is quite cool.

PS: Some settings are even more interesting it makes spelling mistakes and skips parts but keeps saying stuff “similar” to input_data.

Mildly chaotic I might say.

4 Likes

Note @cezar_t the chaos I’m expecting won’t be recall. It’s how attractors might form as activation spreads from a prompt over the recall network.

So if @JarvisGoBrr has coded a recall network. The question which interests me now, is if I “prompt” that network, by spiking a sequence of words over it, how will that activation spread? That’s the bit I’m expecting to have chaotic attractors.

I’m particularly interested to know at this moment why Jarvis has feedback in his learning algorithm.

2 Likes

I doubt it would be able to recall anything without converging towards some stable state. Isn’t that the mark of (presence of) attractors?

Beware the initial state is pretty random.

2 Likes

well, I never thought of it as feedback per se. in truth, it is more like HTM’s lateral connections.

without it , each time you input ‘a’ its hidden state would be the same, regardless of what was in the past, making prediction impossible.

as for why I have the backwards switch, well its for simplicity sake, it would be more biologically accurate to have distinct synapses for forward inference and past reconstruction since synapses are unidirectional. But ironically, since I’m using Hinton’s forward-forward method for training it, doing a backwards pass is simpler.

well, its a kind of resevoir network in a way so it kinda makes sense.

4 Likes

To whoever likes a good puzzle, please explain how a reservoir implemented using a simple (non chaotic) oscillator can become a very good predictor (== modeller) of a chaotic system like the Lorenz attractor?

1 Like

Yes, I guess the recall process is itself a kind of spreading of activation over the network in response to a prompt. You’re right…

So perhaps a way to further refine a definition of what I’m seeking, in the context of what Jarvis has done, is to tweak the connectivity of the network so it is not uniquely tuned to recall sequences in response to a prompt state, but to…gather together different sequences which begin and end at the same way as a prompt sequence.

Instead of continuing a recorded sequence from a prompt state, it should continue all possible sequences from a prompt state at the same time.

That might equate to just “blurring” the sequence history. A sequence history will specify exactly the sequence to be recalled. But if you “blurr” that history, the same state might have occurred multiple times in the sequence “learned”. So there might be multiple possible continuations.

I want all those possible continuations.

And I also want to filter them on the end state of the prompt sequence.

Expanding all possible continuations, may be as simple as throwing away some of a memory state.

How to filter that on a prompt end state might be trickier. Possibly the way to do it is to feedback activation from the end state to select from among those “blurred history” expansions.

In that case the feedback will be what selects the expansions I want.

Another way of saying the feedback selects them, will be to say that oscillations select them.

I don’t know. That might be another way to see what I’m trying to do.

And it’s nice that Jarvis already has this experiment coding and reproducing sequences as a step towards that. Maybe that can short circuit the sequence encoding problem which I was working most intently on with @complyue the last few messages.

But I want to know why Jarvis has feedback in his sequence encoding mechanism already. The move to a feedforward SDR sequence encoding which I was coming to with Compl did not have feedback for the sequence encoding process as such (though I was coming to a more explicit coding of it for the “selection by end state” feedback.)

2 Likes

I cant watch the video on my phone so just a hunch, since the thumb mentions pendulums, maybe its something related to fourier transforms and combination of several wave basis.

1 Like

when tou mention that, the first thing that comes to my mind is word embeddings and the way they encode words meaning by context of surrounding words, I cant think of a way to learn such embeddings directly in a online system without requiring something akin to a circular or shifting buffer, which I find unlikely to find in a real brain.

I think of it in terms of a state that evolves over time, this state is influenced by itself in the past (context) and its current input. So the analogy of branching cell paths is quite accurate, which branch gets selected depends on the current input.

2 Likes

But don’t you see that filtering the sequence expansions we are talking about on beginning and end state is exactly such an “embedding”?

1 Like

yes, but the complicated bit seems to be exactly that, how do you cluster several sequences together like that, specifically, how do you cluster both past and future into a state that can be evoked by an single input. Thats what bogs me down, I dont think keeping a replay buffer several timesteps in the past is practical for recurrent or feedforward networks. not to mention the amount of compute needed given that human brains seem to run at around 10Hz afaik.

2 Likes

I don’t want to take 40 minutes to watch the whole video. But where does what you say happen?

Skimming through it I could only find them talking about how a pendulum can behave chaotically. So, it models its own chaos.

Did I interpret that wrong?

Certainly very simple systems can have very complex evolutions.

1 Like

Why do you need a replay buffer? The spreading of activation can happen repeatedly. So there need be no replay buffer, just replay.

And if the replay is filtered on the end state, by recursion of some kind, then it should be (replay of) the kind we want.

That’s looking at it using this metaphor of expansion of multiple paths.

Looking at it as a simple synchrony of more tightly connected clusters, might be simpler.

1 Like

well, I kinda feel like if you “blur” the state like that, you’d corrupt the preddictions, unless its done separatelly, so maybe we have multiple sequence memories, for several blur widths.

maybe I’m thinking the wrong way here but it sounds to me like we need to do backtracking in order to update state representations.

so we’d have two sequence memories and the current state isnt a selected path but somehow the compressed encoding of both past states, current input and preddicted future states?

2 Likes

The key question which interests me about your code right now is why you have recursion in your sequence coding program.

I’ve been thinking for simple sequence coding, some kind of SDR should be enough.

Did you find you needed recursion?

What’s the… “backwards” flag doing?

1 Like