The coding of longer sequences in HTM SDRs

the backwards flag just swaps input and output and transposes the matrix basically, so literally runs the synapses in reverse to get a reconstruction of the input from the output.

its just the way I preffer to get a negative example for the forward-forward training.

as for the need for recurrency. it can be thought as a feed forward network that takes its own output as input in the next timestep, its literally just a way to keep information reverberating in the activations, without it I’d have to append the input into a list or something.

2 Likes

Right. You could do that though. So it’s not absolutely necessary. There’s nothing in the future states which say anything about the prior states. It’s not like trying to generalize past states to “learn” better prediction, which is generally the case with RNNs.

Why do you need a negative example?

1 Like

for context: [2212.13345] The Forward-Forward Algorithm: Some Preliminary Investigations

This is a way to update weights in a hebbian style but still get “backprop-like” results. its a really good method to construct weights that have “attractor” properties. basically you have the positive example (reality) and a negative example (self-generated), then you increase the weights that correct cells active during the “reality” pass and decrease them during the “negative” pass if the reality and negative passes are identical the updates cancel-out and no weight change happen.

so if you run the network inference once and increase the weights then reconstruct the input from the activations and run it again but decreasing the weights, the network develops a tendency to produce activations that are able to reconstruct the input,

its basically what restricted boltzmann machines do.

this is what I’m doing in this part of the code (note the 1 vs -1 in the last argumment)

    def hebbian_update(self, inputs, outputs, factor=1):
        self.weights[inputs.winners[:, np.newaxis], outputs.winners] += factor
# plus phase
self.state_matrix.hebbian_update(self.previous_state, self.new_state, 1)
self.encoding_matrix.hebbian_update(self.input, self.new_state, 1)

# minus phase
self.state_matrix.hebbian_update(self.previous_state_reconst, self.new_state, -1)
self.encoding_matrix.hebbian_update(self.input_reconst, self.new_state, -1)
4 Likes

Thanks for the ref.

But that means you’re learning. “Learning” being generalizing a prior state based on something it did or didn’t predict in the future state. That’s what I don’t get. If you are just coding a sequence, why do you need to “learn” anything? Can’t you just choose a bunch of cells to code the next state at each point, by connecting synapses from them?

3 Likes

well, I believe “Learning” is a bit of a strong word for something like updating weights.

in reality, if you look at what happens in the network’s weights, its really just modifying receptive fields of cells so that they become more selective to a particular combination of previous state and input, so its literally doing what you just said.

2 Likes

but I guess the point here is efficiency…

how sparse do you want activations to be?

if you really wanted you could assign a single cell for each transition. at which point does it become waste?

do you want random initial weights?

do we really need minicolums? macrocolumns?

how do we get semantic meaning out of single transition cells? is it even possible?

so many questions and so few timesteps.

3 Likes

I think you have basically implemented a Bayesian variational inference, or free energy principle, as some of you know it as, for an auto-regressive problem.
And in this perspective, boosting plays an important role, namely, the regularizer, which tries to maximize the entropy over the variables in the system.
Mathematically speaking, the system is modelling p(x_{\le T})= \sum_{z_{\le T}} \prod_{t=1}^T p(x_t,z_t|z_{t-1}) where x_t is the input at time t and z_t is the internal state, by approximating it by utilizing the surrogates over the latent variables, q_1(w_t|x_t) and q_2(z_t|z_{t-1},w_t) and the prior p(z_t), which assigns a high probability to a sparse state, i.e. boosting.
You can see the whole story by expanding the evidence lower bound, or free energy formula,
\mathbb{E}_{z_{\le T} \ \sim \ q(z_{\le T}|x_{\le T})}[\log \frac{p(x_{\le T}, z_{\le T})}{q(z_{\le T}|x_{\le T})}] = \mathbb{E}_{z_{\le T} \ \sim \ q(z_{\le T}|x_{\le T})}[\log p(x_{\le T}|z_{\le T})] - D_{KL}[q(z_{\le T}|x_{\le T}) \| p(z_{\le T})]
where p(x_{\le T}, z_{\le T})=\prod_{t=1}^T p(x_t,z_t|z_{t-1}) and q(z_{\le T}|x_{\le T})=\prod_{t=1}^{T} \sum_{w_t} q_2(z_t|z_{t-1},w_t)q_1(w_t|x_t).
The insight you could get from this equation is that self.new_state.noise(2), which might seem like just a hack at first glance, is responsible for sampling internal states, of which the necessity is implied by the expectation operator. Adding together the new states that are acquired from the previous state and the current input also makes sense if you assume the variables z_{t-1} and w_t are marginally independent.

5 Likes

woah, I start panicking a little when I see big expressions with lots of parenthesis. the truth is that the noise(2) line was really intended too be a hack, I was using a fancy duty cycle style boosting but it wasn’t enough to get the cells to stop getting stuck so I had to resort to extreme measures. until I realized that what was making the cells get stuck was a backwards pass going to the wrong activation.

but the noise helped so I decided to keep it.

4 Likes

Math notations I’ve rarely encountered intimidate me as well, I guess, because they are essentially a foreign language to those who are not familiar with.

All the equation is saying, is just that “I want to predict the next input given all the previous inputs by first generating an embedding w_t with the current input and combine it with the previous state z_{t-1} to generate the current state z_t, and then predict the next input from it, somehow.”, and that “somehow” is consequently explained in the expanded expression. Giving what we want to math, it tells us to “make the reconstructions as accurate as possible but also make the internal states as ambiguous as possible at the same time”.

I didn’t express the forward-foward part explicitly here but, it would be interesting as well and maybe the math would tell us something useful that might have been overlooked!

4 Likes

It’s only one pendulum with two motors with opposing propellers on the hanging end. Which means it has a single natural frequency.
The input signal modulates speed of the two motors in order to change force & direction. An accelerometer (or gyro?) measures angles and these angles are fed into the output, learning layer.

3 Likes

Now it’s added:

lnet.learn_words_as_sequence(
    '''
the quick brown fox jumps over the old lazy dog
''',
    sp_width=(3, 20),  # width of spike train: [n_columns, n_cells]
    sp_thick=60,  # thickness of spike train
)

About the thickness, there’s some struggle:

Though the 2 alternatives would act the same when sp_thick == sp_width[0] * sp_width[1].


With https://github.com/complyue/OscBrain/blob/main/debug/Trial-001.ipynb I get:

Unfortunately it still seems falling into short strong oscillations.


I don’t quite understand the “desk check” you describe, please expand a bit to help me figuring out your intension and set to implement that.

I started https://github.com/complyue/OscBrain/blob/main/debug/Verif-001.ipynb for the case of single-sentence learning.

The paths form well for so short a sequence (experimented with “the quick brown fox jumps over the old lazy dog”), and when there is no ambiguity, a single letter (“q”) would prompt the exact (sub)sequence accurately:

(HTM style SDR does form well, e.g. the “o” row, each spike of the letter is of different SDR instance, easily seen from the shape of dot cluster.)

But for a prompt invoking multiple possible sequences, you’ll need to suppress different signals very hard to get a clear/accurate/single (sub)sequence, or multiple (sub)sequences would all speak in parallel, like a superposition of all of them:

I guess both the path forming (i.e. learning) and path following (i.e. spiking control in sim) need better algorithm implementations. I’ll catch up with what’s already said in this thread, and carry on.

2 Likes

Currently my “blur” based prompting algorithm sorta implements this - all other letters’ signals are blurred (i.e. reset to rest level when prompt_blur=0.0, or proportionally scaled down with 0 < prompt_blur < 1.0), assuming the prompted letter has one or more of its subset SDRs to fire, according to how previously prompted letters have driven the network dynamics, including path selections.

It also detects the situation that the prompted letter has no single cell to fire, and in this case all cells in the letter’s full SDR will be forced to spike (i.e. to “burst” it as @robf suggested), assuming fresh network start.

1 Like

Yes, it doesn’t seem to have made any difference to the behaviour of immediately running off from the prompt, and endlessly looping around other characters. We thought it might be a “brittleness” thing of chaining only one cell to another, but it doesn’t seem to have been.

“Desk check” is just step-by-step following the code. One way to get this effect might be if it were possible to show the active synapses from cell-to-cell on the raster plot. Then we could see from step-to-step exactly where the activation comes from and where it is looping. But my hunch is that would be a totally different graphical capability!

It may not be worth even thinking about, because it would only be useful for very very simple test sequences. Anything more and the display would completely fill.

If it were possible to, say, hover over a cell, and, say, highlight both that cell and the cells it synapses with, that might be useful. Perhaps doable? No need for lines representing the synapses, just change the display colours of cells according to synapse connections about a given cell.

But not with this plotting module I think.

Nice example with the different occurrences of letters showing different SDR activations according to the path that led to them.

Actually for the “logic” I’m seeking, I don’t mind that. That is in a way exactly the signal I’m seeking. It’s not something you want for single sequences. Because with single sequences we want to clearly emphasize the sequence. But we do want to have a signal which emphasizes common beginning and end points between different sequences, and that signal should be a degree of simultaneity. And even for a single sequence, the signal we want should be a degree of pulling sequence steps together.

One way to reduce the confusion of that might be to stretch the display so that prompt time steps are much larger than network update time steps. That should reduce the confusion between multiple paths which result from activation spreading over the network, and the path of the driving prompt.

We need that to simulate the kind of freedom for the spike time to shift continuously which we would have with spiking hardware, anyway. The fact we have time steps at all are just an artifact of simulating parallel update in serial.

2 Likes

Now, even with a “wide” spike-train chained/threaded in learning the words-sequence, I don’t feel like the algorithm by my current code can guarantee the forming of such "path-cluster"s, @robf (and others interested here), please put more thoughts, I think there needs either a better crafted words-sequence learning algorithm, or if such "cluster"s are already there (learnt by current algorithm), I need a respective algorithm to identify them, during the sim driving.

2 Likes

You are right, not an easy thing to code up with a plotting tool like Bokeh. That’ll need a dedicated GUI application.

But actually Bokeh may be good to do the former:

I think Bokeh happens to have this capability.

Remember alpha to control opacity of data-points (rendered as dots or line, or whatever)? Then even though “completely fill”, you’ll still be able to visually identify clusters by colour intensity.

Let me know if you find some visual-schema worth to think about, I can try implement as plotting code.

2 Likes

Wow. Just goes to show it’s worth mentioning something. That’s completely the reverse of what I thought would be practicable.

Whether it’s worth the effort depends on how difficult it would be. My top priority is to stop this behaviour, and I’m thinking some kind of explicit feedback may be the way to do that. So finding the right kind of feedback might solve it without further examination.

And as a display priority, I think stretching out the time steps to simulate some continuity between prompt characters, might be a higher priority. As well as being something which we will need to do eventually either way.

But if there’s some switch somewhere in Bokeh which would turn synapse connections on… Sure! I’m guessing it will just show us activation zooming off and looping around characters.

But guessing is not the same as knowing!

The only schema I can think of initially would just be synapse lines between cells. So we could trace exactly where the activation travels and gets stuck.

So just lines showing synapses between cells.

1 Like

I tried looking at the repo but I’m a bit clueless about whats it you guys are trying to achieve here.

is this a bio-realistic LIF neural simulation? and you guys are trying to encode sequences as voltage oscilation patterns rather than SDRs?

is it a short-term or long-term memory model?

how exactly is information about sequences being represented and stored here?

1 Like

Sure, I’ll try implement around this idea. But honestly showing all synapses between all step gaps would exceed hardware (GPU RAM as well as browser process RAM usage ~ 10GB) limit, so it’s only feasible to render synapses induced the spikes (which is sparse in nature). Luckily I think we are only interested in these sparse synapses too :slight_smile:

3 Likes

For the record, I posse exactly the same question, despite I’m writing all the code by far. I write code per as far as how much I under @robf’s ideas, in hope to help myself getting farther there, toward his vision/insights or whatever, that I feel very much inspiring and promising.

@robf should be burdened to explain all things better and better :wink:

2 Likes

You said it yourself Jarvis:

So we’re agreed word embeddings work. The only problem is how to extract them from a network of sequences:

I start from the position that such sequences are related in the network by sharing (multiple) beginning and end points.

That’s the shape of the network.

If that’s the shape of the network, why shouldn’t we be able to tune the network to reveal that shape. It’s there. We just have to figure out how to project it out.

@DanML seemed to glimpse this, in an oblique way:

So I want to do “graph analysis”, to find “embeddings”, which embeddings are well known as effective representations of both meaning and structure for language.

The only difference is I think the embeddings will contradict, and there will be an indefinite number of them. So I think they must be found ad-hoc at run time.

And a likely way to do that is by clustering using oscillations. Something which is also known to reveal tightly connected sub-graphs which are “embedded” within a broader graph.

2 Likes