The coding of longer sequences in HTM SDRs

DanML · April 18, 2023, 4:33pm

Good point. What happens if a node self-references SYNAP_FACTOR + 1 times?

complyue · April 18, 2023, 4:34pm

Then it’ll keep firing, yes, becomes concerning …

DanML · April 18, 2023, 4:36pm

Also increased by compacted nodes with effis > 1 ?

complyue · April 18, 2023, 4:37pm

Yes. Effectively (ignore the technical details about compaction) they are the same case, from the logical perspective.

DanML · April 18, 2023, 4:40pm

Thanks @complyue !
I’m not saying we need to close off any of these ‘edge cases’ here and now - but it is good to get common understanding for what could happen.
The data from the trivial set I’m running now suggests it’s not a big deal at the moment.

complyue · April 18, 2023, 4:43pm

How SNN simulations deal with that? Viable to just elide self-excitations altogether?

complyue · April 18, 2023, 4:46pm

I’m a big fan to this school of thinking too!

DanML · April 18, 2023, 4:54pm

You can think of self-oscillation as a feature. Regulation happens in many ways in different networks.
We’d need @Bitking (or similar) to comment from a biological perspective.
I don’t believe HTM allows self-connections. Anyone?

DanML · April 18, 2023, 5:13pm

I think I spoke too soon.
We probably don’t care which cell-cell pattern there is - only which col-col pattern. We may even only care at the col-group (aka letter) level.
self_excit_df = excit_df[ (excit_df.from_column==excit_df.to_column) ] self_excit_df[‘from_column’].value_counts()

(EDIT this line below is wrong)
At the col-col level we have several dozen cols self-firing and >500 self-connections.

robf · April 18, 2023, 5:30pm

I was just thinking the opposite. I was just thinking we need to focus more on the cell-cell pattern. That should tame self activation a lot, because each cell, or rather each subset of cells, embodies a path back through the sequence. So we would only be talking about self activation for a path, by a path.

Actually I was starting to wonder how @complyue drives the letters. Do you drive entire columns, or only the cells corresponding to the current path?

For the first letter of a single word “prompt” the path might start at zero. But no context could be taken as a context too.

What’s the SNAP_FACTOR?

robf · April 18, 2023, 5:35pm

You basically want self activation for a path. That’s what is supposed to give us the oscillations. But it should only sum to something significant for large clusters of paths. That way it gives us a measure of the size of the cluster. And that size of the cluster is what we’re really looking for… large clusters which share beginning and end points. But the cluster must be filtered on entire paths, otherwise the letter representation immediately blows up from total connectivity.

DanML · April 18, 2023, 5:46pm

Probably just as well. Firing is voltage driven at a cell level, which means that previous analysis is wrong. What is needed is to count the ‘fan in’ of multiple source cells to one cell only.

The code takes pairs, there is no null or Start-of-word or Start-of-Sentence symbol currently (as used by other tokenizers).

It’s a typo for SYNAP_FACTOR. This is a divisor used for each input current unit, to prevent one fire causing another fire.

DanML · April 18, 2023, 6:05pm

This sounds more like graph analysis than a Spiking Neural Net. I don’t know how.

robf · April 18, 2023, 6:07pm

It might not be. Each path would be cell-to-cell, but we’d be looking at multiple paths to define a cluster. And even a path should probably be some form of SDR, not just a sequence of single cells.

Anyway, it would be the multiple paths of a cluster which specifies the behaviour we’d be looking for. The multiple paths of a cluster would be the equivalent of current HTM “training” of a path. The multiplicity would be what identify it as a repeated pattern.

It’s just that each of them, individually, would be filtered on an entire sequence. The connection letter to letter would not really be between letters, but between paths of letters leading up to the current letter.

robf · April 18, 2023, 6:09pm

“Graph analysis” might be a good way to characterize this. I want to find clusters of paths in a graph which share beginning and end points.

complyue · April 18, 2023, 6:16pm

Each letter in the prompt is driven independently.

The two 0.5 density there specifies half of a letter’s columns are chosed randomly, then half of each chosen column’s cells are chosed randomly, to spike.

    prompt_col_density=0.5,  # how many columns per letter to spike
    prompt_cel_density=0.5,  # how many cells per column to spike
    prompt_pace=10,  # time step distance between letter spikes

I totally have no clue why you think we have “path” info encoded by the synapses/connections established with current algorithm, so no idea how to follow any path in driving the prompt.

robf · April 18, 2023, 6:36pm

I see it as exactly analogous to the way it is done in HTM now. In HTM now, as I understand it, path is encoded by chaining only certain cells within a column. The subset of cells activated for a column encode the path which led to its activation.

Am I understanding that wrong? How do you see it being done in HTM now?

HTM “trains” these paths as one way to define meaning. Meaning is equated to repetition. So only the repeated paths are recorded.

At this stage, when dealing with words, I’m basically proposing the same thing. The “word” will be defined as repeated paths through a sequence of letters. I’m just proposing the repetition should be identified not by training a single cell, as with HTM, but by a method recognizing the multiplicity another way, just as the sheer number of different paths through the columns of the same letters.

But at base, at this level, the representation of “path” as coded by only a subset of cells in a column, is the same as for HTM.

That might be a problem. Maybe for the initial letter in a sequence, where the “context” is null, a random selection might be appropriate. Though probably “null” should be coded the same way each time too. And half the columns and half their cells sounds too much. I don’t know. Maybe that’s OK. There might be sufficient diversity of collections of even half the elements, to distinguish a large number of paths.

But in terms of driving the network for the initial prompt, and sequential activations of the network, the selection of cells to spike at each occurrence of a letter should be specific to a given path which has led up to that occurrence of that letter, not random. At least that’s the way I’m understanding the problem at the moment.

Bitking · April 18, 2023, 6:54pm

What you are discussing is so far from what the biology does that any details such as self-connection in a neuron in the cortex is a meaningless question. (By the way - no)

I have been following the discussion so far with about the same feelings I have when analyzing a deep-network programmed with back propagation; cool trick but no real relationship to the biology.

complyue · April 18, 2023, 6:58pm

I think I understand how HTM does it as you do. But we have intentionally avoided “HTM training” at all, so I don’t think we are getting the “same” as HTM has.

The very problem with me is the lack of such an algorithm in my mind for the time being, i.e. how to achieve that?

robf · April 18, 2023, 7:13pm

I’m saying the “same” in the sense that HTM trains repeated sequences by training a sequence of single cells. Whereas what is proposed here would identify (not train) repeated sequences, also by repetition, but repetition of the same sequence using different paths through cells this time.

What’s the “same” is the repetition. Not training of single paths vs. clustering of multiple paths.

Note also that I’m saying they are the “same” at the word level. What’s crucial, and interesting, is that the method I’m proposing here generalizes, and does something different above the word level, at the phrase level.

But as an initial task, I think replicating word level behaviour is a simple place to start.

I guess it is to me to identify how to change the code to do that.

I don’t think it should be hard. It looked to me from the code that you join synapses from cell to cell. So the forward activation should be cell-to-cell. And because cell-to-cell, specific to a given path.

Can you link here the specific section of code where activation is passed forward.

I don’t think anyone has proposed self-connection in a neuron. The self connection I’m talking about is that of a path to itself. So more by way of a grand loop over many neurons, and many columns. Even self connection of a letter to itself should be in the context of a path through those letters, so connection from a subset of cells in columns representing a letter, to another subset of cells in the columns of that letter. Since a cell of a column represents a path leading to the letter represented by that column, a connection of a cell to itself would be kind of meaningless, representing a path which contains itself, like Russell’s set which contains itself, which is equally an impossible concept.

Not saying that anything discussed here matches biology at the level of resolution of HTM (except maybe the representation of paths, so far?)

Topic		Replies	Views
SDRs and/or Graphs - modelling the data structures of the brain Tangential Theories sdrs	2	1156	February 14, 2022
Wondering about the history of HTM and SDR Education	18	1654	January 21, 2021
Sequence learning and invariant representations Numenta Theory	2	831	July 15, 2016
HTM Learning algorithm Related Papers	2	1040	November 3, 2017
A toy symbolic system partly inspired by HTM theory Lounge	0	741	June 9, 2017

The coding of longer sequences in HTM SDRs

Related topics