The coding of longer sequences in HTM SDRs

Good point. What happens if a node self-references SYNAP_FACTOR + 1 times?

1 Like

Then itā€™ll keep firing, yes, becomes concerning ā€¦

2 Likes

Also increased by compacted nodes with effis > 1 ?

1 Like

Yes. Effectively (ignore the technical details about compaction) they are the same case, from the logical perspective.

2 Likes

Thanks @complyue !
Iā€™m not saying we need to close off any of these ā€˜edge casesā€™ here and now - but it is good to get common understanding for what could happen.
The data from the trivial set Iā€™m running now suggests itā€™s not a big deal at the moment.

2 Likes

How SNN simulations deal with that? Viable to just elide self-excitations altogether?

1 Like

:handshake: Iā€™m a big fan to this school of thinking too!

1 Like

You can think of self-oscillation as a feature. Regulation happens in many ways in different networks.
Weā€™d need @Bitking (or similar) to comment from a biological perspective.
I donā€™t believe HTM allows self-connections. Anyone?

2 Likes

I think I spoke too soon.
We probably donā€™t care which cell-cell pattern there is - only which col-col pattern. We may even only care at the col-group (aka letter) level.

self_excit_df = excit_df[ (excit_df.from_column==excit_df.to_column) ]
self_excit_df[ā€˜from_columnā€™].value_counts()

(EDIT this line below is wrong)
At the col-col level we have several dozen cols self-firing and >500 self-connections.

1 Like

I was just thinking the opposite. I was just thinking we need to focus more on the cell-cell pattern. That should tame self activation a lot, because each cell, or rather each subset of cells, embodies a path back through the sequence. So we would only be talking about self activation for a path, by a path.

Actually I was starting to wonder how @complyue drives the letters. Do you drive entire columns, or only the cells corresponding to the current path?

For the first letter of a single word ā€œpromptā€ the path might start at zero. But no context could be taken as a context too.

Whatā€™s the SNAP_FACTOR?

1 Like

You basically want self activation for a path. Thatā€™s what is supposed to give us the oscillations. But it should only sum to something significant for large clusters of paths. That way it gives us a measure of the size of the cluster. And that size of the cluster is what weā€™re really looking forā€¦ large clusters which share beginning and end points. But the cluster must be filtered on entire paths, otherwise the letter representation immediately blows up from total connectivity.

1 Like

Probably just as well. Firing is voltage driven at a cell level, which means that previous analysis is wrong. What is needed is to count the ā€˜fan inā€™ of multiple source cells to one cell only.

The code takes pairs, there is no null or Start-of-word or Start-of-Sentence symbol currently (as used by other tokenizers).

Itā€™s a typo for SYNAP_FACTOR. This is a divisor used for each input current unit, to prevent one fire causing another fire.

1 Like

This sounds more like graph analysis than a Spiking Neural Net. I donā€™t know how.

1 Like

It might not be. Each path would be cell-to-cell, but weā€™d be looking at multiple paths to define a cluster. And even a path should probably be some form of SDR, not just a sequence of single cells.

Anyway, it would be the multiple paths of a cluster which specifies the behaviour weā€™d be looking for. The multiple paths of a cluster would be the equivalent of current HTM ā€œtrainingā€ of a path. The multiplicity would be what identify it as a repeated pattern.

Itā€™s just that each of them, individually, would be filtered on an entire sequence. The connection letter to letter would not really be between letters, but between paths of letters leading up to the current letter.

2 Likes

ā€œGraph analysisā€ might be a good way to characterize this. I want to find clusters of paths in a graph which share beginning and end points.

1 Like

Each letter in the prompt is driven independently.

The two 0.5 density there specifies half of a letterā€™s columns are chosed randomly, then half of each chosen columnā€™s cells are chosed randomly, to spike.

    prompt_col_density=0.5,  # how many columns per letter to spike
    prompt_cel_density=0.5,  # how many cells per column to spike
    prompt_pace=10,  # time step distance between letter spikes

I totally have no clue why you think we have ā€œpathā€ info encoded by the synapses/connections established with current algorithm, so no idea how to follow any path in driving the prompt.

1 Like

I see it as exactly analogous to the way it is done in HTM now. In HTM now, as I understand it, path is encoded by chaining only certain cells within a column. The subset of cells activated for a column encode the path which led to its activation.

Am I understanding that wrong? How do you see it being done in HTM now?

HTM ā€œtrainsā€ these paths as one way to define meaning. Meaning is equated to repetition. So only the repeated paths are recorded.

At this stage, when dealing with words, Iā€™m basically proposing the same thing. The ā€œwordā€ will be defined as repeated paths through a sequence of letters. Iā€™m just proposing the repetition should be identified not by training a single cell, as with HTM, but by a method recognizing the multiplicity another way, just as the sheer number of different paths through the columns of the same letters.

But at base, at this level, the representation of ā€œpathā€ as coded by only a subset of cells in a column, is the same as for HTM.

That might be a problem. Maybe for the initial letter in a sequence, where the ā€œcontextā€ is null, a random selection might be appropriate. Though probably ā€œnullā€ should be coded the same way each time too. And half the columns and half their cells sounds too much. I donā€™t know. Maybe thatā€™s OK. There might be sufficient diversity of collections of even half the elements, to distinguish a large number of paths.

But in terms of driving the network for the initial prompt, and sequential activations of the network, the selection of cells to spike at each occurrence of a letter should be specific to a given path which has led up to that occurrence of that letter, not random. At least thatā€™s the way Iā€™m understanding the problem at the moment.

2 Likes

What you are discussing is so far from what the biology does that any details such as self-connection in a neuron in the cortex is a meaningless question. (By the way - no)

I have been following the discussion so far with about the same feelings I have when analyzing a deep-network programmed with back propagation; cool trick but no real relationship to the biology.

4 Likes

I think I understand how HTM does it as you do. But we have intentionally avoided ā€œHTM trainingā€ at all, so I donā€™t think we are getting the ā€œsameā€ as HTM has.

The very problem with me is the lack of such an algorithm in my mind for the time being, i.e. how to achieve that?

2 Likes

Iā€™m saying the ā€œsameā€ in the sense that HTM trains repeated sequences by training a sequence of single cells. Whereas what is proposed here would identify (not train) repeated sequences, also by repetition, but repetition of the same sequence using different paths through cells this time.

Whatā€™s the ā€œsameā€ is the repetition. Not training of single paths vs. clustering of multiple paths.

Note also that Iā€™m saying they are the ā€œsameā€ at the word level. Whatā€™s crucial, and interesting, is that the method Iā€™m proposing here generalizes, and does something different above the word level, at the phrase level.

But as an initial task, I think replicating word level behaviour is a simple place to start.

I guess it is to me to identify how to change the code to do that.

I donā€™t think it should be hard. It looked to me from the code that you join synapses from cell to cell. So the forward activation should be cell-to-cell. And because cell-to-cell, specific to a given path.

Can you link here the specific section of code where activation is passed forward.

I donā€™t think anyone has proposed self-connection in a neuron. The self connection Iā€™m talking about is that of a path to itself. So more by way of a grand loop over many neurons, and many columns. Even self connection of a letter to itself should be in the context of a path through those letters, so connection from a subset of cells in columns representing a letter, to another subset of cells in the columns of that letter. Since a cell of a column represents a path leading to the letter represented by that column, a connection of a cell to itself would be kind of meaningless, representing a path which contains itself, like Russellā€™s set which contains itself, which is equally an impossible concept.

Not saying that anything discussed here matches biology at the level of resolution of HTM (except maybe the representation of paths, so far?)

2 Likes