Thanks @complyue !
Iām not saying we need to close off any of these āedge casesā here and now - but it is good to get common understanding for what could happen.
The data from the trivial set Iām running now suggests itās not a big deal at the moment.
You can think of self-oscillation as a feature. Regulation happens in many ways in different networks.
Weād need @Bitking (or similar) to comment from a biological perspective.
I donāt believe HTM allows self-connections. Anyone?
I think I spoke too soon.
We probably donāt care which cell-cell pattern there is - only which col-col pattern. We may even only care at the col-group (aka letter) level.
self_excit_df = excit_df[ (excit_df.from_column==excit_df.to_column) ]
self_excit_df[āfrom_columnā].value_counts()
(EDIT this line below is wrong)
At the col-col level we have several dozen cols self-firing and >500 self-connections.
I was just thinking the opposite. I was just thinking we need to focus more on the cell-cell pattern. That should tame self activation a lot, because each cell, or rather each subset of cells, embodies a path back through the sequence. So we would only be talking about self activation for a path, by a path.
Actually I was starting to wonder how @complyue drives the letters. Do you drive entire columns, or only the cells corresponding to the current path?
For the first letter of a single word āpromptā the path might start at zero. But no context could be taken as a context too.
You basically want self activation for a path. Thatās what is supposed to give us the oscillations. But it should only sum to something significant for large clusters of paths. That way it gives us a measure of the size of the cluster. And that size of the cluster is what weāre really looking forā¦ large clusters which share beginning and end points. But the cluster must be filtered on entire paths, otherwise the letter representation immediately blows up from total connectivity.
Probably just as well. Firing is voltage driven at a cell level, which means that previous analysis is wrong. What is needed is to count the āfan inā of multiple source cells to one cell only.
The code takes pairs, there is no null or Start-of-word or Start-of-Sentence symbol currently (as used by other tokenizers).
Itās a typo for SYNAP_FACTOR. This is a divisor used for each input current unit, to prevent one fire causing another fire.
It might not be. Each path would be cell-to-cell, but weād be looking at multiple paths to define a cluster. And even a path should probably be some form of SDR, not just a sequence of single cells.
Anyway, it would be the multiple paths of a cluster which specifies the behaviour weād be looking for. The multiple paths of a cluster would be the equivalent of current HTM ātrainingā of a path. The multiplicity would be what identify it as a repeated pattern.
Itās just that each of them, individually, would be filtered on an entire sequence. The connection letter to letter would not really be between letters, but between paths of letters leading up to the current letter.
Each letter in the prompt is driven independently.
The two 0.5 density there specifies half of a letterās columns are chosed randomly, then half of each chosen columnās cells are chosed randomly, to spike.
prompt_col_density=0.5, # how many columns per letter to spike
prompt_cel_density=0.5, # how many cells per column to spike
prompt_pace=10, # time step distance between letter spikes
I totally have no clue why you think we have āpathā info encoded by the synapses/connections established with current algorithm, so no idea how to follow any path in driving the prompt.
I see it as exactly analogous to the way it is done in HTM now. In HTM now, as I understand it, path is encoded by chaining only certain cells within a column. The subset of cells activated for a column encode the path which led to its activation.
Am I understanding that wrong? How do you see it being done in HTM now?
HTM ātrainsā these paths as one way to define meaning. Meaning is equated to repetition. So only the repeated paths are recorded.
At this stage, when dealing with words, Iām basically proposing the same thing. The āwordā will be defined as repeated paths through a sequence of letters. Iām just proposing the repetition should be identified not by training a single cell, as with HTM, but by a method recognizing the multiplicity another way, just as the sheer number of different paths through the columns of the same letters.
But at base, at this level, the representation of āpathā as coded by only a subset of cells in a column, is the same as for HTM.
That might be a problem. Maybe for the initial letter in a sequence, where the ācontextā is null, a random selection might be appropriate. Though probably ānullā should be coded the same way each time too. And half the columns and half their cells sounds too much. I donāt know. Maybe thatās OK. There might be sufficient diversity of collections of even half the elements, to distinguish a large number of paths.
But in terms of driving the network for the initial prompt, and sequential activations of the network, the selection of cells to spike at each occurrence of a letter should be specific to a given path which has led up to that occurrence of that letter, not random. At least thatās the way Iām understanding the problem at the moment.
What you are discussing is so far from what the biology does that any details such as self-connection in a neuron in the cortex is a meaningless question. (By the way - no)
I have been following the discussion so far with about the same feelings I have when analyzing a deep-network programmed with back propagation; cool trick but no real relationship to the biology.
I think I understand how HTM does it as you do. But we have intentionally avoided āHTM trainingā at all, so I donāt think we are getting the āsameā as HTM has.
The very problem with me is the lack of such an algorithm in my mind for the time being, i.e. how to achieve that?
Iām saying the āsameā in the sense that HTM trains repeated sequences by training a sequence of single cells. Whereas what is proposed here would identify (not train) repeated sequences, also by repetition, but repetition of the same sequence using different paths through cells this time.
Whatās the āsameā is the repetition. Not training of single paths vs. clustering of multiple paths.
Note also that Iām saying they are the āsameā at the word level. Whatās crucial, and interesting, is that the method Iām proposing here generalizes, and does something different above the word level, at the phrase level.
But as an initial task, I think replicating word level behaviour is a simple place to start.
I guess it is to me to identify how to change the code to do that.
I donāt think it should be hard. It looked to me from the code that you join synapses from cell to cell. So the forward activation should be cell-to-cell. And because cell-to-cell, specific to a given path.
Can you link here the specific section of code where activation is passed forward.
I donāt think anyone has proposed self-connection in a neuron. The self connection Iām talking about is that of a path to itself. So more by way of a grand loop over many neurons, and many columns. Even self connection of a letter to itself should be in the context of a path through those letters, so connection from a subset of cells in columns representing a letter, to another subset of cells in the columns of that letter. Since a cell of a column represents a path leading to the letter represented by that column, a connection of a cell to itself would be kind of meaningless, representing a path which contains itself, like Russellās set which contains itself, which is equally an impossible concept.
Not saying that anything discussed here matches biology at the level of resolution of HTM (except maybe the representation of paths, so far?)