The coding of longer sequences in HTM SDRs

2 Likes

Still very far from the distributed representation that is coding in the cortex. You are essentially describing ā€œgrandmother cells.ā€

The ā€œsubset of cells in columns representing a letter, to another subset of cells in the columns of that letter.ā€ bit? You have much finer parsing between regions of processing. I call it micro-parsing, but that is not a real generally recognized term. (It should be!) At the region/map level you have a population code that represents features at that level of representation. The idea that there is letter and word within a given map/region does not match up to anything that I am used to seeing in the biology.

The problem in how this corresponds to your proposals is that the multi-region model with the required interconnection paths would be so large as to be intractable with current technology, so you are making toy models that do not do a very good job explaining the processing in the brain.

And you are not doing anything with the zoo of different types of cells working as an ensemble. The dynamics of different inhibitory cells is pretty hard for most spiking model to represent accurately. I donā€™t recall seeing a spiking model that does a good job with chandelier cells; key to HTM temporal processing.

Likewise, the interplay between the cortex and thalamus with bursting is not something that I have ever seen with spiking simulators.

The example that springs to mind is using transistors to explain a digital computer; making a full-up digital computer out of discreet transistors may be out of reach to the casual experimenter so you rig a handful up to make an analog computer. You may find that it does have interesting behaviors, such as calculating certain results but it offers no real understanding of how a spreadsheet calculates an answer.

3 Likes

Yeah, I guess itā€™s this bit:

    prompt_col_density=0.5,  # how many columns per letter to spike
    prompt_cel_density=0.5,  # how many cells per column to spike

If youā€™re spiking a fixed number of columns and cells, thatā€™s probably going to squash any sequence code. I think it should probably only spike the cell synapsed to.

Or am I interpreting it wrong?

Would that not work? What effect would that have?

2 Likes

The algorithm to identify such cells?


Prompting is separate code:

2 Likes

I think you might be misinterpreting that. The columns represent the letter. Theyā€™re an SDR, not a grandmother cell. If the cells represent a path through letters, that is exactly the same as HTM to my understanding. Only less specific (trained) to a single cell in what Iā€™m proposing.

Zoos of different types of cells working as an ensemble, I grant you. Iā€™m thinking entirely in terms of sequence representation elements. Whatever they could be. The equivalent of the CLA in HTM. So perhaps one layer at this point. In particular Iā€™m interested to hear about different inhibitory cells. The ones Iā€™m thinking of at this stage inhibit fairly globally. I think you said thereā€™s a kind which have that property. What other kinds are there?

But the mechanism Iā€™m thinking about might be on the level of the sequence representation mechanism in that paper you cited earlier. Here:

I think they conjectured it as a mechanism for whereā€¦ part of the parietal cortex??

1 Like

Perhaps Iā€™m misinterpreting your code. To me it looks like you spike half the columns and cells for the whole letter if any cell gets enough input. And Iā€™m thinking only the cell which has the inbound synapses should spike.

Am I understanding that wrong?

I canā€™t distinguish if you spike only the subset of cells specific to the sequence so far with this. Iā€™m suspecting not.

Gotta shut down now. Late. Will look more tomorrow.

1 Like

The implementation so far have two separate drives of spiking:

  1. prompted spikes - half the columns, half the cells of each column, regardless of synapses they have.

  2. input current from presynaptic cells would induce the postsynaptic cell to fire, once the voltage exceeds the threshold.

Lacking an algorithm to combine the 2 sources of firing together.

2 Likes

Another paper I point to on a regular basis:

It is a deep look at multiple regions of the brain working together.

Since it attempts to do a realistic description of the basic task of tracking a ball moving through the visual field, it is about as simple a model as you could construct that still engages the key mechanisms of the cortex and thalamus.

No lettters or words, just a big fat ball.

Even with that, this is some tough sledding and beyond the comfort zone for many on this forum. Lots of TLDR action here.

If you do have the background and do the work, there is a wealth of understanding of how the brain does things. The answer to your ā€œpart of the parietal cortex??ā€ is here; if you want to do the work to understand your answer. I really donā€™t want to come off as condescending, and I could see someone taking it that way, but a reasonably complete and correct answer to your question is complicated.

I really donā€™t want to offer a simple answer that is a lie by omission. It that is an OK answer for you then; all of the brain is involved.

1 Like

I think you said that well.

If I may add one other possible cause of confusion: the training set (Brown corpus) is used only to form synapses - it does not run the network (no steps taken).

Prompting uses an (implicit lookup)/mapping to find the column group (aka letter) and chooses what to do next at runtime (as 1 above).

2 Likes

How many cells self-reference? (from other cells on the same column)
So redoing that analysis properly a default run shows:

  • 4 cells with 3 self references
  • 25 cells with 2 self references
  • 565 cells with 1 self reference
2 Likes

Did some tinkering myself, committed the NB:

Major mods:

N_CELLS_PER_COL=10,  # per mini-column capacity

instead of 100

lnet.learn_words(brown.words()) 

instead of

lnet.learn_words_as_sequence(brown.words())
    # global scaling factor, to accommodate a unit synaptic efficacy value of 1.0
    # roughly this specifies that:
    #   how many presynaptic spikes is enough to trigger a postsynaptic spike,
    #   when each synapse has a unit efficacy value of 1.0
    SYNAP_FACTOR=300,

instead of 5

fire_dots_alpha=0.05,

instead of 0.01

ā€˜defcafeā€™ is kinda magic a prompt in this case, ā€˜quickā€™ or ā€˜brownā€™ wonā€™t trigger oscillations at all.

1 Like

Iā€™m totally on board with ā€œall of the brain working togetherā€. I just donā€™t think the way the whole brain works together to combine old elements to construct new meaning has been addressed as effectively as other aspects. Thatā€™s why I liked the paper you referenced, because it did address that problem, and particularly the paper referenced within that, specifically looking at combinations of elements to construct new meaning.

So I think this combinations of elements to construct new meaning is just a less addressed area, and basically the missing piece.

It need not invalidate other perspectives. I think embodiment, down to the detail, will be important to substantiate qualia, for instance. Thatā€™s a big thing.

A lot depends on the detail you want to model. To place my ideas in the context of current AI, using the common ā€œmastery of flightā€ analogy, I would compare deep learning to balloons - actually working, but in a completely different way to birds - transformers maybe as airships (someone described our current time as the ā€œexciting Zeppelin phase of AIā€, recently?), a truly biological model as the whole bird. And myself in that, I hope, in the same relation as the Wright Brothers, abstracting as much as is needed of the bird, but without feeling the need to be constrained by feathers and blood vessels, useful perspectives as they too may be.

In that I actually donā€™t think my level of abstraction differs too much from Jeff Hawkinsā€™ original conception of HTM.

And in detail Iā€™m working very much at a similar level of detail to what I remember of the CLA (Cortical Learning Algorithm.) And the representation of sequences as paths between cells in columns, in particular.

1 Like

Can someone explain what this program is showing on that chart please?

1 Like

ā€œ2ā€ seems fine. To combine ā€œ2ā€ and ā€œ1ā€, you might just drive only the prompt cells appropriate to the prompt context.

But for driving prompts youā€™re right, that does raise other issues.

Firstly it raises the issue of what maximum path length we want to encode.

The maximum path length coded might come down to the number of cells and columns we have. Those will impose a maximum coding depth for sequences. There will be a maximum length of sequence it will be possible to encode by selecting subsets of cells. Especially if the subset is 50% of column cells, as at present(?)

And then we come to the problem of how to drive partially matching sequences. Itā€™s easy if the sequence is an exact match. You just drive the next cell that sequence synapses with. But if the sequence is only a partial match, at the extreme, when youā€™re at the first litter of an isolated word prompt, what cells do we chose to driveā€¦

Ideally we might drive with a strength related to the fit between a fill path and the partial path.

So, ideally, an isolated sequence like ā€œquickā€, might be driven on the cells of each letter, with a strength proportional to the length of match of the preceding sequence. So maybe ā€œqā€ gets a very weak drive on all its cells (because no match with any preceding sequence), and ā€œuā€ gets a slightly stronger drive on the cells of all sequences ending in ā€œqā€, ā€œiā€ a slightly stronger drive again on cells of all recorded (Brown Corpus) sequences ending in ā€œq-uā€, etc.

The same for feedforward synapse activations.

It also raises the issue of what do do when you have a long sequence with just one character different.

It also raises the issue of how to identify the cells for all recorded sequences ending in a shorter sub-sequence.

I guess itā€™s done as a property of an SDR representing the path. Very specific paths, paths reaching back to the maximum distance capable of being encoded with an SDR or a given size, might have larger numbers of cells active (to give the depth of encoding necessary.) While shorter paths might have smaller SDRs.

Perhaps the way to do it is to start with very small SDRs when coding sequence, and gradually expand them. Then when we do the feedforward, the longer paths will be driven on a larger SDR, and so give a stronger, but more specific, signal.

Thatā€™s just an initial idea about how we might address this. Iā€™ll put it out there for comment, and do some more thinking about it myself.

Yeah. So the way Iā€™ve sketched a more nuanced ā€œlookupā€ above, is as an expanding path SDR. So that longer paths have more cells active. Giving both a stronger, and a more specific, signal.

Re. the corpus being used ā€œonly to form synapsesā€ and not to ā€œrun the networkā€, I donā€™t follow the objection there. Maybe youā€™re saying the same thing about how to match a driving/feedforward signal from a shorter sequence. My solution sketched as starting sequences represented as smaller SDRs (instead of 50% of column cells at the moment?), and then expanding them as the recorded sequence grows.

Whatā€™s the difference here. It looks like you stopped learning sequences??

Iā€™m not sufficiently familiar with what SYNAP_FACTOR and fire_dots_alpha are doing at the moment to comment. My impression is that this is just to vary the strength of synapses?

Nice to get some oscillation though. Playing around with it to see what ā€œtuningā€ gives us the kind of signal we want, is a good way to get an idea what we need.

2 Likes

Itā€™s a raster plot. A plot of cell firing. With time on the x-axis, and cells (grouped into columns, roughly) on the y-axis.

In particular this is a raster plot for the cell firings of a network that @complyue has coded up, which captures all the sequences of letters in a corpus of text (the Brown Corpus in this case.)

He then drives the network by spiking cells in a prompt sequence, and sees how the activation spreads across synapses representing observed sequences of letters in the Brown Corpus.

To start interpreting it, look first at the extreme left. The very first cells to fire will be exactly those of the driving ā€œpromptā€.

If youā€™re talking about the last chart he posted in this thread, you can see on the extreme top left some spikes for ā€œxā€. Then, lower down, a bit later, spikes for ā€œdā€, ā€œeā€, ā€œfā€, ā€œcā€, ā€œaā€, ā€œfā€, and ā€œeā€, in sequence. Thatā€™s what heā€™s chosen to use to drive, or ā€œpromptā€ the network, as a test. The other cell firings are the way those initial prompt spikes spread, and then repeated, or oscillated as activation circulated around the sequences of letters observed in the Brown Corpus.

We want to tune the network so that sequences of letters which tend to share beginning and end points in the observed sequences of the Brown, will synchronize, and give us nice vertical lines we can use to identify words, and later phrases.

2 Likes

Thatā€™s what it does now.

The questions are about how many of the letter-mapped columns/cells to fire, e.g. ā€˜qā€™ has 10 columns each with a 100 cells by default. There is no analysis of what those cells link to - if anything. This also currently does not make any synapses - so you are assuming an existing network built by the separate corpus mapping functions (run before the network started).

The reason you know a ā€˜qā€™ group exists is because the whole alphabet (a-z lowercase) is hard-coded into the data structures in advance.

This could be different but that is what exists now.

1 Like

If you look at the code you can see that the difference is that the new one does not run words together when making links (but the other one did). Both make sequences, but logically these are now shorter (word length not sentence/corpus length).

1 Like

Thought some more about this. Is the path code currently done by:

Actually I didnā€™t recall it was done as one cell connections only. I thought there was a place where you coded it as 50% of column cells connected to cells in the next letter.

Looking at it now, I donā€™t think one cell is enough. I think the path encoding needs to be an SDR.

So when we encode the sequences from the corpus, letā€™s try synapsing a random subset of cells from each letter to the next (or synapses from the cells synapsed in the last sequence step, to a random set in the nextā€¦) It needs to be a set. I was wrong to say just code the path as a connection from one cell to another. (That works for HTM because it codes a pattern on a dendrite of the cell?? We need a pattern too, but it can be among the cells?)

Anyway, if we code the sequence path using a subset of cells, or SDR across the cells, the limited size of the subset will mean the code for long sequences will eventually wash out. Because automatically the random choice at each step will start to overwrite remnant code from longer sequences. So the SDR will code (= limit the activation of?) a sequence of some length, but not infinite length.

Then, how to represent shorter strings as subsets of that SDRā€¦

As an algorithm, I would say when driving prompt states:

  1. ā€œburstā€ a prompt letter with no context. So, spike ā€œallā€ its cells.
  2. Then, for the next letter of the prompt, spike only the cells of the next prompt letter which synapse from the first. Those will automatically be a superset of what the spiking cells would have been for longer paths (because the spiking cells of longer sequences would have been selected from that superset by the paths.)
  3. Spike only those cells from the third prompt letter which synapse from the second (also automatically filtered as those which synapsed from the first.)

I think that automatically implements coding partial matches of shorter sequences with longer sequences in terms of their path cells (though actually the reverse of what I initially suggested, because it codes shorter paths as larger SDRs)

I thought it was driving 50% of each randomly. Note by ā€œdrivingā€ Iā€™m talking about prompts. Not the way that propagates further into the network.

I canā€™t figure out what problem you are pointing to here. Iā€™m assuming a network built by corpus mapping functions run before the network started, yes. But when thatā€™s done there are synapses. And those can limit what firesā€¦

I guess I am missing the problem you are pointing to.

With this are you objecting to an arbitrary assignment of SDR code to each letter? Those could be built from lower level data, previously we discussed building them from some kind of neuron response to sound. But since the process of interest is that of building the next level of structure above at each stage, it doesnā€™t matter much where you start, and taking it from a hard coded letter representation is just a place to start.

If thatā€™s what you mean.

Oh, thanks for clarifying that for me. Thatā€™s not such a big deal then.

2 Likes

@robf I realize that I missed one important point in the synapse/connection making algorithm, that the selected cells to connect should be in a single thread of spike-train (am I using this term right?). I think I was wrong in my previous implementation about that point, thus this change to fix it:

With this realization, I now get how the ā€œpathā€ info is indeed encoded. And based on the new understanding, I drafted a new ā€œblurā€ based prompting method, that at any time step when there is a letter to prompt:

  • the voltage of all cellsā€™ except those belong to the prompt-letter get scaled-down by a prompt_blur=0.8
  • when any cell belonging to the prompt-letter is going to fire, leave all prompted cells as they are, i.e. as driven by the networkā€™s dynamics
  • when none of the cells belonging to the prompt-letter is going to fire, force all of them to fire

Code location:


For ā€œblurā€ to be safely performed by simply scaling, now VOLT_REST is hardcoded to be 0.0 and SPIKE_THRES hardcoded to be 1.0.

Iā€™ve also updated the NBs and get sorta game-changing results, please pull the latest commits and run the new NBs to see and tinker.

2 Likes

Yes. ā€œSingle thread of spike-trainā€ makes sense to me. And similar to the way it is done now in HTM. Except the ā€œthreads of spike-trainā€ are not trained.

As I was sketching above, I now think that such a single cell train of activation likely needs to be made an SDR ā€œtrainā€. Perhaps randomly selected for the first element in a sequence, but from then on synapsing only from that initially selected SDR group, and to a randomly selected group in the next element.

Using an SDR ā€œtrainā€ rather than a single cell ā€œtrainā€, should automatically limit the length of sequence which is coded (though I suppose that would happen with a single cell train too, which would eventually run out of unique cells to synapse to.)

But I can imagine anything which helps move the code to be one of paths rather than isolated states should indeed make a ā€œgame changingā€ difference :slight_smile:

Iā€™m confused that it still seems to be states other than the driven states which have continuing oscillations though. I would have thought the main difference of isolating paths would have been to restrict activation to only those states on the driven path, and other paths with similar beginning and end points.

Donā€™t see the significance of doing this. It further emphasizes the prompt state activity?

1 Like