The coding of longer sequences in HTM SDRs

complyue · April 18, 2023, 7:25pm

complyue/OscBrain/blob/0046d0c5cd749309fc25587310a6b7fa8675e160/oscb/sim.py#L247-L282


      
          # accumulate input current, according to presynaptic spikes
          cell_volts_tobe[:] = 0
          for i in range(excit_links.size):
              v = cell_volts[excit_presynap_ci[i], excit_presynap_ici[i]]
              if v >= SPIKE_THRES:
                  cell_volts_tobe[
                      excit_postsynap_ci[i], excit_postsynap_ici[i]
                  ] += excit_effis[i]
          for i in range(inhib_links.size):
              v = cell_volts[inhib_presynap_ci[i], inhib_presynap_ici[i]]
              if v >= SPIKE_THRES:
                  cell_volts_tobe[
                      inhib_postsynap_ci[i], inhib_postsynap_ici[i]
                  ] -= inhib_effis[i]
          # apply the global scaling factor
          cell_volts_tobe[:] /= SYNAP_FACTOR
          
          # reset voltage if fired, or update the voltage
          for ci in range(N_COLS):
              for ici in range(N_CELLS_PER_COL):

This file has been truncated. show original

Bitking · April 18, 2023, 7:42pm

Still very far from the distributed representation that is coding in the cortex. You are essentially describing “grandmother cells.”

The “subset of cells in columns representing a letter, to another subset of cells in the columns of that letter.” bit? You have much finer parsing between regions of processing. I call it micro-parsing, but that is not a real generally recognized term. (It should be!) At the region/map level you have a population code that represents features at that level of representation. The idea that there is letter and word within a given map/region does not match up to anything that I am used to seeing in the biology.

The problem in how this corresponds to your proposals is that the multi-region model with the required interconnection paths would be so large as to be intractable with current technology, so you are making toy models that do not do a very good job explaining the processing in the brain.

And you are not doing anything with the zoo of different types of cells working as an ensemble. The dynamics of different inhibitory cells is pretty hard for most spiking model to represent accurately. I don’t recall seeing a spiking model that does a good job with chandelier cells; key to HTM temporal processing.

Likewise, the interplay between the cortex and thalamus with bursting is not something that I have ever seen with spiking simulators.

The example that springs to mind is using transistors to explain a digital computer; making a full-up digital computer out of discreet transistors may be out of reach to the casual experimenter so you rig a handful up to make an analog computer. You may find that it does have interesting behaviors, such as calculating certain results but it offers no real understanding of how a spreadsheet calculates an answer.

robf · April 18, 2023, 7:49pm

Yeah, I guess it’s this bit:

    prompt_col_density=0.5,  # how many columns per letter to spike
    prompt_cel_density=0.5,  # how many cells per column to spike

If you’re spiking a fixed number of columns and cells, that’s probably going to squash any sequence code. I think it should probably only spike the cell synapsed to.

Or am I interpreting it wrong?

Would that not work? What effect would that have?

complyue · April 18, 2023, 7:53pm

The algorithm to identify such cells?

Prompting is separate code:

github.com

complyue/OscBrain/blob/0046d0c5cd749309fc25587310a6b7fa8675e160/oscb/sim.py#L240-L245


      
          # apply prompt appropriately, force spikes at beginning of current time step
          if next_prompt_i < prompt_lcodes.size:
              if i_step - last_prompt_ts >= prompt_pace:
                  ensure_prompted_spikes(prompt_lcodes[next_prompt_i])
                  last_prompt_ts = i_step
                  next_prompt_i += 1

robf · April 18, 2023, 8:11pm

I think you might be misinterpreting that. The columns represent the letter. They’re an SDR, not a grandmother cell. If the cells represent a path through letters, that is exactly the same as HTM to my understanding. Only less specific (trained) to a single cell in what I’m proposing.

Zoos of different types of cells working as an ensemble, I grant you. I’m thinking entirely in terms of sequence representation elements. Whatever they could be. The equivalent of the CLA in HTM. So perhaps one layer at this point. In particular I’m interested to hear about different inhibitory cells. The ones I’m thinking of at this stage inhibit fairly globally. I think you said there’s a kind which have that property. What other kinds are there?

But the mechanism I’m thinking about might be on the level of the sequence representation mechanism in that paper you cited earlier. Here:

I think they conjectured it as a mechanism for where… part of the parietal cortex??

robf · April 18, 2023, 8:19pm

Perhaps I’m misinterpreting your code. To me it looks like you spike half the columns and cells for the whole letter if any cell gets enough input. And I’m thinking only the cell which has the inbound synapses should spike.

Am I understanding that wrong?

I can’t distinguish if you spike only the subset of cells specific to the sequence so far with this. I’m suspecting not.

Gotta shut down now. Late. Will look more tomorrow.

complyue · April 18, 2023, 8:27pm

The implementation so far have two separate drives of spiking:

prompted spikes - half the columns, half the cells of each column, regardless of synapses they have.
input current from presynaptic cells would induce the postsynaptic cell to fire, once the voltage exceeds the threshold.

Lacking an algorithm to combine the 2 sources of firing together.

Bitking · April 18, 2023, 8:32pm

Another paper I point to on a regular basis:

It is a deep look at multiple regions of the brain working together.

Since it attempts to do a realistic description of the basic task of tracking a ball moving through the visual field, it is about as simple a model as you could construct that still engages the key mechanisms of the cortex and thalamus.

No lettters or words, just a big fat ball.

Even with that, this is some tough sledding and beyond the comfort zone for many on this forum. Lots of TLDR action here.

If you do have the background and do the work, there is a wealth of understanding of how the brain does things. The answer to your “part of the parietal cortex??” is here; if you want to do the work to understand your answer. I really don’t want to come off as condescending, and I could see someone taking it that way, but a reasonably complete and correct answer to your question is complicated.

I really don’t want to offer a simple answer that is a lie by omission. It that is an OK answer for you then; all of the brain is involved.

DanML · April 18, 2023, 8:51pm

I think you said that well.

If I may add one other possible cause of confusion: the training set (Brown corpus) is used only to form synapses - it does not run the network (no steps taken).

Prompting uses an (implicit lookup)/mapping to find the column group (aka letter) and chooses what to do next at runtime (as 1 above).

DanML · April 18, 2023, 9:40pm

How many cells self-reference? (from other cells on the same column)
So redoing that analysis properly a default run shows:

4 cells with 3 self references
25 cells with 2 self references
565 cells with 1 self reference

complyue · April 19, 2023, 7:55am

Did some tinkering myself, committed the NB:

github.com

complyue/OscBrain/blob/main/Tinkering.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cde271ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "from numba import njit\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ebbcf81",
   "metadata": {},

This file has been truncated. show original

Major mods:

N_CELLS_PER_COL=10,  # per mini-column capacity

instead of 100

lnet.learn_words(brown.words())

instead of

lnet.learn_words_as_sequence(brown.words())

    # global scaling factor, to accommodate a unit synaptic efficacy value of 1.0
    # roughly this specifies that:
    #   how many presynaptic spikes is enough to trigger a postsynaptic spike,
    #   when each synapse has a unit efficacy value of 1.0
    SYNAP_FACTOR=300,

instead of 5

fire_dots_alpha=0.05,

instead of 0.01

‘defcafe’ is kinda magic a prompt in this case, ‘quick’ or ‘brown’ won’t trigger oscillations at all.

robf · April 19, 2023, 10:52am

I’m totally on board with “all of the brain working together”. I just don’t think the way the whole brain works together to combine old elements to construct new meaning has been addressed as effectively as other aspects. That’s why I liked the paper you referenced, because it did address that problem, and particularly the paper referenced within that, specifically looking at combinations of elements to construct new meaning.

So I think this combinations of elements to construct new meaning is just a less addressed area, and basically the missing piece.

It need not invalidate other perspectives. I think embodiment, down to the detail, will be important to substantiate qualia, for instance. That’s a big thing.

A lot depends on the detail you want to model. To place my ideas in the context of current AI, using the common “mastery of flight” analogy, I would compare deep learning to balloons - actually working, but in a completely different way to birds - transformers maybe as airships (someone described our current time as the “exciting Zeppelin phase of AI”, recently?), a truly biological model as the whole bird. And myself in that, I hope, in the same relation as the Wright Brothers, abstracting as much as is needed of the bird, but without feeling the need to be constrained by feathers and blood vessels, useful perspectives as they too may be.

In that I actually don’t think my level of abstraction differs too much from Jeff Hawkins’ original conception of HTM.

And in detail I’m working very much at a similar level of detail to what I remember of the CLA (Cortical Learning Algorithm.) And the representation of sequences as paths between cells in columns, in particular.

cezar_t · April 19, 2023, 11:13am

Can someone explain what this program is showing on that chart please?

robf · April 19, 2023, 11:48am

“2” seems fine. To combine “2” and “1”, you might just drive only the prompt cells appropriate to the prompt context.

But for driving prompts you’re right, that does raise other issues.

Firstly it raises the issue of what maximum path length we want to encode.

The maximum path length coded might come down to the number of cells and columns we have. Those will impose a maximum coding depth for sequences. There will be a maximum length of sequence it will be possible to encode by selecting subsets of cells. Especially if the subset is 50% of column cells, as at present(?)

And then we come to the problem of how to drive partially matching sequences. It’s easy if the sequence is an exact match. You just drive the next cell that sequence synapses with. But if the sequence is only a partial match, at the extreme, when you’re at the first litter of an isolated word prompt, what cells do we chose to drive…

Ideally we might drive with a strength related to the fit between a fill path and the partial path.

So, ideally, an isolated sequence like “quick”, might be driven on the cells of each letter, with a strength proportional to the length of match of the preceding sequence. So maybe “q” gets a very weak drive on all its cells (because no match with any preceding sequence), and “u” gets a slightly stronger drive on the cells of all sequences ending in “q”, “i” a slightly stronger drive again on cells of all recorded (Brown Corpus) sequences ending in “q-u”, etc.

The same for feedforward synapse activations.

It also raises the issue of what do do when you have a long sequence with just one character different.

It also raises the issue of how to identify the cells for all recorded sequences ending in a shorter sub-sequence.

I guess it’s done as a property of an SDR representing the path. Very specific paths, paths reaching back to the maximum distance capable of being encoded with an SDR or a given size, might have larger numbers of cells active (to give the depth of encoding necessary.) While shorter paths might have smaller SDRs.

Perhaps the way to do it is to start with very small SDRs when coding sequence, and gradually expand them. Then when we do the feedforward, the longer paths will be driven on a larger SDR, and so give a stronger, but more specific, signal.

That’s just an initial idea about how we might address this. I’ll put it out there for comment, and do some more thinking about it myself.

Yeah. So the way I’ve sketched a more nuanced “lookup” above, is as an expanding path SDR. So that longer paths have more cells active. Giving both a stronger, and a more specific, signal.

Re. the corpus being used “only to form synapses” and not to “run the network”, I don’t follow the objection there. Maybe you’re saying the same thing about how to match a driving/feedforward signal from a shorter sequence. My solution sketched as starting sequences represented as smaller SDRs (instead of 50% of column cells at the moment?), and then expanding them as the recorded sequence grows.

What’s the difference here. It looks like you stopped learning sequences??

I’m not sufficiently familiar with what SYNAP_FACTOR and fire_dots_alpha are doing at the moment to comment. My impression is that this is just to vary the strength of synapses?

Nice to get some oscillation though. Playing around with it to see what “tuning” gives us the kind of signal we want, is a good way to get an idea what we need.

robf · April 19, 2023, 12:08pm

It’s a raster plot. A plot of cell firing. With time on the x-axis, and cells (grouped into columns, roughly) on the y-axis.

In particular this is a raster plot for the cell firings of a network that @complyue has coded up, which captures all the sequences of letters in a corpus of text (the Brown Corpus in this case.)

He then drives the network by spiking cells in a prompt sequence, and sees how the activation spreads across synapses representing observed sequences of letters in the Brown Corpus.

To start interpreting it, look first at the extreme left. The very first cells to fire will be exactly those of the driving “prompt”.

If you’re talking about the last chart he posted in this thread, you can see on the extreme top left some spikes for “x”. Then, lower down, a bit later, spikes for “d”, “e”, “f”, “c”, “a”, “f”, and “e”, in sequence. That’s what he’s chosen to use to drive, or “prompt” the network, as a test. The other cell firings are the way those initial prompt spikes spread, and then repeated, or oscillated as activation circulated around the sequences of letters observed in the Brown Corpus.

We want to tune the network so that sequences of letters which tend to share beginning and end points in the observed sequences of the Brown, will synchronize, and give us nice vertical lines we can use to identify words, and later phrases.

DanML · April 19, 2023, 2:44pm

That’s what it does now.

The questions are about how many of the letter-mapped columns/cells to fire, e.g. ‘q’ has 10 columns each with a 100 cells by default. There is no analysis of what those cells link to - if anything. This also currently does not make any synapses - so you are assuming an existing network built by the separate corpus mapping functions (run before the network started).

The reason you know a ‘q’ group exists is because the whole alphabet (a-z lowercase) is hard-coded into the data structures in advance.

This could be different but that is what exists now.

DanML · April 19, 2023, 2:49pm

If you look at the code you can see that the difference is that the new one does not run words together when making links (but the other one did). Both make sequences, but logically these are now shorter (word length not sentence/corpus length).

robf · April 19, 2023, 3:39pm

Thought some more about this. Is the path code currently done by:

Actually I didn’t recall it was done as one cell connections only. I thought there was a place where you coded it as 50% of column cells connected to cells in the next letter.

Looking at it now, I don’t think one cell is enough. I think the path encoding needs to be an SDR.

So when we encode the sequences from the corpus, let’s try synapsing a random subset of cells from each letter to the next (or synapses from the cells synapsed in the last sequence step, to a random set in the next…) It needs to be a set. I was wrong to say just code the path as a connection from one cell to another. (That works for HTM because it codes a pattern on a dendrite of the cell?? We need a pattern too, but it can be among the cells?)

Anyway, if we code the sequence path using a subset of cells, or SDR across the cells, the limited size of the subset will mean the code for long sequences will eventually wash out. Because automatically the random choice at each step will start to overwrite remnant code from longer sequences. So the SDR will code (= limit the activation of?) a sequence of some length, but not infinite length.

Then, how to represent shorter strings as subsets of that SDR…

As an algorithm, I would say when driving prompt states:

“burst” a prompt letter with no context. So, spike “all” its cells.
Then, for the next letter of the prompt, spike only the cells of the next prompt letter which synapse from the first. Those will automatically be a superset of what the spiking cells would have been for longer paths (because the spiking cells of longer sequences would have been selected from that superset by the paths.)
Spike only those cells from the third prompt letter which synapse from the second (also automatically filtered as those which synapsed from the first.)

I think that automatically implements coding partial matches of shorter sequences with longer sequences in terms of their path cells (though actually the reverse of what I initially suggested, because it codes shorter paths as larger SDRs)

I thought it was driving 50% of each randomly. Note by “driving” I’m talking about prompts. Not the way that propagates further into the network.

I can’t figure out what problem you are pointing to here. I’m assuming a network built by corpus mapping functions run before the network started, yes. But when that’s done there are synapses. And those can limit what fires…

I guess I am missing the problem you are pointing to.

With this are you objecting to an arbitrary assignment of SDR code to each letter? Those could be built from lower level data, previously we discussed building them from some kind of neuron response to sound. But since the process of interest is that of building the next level of structure above at each stage, it doesn’t matter much where you start, and taking it from a hard coded letter representation is just a place to start.

If that’s what you mean.

Oh, thanks for clarifying that for me. That’s not such a big deal then.

complyue · April 19, 2023, 4:37pm

@robf I realize that I missed one important point in the synapse/connection making algorithm, that the selected cells to connect should be in a single thread of spike-train (am I using this term right?). I think I was wrong in my previous implementation about that point, thus this change to fix it:

With this realization, I now get how the “path” info is indeed encoded. And based on the new understanding, I drafted a new “blur” based prompting method, that at any time step when there is a letter to prompt:

the voltage of all cells’ except those belong to the prompt-letter get scaled-down by a prompt_blur=0.8
when any cell belonging to the prompt-letter is going to fire, leave all prompted cells as they are, i.e. as driven by the network’s dynamics
when none of the cells belonging to the prompt-letter is going to fire, force all of them to fire

Code location:

github.com

complyue/OscBrain/blob/9802293ad363da960fbbfcc9249c5f744897cf8a/oscb/sim.py#L199-L212


      
          def prompt_letter(lcode):
              letter_volts = cell_volts[sdr_indices[lcode], :]
          
              # suppress all cells first
              #    https://github.com/numba/numba/issues/8616
              cell_volts.ravel()[cell_volts.ravel() > 0] *= prompt_blur
          
              if np.any(letter_volts >= SPIKE_THRES):
                  # some cell(s) of prompted letter would fire
                  # restore letter cell voltages
                  cell_volts[sdr_indices[lcode], :] = letter_volts
              else:  # no cell of prompted letter would fire
                  # force fire all of the letter's cells
                  cell_volts[sdr_indices[lcode], :] = SPIKE_THRES

For “blur” to be safely performed by simply scaling, now VOLT_REST is hardcoded to be 0.0 and SPIKE_THRES hardcoded to be 1.0.

I’ve also updated the NBs and get sorta game-changing results, please pull the latest commits and run the new NBs to see and tinker.

robf · April 19, 2023, 6:37pm

Yes. “Single thread of spike-train” makes sense to me. And similar to the way it is done now in HTM. Except the “threads of spike-train” are not trained.

As I was sketching above, I now think that such a single cell train of activation likely needs to be made an SDR “train”. Perhaps randomly selected for the first element in a sequence, but from then on synapsing only from that initially selected SDR group, and to a randomly selected group in the next element.

Using an SDR “train” rather than a single cell “train”, should automatically limit the length of sequence which is coded (though I suppose that would happen with a single cell train too, which would eventually run out of unique cells to synapse to.)

But I can imagine anything which helps move the code to be one of paths rather than isolated states should indeed make a “game changing” difference

I’m confused that it still seems to be states other than the driven states which have continuing oscillations though. I would have thought the main difference of isolating paths would have been to restrict activation to only those states on the driven path, and other paths with similar beginning and end points.

Don’t see the significance of doing this. It further emphasizes the prompt state activity?

Topic		Replies	Views
SDRs and/or Graphs - modelling the data structures of the brain Tangential Theories sdrs	2	1151	February 14, 2022
Wondering about the history of HTM and SDR Education	18	1650	January 21, 2021
Sequence learning and invariant representations Numenta Theory	2	828	July 15, 2016
HTM Learning algorithm Related Papers	2	1038	November 3, 2017
A toy symbolic system partly inspired by HTM theory Lounge	0	741	June 9, 2017

The coding of longer sequences in HTM SDRs

Related topics