Lexicon Induction From Temporal Structure



I have an idea about how to solve this first problem…

I started by looking at Numenta’s 2-layer HTM circuit, in which Numenta assigns and holds constant the state of the output layer’s Spatial Pooler. The purpose of the output layer Spatial Pooler is to recognise sequences of activity in the input layer and assign a stable representation to them.

How can I determine if the output layer’s Spatial Pooler is forming representations of whole sequences? By training the circuit and then measuring the output layer spatial pooler mini-column overlap between all of the elements in the sequence and verifying that the overlap is significantly greater than the overlap between unrelated sequences. Hereafter I call this the sequence-overlap property.

Then I asked how I could change the output layer spatial pooler such it naturally has this sequence-overlap property. The best idea I have is to measure the sequence-overlap property and force it to be true by activating mini-columns when the overlap is less than some predetermined threshold. The output layer Spatial Pooler Mini-columns can then become activated in two ways: from proximal input via the regular method OR because the mini-column was previously used to represent an ongoing sequence and is now being selected to continue representing it.

HTMs are unsupervised algorithms. The sequences of input (encountered in the wild) are never clearly delineated; where one thing ends and the next begins is something the brain is supposed to figure out on its own. So I make the assumption that every two consecutive moments in time are part of the same sequence.

The output layer spatial pooler then:

  1. Measures the overlap between its current activations and the previous timesteps activations.
  2. If that overlap is less than a predetermined threshold (such as 50% overlap) then mini-columns which were previously active and are no longer active are activated in order to represent this sequence. Also, mini-columns are selected on the basis of how well they recognise their current input.

I have a proof of concept for this mechanism, it’s not tested on OPs dataset though:

Thank you for reading,


This switching mechanism seems like it would switch too much. For example, think of a cortical column processing some visual input. Both the proximal and distal inputs to the input layer of the circuit might change drastically while inspecting one object. The distal stimulation would change drastically as eye movements saccade around the object. The proximal simulation might be completely different when looking at one part of object vs another.


The purpose of this mechanism is to measure and control how much the output layer switches between objects.

Another way of looking at this mechanism is that it assigns output layer mini-columns to represent sequences of input when it detects that the current sequence is not being adequately represented.

Example with visual input:
Lets say, for this example, that there is an eye looking at a coin, and attached to the eye is our 2-layer HTM circuit. First the coin is showing heads and then it flips over to show tails. When the coin flips from heads to tails the input layer will dramatically change because there is little semantic similarity between the images on the two sides of the coin.

Without this mechanism, the output layer would also dramatically change, resulting in two very different output layer mini-column (OLMC) representations of the coin, one for heads and another for tails. This can be written as Overlap(HEADS, TAILS) ~= 0, where HEADS and TAILS are the output layer mini-column activations for these two visual inputs.

This mechanism detects when Overlap(OLMC(Time), OLMC(Time - 1)) < THRESHOLD.
When the coin flips, this mechanism should detect the sudden change as a sudden decrease in output layer mini-column overlap between consecutive moments in time.

Having detected the sudden change, this mechanism attempts to fix it by forcing some output layer mini-columns to remain active from the previous time step. This ensures that the Overlap(OLMC(T), OLMC(T-1)) >= THRESHOLD.

When the coin flips from heads to tails, this mechanism will ensure that THRESHOLD fraction of HEADS are still active while viewing the tails side of the coin, for at least one time step. These forced activations (sampled from HEADS) also learn about the tails side of the coin. Given enough flips of the coin, some of the output layer mini-columns should learn to respond to both heads and tails.


This is an interesting line of thinking, I think you should follow through with your experiments and let us know what you find.


This is a similar strategy to the one that I am exploring, where the goal is to form an association between two consecutive inputs. However the implementation I am working on is quite different. Instead of mechanism for forcing previously active minicolumns to remain active, I am instead looking at forming proximal connections at the cell level using a similar algorithm to the one used in TM for forming distal connections. The input for this algorithm is the active cells of the inference/TM layer over two time steps (thus forming an association between two inputs).