New TP pseudocode?

sheiser1 · November 16, 2016, 4:17am

Hi All,

So I’m really curious to get a clear overview of what the temporal pooling function does, specifically how it learns stable representations of sequences from temporal memory. I’ve been reading through the TP.py file from /src/nupic/research on github and understanding pieces of it, just wondering if there’s anything out there like those awesome BAMI papers, which overview the Spatial Pooling and Temporal Memory algorithms in detail. Thanks!

– Sam

Jonathan_Mackenzie · November 16, 2016, 4:39am

The HTM school videos have good explanations of the spatial pooler:

The temporal memory video is Coming Soon™

rhyolight · November 16, 2016, 4:54am

Are you saying you want to know how the old algorithm worked? The CLA White Paper is the original source of NuPIC pseudocode, but we’ve updated TP to be Temporal Memory ™ now, which is described in BAMI. The TP algorithms in the CLA White Paper are out of date.

sheiser1 · November 16, 2016, 6:18am

Thanks guys! I know I’m up to date on the mechanisms of the Spatial Pooler and Temporal Memory as detailed in the BAMI papers and the nupic code. The reason I ask about the Temporal Pooler independent from that is that I saw it at work in this recent post showing animations of the HTM algorithms learning.

In several of the animations (post 10 out of 19 for instance) there looks to be another mechanism below the Temporal Memory that he calls the ‘TP’. The ‘TP’ seems to create representations that remain stable across pattern sequences of TM. So there are a set of cells in the TP that remain on over many time steps, as these cells have learned to represent that entire sequence. As he says, the sequence from TM has basically been collapsed down into a single spatial encoding (set of cells that recognize the entire sequence). This is really interesting to me because as he says, these more stable (less frequently changing) patterns could be passed into another region to find larger patterns still (maybe a bit like moving up the cortical hierarchy).

Basically if there is an established algorithm for these ‘TP’ cells to learn to recognize and represent entire sequences within TM I’m really curious to know how it works! Thanks again,

– Sam

sunguralikaan · November 16, 2016, 7:11am

They switched to calling it union pooler rather than temporal pooler.

The most recent implementation that I am aware of. (6 months old)
https://github.com/numenta/nupic.research/blob/86f6d2fe900c8f71becdc451156f16c4103a160c/htmresearch/algorithms/union_temporal_pooler.py

This is the only thing that comes close to being a pseudo-code but its just a copy pase of implementation with comments. (1 year old)

sebjwallace · November 16, 2016, 5:16pm

It doesn’t seem there’s any official pseudocode yet. Here’s some ideas though.

Here’s a link to some details on @floybix implementation, which gives some insight. He’s got some other posts on TP also.

From a naive point-of-view TP is fairly simple. A cell (or sparse distribution of cells) has a series of dendrite segments that recognizes TM patterns for each step of the correctly predicted sequence. The slight hack is that a dendrite excitation causes the neuron to spike, instead of just a depolarization.

This recognition of a sequence is essentially a classification of a feature, that then gets passed up to the next region.

These are only my general thoughts on implementation. Improvisation is fun until we get official pseudo.

Paul_Lamb · November 16, 2016, 5:31pm

Another way that I have implemented something similar for recognizing an object from its features, is when a cell is predicted multiple times in a row, it becomes active. In my implementation, this can be configured, but perhaps a simpler implementation would be if predicted from a predictive state, go active.

There was a discussion on another thread that I can’t find off hand, which talked about input from apical dendrites causing a cell to activate if it was in predictive state from distal dendrites (this would be a similar process to what I have implemented, except would require a new classification of dendrite besides the proximal and distal ones). I haven’t had a chance to explore this yet myself, as it starts getting into the concept of hierarchies.

sheiser1 · November 19, 2016, 4:10pm

Thanks @sebjwallace, I appreciate the detail! I’m definitely well closer to understanding it, I just want to see if I’m interpreting your diagram here correctly (to see what I’m still missing). Here’s my take as of now: In this diagram, the cell (one of however many in the ‘union pooler’) has 5 distal segments, each have connected to a subset of active cells in each of the last 5 time steps. In other words, the first segment (with ‘T’ above it) connected to a subset of cells that were active at time ‘T’, whereas the second segment (with ‘T+1’ above it) connected to a subset of cells that were active at time ‘T+1’. This goes the same for the other three segments at the other 3 later time steps. The cell containing these 5 segments becomes active if any of the segments becomes active (not just ‘predictive’ to await feed forward input as in Temporal Memory). Am I correct so far??

In looking at the ‘union pooler’ code kindly provided by @sunguralikaan, I think I’ve found a piece key to understanding how it works:

if learn:
  # adapt permanence of connections from predicted active inputs to newly active cell
  # This step is the spatial pooler learning rule, applied only to the predictedActiveInput
  # Todo: should we also include unpredicted active input in this step?
  self._adaptSynapses(predictedActiveInput, activeCells, self.getSynPermActiveInc(), self.getSynPermInactiveDec())

  # Increase permanence of connections from predicted active inputs to cells in the union SDR
  # This is Hebbian learning applied to the current time step
  self._adaptSynapses(predictedActiveInput, self._unionSDR, self._synPermPredActiveInc, 0.0)

  # adapt permenence of connections from previously predicted inputs to newly active cells
  # This is a reinforcement learning rule that considers previous input to the current cell
  for i in xrange(self._historyLength):
    self._adaptSynapses(self._prePredictedActiveInput[:,i], activeCells, self._synPermPreviousPredActiveInc, 0.0)

This seems to be an extension of the SP, where ‘adaptSynapses’ increments the active synapses that contributed to the winning columns’s overlap score and decrements those inactive synapses (who didn’t contribute). The piece I’m missing is what the union SDR is exactly. Is this an SDR the is the union of all cells that have become active within the last few time steps? If so it would be less sparse that an SDR representing a single input at a single time step.

What I’m really trying to figure out is:

What data structures are used for learning in the ‘union pooler’? From the code I pasted above it looks like the ‘PredictedActiveInput’, ‘prePredictedActiveInput’ and ‘unionSDR’ are involved, but how do they all contribute?
How does each union pooler cell (like the one you show) choose which cells from the Temporal Memory to connect to? Is there one segment per time step as I describe above, or is each segment forming synapses from a subset of the union SDR?

From my understanding this union pooler function is the major aspect of existing HTM theory that I haven’t wrapped around yet, and I’m itching to get there!

Thanks a ton @sebjwallace, @sunguralikaan, @Paul_Lamb and anyone with any further guidance!!

– Sam

sebjwallace · November 19, 2016, 11:32pm

You’ve got it. Each segment represents a separate step in the learned sequence. If the sequence was fed in again the TP cell will be constantly active throughout. This cell is a classifier for that sequence.

Upon reflection this method is loosely sound to the biology of STDP. Neurons spontaneously grow and connect to any axon terminals it can find. If the neuron of that terminal spikes before the dendrite neuron does then there is a temporal relationship between them. This could be ‘t’ (in the diagram). If the dendrite neuron forms another dendrite connection to another neurons axon terminal, and that neuron spikes before t, then we’d have t-1. This repeats until the dendrite neuron has dendrites of t-1,t-2,t-3,t-4, etc. (when I say dendrites I could also mean segments). The dendrite neuron will have gradually formed connections to the sequence in reverse order due to STDP.

There is a lot more detail that goes into that which reveals some nice properties. On the basic level its just about how a cell (of a group of cells) collapse temporal activity into a stable representation.

Just to note again - I don’t think this is how nuPic plan to implement TP. These are just my ideas.

sunguralikaan · November 20, 2016, 12:03pm

Temporal pooling/union pooling might be implemented as an extension of SP. You can read on ionotropic and metabotropic receptors for the biological base of the current union pooler implementation.

Not if you apply global inhibition on the union at each time step and excite only the top %x of columns based on overlap just as SP does. Yes they are columns not cells and learning happens on proximal dendrites of the pooling layer, the Nupic just names them cells. My best guess is because in terms of vanilla HTM layer, the union pooling layer is a layer with single neuron per column. So a column is actually a cell. We are essentially doing a version of SP on the cells of the input layer. The input are cells not columns because we need the distinction between predicted (landmarks) and bursting cells on the input.

predictedActiveInputs are the successfully predicted cells in the input layer. unionSDR is the currently
active union pooling columns. They change gradually in time in a continuous manner if implemented correctly. Any current activation at time t in the pooling layer is the unionSDR which are the current learning columns.

There is no choosing but there is a bias towards the predicted cells on the input layer. One underlying theory is successfully predicted cells should have a stronger/more frequent spikes compared to the cells activated by bursting on the input layer. That is why pooling columns are focused on predicted cells of the input. If I am not wrong the current theory is based on predicted cells activating the metabotropic receptors on the receiver column which causes a longer activation even if the input vanishes. Think of it as closing your eyes but still being able to visualize the last frame you saw. So unionSDR is the union of the columns that are still active on the pooling layer because of previous cellular activations of the input layer. This unionSDR actually changes slowly and continuously because of the above mechanism of active columns having an activation value that increases when the input is present and decreases when it is not until becoming inactive. Every iteration top %x sparsity of activity is picked among these columns as normal SP.

On low level, Nupic has higher overlap weights for predicted input neurons compared to bursting cells and you can see that in the code. So the temporal pooling collumns are activated longer when the input is predicted. This biases the learning biased towards predicted input cells. You can just include the predicted cells and not bursting cells in the input at all. That is what I do currently. Predicted input cells are the key here.

sheiser1 · November 23, 2016, 12:44am

Thanks a TON @sebjwallace and @sunguralikaan!! I feel way better in my understanding now, and if you’ll indulge me once more I’ll recap how I now think the union pooler works (in NuPIC to begin with).

First to establish what the data structures are. To stick with the terms used in the code, there’s:

‘activeInput’ --> the set of cells currently active within them TM (some predicted, some bursted)
‘activeCells’ --> the set of active columns output from the SP
‘predictedActiveInput’ --> the set of active cells correctly predicted by the TM
‘unionSDR’ --> the set of active cells in the union pooler (set of active columns like output of SP)

So far so good? If so I think I get the union pooler learning they’ve implemented (that ‘if learn:’ loop from the compute function that I’d posted prior). So here goes with that:

Update the permanences of all activeCells (SP winning columns) to match the predictedActiveInput. Specifically, go through each activeCell’s distal segment and increment all permanences connected to predictedActiveInput bits. This is just like spatial pooling, except each column’s distal segment connects not to encoder bits but cells from the TM that were correctly predicted.
Update the permanences of all unionSDR’s active columns to match the predictedActiveInput. This does the same thing with all columns of the unionSDR as was just done in the prior step with all columns in ‘activeCells.’ Each column in the unionSDR has a distal segment with synapses connected to cells in the TM. Those synapses on those segments that were connected to correctly predicted cells (‘predictedActiveInput’) are incremented.
Finally, go back a number of time steps into the past (‘historyLength’) and for each time step, update the permanences on the ‘activeCells’ from the SP to match the correctly predicted cells from that time step (‘prePredictedActiveInput’). This way each SP column is learning to respond to numerous correctly predicted inputs from prior steps, the same way it would normally learn to respond to numerous encoder activations. This allows these columns to become predictive in response to more patterns, enabling them to be more stable in their activity over entire sequences coming into TM.

It seems that in NuPIC the unionSDR cells act like SP columns, only connecting to correctly predicted cells from the TM instead of encoder bits. In that case I think each cell would have a single distal segment, with synapses adapted by spatial pooling them to these predicted cells. The active columns from the TM (‘activeCells’) are spatial pooled in the same way, but to the correctly predicted cells from each of ‘historyLength’ steps back in time.

The implementation you describe @sebjwallace sounds quite different, as its the unionSDR cells that connect to correctly predicted TM cells from multiple steps back in time, using a distal segment for each time step. This means you must have a parameter equivalent to ‘historyLength’ that sets how many segments can be made right?

Thanks again,

– Sam

sunguralikaan · November 24, 2016, 11:40am

You seem to use the term SP for pooling layer and TM for the input layer from what I understand. Just to be certain that we are on the same page, there is no TM involved in the pooling layer. The pooling layer just has SP that is learning on the active and predicted cells of the input layer. You could say the output of TM of the input layer becomes the input for the SP of the pooling layer.

UnionSDR is updated by the activeCells(columns) of the pooling layer at each iteration. It updates the unionSDR by applying global inhibition to current active cells (columns) and a global decay to all the cells (columns). Then unionSDR becomes the most active %x of the active cells (columns). So the change unionSDR(columns) is guided by the activeCells(columns). It is kind of like biasing the pooling activation (unionSDR[columns]) according to the most recent activation(activeCells[columns]).

This should be proximal segments of the cells (columns with single cells) of the pooling layer. Though it may also be implemented with distal segments. The reason I say this is, columns (through proximal dendrites) learn competitively, cells (through distal dendrites) are not in competition with each other in the same way as columns. In addition we need global inhibition which is done on columns in vanilla HTM not cells. There is no distal segment adaptation in union pooler implementation according to my interpretation but I may be wrong on that. You may as well obtain a similar function through distal learning.

The first two points of yours are similar to what I understand from Nupic. Though I have a slightly different implementation that kind of merges activeCells and unionSDR in this implementation onto the same thing because as you can see it is kind of redundant.

This is not a requirement. This part is to help with the speed of reinforcement learning from what I interpreted. Pooling should work without this step according to my implementation.

Maybe a Numenta employee might shed some light on us trying to decipher this
.

sheiser1 · November 26, 2016, 5:43pm

I was using SP to mean pooling layer and TM to mean input layer, and its a good thing you made the distinction cause I realize I was a bit off base. I thought that the ‘activeCells’ were the columns active in the input layer, but they’re actually those active in the pooling layer, right?? If this is the case I have to verify what the unionSDR is exactly, cause I thought that’s what it was. Is it the set of cells in the pooling layer that have been most active over the last ‘historyLenght’ timesteps?

Effectively I’m trying to get the difference between these two lines of code:

1) self._adaptSynapses(predictedActiveInput, activeCells, self.getSynPermActiveInc(), self.getSynPermInactiveDec())
2) self._adaptSynapses(predictedActiveInput, self._unionSDR, self._synPermPredActiveInc, 0.0)

They’re both doing SP learning to the ‘predictedActiveInput’ (correctly predictive cells from the TM of the input layer), the only difference being that one is adapting synapses of the ‘activeCells’ in the union pooler and the other is adapting the unionSDR.

Thanks again!

– Sam

sunguralikaan · November 26, 2016, 7:53pm

I might as well have some holes in my understanding too that I am not aware of so I was kind of hesitant to provide a lot of information but here it goes.

Yes that is what I understand from the code.

As I pointed out these two lines are kind of redundant because unionSDR is strongly correlated with activeCells. But I still think adapting the synapses of both lists of columns serves a purpose.

At the start of every pooling iteration, you apply the default SP with the predicted and active cells of the layer below as input. While the active inputs increase the overlaps as usual, the predicted inputs have a higher impact input by using a weight.

# Compute proximal dendrite overlaps with active and active-predicted inputs
overlapsActive = self._calculateOverlap(activeInput)
overlapsPredictedActive = self._calculateOverlap(predictedActiveInput)
totalOverlap = (overlapsActive * self._activeOverlapWeight +
                overlapsPredictedActive *
                self._predictedActiveOverlapWeight).astype(REAL_DTYPE)

The overlaps of all the columns are calculated.

if learn:
  boostFactors = numpy.zeros(self.getNumColumns(), dtype=REAL_DTYPE)
  self.getBoostFactors(boostFactors)
  boostedOverlaps = boostFactors * totalOverlap
else:
  boostedOverlaps = totalOverlap

Then comes the boosting part as in the default SP.

activeCells = self._inhibitColumns(boostedOverlaps)
self._activeCells = activeCells

Then comes the inhibition, again as the default SP. In the end, the pooling layers activates the columns with the highest %X overlap. Until this stage, pooling layer does not consider any prior activation on the input layer, at least not directly. So this activation is named activeCells, actually columns in SP terms.

# Decrement pooling activation of all cells
self._decayPoolingActivation()

The overlaps of active columns also increase the pooling activation of the columns which is a seperate variable. This pooling activation decays in time. So this step applies the decay to the pooling activation variable of all the columns.

# Update the poolingActivation of current active Union Temporal Pooler cells
self._addToPoolingActivation(activeCells, overlapsPredictedActive)

This is the part where the current overlaps of the current active columns (activeCells) increase their own pooling activation with respect to the strength of the overlap.

# update union SDR
self._getMostActiveCells()

This part finds the top %X amount of columns according to the pooling activation, not the current overlaps. The implementation calls this list of columns unionSDR. The function should be named something like getMostPooledCells to prevent confusion.

In short;
activeCells → columns with the highest overlaps at the current iteration.
unionSDR → columns with the highest pooling activation at the current iteration.
So these two are different lists that are calculated at every iteration. They share some columns because the pooling activations of columns are effected by overlaps. Implicitly, unionSDR is updated by activeCells.

Now, back to those two lines. The implementation adapts the proximal dendrites of the columns in these different lists to the same current input. The adaptation of unionSDR seems obvious as we are doing pooling but I am not sure on the necessity of activeCells here. Hence it seems kind of redundant to me.

Still, adapting synapses of activeCells (columns) helps biasing the learning towards the recent activation. For example, if a column is in both lists than it learns a lot faster than the pooled columns only in unionSDR which may or may not be activated recently.

I am afraid the best way to grasp the whole thing is trying to implement it because it is not straightforward.

Topic		Replies	Views
Temporal Pooler Implementation Engineering	12	1334	May 15, 2016
Understanding Old Temporal Pooler (2011) Numenta Theory	6	1552	May 7, 2016
What ways have been proposed for the implementation of temporal pooling? Numenta Theory	5	719	January 19, 2018
What is flowchart of Temporal memory Numenta Theory temporal-memory	2	768	January 4, 2018
The most / least frequent pattern NuPIC temporal-pooling	3	856	October 27, 2017

New TP pseudocode?

Related topics