Review and validate my own implementation of the temporal pooler

Changes

  • Connections have permanence and are either connected or not. (there is no notion of a synapse).
  • Columns have spatial connections.
  • Cells have temporal connections.
  • Cells don’t have segments.

Temporal Pooler

Phase 1: Activate cells within active columns.

  • For each active column check if there are predictive cells within the column.
  • If there are no predictive cells: then activate all cells within the column (burst). Pick a winner cell by choosing a random cell with the least number of temporal connections to other cells.
  • If there are predictive cells: then activate only those cells. Pick a winner cell by choosing the cell that is the most predictive, meaning the cell that best represents the transition from the previous active cells to the current predictive cells.
  • Form new temporal connections from a subset of the previous winner cells to the current winner cell or strengthen/weaken existing ones by the values temporal_connection_increment / temporal_connection_decrement.
  • For each cell that was predictive but didn’t become active weaken the temporal connections that lead to it becoming predictive by the value temporal_connection_predictive_decrement.

Phase 2: Choose cells to become predictive.

  • From all currently active cells check which temporal connections are connected. If there are more connected temporal connections leading to a cell than a predetermined activation threshold then make that cell predictive.

Variables

  • number_of_columns 2000
  • number_of_cells_per_column ???
  • potential_percent 0.5
  • activation_threshold ???
  • spatial_connection_threshold 0.1
  • spatial_connection_increment 0.05
  • spatial_connection_decrement 0.008
  • temporal_connection_threshold 0.1
  • temporal_connection_increment 0.05
  • temporal_connection_decrement 0.008
  • temporal_connection_predictive_decrement ???
  • max_number_of_connections_per_cell ???

Questions

  1. Is the temporal pooler working properly?
  2. The number_of_cells_per_column should make up for cells not having segments. For example if a correct value in the vanilla TM is 128 cells per column and 128 segments per cell then for my implementation it should be 128*128 = 16,384 cells per column.
  3. What should be the value for activation_threshold? Meaning, how many connected temporal connections leading to cell should be enough to make that cell become predictive?
  4. For predictive cells that didn’t become active by how much should the temporal connections that lead to it becoming predictive be weakened? In other words what should be the value for temporal_connection_predictive_decrement?
  5. What should be the max_number_of_connections_per_cell ?
3 Likes

Your description looks correct to me, though I’d have to see the source code to comment more intelligently on it.

I’ve not thought about this particular idea before. Let me think it through a bit and I’ll comment later if I see a logical problem with this approach.

It depends on the use case, but this particular parameter is mainly for balancing capacity with noise tolerance. I usually start at around half of the number of minicolumns active in a timestep (which depends on your target sparsity level) In a typical 1024 minicolumn 2% sparsity, I start with 10 for the activation threshold and adjust from there.

This also depends heavily on the use case. NuPIC defaults to 0 (meaning no decrement for incorrect predictions, so it essentially never forgets anything), but in any case, you definitely want to start really low and adjust up from there if find that you start accumulating too many wrong predictions.

I’ve not implemented TM without segments myself, so I can’t really provide any expertise on this aspect. I can say that the max synapses per segment in NuPIC defaults to 128 (which I also typically use). This is another one that you will probably have to play around with (or perhaps others who have implemented TM without segments might have some insights).

A couple of other points (not a criticism of your implementation) to avoid confusion. I recommend a couple of terminology changes:

  1. I would not use the term “temporal pooler” in this context. The original algorithm a few years back included both temporal memory and pooling and was called “temporal pooling”, but the pooling aspect has since been separated out and we currently call the algorithm “temporal memory” (or TM). It is a minor point, but the term “temporal pooling” (or TP) is currently used to refer to an algorithm which pools activity over time, and is separate from the TM algorithm.

  2. I would use the term “minicolumn” instead of simply “column”, because currently there is a lot of focus on the “cortical column” (a collection of layers containing thousands of minicolumns), so the term “column” by itself has become ambiguous.

3 Likes

One major effect that this will have is that it will drastically increase sparsity when things are being correctly predicted. This would allow for unions of a lot more SDRs to be active at the same time (which I assume is the reason you are considering this strategy – since the drawback of eliminating segments is unions of too many SDRs leading to false positives).

2 Likes

Consider this paradigm:

time = 1; input = ‘A’;

The set of columns that represent ‘A’ burst and all cells within are activated but this leads to no other cells becoming predictive. Even though there are temporal connections to cells within the columns that represent ‘B’.

time = 2; input = ‘B’;

The set of columns that represent ‘B’ burst and all cells within are activated. A winner cell is chosen for each active column.

Should those winner cells form new temporal connections from the columns that represent ‘A’ or should the existing connections (that didn’t have high enough permanences) be strengthen?

In my implementation, I do both (determined by a configurable threshold) – if at least that number of synapses are receiving input (whether or not they are at connected permanence), then I train that segment, otherwise I grow a new one. I’m not sure about NuPIC, but in BAMI they do not appear to consider that detail (they simply always pick the best match – even if it only had a single synapse active):

84. function bestMatchingSegment(column)
85.   bestMatchingSegment = None
86.   bestScore = -1
87.   for segment in segmentsForColumn(column, matchingSegments(t-1))
88.     if numActivePotentialSynapses(t-1, segment) > bestScore then
89.       bestMatchingSegment = segment
90.       bestScore = numActivePotentialSynapses(t-1, segment)
91.
92.   return bestMatchingSegment