Union Pooler implementation not working as expected

I’m writing my own implementation of a union pooler/temporal pooling algorithm to learn to represent currently active sequences in a longer-term context. I’m adapting the code from the nupic python library from this union pooler implementation.

I’m trying to write the union pooler to work with existing code from this repo: bitHTM. It removes a lot of the details like local receptive fields and other stuff to simplify and speed up the code (I’m also using cupy instead of numpy so that it works on a GPU, but you can replace cp with np and its the same). The spatial pooling/boosting mechanism is implemented as a vectorised matrix operation from this repo, so I’m pretty sure the weight update rule is correct.

Here’s my implementation. I just wanted to ask if anyone could potentially help me check that I’m implementing it right, because it seems to be broken.

So firstly, I wanted to create a bunch of synthetic data to test the model with various parameters. The data is basically 10 patterns each with 10 time sequence steps, which are binary arrays of length 500, just randomly initialised. I choose a pattern randomly and train the model on the 10 inputs in the pattern one-by-one, then go again. I was worried that they don’t preserve any semantic information between bits, is this crucial for the union pooler to work? It seems to lean patterns with SP/TM basically perfectly, predicting every column after enough iterations, and predicting the inputs between patterns very badly (usually 0 or less than 10 correct, as you’d expect) meaning when pattern 1 ends it cannot predict the next one because it could be pattern 1,2,3,4…etc.

The problem is that my model eventually reaches just one stable representation that never changes between patterns. Ideally, I should expect the model to represent the currently active pattern, then as soon as it sees the next one, it should rapidly update the entire context to represent this, and not change much during the course of the pattern. In practice, I can see the change in the union_SDR between each step, and it doesn’t show this behaviour at all, it seems to change almost randomly during the course of the pattern, sometimes changing a little at the start, sometimes changing a lot mid-way through. However, when I plot the sum of each unionSDR over each type of pattern, they’re all relatively distinct, meaning some bits have high activation for one specific pattern, and zeros everywhere else.

I understand that this is probably not the most rigorous analysis, but is there anything specific about the kind of patterns it can match, or any issues with my implementation which may be at fault? Is there a more updated model or pseudocode which I could use instead? I’ve tried a bunch of different learning rules, and removing each of the existing rules but nothing seems to form stable yet diverse representations for each of the 10 patterns.

I would be extremely grateful for any ideas about what could be wrong, thank you

4 Likes

I’m not so sure as I’m not familiar with Union TP at all, but in the decay_pooling_activation function, you wrote:

self.pooling_activation = cp.exp(-0.1 * self.pooling_timer) * self.pooling_activation_init_level

I think it should be self.pooling_activation instead of self.pooling_activation_init_level.
Of course, in the reference code, it says

self._poolingActivation = self._decayFunction.decay(\
                          self._poolingActivationInitLevel, self._poolingTimer)

But it could just be a typo and the default value of the decayFunctionType argument is 'NoDecay' so it could’ve been just overlooked.
Again, I could be wrong. :fearful:

P.S. I’m the author of bitHTM and I didn’t know there was anyone actually using it as there was a ton of bad coding decisions in retrospect and the whole project was mainly just for practicing NumPy. :exploding_head:

2 Likes

WOW! Great to hear from the author themselves :smiley: that’s so unexpected.I just wanted to thank you for writing that repo you have honestly saved me so much time in understanding the HTM and maybe even borderline saved my bachelors dissertation :sweat_smile:

I’ve tried variations with no decay or exponential, as well as using sigmoid pooling or linear, both appear to have little effect, or actively make it worse. For future readers, I’m reading a paper about “sandwich pooling” which I’m currently implementing now, hopefully, that can do what I want, or maybe there’s a more fundamental error somewhere else in my code :man_shrugging:

4 Likes

Hi,

Here is a competing hypothesis about how to implement temporal stability:

Learning Invariance from Transformation Sequences
Peter Foldiak, 1991
Physiological Laboratory, University of Cambridge
https://www.researchgate.net/publication/215991433_Learning_Invariance_From_Transformation_Sequences

Foldiak proposes a new learning rule:
“A trace is a running average of the activation of the unit, which has the effect that activity at one moment will influence learning at a later moment. This temporal low-pass filtering of the activity embodies the assumption that the desired features are stable in the environment.”


Also, Kropff & Treves used a very similar mechanism in their model of grid cells, which I discuss in detail in a 20 minute recorded lecture:

3 Likes

What are the differences between Temporal Memory and Union Pooler?
Same purpose, different mechanics or they are meant to do different things?

1 Like

As far as I know, the Union (Temporal) Pooler is just a way of implementing the Temporal Pooler.
The main purpose of the Temporal Pooler is to identify and represent a sequence in a unique and stable manner. You can see how it can be utilized to name a song, etc.
The Temporal Memory on the other hand, tries to predict the next input. Of course, as you probably know, it technically can represent a sequence in a unique way, but the representation changes every step.

3 Likes

Thanks so if I understand this, the output of the temporal pooler is:

  • a SDR specific to the input SDRs over past N time steps.
  • which attempts to preserve similarity - if two input sequences are similar then their corresponding TP-output SDRs are also similar.

This is it?

One question would be whether is it anticipatory?
That would be it signals the presence of a learned temporal pattern before being fed the whole pattern?

e.g if it learned “XYZAW” as a pattern and its associated (temporarly pooled) SDR is “S”, would “S” begin to be visible just after “XY”? Assuming of course that no other patterns begining with “XY” where learned.

PS One consequence would be that the output (if we compute it every time step) has a slower rate of change than the input.

1 Like

Yes, but also for the future M time steps as well, given N+M equals the length of the sequence, assuming the sequence won’t be interrupted abruptly.

I probably don’t know well enough about TP to answer this, but I’d assume some property of similarity is generally desirable. I have no idea about how UTP handles this though.

Ideally, yes. But in practice it might not kick in instantly though. It probably depends a lot on the implementation.

I think that’s actually one of the desirable features of TP. Note that Numenta used to put a lot of emphasis on stable activation in the higher layers when they were actively researching TP.

2 Likes

Thanks, now it makes more sense.

I think many ideas in Reservoir Computing accomplish similar goals with a lot less conceptual gymnastics. And since there-s no learning, reservoir’s outputs tend to be inherently stable.

I found there-s also a patent on temporal/union pooler which is both confusing (a lot of particular options to implement it) and intriguing… what a patent would be useful for?

2 Likes

Is this patent idea implemented in

But it’s not clear for which applications it is suitable?

1 Like

Most likely yes, is an implementation of whatever is described in the patent. Checking this is indeed the case is a bit beyond my processing capacity.

Regarding for which application is it suitable, my feeling is it would be easier to find / figure out applications for a temporal pooler than for a temporal memory.

My confusion was based on the mistake I made assuming “Temporal Pooler” was only an early name replaced by “Temporal Memory” in Numenta’s nomenclature, both being basically the same thing.

PS well, in a sense a TM’s entire set of predictions/cells can be regarded as a temporal pooler. Its internal state is dependent on past input and has a good repeatability, specially after it disables learning .
One problem though could be the expanded state size - column width x column height representation.

1 Like

I understood that a key benefit of SDRs is noise immunity: two SDRs of (say) 20 bits that differ in one or two bits are probably the same SDR, plus noise.

So is ‘similarity’ in this context intended to convey this concept of ‘same but for noise’ or the concept of ‘semantically similar’, but not actually the same SDR.

2 Likes

I skimmed the patent application and the linked python file, and yes they appear to be the same thing.

The main idea in that patent application is:

“After activation, the [cells] remains active for a period longer than a [single] time step.”

2 Likes

I think in order for TM to be qualified as a TP, the size of the state vector(i.e. the representation) should be considered as \text{sequence length} \times \text{column width} \times \text{column height}, if you ignore the implied likely requirement of a fixed state size.
The stable representation property of TP is mostly expected to impose the activation during a sequence to be stationary over consecutive time steps, or possibly it converges depending on the realization of TP.
TM is not a realization of TP not because learning can cause drifts in the sequence representations, but because the overlap of the TM cells between two consecutive time steps is virtually zero.

P.S. It is true that Numenta used to call something very close to TM as TP, but there were varying degrees of differences such as the receptive field over time being larger than one(dependency on the past N states where N\ge1), etc… if I remember correctly.

2 Likes

I guess both. A segment activates when 4-10 (configurable) inputs are active.

True, but if you apply a delay - for whatever cells were active within past N time steps output a 1, there you have it.
So yeah state has to track the past N steps activations, but the output SDR can be only cols*depth size

1 Like

Very offtopic but how is that possibly patent-able :rofl:

2 Likes

Thank for your checking this code! I believe that this idea does not improve the prediction quality. Maybe for object classification?

Hey, how’s your temporal pooling implementation going? Were you able to figure out why the values/patterns didn’t change?

Have you tried to implement this sandwich pooler? I have read the paper and think that is a good approach. However, I suspect that you have to invest some time to find suitable parameters for both spatial poolers so that everything runs stable. I find the idea really interesting. Too bad, unfortunately I haven’t found a reference implementation so far.