Related to some research I’m doing, I need to devise a way to get high-order (multistep) cellular activation predictions out of the TM algorithm. As I understand it, the TM algorithm only outputs cellular activation predictions for the very next time step. I know there exists a bunch of support for getting multistep scalar or class predictions but that’s not what I’m after here.
One thing I was thinking was feeding the predictions that come out of the TM algorithm right back into the TM’s input. This would be like assuming the predictions were 100% correct and then seeing what cells would get put in a predictive state next if those first order predictions were 100% correct. This could be repeated n times to get n high-order cellular activation predictions.
Definitely an interesting problem. The issue with predictions from predictions is that since the TM makes multiple simultaneous predictions (it predicts unions of possible futures) your SDRs will get denser and denser with each timestep, resulting in less and less precise predictions. However, for any one level of temporal abstraction, maybe that’s exactly what should happen.
In the long run the theory predicts we’ll do this by representing time more and more coarsely at larger scales up the hierarchy. Then at each level you can make predictions at that granularity that get less and less precise as you go multiple timesteps in the future, but you can still predict out to whatever duration you want if you pick a high enough region.
But of course the theory hasn’t yet fleshed out the method of temporal abstraction up the hierarchy (“temporal pooling”). That said, some recent threads on the forum have some promising work in this direction.
Aside: all of machine learning has this problem. It’s very trendy right now to predict video frames pixel by pixel, and as you extrapolate these predictions forward by chaining them from other predictions, the result gets blurrier and blurrier. There has been some great work that shows that if you predict in a high level abstract space and only pull out pixel predictions as a byproduct, you can get much sharper predictions that better respect the structure of how things actually move in the world. This is analogous in some ways to the HTM idea of predicting at a coarser temporal resolution.
Thanks for the response! You make an interesting point. Unions of possible futures would indeed be what I’d need for my purposes. I’d ideally like to capture all possible paths from a single point in time. I suppose it doesn’t hurt to try,
You can add nonbinary weights and nonbinary level of cell activation. Then different predictions will have different levels of activation. So, you will be able to feed not a union of predictions but only the most likely(more active). This can solve the problem
This can almost be done with the current artifacts from the TM algorithm, by scoring number of distal synapses and their permanence to generate a non-binary SDR. It might require a global decay rate to be added to be used as a tie-breaker in cases where two paths are statistically equally likely to occur though.
I’ve never tried this myself though, as I haven’t had a need for it, so purely speculation on my part
I think this is a good idea in general, but it’s unlikely to solve the problem (although it may mitigate it). Taking the most likely predicted cells is just going to give you a sparser union that still contains multiple possible outcomes.
This is the most desired result for my purposes, however, which I can detail more if anybody is interested. I need all possible outcomes to be represented together in a union. Getting the state of cellular activations related to the most likely high-order sequence value as determined by the TM algorithm I think is a different question.
I have tried what was proposed above in one of the previous architectures that I worked on. Almost everytime it did not converge onto an actual activation that happened as @jakebruce said. It contained bits and pieces from multiple activations which was very hard to utilize in a meaningful way.
So I went even further by creating a circular loop of two layers (reciprocal connections in some sense) in order to extract the dominant actual activation (an activation that happened) from this sparse union that contained bits and pieces from separate states. At the time I believed this was how basal ganglia resolved conflicting activations from a union of predictions (Cortex->Striatum->Gpe/Gpi->Gpi->Thalamus->Cortex):
Lets say you have A and B layers that only have spatial pooler. Layer B classifies Layer A by taking its columnar/neural activation as input. In turn, Layer A takes input from B’s columnar/neural activation. Assume we trained this layer loop some time so that they both classify each other correctly. If you activate some union activation in A and let the circular flow continue, it converges onto an actual state as it passes to B and comes back to A and goes back to B and so on. This convergence is due to the merits of SP algorithm. The method worked in terms of extracting the dominant real activation but it solved the wrong problem for me which helped me understand a more fundamental problem about the architecture.
Interesting perspective, @sunguralikaan. Thanks for sharing your experience.
I wonder if a single activation for a high order prediction is really always the “correct” answer here. I think it depends on the question that is being asked. In situations where multiple simultaneous sequences are known, from a given point in time the system rightfully does not know which sequence is going to continue. It’ll ideally only have some idea of the frequency of patterns which you might interpret as their probability of occurring. So, you can ask the system different things here. You could ask it "what do you think is the most likely activation to occur at time step t+n where n>1, or, alternatively, you could ask it for everything that it thinks could happen which would best be represented as a union of activations which individually represent possible futures.
In both cases, I imagine the predictions are going to get less reliable the farther out you go. In the case of the single activation, the single activation prediction is going to be less reliable due to the number of possible answers blowing up exponentially leading to more uncertainty. In the case of a union of possible futures, the farther out you go the denser the SDR is going to become because you’re unioning so many possible futures. The resulting dense SDR is then rendered uninformative as you go further out.
In any case, if it’s still accessible, would you mind sharing your code that you used to tie cellular activation predictions back to the TM algorithm? I’d be very interested to see your approach to do this from an implementation standpoint.
This was 8 months ago so the experimental code is not in the codebase anymore. I assume you are looking for some sort of a Nupic extension but I work on a separate HTM implementation with its own structural design embedded in a game engine so the implementation would not communicate well.
On the other hand, I remember my approach pretty well as it was pretty naive. Normally, a layer would call SP first and TM second in an ongoing loop. Whenever I wanted to feed predictive cells back to TM as activations, I would call a function named X instead of default SP function prior to TM.
On top of this, you have to make sure TM Phase 1 (bursting columns or activating neurons from predictive columns) is bypassed so that when you call TM, it does not try to activate cells from active columns as we already did this by activating predictive cells. Then you run the rest of TM algorithm as usual. I would imagine that the most recent Temporal Memory implementation in Nupic makes this harder to pull of as TM phases are kind of merged.
What you want if the layer has predictive neurons at this point:
-> TM(layer) excluding TM-Phase1
-> Reset all active cells and columns.
-> activeCells = predictiveCells
-> Reset all predictive and matching cells.
-> Reset all active and matching segments along with their activations caused by distal inputs (for the proceeding TM calculations)
Thanks for the explanation. It’s certainly something I want to try in my research if I have the time.
Another thing I was thinking was making use of the capabilities of the current NuPIC code base to produce scalar/class predictions of the data signal. Consider a single signal that takes on scalar values. I want to look at the probability distribution that is produced for the next time step’s scalar value. This prediction of the scalar value can be assumed to be 100% correct and then used as a simulated next time step data input into the model. The rest of the algorithm follows normally until you get a new set of predicted cells and, voila, a high-order prediction of cellular activations.
In the case of branching sequences, ideally the probability distribution at the fork in the road will appear significantly bimodal, trimodal, etc. There’s no reason each path couldn’t be followed separately to get a sense of multiple possible futures.
I’m aware that the scalar value prediction mechanisms in NuPIC support high order predictions, but that might be too limiting for my purposes. I need knowledge of all possible branching sequences…each member of each sequence, which sequence each belong to, where they start, where they end, etc.