An open-source community research project on comparing HTM-RL to conventional RL


We have model serialization in NuPIC. It just doesn’t always work well.


A post was split to a new topic: OOP vs Functional


I don’t want to have to serialize my model each step. Isn’t there a simpler way?


No, not each step. Just to go offline. To stop the program, for example, and start again with the trained model.


No, the RL algorithm should be able to predict multiple sequences from the same starting state. So you need to unfold a sequence, reset, and unfold again.


Yes, I think @Paul_Lamb has talked at length about this. This is temporal sequence pooling and is not implemented in NuPIC.


Perhaps this is the HTM-RL direction that is begging to be found.

A checkpoint and Monte-Carlo probe with various step directions.


I think what you could do in this case (to test out different possible next steps from a given step) is to add a new activation function which bypasses the SP. The idea would be

  1. Save SDR of current active cells
  2. Disable learning
  3. Run your simulated next steps
  4. Execute a Reset
  5. Run the “activation function” with the saved SDR
  6. Repeat from step 3 for other simulations
  7. Enable learning


I thought you couldnt just “reset” it


You can, but all a reset does is disable all cell states, so the next input will always cause minicolumns to burst. In other words, it doesn’t reset to a previous state, but rather erases all state information.


Okay and you can restore the state using “activate cells”?


Yes, the active cells represent both the input and the context of the input (i.e. not just “D” but “D after C after B after A”)


Great! So there’s no major technical problem implementimg these types of algoritms using nupic?


No problems that I can see. You’d just need to be familiar enough with NuPIC source code to know where to add the necessary function.


Well there’s the problem that we need to predict multiple values state and reward (the state itself may consist of lots of values). But if I remember correctly this can be solved via the network api.


The problem is defined here:


(injecting my favorite topic again) I think the solution to this problem is the addition of a pooling layer with long-range distal connections (similar to the output layer in SMI). Each field (or small subsets of related fields) would be given their own region, and use the pooling layer to “vote” on the larger context (like the “fingers” in the SMI example voting on the object being sensed). I’m not sure if we want to explore temporal pooling in this project, or if we’d rather stick with more established HTM algorithms.


I see nupic as the canonical HTM implementation. We should adhere to it as closely as possible apart from small techinical things that are required to adapt it to our end.


Not to derail this discussion further, but…
Why couldn’t we just get the predicted cells and compare their columns to the spatial encoded values for the buckets of each of our desired variables? The buckets that have the highest overlap are the most likely predicted values for the variable.


@Randy that is sort of what we call a “classifier”, right? Either way, NuPIC has techniques to turn predictive cells into predictions of the input type. The github issue above defines how it could be done.