An open-source community research project on comparing HTM-RL to conventional RL

rhyolight · June 19, 2018, 3:36pm

We have model serialization in NuPIC. It just doesn’t always work well.

rhyolight · June 19, 2018, 3:48pm

A post was split to a new topic: OOP vs Functional

matan_tsuberi · June 19, 2018, 4:02pm

I don’t want to have to serialize my model each step. Isn’t there a simpler way?

rhyolight · June 19, 2018, 4:03pm

No, not each step. Just to go offline. To stop the program, for example, and start again with the trained model.

matan_tsuberi · June 19, 2018, 4:04pm

No, the RL algorithm should be able to predict multiple sequences from the same starting state. So you need to unfold a sequence, reset, and unfold again.

rhyolight · June 19, 2018, 4:07pm

Yes, I think @Paul_Lamb has talked at length about this. This is temporal sequence pooling and is not implemented in NuPIC.

Bitking · June 19, 2018, 4:07pm

Perhaps this is the HTM-RL direction that is begging to be found.

A checkpoint and Monte-Carlo probe with various step directions.

Paul_Lamb · June 19, 2018, 4:17pm

I think what you could do in this case (to test out different possible next steps from a given step) is to add a new activation function which bypasses the SP. The idea would be

Save SDR of current active cells
Disable learning
Run your simulated next steps
Execute a Reset
Run the “activation function” with the saved SDR
Repeat from step 3 for other simulations
Enable learning

matan_tsuberi · June 19, 2018, 4:22pm

I thought you couldnt just “reset” it

Paul_Lamb · June 19, 2018, 4:23pm

You can, but all a reset does is disable all cell states, so the next input will always cause minicolumns to burst. In other words, it doesn’t reset to a previous state, but rather erases all state information.

matan_tsuberi · June 19, 2018, 4:25pm

Okay and you can restore the state using “activate cells”?

Paul_Lamb · June 19, 2018, 4:25pm

Yes, the active cells represent both the input and the context of the input (i.e. not just “D” but “D after C after B after A”)

matan_tsuberi · June 19, 2018, 4:27pm

Great! So there’s no major technical problem implementimg these types of algoritms using nupic?

Paul_Lamb · June 19, 2018, 4:28pm

No problems that I can see. You’d just need to be familiar enough with NuPIC source code to know where to add the necessary function.

matan_tsuberi · June 19, 2018, 5:14pm

Well there’s the problem that we need to predict multiple values state and reward (the state itself may consist of lots of values). But if I remember correctly this can be solved via the network api.

rhyolight · June 19, 2018, 5:19pm

The problem is defined here:

github.com/numenta/nupic-legacy

Allow predictions for multiple fields per model

opened 10:57PM - 04 Jan 15 UTC

scottpurdy

subject:algorithms subject:swarming status:help wanted priority:3

It would be nice if you could specify multiple fields to make predictions for. …This will require a change to the CLAModel to support multiple classifier regions. It currently supports only one with the region name "Classifier": https://github.com/numenta/nupic/blob/0cc904e5835a45e541eb0cac92901271b8f91108/nupic/frameworks/opf/clamodel.py#L1117 Additionally, the OPF and description.py format will need to be updated to support multiple fields to predict. It require some changes to metrics or swarming so it knows which field (or fields?) to use for optimization. There are two implementations of the classifier, so we will need a task in nupic.core for the C++ implementation.

Paul_Lamb · June 19, 2018, 5:47pm

(injecting my favorite topic again) I think the solution to this problem is the addition of a pooling layer with long-range distal connections (similar to the output layer in SMI). Each field (or small subsets of related fields) would be given their own region, and use the pooling layer to “vote” on the larger context (like the “fingers” in the SMI example voting on the object being sensed). I’m not sure if we want to explore temporal pooling in this project, or if we’d rather stick with more established HTM algorithms.

matan_tsuberi · June 19, 2018, 5:52pm

I see nupic as the canonical HTM implementation. We should adhere to it as closely as possible apart from small techinical things that are required to adapt it to our end.

Randy · June 19, 2018, 6:14pm

Not to derail this discussion further, but…
Why couldn’t we just get the predicted cells and compare their columns to the spatial encoded values for the buckets of each of our desired variables? The buckets that have the highest overlap are the most likely predicted values for the variable.

rhyolight · June 19, 2018, 6:17pm

@Randy that is sort of what we call a “classifier”, right? Either way, NuPIC has techniques to turn predictive cells into predictions of the input type. The github issue above defines how it could be done.

Topic		Replies	Views
Reinforcement Learning and HTM Algorithm Machine Learning sequence-memory , encoders , question , community , nupic	26	3559	June 18, 2019
Need a push(htm & rl) Machine Learning	5	590	August 1, 2019
Exciting potentials with HTM agents in OpenAI Gym Engineering	5	540	October 20, 2019
HTM + Logic for sequence learning Machine Learning sequence-memory	2	478	November 16, 2023
Deep Reinforcement Learning, HTM Numenta Theory	5	1270	May 14, 2016

An open-source community research project on comparing HTM-RL to conventional RL

Related topics