Self Play for HTM?

Ed_Pell · August 14, 2018, 11:36am

Self play has been a big success for Alpha Zero. Exceeding human level play in a few days. Is there a roll for self play in HTM? Can a HTM be trained to play a games with defined rules and then be set to train by playing itself?

Paul_Lamb · August 14, 2018, 12:14pm

HTM is currently missing a reinforcement learning capability, so by itself it would not be able to formulate or execute game strategies. In it’s current state, HTM could learn game states and make predictions about what will happen in the future (those predictions could include predicted reward). You’d need to combine HTM with other AI technologies to create a game playing system.

Jonathan_Mackenzie · August 14, 2018, 12:16pm

HTM models aren’t really trained in the traditional supervised sense. They simply observe input sequences and output predictions of that sequence. The parameter adjustments that do occur are based on the input sequence, and not derived error values.

Paul_Lamb · August 14, 2018, 12:25pm

BTW, if you haven’t seen it yet, you might want to check out this project which explores HTM + RL for a game agent. It could be a good start if you are hoping to continue this type of research. You can also join our “Dad’s song” working group, which has some similar goals, if you want to bounce ideas and brainstorm.

rhyolight · August 14, 2018, 9:35pm

This question is a great question, but it exposes some confusion about what HTM is and is not.

HTM is a theory of spatiotemporal memory storage, which includes learning, inference, and prediction.

Learning is the Hebbian part within synapses between neurons. Its the whole “neurons that wire together fire together” thing. Patterns reinforce themselves over time as they are continually perceived. Learning also involves inherent properties of SDRs. The brain takes advantage of properties of SDRs to learn better and faster.

Inference involves representation, because if you have a representation of something you perceive in reality, you have to be able to compare it to all the things you’ve perceived in the past to infer what that thing actually is and how it relates to every other object you’ve learned.

Prediction means knowing how objects in the world move, what causes have what effects, how objects interact over time. Prediction requires temporal sequence memory over spatial data. We think it also requires sparsity.

Now when you think about RL you think about some signal into the system that inputs some type of score or feedback the system uses to tune its actions over time. This could exist outside the current realm of HTM theory.

Ed_Pell · August 15, 2018, 11:03pm

rhyo thanks for the response. There is a ton of stuff packed in your post. I get the learning part. I sort of get the inference part but am a bit short on learning associated labels (names). On the last point prediction over time that I do not see in HTM nor in anyone’s systems.

Bitking · August 16, 2018, 5:16pm

I was reading this piece on AI in games and they describe a state machine that is fairly orthodox.

My though is that the HTM block could detect when play is deviating from some “normal” state and trigger a state change evaluation.

Along these same lines - every part of the block diagrams could be evaluated for anomaly detection as a possible case for HTM as a solution.

rhyolight · August 16, 2018, 6:54pm

Yes, we are in the realm of active theory here. But much of what we’ve learned about temporal sequence memory prediction applies, so we have a lot to work with.

Gary_Gaulin · August 16, 2018, 10:44pm

How about card games that require being good at predicting what might come next, like blackjack and poker?

Bitking · August 16, 2018, 11:07pm

Um, as long as you subtract the set of what cards have been seen predicting the next card is just a random selection from the collection of cards left in the set.

There is no valid method of predicting a well generated random number.

Gary_Gaulin · August 16, 2018, 11:39pm

Keeping track of which cards are left, and how order changes from one shuffle to the next (depending on dealer’s less than random methodology) puts the odds in the player’s favor.

Here is an applicable example:

Bitking · August 17, 2018, 2:01am

So you are proposing a tool to sell to casino owners to spot anomalies?
I suppose that you could tie it to the table cameras and spot things automatically.

Gary_Gaulin · August 17, 2018, 2:19am

I was thinking more like a robot add-in, to use at casinos that have no rules against allowing bots to play.

Since I find card games and such to be boring I have no interest in gambling. But a robot that is such a good player it has to be banned from ever playing again sounds like a lot of fun!

Topic		Replies	Views
An open-source community research project on comparing HTM-RL to conventional RL Related Papers	63	3352	June 19, 2018
Deep Reinforcement Learning, HTM Numenta Theory	5	1269	May 14, 2016
AI: reinforcement learning from its own representation of space/time Lounge	1	735	June 10, 2018
Reinforcement Learning and HTM Algorithm Machine Learning sequence-memory , encoders , question , community , nupic	26	3558	June 18, 2019
HTM Based Autonomous Agent Related Papers	47	6343	September 23, 2019

Self Play for HTM?

Related topics