HTM vs GRU playing Rock, Paper, Scissors. (HTM outperforming by 7%)

marty1885 · November 4, 2018, 7:56am

I was inspired by this post by Matheus_Araujo on playing rock paper scissors against HTM. I thought to my self that why not let HTM play again a RNN or even a LSTM network? And so it becomes a side project of mine.

Framing the problem

The ides is to have 2 agents, one implement in RNN and one in HTM. And let them to predict what each other will play next. Then act accordingly.
Ideally (if both agent are learning and predicting the opponent’s pattern efficiently) both agent should have no batter chance of winning and loosing than random guesses due to both predicting the other’s next move and updating their predictions gradually.

The agents

Both agents are trivial to implement. For the RNN agent. Just connect a RNN/LSTM to a FC layer and train it every N step while asking it to predict the next move the HTM agent will make. On the other hand, The HTM agent is even more trivial to implement. You only need to encode what the RNN agent has done and send it to a TemporalMemory later. Then use the predictions made by it.

The results

So that’s the setup. Then, I wrote the game in C++ with tiny-dnn and NuPIC.core. After getting both agent working, tuning their hyper parameters and letting both algorithms play against each other for 100K times. HTM always ended up winning slightly more beyond the margin of error.

Ex:

RNN Wins 32375 times, 0.32375%
HTM Wins 37755 times, 0.37755%
draw: 29870

I have also tried tune the hyper parameters further. There are cases where RNN can beat HTM, but I seems always to find a way to make HTM win again.

Anyway, that what I have discovered till now. Here is the source code.
It is very messy for now. But I’m planing on writing a blog post about this project and (possibly) making a GUI for it. I’ll clean it up in the future.

Scrreen shot:

Matheus_Araujo · November 7, 2018, 9:15pm

Wonderful! I really loved it!

marty1885 · December 10, 2018, 9:40am

I’ve published a short post on my website about the experiment. This time I found HTM only as powerful as LSTM if LSTM is trained for a longer time.

sruefer · December 17, 2018, 1:14pm

That is an interesting result… are those win rates persistent over time? Or do they vary along the play time axis a lot? Would be interesting to see those percentage wins and draws plotted over time.

thanh-binh.to · December 17, 2018, 1:41pm

@sruefer it is stable in my experiments. Currently I reach 38.578% for HTM vs 31.93% for LSTM, but HTM does not win GRU-RNN (both are 34%)

marty1885 · December 17, 2018, 4:38pm

Sounds’s like a good idea. Let’s do it data science style!

I dumped 20000 results to a .npy file and load it in Python for analysis.

Basic analysis

First let’s see how the game progresses. This is a graph of the running average (window size of 100) of the outcome of the game (0 = draw, 1 = RNN wins, -1 = HTM wins) (average = 0 means both are winning 50% of the time or draw all time, > 0 means RNN winning more, < 0 means HTM wins more).

all_moving_ave

Interesting seeing those plateaus around 0.2. It means that the RNN is consistently winning slightly more than HTM. But right after that HTM seems to have grasped what the RNN is doing and starts to win. Then RNN take ove again. This cycle seems to repeat it self. At around the 75000th game, HTM even have figured our the RNN’s pattern and wined nearly 100% of the time.

Winning rate of RNN and HTM and draw rate

Now we have some basic idea of what going on (although really weird) let’s dig deeper and see how both algorithm is performing over time!

First the winning rate of RNN
rnn_win_rate
Wow, so during the plateaus. RNN is actually winning 50% of the time. Beating the odds of 33.3%! But during the lows of RNN, The winning rate seems to drop to ~30% on average; sometimes even lower.

Now HTM…
htm_win_rate
During plateau, HTM is only winning less than 30% of the time. But right after that HTM gives a violence fight back. Sometimes even winning more than 50%!

draw_rate
WHAT! The draw rate dropping to 0 from time to time!

Overall winning rate

This is a plot of the winning rate calculating from the very first game. (Those crazy spikes and plateaus disappear due to the shear number of data points). The graph shows the overall winning rate is fairly stable and both algorithm eventually plays at the same level.

Conclusion

This concludes the analysis. The data shows that the game is not progressing as what I originally expected. I expect that both RNN and HTM consistently winning 33% of the time. However in the real world RNN and HTM is having fight over who is capable of learning the opponent’s pattern faster and leading to interesting results. I’m speechless now… Truly amazed by what’s happning. Any ideas?

rhyolight · December 17, 2018, 5:15pm

I’m curious… what happens when HTM vs HTM or RNN vs RNN? Is it the same back and forth?

marty1885 · December 18, 2018, 2:28am

For HTM vs HTM. Seems that one of the TM is beating the other. Overall winning 50%, loosing 25% and drawing 25% of the time. (Both TMs are identical but I have to change the initial state a bit to ensure some diversity so the TMs won’t be drawing all the time).
rnn_win_rate

And for RNN vs RNN.
Both RNN has managed to win 41% somehow and the graph is more balanced/violence.
rnn_win_rate

sruefer · December 18, 2018, 2:19pm

Great to see that data… I have no explanation for that, it really baffles me.

What happens if you repeat with the same starting parameters / seeds? Do those clusters reproduce? Or are they at a different place then? I am not sure which scenario would freak me out more

marty1885 · December 18, 2018, 2:53pm

I’ve change the seed from the default 42 (I really have to give Numenta the credit of using it as the seed) to 51 and setting the initial htm_last_move to Paper. Seems HTM gets better at playing the game somehow? But the plateaus are still present and HTM is winning as low as 25% from time to time.
htm_win_rate

–Edit–
Seems that the change of settinghtm_last_move to Paper (default is Rock) is the cause of HTM outperforming. Setting it back to Rock leads to both algorithms winning 37% of the time.

–Edit 2–
Oops, I found that I have misunderstand you. So I repeated the experiment with the exact same parameter. And the clusters are at the same location.
htm_win_rate

thanh-binh.to · December 18, 2018, 3:38pm

@marty1885 I think, both HTM players should have a different seed.

marty1885 · December 18, 2018, 3:41pm

Yes, in the HTM experiment. Both HTM players has a different seed.

thanh-binh.to · December 18, 2018, 3:47pm

Thanks @marty1885
I am interested in what is happened if HTM player is trained somehows and we compare them only in inference mode? Is it better?

marty1885 · December 18, 2018, 4:08pm

I turned off learning in HTM after 20K games. ( RNN still learns after that point)
htm_win_rate

Paul_Lamb · December 18, 2018, 4:28pm

He’s dead, Jim.

Topic		Replies	Views
Playing Rock, Paper, Scissors with HTM, Take 2 Applications sequence-memory	19	1316	October 9, 2019
An open-source community research project on comparing HTM-RL to conventional RL Related Papers	63	3352	June 19, 2018
RNNs and HTM Machine Learning community	18	1330	January 10, 2019
Exciting potentials with HTM agents in OpenAI Gym Engineering	5	540	October 20, 2019
Reinforcement Learning and HTM Algorithm Machine Learning sequence-memory , encoders , question , community , nupic	26	3558	June 18, 2019