I was inspired by this post by Matheus_Araujo on playing rock paper scissors against HTM. I thought to my self that why not let HTM play again a RNN or even a LSTM network? And so it becomes a side project of mine.
Framing the problem
The ides is to have 2 agents, one implement in RNN and one in HTM. And let them to predict what each other will play next. Then act accordingly. Ideally (if both agent are learning and predicting the opponent’s pattern efficiently) both agent should have no batter chance of winning and loosing than random guesses due to both predicting the other’s next move and updating their predictions gradually.
The agents
Both agents are trivial to implement. For the RNN agent. Just connect a RNN/LSTM to a FC layer and train it every N step while asking it to predict the next move the HTM agent will make. On the other hand, The HTM agent is even more trivial to implement. You only need to encode what the RNN agent has done and send it to a TemporalMemory later. Then use the predictions made by it.
The results
So that’s the setup. Then, I wrote the game in C++ with tiny-dnn and NuPIC.core. After getting both agent working, tuning their hyper parameters and letting both algorithms play against each other for 100K times. HTM always ended up winning slightly more beyond the margin of error.
I have also tried tune the hyper parameters further. There are cases where RNN can beat HTM, but I seems always to find a way to make HTM win again.
Anyway, that what I have discovered till now. Here is the source code.
It is very messy for now. But I’m planing on writing a blog post about this project and (possibly) making a GUI for it. I’ll clean it up in the future.
That is an interesting result… are those win rates persistent over time? Or do they vary along the play time axis a lot? Would be interesting to see those percentage wins and draws plotted over time.
Sounds’s like a good idea. Let’s do it data science style!
I dumped 20000 results to a .npy file and load it in Python for analysis.
Basic analysis
First let’s see how the game progresses. This is a graph of the running average (window size of 100) of the outcome of the game (0 = draw, 1 = RNN wins, -1 = HTM wins) (average = 0 means both are winning 50% of the time or draw all time, > 0 means RNN winning more, < 0 means HTM wins more).
Interesting seeing those plateaus around 0.2. It means that the RNN is consistently winning slightly more than HTM. But right after that HTM seems to have grasped what the RNN is doing and starts to win. Then RNN take ove again. This cycle seems to repeat it self. At around the 75000th game, HTM even have figured our the RNN’s pattern and wined nearly 100% of the time.
Winning rate of RNN and HTM and draw rate
Now we have some basic idea of what going on (although really weird) let’s dig deeper and see how both algorithm is performing over time!
First the winning rate of RNN
Wow, so during the plateaus. RNN is actually winning 50% of the time. Beating the odds of 33.3%! But during the lows of RNN, The winning rate seems to drop to ~30% on average; sometimes even lower.
Now HTM…
During plateau, HTM is only winning less than 30% of the time. But right after that HTM gives a violence fight back. Sometimes even winning more than 50%!
WHAT! The draw rate dropping to 0 from time to time!
This is a plot of the winning rate calculating from the very first game. (Those crazy spikes and plateaus disappear due to the shear number of data points). The graph shows the overall winning rate is fairly stable and both algorithm eventually plays at the same level.
Conclusion
This concludes the analysis. The data shows that the game is not progressing as what I originally expected. I expect that both RNN and HTM consistently winning 33% of the time. However in the real world RNN and HTM is having fight over who is capable of learning the opponent’s pattern faster and leading to interesting results. I’m speechless now… Truly amazed by what’s happning. Any ideas?
For HTM vs HTM. Seems that one of the TM is beating the other. Overall winning 50%, loosing 25% and drawing 25% of the time. (Both TMs are identical but I have to change the initial state a bit to ensure some diversity so the TMs won’t be drawing all the time).
Great to see that data… I have no explanation for that, it really baffles me.
What happens if you repeat with the same starting parameters / seeds? Do those clusters reproduce? Or are they at a different place then? I am not sure which scenario would freak me out more
I’ve change the seed from the default 42 (I really have to give Numenta the credit of using it as the seed) to 51 and setting the initial htm_last_move to Paper. Seems HTM gets better at playing the game somehow? But the plateaus are still present and HTM is winning as low as 25% from time to time.
–Edit–
Seems that the change of settinghtm_last_move to Paper (default is Rock) is the cause of HTM outperforming. Setting it back to Rock leads to both algorithms winning 37% of the time.
–Edit 2–
Oops, I found that I have misunderstand you. So I repeated the experiment with the exact same parameter. And the clusters are at the same location.