Can someone help me figure out how to debug my tic-tac-toe attempt?

Version Info
I am using NUPIC version 0.5.7 with nupic.core version: 0.4.13

Overview:
I am trying to get NUPIC to identify tic-tac-toe boards that I pass to it. However NUPIC does not seem to be able to converge on the rules for tic-tac-toe. I need help figuring out a way to debug this.

Details:
My plan is to send NUPIC data in pairs of rows, where the first row is the serialization of a game board, and the second indicates the player that won the game. I am hoping that NUPIC can learn the basic set of rules for tic-tic-toe by doing this, however that has not been the case. Everything seems to be working but NUPIC never seems to converge and its guesses about what board is winning is pretty much random. Over time it never gets any better.

A board can be serialized into a single row of values. These values are a ‘reset’ column, a column for each of the nine positions on a tic-tac-toe board and finally a column that indicates the ‘winner’ of the previous row. A board and its identification can then be expressed with a pair of rows. The first of the two rows has the reset bit set in the first column, the values of the board in the following nine columns, and zero in the winner column. The following row is all zero’s except for the winner column, which indicates the winner - zero, player 1, or player 2.

My expirment begins by creating a few hundred sample boards. These boards with their results are writtent to a .csv file. I swarm over this csv file to produce a model. Using this model, I feed it more randomly generated boards and their results. I was hoping that after a certain amount of time, given a board my model would always predict the correct value of the winner column in the row that would come next - the result row. However what I am seeing seems to be more or less random. The model almost always predicts, player one, though not always. And it will sometimes make a correct assesment of a board where player 2 wins. However when I track results over time, it does not improve and the guesses appear to be just about random.

I believe that learning is occuring, as when I explicitly disable it, there is no intelligence at all. It will just predict zero - which is the default value. Not Player 1 or player 2.

Request for Help
I’ve tried to alter parameters somewhat, but it feels like I’m just kind of guessing, changing the code and hoping that I get a better result. I’m not really sure how to debug this. I’m not sure why my model is making the guesses that it is. Does anyone here know of a good way to figure what is going on behind the scenes. I’d like to be able to understand why it sometimes will guess player 2 and get it right. I’d also like to figure out what I am screwing up in either my training data or the way I’m creating the model as it seems that this problem should be very simple for NUPIC to figure out.

Sample Data:
This is some of the data that I am getting from my expirments. I’m hoping that someone can take a look and see something that Inot in terms of things that are obviously wrong, and help me to identify a good vector to continue investigating this issue.

Here is one of the boards my code generated. As you can see it should be a player 2 win.
[1, 1, 2]
[1, 2, 2]
[2, 1, 0]

Here is the model result from when I serialized this board and fed it into my model:

ModelResult(	predictionNumber=142
	rawInput={'reset': 1.0, 'row_2_column_2': 0.0, 'row_2_column_1': 1.0, 'row_2_column_0': 2.0, 'row_0_column_1': 1.0, 'row_0_column_0': 1.0, 'row_0_column_2': 2.0, 'winner': 0.0, 'row_1_column_2': 2.0, 'row_1_column_0': 1.0, 'row_1_column_1': 2.0}
	sensorInput=SensorInput(	dataRow=(0.0,)
	dataDict={'reset': 1.0, 'row_2_column_2': 0.0, 'row_2_column_1': 1.0, 'row_2_column_0': 2.0, 'row_0_column_1': 1.0, 'row_0_column_0': 1.0, 'row_0_column_2': 2.0, 'winner': 0.0, 'row_1_column_2': 2.0, 'row_1_column_0': 1.0, 'row_1_column_1': 2.0}
	dataEncodings=[array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)]
	sequenceReset=0.0
	category=-1
)
	inferences={'multiStepPredictions': {1: {0.0: 0.012589512422763568, 1.0: 0.91143423257015377, 2.0: 0.075976255007092777}}, 'multiStepBucketLikelihoods': {1: {0: 0.012589512422763568, 254: 0.075976255007092777, 127: 0.91143423257015377}}, 'multiStepBestPredictions': {1: 1.0}, 'anomalyScore': None}
	metrics=None
	predictedFieldIdx=0
	predictedFieldName=winner
	classifierInput=ClassifierInput(	dataRow=0.0
	bucketIndex=0
)
)
This is the model result that I fed into NUPIC after it was 91% confident that player 1 should have won. You can see that I set the winner value to be 2.0
ModelResult(	predictionNumber=143
	rawInput={'reset': 0.0, 'row_2_column_2': 0.0, 'row_2_column_1': 0.0, 'row_2_column_0': 0.0, 'row_0_column_1': 0.0, 'row_0_column_0': 0.0, 'row_0_column_2': 0.0, 'winner': 2.0, 'row_1_column_2': 0.0, 'row_1_column_0': 0.0, 'row_1_column_1': 0.0}
	sensorInput=SensorInput(	dataRow=(2.0,)
	dataDict={'reset': 0.0, 'row_2_column_2': 0.0, 'row_2_column_1': 0.0, 'row_2_column_0': 0.0, 'row_0_column_1': 0.0, 'row_0_column_0': 0.0, 'row_0_column_2': 0.0, 'winner': 2.0, 'row_1_column_2': 0.0, 'row_1_column_0': 0.0, 'row_1_column_1': 0.0}
	dataEncodings=[array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.], dtype=float32)]
	sequenceReset=0.0
	category=-1
)
	inferences={'multiStepPredictions': {1: {1.0: 0.0039215686274509803, 2.0: 0.99607843137254592}}, 'multiStepBucketLikelihoods': {1: {254: 0.99607843137254592, 127: 0.0039215686274509803}}, 'multiStepBestPredictions': {1: 2.0}, 'anomalyScore': None}
	metrics=None
	predictedFieldIdx=0
	predictedFieldName=winner
	classifierInput=ClassifierInput(	dataRow=2.0
	bucketIndex=254
)
)
1 Like

Thanks for explaining your issue to us. I’ll try to help out…

This already seems wrong to me. Each row of data you send should be a snapshot in time, so the fact that you send the player row after the board state row is going to confuse it. I would suggest you encode the board state and the player in the same row.

Also, what exactly are you trying to predict? Who won? I think a better goal for NuPIC might be to predict the next board state, given previous board states and players info.

What if we abandoned the idea of passing in rows of scalar input and thought about a better way to encode the problem space into an SDR?

You could write an encoder that converts each cell on the board (empty, X, or O) to a binary representation. Concatenate them together, and also include a bit for the player. If you train it on hundreds (better thousands) of games, the model should predict the state of the board better than random. I think you’ll have more success this way than trying to shoehorn the semantics of a tic-tac-toe problem space into a bunch of generic scalar encoders.

If you are generating the boards randomly, and generating the moves randomly… then it makes sense to me that you’re getting random predictions back. If there is no pattern in the data, there’s nothing to analyze. You want to look for a catalog or history of tic-tac-toe games played by humans.

Hey Matt thanks for the reply.

Ultimately, I would like to be able to have NUPIC actually play games, but for now I thought it best to bound the problem to only look at a simple boards (no cat games or invalid boards) and determine who won. I thought it would be a simpler problem if I explicitly tried NOT to play yet. The reason I broke this up into two rows is that I thought that NUPIC handles temporal streaming data. So in my data stream the first data point describes a board and the second point says who the winner is. If I were to include the winner in the initial data point I’m not really sure what it would be trying to predict.

I think I might not have been clear in my original message when describing how I break the data down. When I said that I encode the player, it is not who’s turn it is. It is which player won the previous game. I think of it like NUPIC is a person. It sees a picture of a game board, and then says the player that won that game. Kind of like flash cards for a little kid or something. Let me know if this is still confusing. It might be that this is just a bad problem.

As for abandoning the idea of passing data in as scaler encoders, I’m fine with that. I started pursuing this after I watch the video tutorial about predicting the next value in a sine wave. If I were to write my own encoder, would I still hook that up to the OPF framework. If I pursue this can you point me at a good jumping off point for this? Is Scott’s video where he goes over different encoder types, OPF and the hot-gym data the best place to start? I could watch that again.

Lastly, I am generating boards randomly, but I throw out boards that are invalid, or are cat’s games. I only give NUPIC boards where either player 1 or player 2 clearly won. Since I’m not showing NUPIC how games progress the actual series is very short. I do try to set the series reset bit every time I show a new board, but the documentation said that it is not respected. Is this still the case?

Thanks!

Hey @mgstrein.

Interesting problem, but (my two cents) not the right kind of problem for NuPIC. This is because NuPIC learns complex temporal data where variable-length temporal history is necessary for making predictions.

Your problem, on the other hand, is a simple A->B mapping where there should be no temporal dependency between B and anything that came before A. So it’s first-order, but even simpler than that: it’s first-order with isolated sequences that are limited to length 2.

You can solve this type of problem with NuPIC style sequence memory, but it would be overkill. A simpler type of network that can solve this would be the following:

A 3x3x3 array of cells encodes the board state (3x3 board positions, one of three states for each position: X, O, or empty), and projects to a 3x1 array of cells encoding the winner (X, O, or neither). Whenever a winner cell is activated, it forms a segment that has connections to all the cells that were active on the board. There are 3^9 = 19683 possible board states, so it won’t take too long to learn all the possible mappings. It would also require a lot more segments per winner cell than realistic neurons have, but that’s not a problem for a software implementation.

But this is so simple as to be trivial. A more interesting problem, more suitable for NuPIC, would be to show sequences of board states, so the network can learn to predict player moves, as Matt suggested. I think this kind of problem is still not really a great fit for NuPIC, because since the board state of tic-tac-toe is completely visible to both players, there’s no need to handle any temporal history.

1 Like

One of the main challenges when setting up a problem for HTM is understanding that input to the HTM needs to be some kind of stream, like a sensor stream. So, imagine the game’s state as an event that emits over time. Every time the board state changes, it’s a new row in the data stream. You might also encode the mark that was just made, the player that just moved as a part of the game state, or the player who has the advantage. Just some ideas.

I would also feed in cat’s games. I would feed in as many games as possible, but nothing randomly generated. These games should have been played by computers or humans playing to win. After the last move of each game, reset the model for a new sequence. Each game will be a sequence.

If the board state is encoded well, this should teach the HTM some basic things about the game, like that X follows O and probably what some finishing moves are.

To learn more about encoders (aside from Scott’s talk, which is a good start), here are some other references:

1 Like

@mgstrein You might benefit from reading through this old mailing list post about chess board encoding. I even created a skeleton ChessboardEncoder python class.

1 Like

Thanks for all the help guys. It sounds like my best bet at this point, is to move on from this identification problem and to change my framework to generate games, rather than just boards. I’ll probably be back pretty soon for help on that.

By the way, is the reset bit still not used, or should I set that between games?

It is used. Where did you read that it didn’t work?

In the section, Raw Input vs Sensory Input:

sequenceReset: Control field for temporal patterns. This field has a value of 1 if an explicit temporal reset was specified for this record, 0 otherwise. Resets are currently not being used.

Also, it seemed like in my data that no matter what I did for reset the result object never had the reset bit set. (this could be a bug in my code, I never looked at it closely since I thought it did not work)

I see. That is referring to a SensorInput instance. I don’t think that is what you are using.

To reset during a swarm, you can use a reset flag in the input CSV. To reset when running an OPF model, you can call model.resetSequenceStates() directly.