Pointer request: on line classification of drawings

I am looking around for some relevant pointers to solve the following problem:

I have a stream of online data about a figure being drawn. This is (x,y) data of a pen position. I want to recognize figures and sequences of figures which have a distinct name.

This is all very similar to online handwriting recognition, in which I want to classify letters and their combination to words. In some cases, the word being written is known in advance, which may help in predicting the next letter being written.

It is also similar to melody recognition, in which I want to classify notes and their combination into bars and phrases.

There is a pointer to an audiostream example that may be useful here but it leads to nowhere. A google search on nupic online handwriting recognition did not really yield any useful results.

Is it even possible with today’s implementation of nupic? As I understand the hierarchical part is not really implemented yet, and it seems to me I might need it for this.

I am not particularly looking for direct solutions, but I am kinda struggling to find the right information in the forest of what is available.

As an example: in this sequence I would like to recognize the “circle” in the middle:

Top chart is x and y coordinates, bottom part is the parametric plot (the drawing)

I fixed the link, it should point to https://github.com/htm-community/nupic-example-code/tree/master/audiostream

I would not suggest using NuPIC today to try and do handwriting recognition applications. You will need to learn a lot about topology and local inhibition and tweak a lot of spatial pooler parameters. Not sure you’d even have much success.

One thing you could try is using the CoordinateEncoder to encode x,y positions and feed them in over time, representing the drawing. But there is no way to get predictions from the CoordinateEncoder yet, so you can’t predict where the pen will be in the future, just how anomalous the current state of the drawing is.

I have some ideas on how to proceed with this. Lots of inspiration from HTM, but a somewhat different approach.

First, I’d like to point out that my problem is not exactly like online handwriting applications. In my applications the set of figures are simple and limited. It is expected that the image being drawn is intended to resemble a predefined figure (a circle, a line), so there is no problem of writing styles. Furthermore, the position of the drawing on the canvas is significant.

My first epiphany was that a way to create an SDR from a figure is to draw it with a thick pen. For example, the last figure above would translate in the following SDR (flipped vertically):

circle SDR

Furthermore, my set of predefined figures would be something like this:

![middle circle](upload://oxuyOBQbhM0cncQlSoAWdKpB6Fp.png | ![left circle](upload://x1TZhugDxCiXoTEgsB7EyJhZy4s.png | line

So, inspired by HTM, I can just calculate the overlap between the sample and all the images in the predefined set. The best overlap score is the best match. The width of the “thick pen” is equivalent to the width of an SDR, it makes the process robust to noise.

Next step is the online part. The current idea is to calculate the overlap of the current position SDR with every mask for each time step. Then aggregate (integrate) that overlap over time.

When the overlap reaches a certain threshold, the figure is “recognized” and all other aggregates are dropped to zero (inhibition)

Next, just like with handwriting, there is only a limited set of possible subsequent figures, which I know in advance. So when one figure is recognized, I can lower the overlap threshold for the predicted next figures.

Actually, there is not much to learn in the whole process, since the set of figures is fixed and the user is expected to draw as close to the predefined figure as possible. So I can just skip the generic spatial pooling and basically hardcode that bit.

Writing some code today, to test these ideas. I’ll come back with the results. Probably no nupic eventually, but rather a somewhat HTM inspired approach.

Some results:

This shows the overlap with the current point on the canvas with each of the possible figures, as a function of time. The last one is a clear winner here. There is a signal for a longer span of time and I can clearly see when the figure started and ended.

Plan is integrating over that graph, marking a figure as “recognized” once it passes a certain threshold and predicting the next figure in line by lowering that threshold.

2 Likes