Swarming is not fun, we try to avoid it.
So I implemented something like we discussed (nothing complicated, as it turned out). Normalization has 2 sides depending on window size: 1st - if we normalizing whole data - we getting whole shape. 2nd - if window size is 2 - we getting simple binary classification (“is previous step data smaller or bigger”). So intermediate option between this 2 sides is semi-classification with shape? (I’m not sure what it means )
Here is examples on 6k length data (normalization window size is 30 steps):
Price
Volume
(normalization window size is 100 steps):
Price
Volume
While on the price graph we see that normalization cut trend movements out, on the volume graph we see something like the escalation of volume fluctuations (the less normalization window size, the more this effects occurs). I need your thoughts on this, I’m sure tat I missing something in my conclusions. Also maybe data needs more preprocessing.
Meanwhile I run few swarmings on this data with different normalization window sizes, and again - no results. Swarming outputs model which not using all data fields (sometimes it ignores Volume, sometimes Price, or Time). No idea why it happens (yeah, swarming is not fun ). Select parameters manually - also without results for now.
My hunch is that the fastest & simplest path to better results is trying more preprocessing methods. The first thing I’d try (if you haven’t already) would be Aggregation. For instance take every five successive data points and sum them into one value. This will reduce your total amount of data points by a factor of 5, but each data point will hold more information because it represents a larger chunk of time. What should hopefully happen is that the resulting plots over time will smoothen out, with less noisy spikes to confuse the system.
This has worked for me in my current project modeling human operator movements. The sampling rate was high (100 per second), and was inherently noisy since predictable human movements don’t happen that fast. At this rate the HTM took a long time with all that data and gave mediocre results. But when I did aggregation, summing it down to 20, 10, 5, 2 and 1 point per second the performance jumped - peaking at 2/second. I wonder if an aggregation window-size parameter sweep like this may help you, since its hard to know what sampling rate will make the signal clearest for the HTM. Of course it may not help, though worth trying I’d say.
Thanks for your thoughts, yeah aggregation is a better way - but I have very limited time frames. Model should find anomalies as soon as possible, because events which I’m looking for is very short in time (5-20 min). When I switched from 3 min agg to 5 min agg number of detected anomalies increased but quality of this anomalies still poor (I cant even explain what exactly it found in most cases ) But model reacts better to sharp drop in price and volume, rather then to price/volume increase. I will post some examples after a little tweaking of the model!
I didn’t have any useful results from the approach I described either. I had to take a step back and try to understand why I can see certain patterns visually but HTM cannot easily pick them up. Besides the scale variance problem, I think another reason may be related to path integration (the property whereby if I go diagonally 5 steps, I end up with the same location representation as forward 3 steps then right 4).
I think what may be happening is as my eyes scan over a few graphs which I perceive to contain patterns, in my mind I am forming an idealized concept that is something like a winding trail. Each new time I notice the pattern, I am automatically tuning out the short noisy spikes because they are random and not common between the different instances. As I follow each of the paths in my mind, the location representations that are common between the paths get more reinforced, because of the path integration property (doesn’t matter which zig-zaggy direction I went to reach a particular common location – it will still end up with the same representation).
This doesn’t help with your specific problem of how to do simple anomaly detection on the data with NuPIC, but perhaps it does hint at some interesting possibilities to explore as HTM research progresses in the future.
Are you using raw anomaly scores or calculating an AnomalyLikelihood?
I think I got your idea. You mean not only scale but with length scale and not affected by inversion (if we see inverted picture of monkey - we still recognize monkey) Something like that:
Just pure shape. Also about noise ignorance - any shape or object has a variations like it could be represented the far from original shape the less chance that it is shape which we are looking.
Red area (we’ve got here an elephant


Yes I’m using Anomaly likelihood and Anomaly log likelihood. Most likely problem in my encoders and model params.
No one knows exactly how to implement what @Paul_Lamb is talking about in TM. I believe it requires a hierarchy of cortical columns spread across a broader sensory spectrum. Current NuPIC TM will learn these scaled patterns as two different pattern, but it can recognize a whole lot of patterns.
So I finally realized that normalization of data in a way I’m doing it - really stupid. The results are highly dependent on the size of the normalization window and the shape of data is deformed. I’m out of ideas and model does not work on raw data. On 4k dataset it found 14 anomalies:
Also: Is it possible to classify anomalies somehow? From results it seems that anomalies has similarities(like price+volume increase, price+volume decrease, price decrease + volume increase and so on)