Anomaly score/likelihood question

Hi everyone, I have been trying to implement HTM on time series data with a recurring pattern on a weekly basis and am wondering if the anomaly scores I’ve gotten seem reasonable. Here are the plots I got using tm.anomaly to retrieve the anomaly score: https://imgur.com/a/1riezeQ.

In the first graph, since the anomaly scores spiked up at the start of every week, I reduced the number of bits in the SDR representing dayOfWeek/timeOfDay/weekend. Doing that gave me the result shown in the second graph. Why does the anomaly likelihood still spike up every weekend (where there are gaps in the input data)? Is this normal considering I do not reset the TM at any point?

In the 3rd graph in the link, the anomaly score still seems to be spiking at the start of every week even though I am not resetting the TM. Also, the anomaly likelihood has very little spikes even when the prediction and the input are drastically different, and the anomaly likelihood is also gradually increasing and leveling off. I would appreciate any guidance as to whether or not any of these graphs show proper anomaly score/likelihood behavior and if not, what might have gone wrong. Thanks in advance!

1 Like

Hey @acdolphin246, welcome!

I think the weekday to weekend pattern takes longer to learn because its expressed over a longer time period, compared to daily patterns.

So the anomaly score measures how surprised the model is by each input, but the anomaly likelihood represents the change in distribution of anomaly scores. So if a recent sampling of anomaly scores is distributed similarly to a longer term sampling the anomaly likelihood will be low – even if the anomaly scores are high.

None of them look crazy to me, though I’d be suspicious of the first one because the anomaly scores seem to drop very quickly. Usually the anomaly scores are very high and volatile at first, since the system hasn’t had time to learn much yet. One thing that really helps is seeing the model_params.

Thanks for getting back to me @sheiser1 ! So would it be better if I ran the model on even more data? I am using the following model parameters:

    'enc': {
        "value":
            {'resolution': 0.01, 'activeBits': 21, 'size': 700, 'sparsity': 0.02, 'seed': 1},
        "time":
            {'timeOfDay': (30, 1), 'dayOfWeek': (30, 168), 'weekend': 13}
    },

    'predictor': {'sdrc_alpha': 0.035828933612157998},

    'sp': {'boostStrength': 0.0,
           'columnCount': 2048,
           'localAreaDensity': 0.01953125,
           'seed': 1,
           'potentialPct': 0.3,  # % of bits (in SDR) connected to one spatial pooler bit
           'synPermActiveInc': 0.003,
           'synPermConnected':0.2,
           'synPermInactiveDec': 0.0005},

    'tm': {'activationThreshold': 13,
           'cellsPerColumn': 32,
           'initialPerm': 0.21,
           'seed': 1,
           'maxSegmentsPerCell': 128,
           'maxSynapsesPerSegment': 32,
           'minThreshold': 10,
           'newSynapseCount': 20,
           'permanenceDec': 0.25,
           'permanenceInc': 0.25},

    'anomaly': {
        'likelihood':
            {  # 'learningPeriod': int(math.floor(self.probationaryPeriod / 2.0)),
                # 'probationaryPeriod': self.probationaryPeriod-default_parameters["anomaly"]["likelihood”][“learningPeriod"],
                'learningPeriod': 500,
                'reestimationPeriod': 168
            }
    }
}
1 Like

The code looks a bit familiar. Is this perhaps based on htm.core’s hotgym.py example? I ran some Bitcoin prices through that framework a little while back and tried to see which parameters improved performance the most. It looks like you tweaked the TM’s cellsPerColumn - have you considered messing around with columnCount in the spatial pooler?

I think @sheiser1 said it better than I could have; this is continuous learning, so don’t worry too much about early anomaly detection while it trains to a good performance.

You reduced the number of bits in the encoder representing dayOfWeek, timeOfDay and weekend? If I understand this right, you kept encoder size constant (700) but increased sparsity of all three encoded variables? I feel like that might be a slightly off direction, but that’s just a hunch. Maybe boost the encoder size by a couple hundred, it shouldn’t really impact your runtime. Though the size of SP and TM is arguably more important than encoder size.

1 Like

Thanks for your advice @mcleverley! I used some parameters I found in another post on this forum which is probably why the code looks familiar. My data is pretty similar to the hotgym.py data.

If I increase encoder size, should I also increase the bits encoding the timestamp/value? What is the relationship between the size of SP/TM and the model performance?

I tried effectively running my model on more data points by changing the granularity from hourly data to every 5 minutes, but it seems like the anomaly scores still aren’t improving. Generally how long does it take for models to reach good performance?

1 Like

I’m far from an expert so anyone please correct me if I misstep here: This episode of HTM school will explain it much better than I can.

If you boost encoder size by 50%, I’d say increase each variable’s encoder_bits by… ~20% or maybe more. I’m really just pulling these numbers out of the air here, though.
Theoretically, a larger SP / TM will be able to understand more complex and temporally-lengthy patterns, though the latter is mostly from “depth” or cells per column. Kind of like increasing the number of nodes per layer or number of layers in a classic neural net in terms of performance boost.

I think more data points and especially more granularity is great especially if you’re doing anomaly detection, more opportunity to pick up on ‘micropatterns’ that could occur quicker than an hour.
As for how long it’ll take to reach performance… that’s tough to say. Definitely depends on the data and how “readable” the patterns are, as well as your model size / parameters. Sorta like asking how many cat pictures does a convolutional net need to recognize cats, depends on the execution.

For reference, I think hotgym starts putting out good predictions after ~4 months or so, but power consumption is a very “readable” variable to encode. I talked to Intelletic the other week about their model and they mention it needed ~6 months of Bitcoin price data to start putting out useful predictions (but their model isn’t purely an HTM). How much data can you acquire in this case?

1 Like

I currently have around 4 months of data with a data point for every 5-minute interval. However, there are some anomalies that already occur near the start of the dataset, so I’m not sure if that would affect the model’s performance too.

1 Like