Int Assertion Error

@komal I had a chance to look at the sample data you sent me this morning. I have a couple of concerns.

Sub-second timing

I think I mentioned this before, and that’s why we abandoned using date-encoding for this data set. But I want to reiterate that long-scale time encodings like “day of week” and “time of day” will never work with these small time intervals. In these cases it is better not to encode time, but… If you are not encoding time you must ensure that there is a normal interval between incoming rows of data. It cannot be arbitrary. In your case, it looks like the startTime field is not only irregular, but actually out of order. The data looks to be sorted by endTime, and I’m not sure that’s the right thing to do.

Data represents many different events

It looks like the cardinality of an input row of data consists of flag and packets. From a brief inspection of the data, there are at least 6 flags, each of which could have any integer for packets, so it is hard to say how many different rows of data might be coming for one millisecond.

If only one HTM model is trying to learn this, it would probably be confused. From my experience, a single HTM layer like this is better at learning one thing’s pattern over time, not many things’ patterns over time. For example, in Grok for IT we break up a server’s metrics into models for each metric (CPU, NetworkIn/Out, Memory, etc.). Each one of these has a model created for it, and analyzes one stream of scalar data. We could have passed them all into one model using a string flag to identify the metric type, but performance is much better when each metric is analyzed by a different model.

You could do the same thing for your data. Break it up into many different input parts, one for each flag. Create one model per flag, each one only being fed the data for that flag. As you deal with the entire data set, pre-process each row and send to the appropriate model for processing and keep track of each model’s anomaly score over time. I’ve done this type of thing with audio input to get an overall anomaly indication from many different models, so you might want to see that code here:

@rhyolight Yes I know that I can’t use timestamp field of my data. That’s why I haven’t used that field while building the model.
Thank you for taking out your time and giving all your support and guidance to me. It was nice interacting with you.

1 Like