Finding the predictability of a pattern of anomalies with HTM

Hi,

I’m trying to use NuPic to detect possible cyber attacks. The basic idea that I’ve been working on was to measure lots of metrics that measure certain “features” of the way users, computers and networks work within the organization, then send those metrics both to HTM models as well as to other algorithms that will detect anomalies in the behaviour of single metrics (mostly based on some statistical approach) and have each of these components report anomalies it detects to another service that will look at specific time windows (let’s say a day) and send to an HTM model some sort of an array that indicates which metrics have shown an anomalous behaviour during the day and its magnitude and which metrics haven’t shown any sign of anomalous behaviour and have this service detect whether this pattern of metrics anomalies during the day is in itself anomalous and unpredictable or that this pattern of anomalies is actually predictable. The thing is that the number of such metrics can be quite large… few thousands for a small organization and potentially tens or hundreds of thousands for a medium organization.

I have already built the first part that handles anomaly detection in single metrics (similar to the gym example) and it works perfectly. As for the second HTM part I think that I’ll have to define a map that will indicate which metrics are or can be considered “similar” to which metrics and write an encoder that will maintain an overlap between them and will prevent such overlap between metrics that are not “similar” and will represent the whole thing as an SDR.

I would like very much to get your feedback on this approach and if I’m right in this approach, an example of a similar encoder in python would be very helpful and will be very much appreciated.

Thanks a lot
Yuval.

I think it could work, but be sure to include time somehow in the 2nd stage encoding for the anomalies. If you do this in day batches, day of week may be important.

1 Like

Hi, thanks a lot for the quick reply. I forgot to mention it but sure time was intended to be included, in fact, I intended to include the current time on two different calendars (e.g. the gregorian date and the jewish date) do you have some sort of a good starting point for building the encoder that would be able to encode this data (the current gregorian date, the current jewish date and the anomalies pattern for the day)

Also, since that the number of metrics (i.e. the number of possible anomalies) can be quite high (several tens of thousands) how big should the SDRs that are created by this encoder be?

Thanks,
Yuval.

This type of custom encoder would not be too hard to build. Here is an example from NuPIC. Also, I will be building out a date encoder soon for Building HTM Systems on my twitch stream starting Thursday. I still have some work to do converting the project into a proper web application before I get to that, but it will be the first thing I need to do to continue the project.

Thanks A LOT!
I’ll get started right away… Thanks.