Splitting events into temporal streams



Here is a very typical data problem. You have a complex system, and you have access to events occurring within it over time. You can monitor for certain events and get streams of information with associations to “things” that actually exist (conceptually or physically). Here is an example from a recent NuPIC question.

senddatehour channelid countryid volume
14.5.2018 21:00 42344 100 2380.0
14.5.2018 22:00 42344 100 1372.0
14.5.2018 23:00 42344 100 761.0
15.5.2018 0:00 42344 100 410.0
15.5.2018 1:00 42344 100 229.0
15.5.2018 2:00 42344 100 204.0
15.5.2018 3:00 42344 100 285.0

Most folks looking to analyze this data with an HTM like NuPIC will try to encode each row in the table above as a different field with a different encoder. This makes sense in a way, but it won’t work. Why not? Because this data is not a continuous stream, it is a confluence of hundreds of smaller streams.

This data has around 200 different countryids, each with 4 channelids. Each combination of country/channel is a stream. All streams get dumped into the data shown above, which is why I called it a confluence.

Break Apart The Streams

In order to process theses streams, you need to create an HTM model for each one. You can’t use one model to process the entire confluence. So you first have to write code split the confluence up into streams, however you define them in your data set.

Run a few models first

It probably looks daunting to run over a thousand HTM models at one time, so I suggest you just don’t do it, at least not until you’ve run one or two and validated that HTM is actually giving you valuable anomaly detection results.

Anomaly Detection algorithm takes more than 20 hours