Flink Anomaly Detection


I am relatively to new to HTM.java. we are working on a project where we need to monitor the server logs in real time. We are using Apache Flink for streaming. If there is any sudden spike in the number of transactions(Unusual), server errors, we need to create an alert.(Anomaly)

  1. How can we best achieve this using HTM.java? Can we implement the use case using HTM.java?
  2. How do we consider the historical information about the patterns observed
    and will HTM compute the baseline accordingly?

Any sample code is much appreciated.


1 Like

Hi @harshith99,

Welcome, and thanks for using HTM.java. I believe this should be an easy use case for HTM.java, but first let me preface my comments with a disclaimer… :slight_smile: I am one of the authors of HTM.java but I have little to no experience actually putting HTMs to use and have no Data Science background with which to authoritatively say how to best use HTMs for analyzing Anomalies in real-life applications. So instead I’ll list the people who would best speak to the different areas your use case overlaps.

  1. HTM.java sample code and usage - me and a few other developers.
  2. Details about configuration settings: @rhyolight (Matt Taylor, Flagbearer and Open Source Director for Numenta).
  3. Details about the application of HTMs to Anomaly Detection: @rhyolight, @mrcslws, lscheinkman, subutai
  4. Details about Flink with HTM.java: @EronWright - also have a look at this project: flink-htm.
  5. Meaning of Life: jhawkins :slight_smile:

HTM.java was created to be equal in all algorithmic ways to NuPIC and only differs in some of the implementation details but yields exactly the same output (with the version it is aligned with, time-wise). There are some minor differences at this time that don’t really have much of an impact on output quality because I have not had the time to update it to transpose a couple of very minor updates from NuPIC.

That being said…

As far as I know, a time column and a data column indicating transaction errors at the resolution of maybe 2secs. <-- [?? @rhyolight ??] should be sufficient. Historical consideration would be handled by running actual data into the HTM for one or more days and then saving it to disk so that you have an “acclimated” HTM. (There is no such thing as “training” an HTM, it just starts gradually learning new data as it is received).

A template for actual sample code for how HTM.java is used for Anomaly Detection, can be derived from this example, and looking at the unit tests in the Network API Package. In particular, you should refer to the NetworkTest.java and the LayerTest.java file.

You will probably need to use the Anomaly Likelihood (not built-in to HTM.Java’s Network API) code (as opposed to just the Anomaly code, which is built-in) which will require some custom setup. HTM.java’s Network API is very flexible and allows you to insert custom “nodes” into the network structure by implementing your own Observable, and inserting it into the network setup. Inside the onNext() method of the Observable, you take the Inference object you are handed and feed it into your own Anomaly-Likelihood instance and then output that result as normal (your usage implementation details). You would however have to refer to the Numenta devs previously mentioned to arrive at the proper Anomaly-Likelihood window configuration (same config applies to both NuPIC and HTM.java).

Example of adding a custom function node:
and here

I should warn you (just in case you were misinterpreting this), that NuPIC/HTM.Java are not end user applications but Libraries one uses in one’s own code - so there isn’t a “kit” that you can just plug in, so to speak. You will have to spend some time messing around with the tests to see how to implement your own data and once you get the configs tweaked for your application, and you have an understanding of the Library’s basic usage, you can formalize your code to implement your custom solution.

I would suggest using the NuPIC forum for general configuration questions about HTMs; and the tests in the Network and Persistence package for messing around with implementation details. Also, the Hot Gym example in HTM.Java’s examples repo: https://github.com/numenta/htm.java-examples/tree/master/src/main/java/org/numenta/nupic/examples/napi/hotgym could be somewhat useful.

I have been in a crunch lately with my day-job at Cortical.io, (a business partner of Numenta), and so my current cycle takes me away from HTM.Java matters - but I’m coming up on the conclusion to that project and will be more attentive here following the release of that project to the community - but I will help as much as I can (short of writing your actual application code which I cannot do for you). :stuck_out_tongue: Unfortunately, some time will have to spent getting to know HTMs and their application.

I hope this gets you started?