Anomaly detection in web server logs


#1

Hi All,
I’m actually new to HTM and NUPIC, trying to detect anomalous web requests from the server logs of Nginx, which may be like a Dos attack or Injection attack etc. I have no idea whether it is possible through HTM. I looked over the example of HotGym, which is like a single value prediction and anomaly detection.
Can anyone help me with this?

Thanks and Regards,
Arjun.


#2

Hi, welcome to the forum.
Most of us are more than happy to help. But you’ll need to provide more info for us to help. If you are comfortable and has some experience in advanced programming; that’s awesome, just ask your questions and we should be able to answer them professionally (in a open-source community sense). If you aren’t; that’s ok too. There’s service like Grok that does exactly what you want to.


#3

Hi, Thanks for your reply.
First I would state me as an Intermediate Programmer, in python.

These are the things I am trying:

  1. To find anomalous requests hitting my web server with the help of logs.
    For example: A bot trying to access a page that doesn’t exist in my server. (Which is not matching the usual pattern).
  2. Continuous number of requests from the same IP within a period of time.

I checked this one too, which uses similar kind of data for predicting the next user action.

Some other questions I have:

  • How can I encode an URL into SDR?
  • Is there a way to find anomaly score between two SDRs?

Thanks and Regards,
Arjun


#4
  • You might want to try the CategoryEncoder.
  • Depends no what you mean by 2 SDRs.
    One can’t simply pull out 2 SDR then calculate the anomaly score between the SDRs. But if one SDR is the input of timestep t and the other is the prediction made by HTM. Then the anomaly score is defined by num_bits_not_predicted/num_all_bits_in_predicted_sdr.

Hopefully I answered your questions. Feel free to ask more if you have them.


#5

This is what I actually meant.
Thanks, Marty. Now let me try this and get back.


#6

Oops. Sorry for the confusion. It is num_bits_not_predicted/num_all_bits_in_predicted_sdr