HTM and DDoS Attack detection

Hi all, i am planning on developing a DDoS attack detection model from network traffic, first is it possible to use HTM for this kind of problem as a DDoS attack can be detected using several properties of the network traffic.

Thank You

2 Likes

Hi ism1n! Welcome to the forum.

If you want a ready to deploy solution. Get Grok. If not, you can find anomaly traffics using HTM (Traffic not showing up or too much traffic). The beauty of HTM is that it learns while inference. So you don’t need to update your model every time when there’s a growth.

4 Likes

Hi Marty,

Thanks for the reply. I did check out Grok and it is a good product. I am currently a research student and considering to use HTM framework in my research to model the said problem.

Thanks again

2 Likes

Welcome @osm1n and very cool application! If you’d like any helping thinking through how to formulate the problem for HTM we’d be glad to, and I for one am really curious how well your system performs at detection.

1 Like

What kind of dataset your are using for ddos attack detection?
Thanks

1 Like

That will be great and i will really appreciate that. The plan is to use a dataset and a stream network traffic coming from connected devices for online detection of the attacks. Then compare the performance of HTM against other unsupervised techniques used to solve these kinds of attacks.

Thank You

1 Like

I am currently looking at this dataset here

I have tried using the dataset with HTM studio but seems like it wont work as its a multivariate dataset. I am not sure though.

1 Like

Would you care to specify which file(s) in which folder(s) you’re referring to? I have my own home made sort of HTM studio for multivariate data, if you know when the real attack(s) occurred I could run it through and evaluate the anomalies that way.

2 Likes

Awesome. Any of these files with the entire features here

Afterwards this file with the 10 best final features

In the datasets the attack and normal traffic are mixed together - with attack traffic set as 1 in the attack column

Thanks

2 Likes

After staring at the logs for a bit. Looks like you need to transform the data a bit before apply HTM to it. Good luck!

2 Likes

Hey @marty1885, would you mind explaining this a bit? Like what is it about the data that calls for transformation(s)? And what kind of transformation(s) do you think would be best? Thanks, just curious for your intuitions on this.

1 Like

Sure!!

First, few know-how, domain knowledge and my co-worker have been working on almost the same problem in the past 6 months.

  1. This data looks like something generated from NetFlow.
  2. Since it is likely NetFlow, the subcategory columns is likely to be guessed information, I should probably ignore them.
  3. Since we are looking for DDoS attacks. We should be looking for the total traffics going into an IP address. Not how many a client is sending to a server. (That can work, but is very noisy.)
  4. pkSeqID looks weird. I started out guessing it is the TCP packet sequence number. But UDP shouldn’t have one and that number is getting too large. I guess they are generic IDs. Ignoring them should be a good idea.
  5. Again, since we are looking for DDoS attacks, we should reconstruct the amount of traffic. To do so, you need the time of when each packets arrived. But that information is ignored in the reduced CSV. So you’ll need to extract the information yourself.
  6. According to Matt, HTM works best when you are dealing with 5~6 variables. 10 is a bit too much.
  7. This is NetFlow. It is an event based system. There is no way for the system to be running in regular intervals. (Which HTM relay on)

So, I’ll purpose the following processing steps waving hands.

  1. Redo the data reducing step and add in the time stamp. We need that.
  2. Now sort all rows of data according to time. Run a rolling window over time for each destination IP. Sum the traffic up, and as for the other attributes, you can average them, sum them or something else. My best guess is as good as yours. This is to reconstruct the traffic and have a regular interval.
  3. If at this point you still have too much variable. Perform PCA on everything besides the amount of traffic.
  4. If you are performing PCA. Remember to normalize the data before PCA.
  5. Here you go, something HTM can accept.
3 Likes

Thanks for the detailed explanation. I do have a question though, what if the network traffic was captured in real time, then a DDoS attack was carried out using one of the host sending the traffic, is it still possible to use HTM in this scenario ?

Thanks again for your time guys.

2 Likes

Thanks for your reply. :smile: First, real time != regular intervals. So you till need to apply a rolling window. Secondly, yes you can, but you will have better luck finding DDoS attacks from the server side then trace back to the attacker. Monitoring everyone’s outbound traffic is simply noisy and unreliable in most cases. In fact I might suggest you use the amount of connections made as the primary detection factor. Instead of the inbound traffic. Depending on the data.

3 Likes

Alright, I am taking your suggestions close to heart :smile: and getting started with the data clean up. Thanks.

3 Likes

When you have the data cleaned I’ll run it through my multivariate anomaly script if you’d like.

1 Like