HTM and DDoS Attack detection

osm1n · June 30, 2019, 1:22am

Hi all, i am planning on developing a DDoS attack detection model from network traffic, first is it possible to use HTM for this kind of problem as a DDoS attack can be detected using several properties of the network traffic.

Thank You

marty1885 · June 30, 2019, 3:53am

Hi ism1n! Welcome to the forum.

If you want a ready to deploy solution. Get Grok. If not, you can find anomaly traffics using HTM (Traffic not showing up or too much traffic). The beauty of HTM is that it learns while inference. So you don’t need to update your model every time when there’s a growth.

osm1n · June 30, 2019, 4:47pm

Hi Marty,

Thanks for the reply. I did check out Grok and it is a good product. I am currently a research student and considering to use HTM framework in my research to model the said problem.

Thanks again

sheiser1 · June 30, 2019, 6:24pm

Welcome @osm1n and very cool application! If you’d like any helping thinking through how to formulate the problem for HTM we’d be glad to, and I for one am really curious how well your system performs at detection.

zahir_hamroune · June 30, 2019, 6:36pm

What kind of dataset your are using for ddos attack detection?
Thanks

osm1n · June 30, 2019, 10:33pm

That will be great and i will really appreciate that. The plan is to use a dataset and a stream network traffic coming from connected devices for online detection of the attacks. Then compare the performance of HTM against other unsupervised techniques used to solve these kinds of attacks.

Thank You

osm1n · June 30, 2019, 10:34pm

I am currently looking at this dataset here

I have tried using the dataset with HTM studio but seems like it wont work as its a multivariate dataset. I am not sure though.

sheiser1 · July 1, 2019, 3:11am

Would you care to specify which file(s) in which folder(s) you’re referring to? I have my own home made sort of HTM studio for multivariate data, if you know when the real attack(s) occurred I could run it through and evaluate the anomalies that way.

osm1n · July 1, 2019, 3:26am

Awesome. Any of these files with the entire features here

Afterwards this file with the 10 best final features

In the datasets the attack and normal traffic are mixed together - with attack traffic set as 1 in the attack column

Thanks

marty1885 · July 1, 2019, 4:51am

After staring at the logs for a bit. Looks like you need to transform the data a bit before apply HTM to it. Good luck!

sheiser1 · July 1, 2019, 5:27am

Hey @marty1885, would you mind explaining this a bit? Like what is it about the data that calls for transformation(s)? And what kind of transformation(s) do you think would be best? Thanks, just curious for your intuitions on this.

marty1885 · July 1, 2019, 10:09am

Sure!!

First, few know-how, domain knowledge and my co-worker have been working on almost the same problem in the past 6 months.

This data looks like something generated from NetFlow.
Since it is likely NetFlow, the subcategory columns is likely to be guessed information, I should probably ignore them.
Since we are looking for DDoS attacks. We should be looking for the total traffics going into an IP address. Not how many a client is sending to a server. (That can work, but is very noisy.)
pkSeqID looks weird. I started out guessing it is the TCP packet sequence number. But UDP shouldn’t have one and that number is getting too large. I guess they are generic IDs. Ignoring them should be a good idea.
Again, since we are looking for DDoS attacks, we should reconstruct the amount of traffic. To do so, you need the time of when each packets arrived. But that information is ignored in the reduced CSV. So you’ll need to extract the information yourself.
According to Matt, HTM works best when you are dealing with 5~6 variables. 10 is a bit too much.
This is NetFlow. It is an event based system. There is no way for the system to be running in regular intervals. (Which HTM relay on)

So, I’ll purpose the following processing steps waving hands.

Redo the data reducing step and add in the time stamp. We need that.
Now sort all rows of data according to time. Run a rolling window over time for each destination IP. Sum the traffic up, and as for the other attributes, you can average them, sum them or something else. My best guess is as good as yours. This is to reconstruct the traffic and have a regular interval.
If at this point you still have too much variable. Perform PCA on everything besides the amount of traffic.
If you are performing PCA. Remember to normalize the data before PCA.
Here you go, something HTM can accept.

osm1n · July 1, 2019, 3:58pm

Thanks for the detailed explanation. I do have a question though, what if the network traffic was captured in real time, then a DDoS attack was carried out using one of the host sending the traffic, is it still possible to use HTM in this scenario ?

Thanks again for your time guys.

marty1885 · July 1, 2019, 4:08pm

Thanks for your reply. First, real time != regular intervals. So you till need to apply a rolling window. Secondly, yes you can, but you will have better luck finding DDoS attacks from the server side then trace back to the attacker. Monitoring everyone’s outbound traffic is simply noisy and unreliable in most cases. In fact I might suggest you use the amount of connections made as the primary detection factor. Instead of the inbound traffic. Depending on the data.

osm1n · July 2, 2019, 5:30am

Alright, I am taking your suggestions close to heart and getting started with the data clean up. Thanks.

sheiser1 · July 2, 2019, 6:41am

When you have the data cleaned I’ll run it through my multivariate anomaly script if you’d like.

Topic		Replies	Views
I want advice on Using HTM for Anomaly Detection in Streaming Data Machine Learning	0	32	September 26, 2024
Flink Anomaly Detection HTM.Java	1	1746	July 24, 2017
Comparison between HTM and other anomaly detection algorithms NAB	3	1146	November 29, 2019
Could Someone Give me Advice on Implementing HTM for Real-Time Anomaly Detection in Industrial IoT Systems? NuPIC live , question , community	0	25	August 29, 2024
Advice For a New MS Student Lounge research , anomaly-detection , projects	12	1614	April 19, 2018

HTM and DDoS Attack detection

Related topics