Newbie questions

artifex · August 22, 2017, 11:06pm

Hi

Thanks. I guess that I’ll run it first and then plot the results on MS-Excel or something similar.

Just another question that is more philosophic really:
From what I understand, if I would like to run the anomaly detection on a different data that has different columns in the CSV and “behaves” differently I should run it again through the swarm to generate the correct parameters yaml file. To me, it seemes a bit simmilar to what “conventional” machine learning techniques do: they run a batch process every once in a while to choose the best algorithm for the data and then uses the algorithm found to process the data at near-realtime. According to the critisism that Jeff Hawkins daid in one of his talks and in his book, our brain doesn’t work like that so, does that mean that the swarming process is a temporary thing that you hope to remove the need for it in future releases?

Tnx,
Yuval Khalifa.
Blog. http://www.artifex.co.il/en

rhyolight · August 23, 2017, 1:08am

@artifex I think you could have helped yourself here faster than we could have helped you. Just take the keywords of your question: “anomaly detection swarming” and search the forums. Most of these search results lead to the answer to your question:

jamrufin · August 23, 2017, 11:00am

I’ve just installed Nupic and get the same blank screen in Matplotlib (I installed it via Ananconda), I then copied a 4 line code example from Matpoltlib website to check it is working correctly.

^CTraceback (most recent call last):
File “./run.py”, line 151, in
runModel(GYM_NAME, plot=plot)
File “./run.py”, line 141, in runModel
runIoThroughNupic(inputData, model, gymName, plot)
File “./run.py”, line 122, in runIoThroughNupic
output.write(timestamp, consumption, prediction, anomalyScore)
File “/home/noob/nupic/examples/opf/clients/hotgym/anomaly/one_gym/nupic_anomaly_output.py”, line 298, in write
self.highlightChart(anomalies, self._anomalyGraph)
File “/home/noob/nupic/examples/opf/clients/hotgym/anomaly/one_gym/nupic_anomaly_output.py”, line 251, in highlightChart
color=highlight[2], alpha=highlight[3]
File “/home/noob/anaconda2/lib/python2.7/site-packages/matplotlib/axes/_axes.py”, line 916, in axvspan
p = mpatches.Polygon(verts, **kwargs)
File “/home/noob/anaconda2/lib/python2.7/site-packages/matplotlib/patches.py”, line 950, in init
Patch.init(self, **kwargs)
File “/home/noob/anaconda2/lib/python2.7/site-packages/matplotlib/patches.py”, line 141, in init
self.update(kwargs)
KeyboardInterrupt

py test script

import matplotlib.pyplot as plt

plt.plot([1,2,3,4], [1,4,9,16], ‘ro’)
plt.axis([0, 6, 0, 20])
plt.show()

artifex · September 3, 2017, 7:19pm

Hi everyone,

First - I’d like to thank you all for patiently and slowly answering all my newbie questions so far. At the end I used Graphite and Grafana to plot the charts (simply because I know how to use them very well…) and I managed to handle the presentation very well so thanks.

But now, after I managed to do that, people at my office want a bit more so I thought of some use case that I’d like to show and I’d like your opinion on this as well as any links and docs that you think might be even remotely relevant.

Since that I’m working on a system that should detect cyber related attacks I thought of two main scenarios:

Since many types of malware today use something called DGA (Domain Generation Algorithms) to create and access URLs I thought it might be possible to send to the HTM lines of domain names (each line would contain a single domain name) and to get it to find those that seem a bit “off”, for example: if after sending many domains like “www.google.com”, “www.yahoo.com” and “www.numenta.org” something like this “dsadfasdsdfferettvvbbnggrtttrtdsss.biz.il” would appear I’d like to be notified.
In order to find inputs with irregular characters in them (to detect XSS and SQLi for example) after sending in texts like “David”, “Jeff”, “Sandra”, “Jenny” if something like “or 234=234 --” would appear I’d like it to find it as an anomaly. I thought that by measuring the number of occurrences of each character and feeding the HTM with the frequency of each character (therefore requiring many fields) and then detecting an anomaly in each of those metrics but after watching this https://www.youtube.com/watch?v=gYOwBlVuJDw I came to believe that it might not be the best idea. right?

Any ideas and data will be most kindly welcomed,

Tnx,
Yuval Khalifa
Blog. http://www.artifex.co.il/en

artifex · September 4, 2017, 3:39pm

Any thoughts anyone?

jimmyw · September 5, 2017, 2:31am

@artifex I find it hard to imagine HTM offering an advantage over existing solutions to those specific problems.

Scenario 1 would be covered by a mix of rule based heuristics and cloud reputation services, and I think HTM could introduce a lot of false positives without any gain. Scenario 2 is small enough in scope that regexes do a good enough job.

Perhaps you could explore combining time-based streams of system monitoring data in such a way as anomalies could indicate a cyber attack? example input sources might be aggregations of number of active visitors, volume of web requests, unique URLs hit per visitor, backend compute activity (e.g. database IO)?

artifex · September 5, 2017, 3:53am

Hi,

Thanks a lot for your quick reply. Maybe this was a bad choice of scenario… (-;

The system I’m currently building is generating a lot of metric data that measure the user’s, computer’s and network’s behaviour (such as number of drops in the firewall for every ip, number of logons for each user, number of bytes transferred per protocol, udp vs. tcp ratio, number of IDS alerts per alert type, etc.) Is it possible to use HTM to detect anomalies in the combination between these metrics (for example: usually when metric A is around 10, metric B is around 40 and metric C is around 20 but today it’s different)

I hope that this example is better…

Tnx.

jimmyw · September 5, 2017, 6:48am

The concept of combining independent models is discussed in this paper.

The TLDR version is basically yes, you could do that in theory, but in practice it can be better to use a number of smaller models and combine the result.

If you’re experimenting with HTM, perhaps you could try applying both approaches and see which works best?

Topic		Replies	Views
Model.yaml NuPIC question	1	414	October 29, 2018
AnomalyDetection :- No model named model_factory NuPIC	4	2052	July 14, 2018
HTM do not perform well when learning a simple function like y=x! NuPIC	23	1545	February 6, 2018
Importlib error NuPIC	11	910	April 14, 2017
Yaml convert NuPIC	1	466	September 7, 2017

Newbie questions

Related topics