Welcome @mh01223, and cool sounding application! In general the more specifically you show us your data and tell us the nature of the anomalies the more we can guide. Given a valid framing of the problem it shouldn’t be too hard to get some initial experiments running with NuPIC.
I agree with Sam, show us the data early!
@sheiser1 Thanks for the response guys. The data consist of 56 telemetry channels spanning 5 years of operation of an educational Satellite called FUNcube-1. A couple of initial issues I have with it is:
1)There are time steps missing. Telemetry is taken every 5 seconds over a year, but I only have roughly 1-1.5 million data points a year when I should be having ~6 million. This is due to groundstations being unavailable to process the data at certain times. Assuming I ran them all sequentially would an HTM network be able to adjust to sudden pattern changes given it is using an anomaly likelihood rather than the raw value to predict anomalies?
2)The data is unlabelled so it is hard to judge an HTM’s anomaly detection capabilities, although there are 4 large spikes in each of years 2,3,4 and 5 where the sensors max out, possibly due to solar events. The value of an HTM over a limits-based anomaly detection system in this case would constitute an earlier detection and quicker triggering of Satellite ‘safe-mode’.
I’m currently pushing my professor to find my better data, but until he does this is what I have to play with. The fact that it isn’t a commercial mission means the Satellite isn’t monitored that closely. Assuming that other than the spikes, the rest of the data is anomaly free, the performance of any network would be quantified by how closely it can match the existing pattern. I can send you the data if you’d still like to take a look? I presume I can attach the csv files here?
Yes, as long as you encode time in the input encoding along with any scalar values.
Unlabeled is fine…
What does the data represent? What is meaningful for extracting patterns? If you were the analyst assigned to this task, what would you do with the data to make it more meaningful?
The data mainly consists of voltage, currents and temperatures of the Solar Panels, batteries, antennas, bus interfaces etc. An example of dependence is, the voltages on each solar panel can indicate the orientation of the Satellite relative to the Sun and thus will influence many of the temperatures around the system. I understand how numerical values can be encoded to have similar overlapping representations but I’m not sure how I would be able to create a certain SDR that can quantify the co-dependence of all the telemetry values. It seems like an impossible task…
I don’t think you need to model to co-dependece specifically, just analyze the raw data streams.
The sounds good to me. The approach I use on this multivariate anomaly detection scenario is to analyze all the streams individually (one NuPIC model for each metric), and look for times when numerous metrics are being anomalous at once. So if there are 56 metrics that’d make 56 models (monitored simultaneously). You could define a system anomaly as any time step when 10 or more of the 56 are anomalous, and report which 10 are causing it. I actually have code that does this, so if you’d like I could run your data through it and evaluate the anomalies detected by proximity to the real anomalies:
NuPIC can’t take in null data and every 5 seconds sounds way faster than you need. What I’d do is aggregate this way down, to maybe one data point per hour. This will get rid of the missing values, smoothen the signals and reduce the raw volume of data by a factor of 720.
If you’d like to share your data and give the time steps where the real anomalies occurred I could test this out and see how good the results are.
I see, that makes sense. Yes, if you could take a look that would be great. Can I attach data on this forum? If so how? I can’t see any attach option.
An issue with aggregating over larger time scales is that for an anomaly detection system to work real-time, we would need it to react as soon as possible, so something like an hour would be too long. The anomalies generally last for a few minutes.
You mean that the anomalous events (those big spikes) only last for a few minutes right? Let’s try aggregating to a smaller level, maybe every 5 minutes then. What may be the case is that NuPIC would find anomalous activity before those spikes set in, since temporal dynamics can often change subtly in the run up to such big shifts. In this case it may not actually matter that the big spikes themselves are brief in terms of early detection, experiments will tell.
I don’t think you can upload the data to the forum, though you can send it to me directly if you’d like, I’m at firstname.lastname@example.org. How big is your data file? I guess the aggregation could also help w/these logistics by compressing the file lol.
Yes, that’s what I was hoping to notice. I have them compressed as csv.tgz format if that’s ok? I’ll send them over now.
They’re a little under 50MB each
The links are asking to me sign into U of Surrey, would you know any way around that?
Ok, the files seems to be over the limit to be attached by email. The easiest way would probably be via Skype; it has a 300MB limit so I can send you them through the chat there if you have an account?
Can you just show us a few lines of the data? We just want to see the structure.
Sent Sam the full data via skype.
This is an example of some of the data during normal operation.
This is an example of the data during an anomaly, likely due to a Solar Event. All the measurements seem to top out to the same values. A good test would be predicting these before they occur; ie. perhaps there was a smaller stream of charged particles before the heavy stream hits.
Quick plot of the anomaly for the Total System Current: