Anomaly detection - HTMs on k/v pairs - setting up the publisher/headers

phil_d_cat · May 10, 2017, 2:46pm

If my data set looks like {timestamp, location, type_classifier0, type_classifier1, result} and I want to run anomaly predictions on the result column (int values) would I:

a - set this up as

PublisherSupplier.builder()
      .addHeader("timestamp, location, type_classifier0, type_classifier1, result")
      .addHeader("datetime, int, int, int, int")
      .addHeader("T,B")

OR -
b - more like:

PublisherSupplier.builder()
.addHeader(“hash(keys), result”)
.addHeader(“int, int”)
.addHeader(“B”)
OR -

c - cache HTMs using columns 1-3 and pass in result values on some periodic schedule? IE save an HTM for each K/V pair where K = some_hash_of(location, type_classifier0, type_classifier1)

(using HTM/Key)

PublisherSupplier.builder()
      .addHeader("result")
      .addHeader("int")
      .addHeader("B")

Im still asking likely foolish questions as I haven’t had the aha! moment with this library yet.

cogmission · May 10, 2017, 2:57pm

@phil_d_cat,

The first…

You can pick and choose which field(s) you want to infer on, by setting the:
Key.INFERRED_FIELDS_MAP as an example, seen here which points to this static utility method.

…but once again, I refer you to the tests which vary the setup in countless variety - which can give you a sense for how the whole thing fits together

phil_d_cat · May 10, 2017, 4:40pm

How should I keep the various K/V pairs separated? Does the HTM ‘engine’ do this?

Topic		Replies	Views
Epoch timestamp? PublisherSupplier.builder() HTM.Java	3	755	May 8, 2017
Anomaly detection - params and questions / definition of terms HTM.Java	3	965	May 13, 2017
Anomalies vs predictions HTM.Java	1	854	May 17, 2017
About integration of Flink-HTM HTM.Java htm-implementations	5	906	June 20, 2017
Missing anomaly scores HTM.Java	6	883	June 5, 2017

Anomaly detection - HTMs on k/v pairs - setting up the publisher/headers

Related topics