Anomaly detection - HTMs on k/v pairs - setting up the publisher/headers

If my data set looks like {timestamp, location, type_classifier0, type_classifier1, result} and I want to run anomaly predictions on the result column (int values) would I:

a - set this up as

PublisherSupplier.builder()
      .addHeader("timestamp, location, type_classifier0, type_classifier1, result")
      .addHeader("datetime, int, int, int, int")
      .addHeader("T,B")
  • OR -
    b - more like:

    PublisherSupplier.builder()
    .addHeader(“hash(keys), result”)
    .addHeader(“int, int”)
    .addHeader(“B”)

  • OR -

c - cache HTMs using columns 1-3 and pass in result values on some periodic schedule? IE save an HTM for each K/V pair where K = some_hash_of(location, type_classifier0, type_classifier1)

(using HTM/Key)

PublisherSupplier.builder()
      .addHeader("result")
      .addHeader("int")
      .addHeader("B")

Im still asking likely foolish questions as I haven’t had the aha! moment with this library yet.

1 Like

@phil_d_cat,

The first… :slight_smile:

You can pick and choose which field(s) you want to infer on, by setting the:
Key.INFERRED_FIELDS_MAP as an example, seen here which points to this static utility method.

…but once again, I refer you to the tests which vary the setup in countless variety - which can give you a sense for how the whole thing fits together :wink:

1 Like

How should I keep the various K/V pairs separated? Does the HTM ‘engine’ do this?