Scaling 'hot gym' for multiple locations


#1

How would I go about scaling up an app like the ‘hot gym’ if the data was coming from multiple locations? Would iI need to create a Network for each location and retrieve / store this as each value is sent through?

IE Assuming the data is coming in as {timestamp, param0, param1, param2, result} and the values for param0-2 make up a “rich key”. Assume I only want to compute anomalies or make predictions on the values for result.

Further: in this topic you reference the INFERRED_FIELDS_MAP. Can you go into more detail about how to use this and how it would apply here?


#2

Maybe I’m not understanding you, but I don’t see how data coming in from multiple locations matters to the HTM? If you mean scaling in terms of processing speed, you are only going to get the speed you get and multiple models aren’t going to help that because each model needs all of the data because models in HTM-Land process sequences any of which if missing, would result in inaccurate inferences?

The INFERRED_FIELDS_MAP is simply the subset of fields (out of all the fields in your input), which you want to detect anomalies in or do prediction on.

And is only relevant if you are doing “Prediction” as opposed to “Anomaly Detection”, because its purpose is to assist the “Classifier” assembly in creating classifiers for only the desired fields in the input, rather than create classifiers for all fields - like it was doing previous to the INFERRED_FIELDS_MAP settings and code.


#3

Maybe I’m not understanding you, but I don’t see how data coming in from multiple locations matters to the HTM?

Im thinking of a system that would do anomalies / predictions for multiple, independent locations. IE The values used to do computation for one site wouldn’t impact / influence those of other sites


#4

You would have to test it and see how many instances of the same model you need to run, depending on the request handling speed. One advantage is that you can run the same model everywhere since HTMs can be serialized. You would have to guarantee serial querying though, since the HTM isn’t guaranteed to be concurrency safe.


#5

I would first try a model per location. If that doesn’t give you the results you want, perhaps a multi-field model. But we’ve found that model-per-data-source does pretty well for anomaly detection.