Is my data being predicted correctly?

Don’t know unfortunately, you’ll have to ask others on this site.


I’m using my template myself and I had to update it to support multiple fields of data, which you can now also get here. Keep in mind that even if you feed multiple fields of data to the HTM, it can still only predict 1 metric. The way to use it now is as follows

./process_input.py input_file_name output_file_name last_record date_type predicted_field_name other_included_field_name, etc…

You use it similarly to the previous file, except this time you need only specify how many line records the swarm needs to swarm over or -1 to not run the swarm first. Since it has support for multiple data fields, the data field which you want to predict must be the first data field provided, then you can add as many other data fields that are inside your input file.
Example in use:

./process_input.py input-file.csv output-file.csv 3000 EU metric-to-predict additional-metric1 additional-metric2

or

./process_input.py input-file.csv output-file.csv -1 OFF metric-to-predict additional-metric3


This new template will also output a prediction quality analysis report, which works as follows:
say that HTM is fed a set of values with some patterns that it learns. At a certain point in time, the fed values are [5, 8, 2, 9, 4, 6, 7] and the respective predicted values at those same time points are [6, 5, 2, 10, 5, 6, 8]. HTM will rarely predict perfectly, but some set of model_params.py parameters will lead to better prediction results than others. A quick way to analyze how well the predicted results are compared to the actual values (instead of plotting everything inside a spreadsheet document) is analyzing both the absolute difference and relative difference. The first thing done is that the absolute difference between all values is calculated and summed up at the end. However, there is an unfairness in that if the data consisted of only small values (or very large values), and HTM predicted values that were off but also small, then the summed absolute difference would be comparatively smaller than the summed absolute difference of larger values. So to make up for that, the absolute largest and smallest value in the predicted-data-metric set is found and an absolute largest difference is calculated. Then, all calculated absolute differences between fed and predicted values are compared to the absolute largest difference by percentage. Finally, a record of percentages is displayed ie what percentage of predicted values were off by 0-0.99% of the absolute largest difference, what percentage of predicted values were off by 1-25% of the absolute largest difference and so on…


Note that utilizing dates (EU/US) instead of simple indexes (OFF) is treated as an additional metric by HTM as if you were doing this

./process_input.py input-file.csv output-file.csv 3000 metric-to-predict date-EU

By the way, since you intend to use multiple fields of data, I strongly suggest you read this post here