I normally test my code with MAPE, RMSE, NLL and the like … but those seem to me somewhat artificial.
If you ask a human to tell you is your model predicting well, you will most probably get greater score if the modeled signal follows the SHAPE of the original signal even if there are big fluctuations … compared to small fluctuations but wrong shape.
Have you heard of such measures OR probably you have some ideas ?
May be you smooth the original signal or something !
As an example :
On the NY taxi stream i get MAPE : 7%, but on the hot-gym i get MAPE: 23%.
May be because hot-gym has more abrupt changes, so failures in predictions, cause spikes to show up here and there.
If I zoom on hot-gym it looks fine to me !
PS> Do you guys take into account first 200-300 data-points when calculating those measures (the algorithm is still adapting at the begining)