Measuring : Shape similarity !?


I normally test my code with MAPE, RMSE, NLL and the like … but those seem to me somewhat artificial.
If you ask a human to tell you is your model predicting well, you will most probably get greater score if the modeled signal follows the SHAPE of the original signal even if there are big fluctuations … compared to small fluctuations but wrong shape.

Have you heard of such measures OR probably you have some ideas ?
May be you smooth the original signal or something !

As an example :
On the NY taxi stream i get MAPE : 7%, but on the hot-gym i get MAPE: 23%.
May be because hot-gym has more abrupt changes, so failures in predictions, cause spikes to show up here and there.
If I zoom on hot-gym it looks fine to me !

PS> Do you guys take into account first 200-300 data-points when calculating those measures (the algorithm is still adapting at the begining)

We typically use MAPE in NuPIC, but in our papers we have also used negative log likelihood. We don’t touch the original data but we use a rolling average of the error metric when we plot a continuous error. For reporting final accuracy we only use the last N records (N is anywhere from 1 thousand to 10 thousand). This paper contains some details:

  1. Y. Cui, S. Ahmad, J. Hawkins, Continuous online sequence learning with an unsupervised neural network model. Neural Comput. 28, 2474–2504 (2016).