I used some of the Yahoo Webscope data in my thesis. My thesis is not publicly available at the moment because the work is pending publication. In short, in my findings, the Numenta algorithm for anomaly detection was successful for a lot of the data inside the Yahoo Webscope database. There are various datasets in that database however that the Numenta algorithm was not suited for. More concretely, datasets where patterns are not repeated are not suited for the HTM networks (under a standard scalar encoding scheme). There are a handful of datasets, for example, in the Yahoo Webscope dataset that feature a signal that is constantly increasing (or decreasing) for the snippet that is asked to be analyzed. Constantly increasing to new values each timestep means new SDRs each timestep which result in high prediction error throughout the snippet because no repeatable pattern is being captured by a standard scalar encoding scheme. HTM networks operate by predicting repeated patterns of neuron excitation. If you are not encoding any repeatable patterns, HTM networks will be useless.
That being said, when repeated patterns of neuron excitation were being encoded, the Numenta algorithm was very useful. There were still some failures and I used those failures to demonstrate the benefits of my thesis contributions. I’ve attached one of the figures from my thesis that reported on a dataset from the Yahoo Webscope database.