Again, this is one of my stupid engineering ideas. In the past, HTM use swarming for hyper parameter optimization. But I find no data about how the hyper parameter space looks like. Is is like Neural Networks’s non-continuous, full of local minimum one? Or is it smooth and in fact we can use gradient assent for hyper-param optimization?
So I decided to map out a simple 2D parameter space (boost strength and # of bits). To see the surface for myself. The setup is to have 2 free variables, boost strength and # of bits of the SP’s output. And we measure the accuracy of the classification to create a 2D map of the hyperparam space.
(The classier scores a 40% accuracy when directly operating on the images without a SP. So it is possible to get worse performance with bad parameter.)
The parameter space isn’t exactly smooth. But (from experience) looks like momentum enabled gradient optimization can work. It also seems like we get the best performance when boost strength = ~0.1. Coincidentally being 1/num_classes
in this case. Not saying the relation is definite.
Other picture from another angle:
If we look at a slice of the map at boost factor = 0
, we can approximate the plot pretty well with a linear function. But I don’t suspect the trend keeping this way too much.
We can find the general trend (instead of the details) by smoothing the plot a bit. Basically, the more bits, the better. But you should set boost factor to 0.1
This is the source code for the plot. Run it in ROOT and the plot will show up.
And the original data.
Let me know what else/analysis can I do with this data. I have no idea.
Other findings
For some reason or another. Zen 2 CPU performs extremely well for HTM. My Zen 2 machine though running at a lower RAM speed, is overall 40% faster than a Zen 1.