The grokking challenge?

cezar_t · May 5, 2023, 11:44am

I have played a bit around this problem and so far I had best results with the following setup:

Each 0 to 96 input integer is encoded as a 400 long dense vector with a VarCycleEncoder(*).
Since addition (modulo N) is commutative the dataset contains only pairs (X,Y) without the corresponding (Y,X) pair.
For the same reason, instead of having a 800 long dense input for a pair of values, I added the dense representation of Y and X to get a 400 long dense representation of Y and X together
the resulting vector (Y and X) is SDR-ified at different sparsification levels e.g. 50/400, 80/400, 100/400, 133/400, 200/400 to obtain different bit only input data sets for all possible (97*96/2=4656) pairs.
a random half of these pairs are used for training, half for testing
training was done on a sklearn MLP regressor with 97 outputs instead of a classifier. The reasons are is I figured out how to keep the regressor train indefinitely, long after a classifier would have stopped with 100% accuracy on training data. Probably MLP Classifier have similar settings too.
The hidden layers for these results were (400,200,200,200,200,200) . Various depths and widths might work, adding more depth has slight improvements.

The best result I got was 99.4% on the testing half after 200 iterations, which takes ~103 sec. on an old, 2 core/4 threads laptop.

What I found it interesting is:

encoding matters. A lot. Other encoders had much poorer performance.
“SDR-fication” matters again, a lot. Just feeding the MLP the overlapped dense representations had much poorer results within the same compute budget, than shifting highest values in the dense vector to 1 and lower ones to 0
Contrary to Numenta’s SDR theory, best results (in this case of MLP trained on SDR) were obtained with very low sparsity, 1/3 bits 1 and even half 1 half 0 bits yielded close to top results.

Now, sure it raises the question upon why using a “special” encoding instead of the “neutral” one-hot encoding. That’s a long discussion. In my opinion, searching for and finding out efficient encodings which might also be useful for different problems is a valid path for several reasons I would gladly discuss.

Topic		Replies	Views
HTM underfit/overfit Numenta Theory	8	999	January 16, 2017
Generalization benchmark? Numenta Theory	9	978	June 4, 2018
Learning high order sequences Tangential Theories	6	1237	July 10, 2016
I have begun the development of a new NEAT engine Engineering pyramidal , htm	12	248	April 8, 2025
Numenta Technology Demonstration: Sparse networks perform inference 50 times faster than dense networks, with competitive accuracy Machine Learning	6	1222	November 12, 2020

The grokking challenge?

Related topics