Generalize vs. Memorize

This paper takes a look at deep neural networks from the viewpoint that each layer is an associative memory unit that could be replaced by a k-NN unit for example.
https://arxiv.org/abs/1805.06822
It shows that generalization and memorization are not antagonistic.

2 Likes

Great paper!

1 Like

If you use evolution for training it can exploit different aspects of associative memory (AM) behavior. At low storage to capacity rates some types of AM can have good error correction capacity and act like dictionary learning. Near maximum storage capacity they will do correct one to one vector mapping with input misalignments resulting in noise in the recalled vector. At greater than capacity you get noisy outputs but in higher dimensional space the angle between what would be the the correct output and the noisy output is statistically significant even at high over storage like 10 or 100 times too much. The evolutionary algorithm can decide how to handle all these different modes to best solve the problem at hand. I think back propagation would lack such discernment.

I guess also you can classify associative memory into 2 types: Those that partition the entirety of the input space into decision regions like k-NN, k-means, SOM. And those that have unassigned regions in the input space. That don’t produce any particular output (just a zero vector or noise) for inputs in those unassigned regions (eg. a standard neural network layer.)
It is interesting that k-NN can even give an improvement in generalization. k-NN though is so heavily quantized that anything like BP or evolutionary algorithms would struggle very badly to train it. I wonder if there is a continuous (not quantized) associative memory that fully partitions the input space? In the hope that it generalizes better.
Otherwise train with a continuous AM and for recall/production transfer that knowledge to a quantized AM like k-NN. But then you end up with something akin to a decision forest! How these things interweave and intertwine.

1 Like

This guy has a many useful ideas, some of which you could use to convert say k-NN to a more continuous form (less quantized) to make learning more tractable:
https://twitter.com/gabrielpeyre
Like Shepard interpolation.

I have an out of control backlog of things to try.
Anyway, what about the decision regions of Numenta sequence associative memory?

1 Like

[SIGGRAPH 2018] Neural Best-Buddies: Sparse Cross-Domain Correspondence:

Thanks for the link. I’ll have a look at it.
Decision trees always produce a definite output for any point in input space. Only some types of associative memory have that property, like k-NN.

I had in mind a type of decision tree that always worked off the full input data, rather than the usual type where the decision regions are always split irreversibly into sub-regions at each branch. That could be wasteful, but then there are many advantages. Anyway such a tree is possible because repeated distinct random projections can provide infinitely many different views of the input data.
I see a way now of combining soft versions of such decision trees into decision forests arranged in layers the same as current deep neural networks. I’m not sure if you could train those by back propagation, I will only try by evolution. Anyway it is very interesting. I get to combine a lot of ideas I’ve been entertaining into one system. With a far reduced code burden.

Also modern compilers should be able to auto-vectorize the rate limiting steps in such code. Meaning I won’t have to write specialized (not portable) SIMD assembly language instructions. The only question is how difficult the evolution algorithm will find it. It should be okay. A hybrid decision forest, deep neural network.

1 Like

Sequence memory and associative memory.

Sequence memory can be a recorded sequence of SDR matrix activation.
And, or, with a sliding window algorithm that takes pieces of the past SDR
matrix activation are used to
build future SDR matrix activation. The selection of the past data by piping into
the future. But the logic of having a SDR bit that represent how data is piped
into future SDR matrix. There are many type of piping schemes. From simple to
complex pluming structures that could work as well as a painters algorithm.

Associative memory is an arrangement images or data in grouping of similarity.
That could be used to build new patterns with the next most similar object for
wondering mind to build new pattern sequences.

Okay, here is the code I have for nearest neighbor with Shepard interpolation based neural networks:
https://github.com/S6Regen/Wild_Irish_Rose
The code is very definitely alpha. Anyway initial testing shows it to be quite evolvable and fast. I presume the java hotspot just in time compiler is auto-vectorizing it.

1 Like