Here is an article that contains rare information about the history of associative memory.
The Hopfield net cannot work well because while the weighted sum can provide error correction (to allow attractor states) it is rather a poor form of error correction.
With a group of weighted sums there is a pull toward the actual weight vector direction of each weighted sum which really an inseparable superposition of many of the training examples. Not a specific stored training target.
The exact geometric direction of the weights in a weighted sum is something Achilles heel. Another messy case to deal with.
And yet it ought to be dealt with, as with all the other messy cases (eg. sparse versus dense inputs.)
Anyway:
https://thegradient.pub/dont-forget-about-associative-memories
1 Like
I had to fix the link for some reason. It should work now.
1 Like
I can’t find publicly available information about morphological associative memory but I found this about alpha-beta associative memory:
https://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1870-90442016000200043#:~:text=Associative%20memory%20has%20been%20an,sense%2C%20and%20most%20frequent%20sense.
I’ll have to let it sink in a bit.
The first thought is it like some kind of intermediate target AM.
A training pair input, target (a,b) is augmented with a unique intermediate target to give (a,i,b).
And then you train 2 AMs. One (AM1) with (a,i) and the other (AM2) with (i,b).
Recall is AM2(AM1(x)).
If the vector i is sparse (eg. only 1 non-zero element set to 1) then you can error correct by re-sparsifying the result of AM1(x) so that only 1 element is 1 and the rest have been set to 0.
Intermediate error correction. There are many other ways of doing intermediate error correction.
Somewhere along the line an obsession with AM doing error correction during recall entered the room.
That demand is seldom mentioned in relation to deep neural networks where generalization is wanted.
IDK why the comparison difference?
1 Like
It seems as though diffusion models have been fairly successful at overcoming the limitations faced by the original associative memory architectures. They do this by learning a sequence of loss functions that start out very broad (with the gradient descending towards the global mean of the target manifold), but then get increasingly detailed (descending towards local minima in the learned manifold) while adding little bits of noise (perturbations) along the way.
See this video from Welch Labs (via. 3Blue1Brown) for a fairly approachable breakdown of diffusion models.
2 Likes
I’ll look at the video. The low level aspects of neural networks are more than enough for me.
I would not be surprised if diffusion models had to be very over-parameterized to get variance reduction out of the weighted sums in the network.
Maybe I’m wrong about that.
At the moment I’m thinking about training 2 AMs on the same data.
And then suppressing the outputs when they disagree about the response to a particular input.
2 Likes
That’s a reasonable approach. However in order to avoid repeating each other’s “mistakes”, might be advisable to store different data in each.
Also I’d be careful with what you mean by “disagree”. Because, in “real world” there may be multiple valid “answers” for the same “question”.
I personally like (Approximate)Near Neighbor search. There are plentiful libraries and databases, and they are a pretty decent approach towards associative memories.
@CollinsEM nice video, thanks.
1 Like