I seem to be on some kind of documentation spree.
https://archive.org/details/column-selection-in-re-lu-networks-as-hashing
These could be helpful:
https://archive.org/details/associative-memory-as-a-hashed-linear-readout
Or the picture version:
Generally speaking though I’ll leave the material for people to find themselves.
1 Like
This is the current “neural network research community” best effort:
https://arxiv.org/pdf/2604.21691
:~ kind of reminds me of the painting The Scream. Which I nearly did when I read their paper.
1 Like
To be fair to that paper they are saying there must be some mechanistic view you can take of neural networks since they are so simple and then they expect to apply (formal) mathematics to that.
They just seem to be stuck at some kind of impasse.
They probably cannot see past ReLU as a function.
It is a bit of trick item in that when you graph it, it 100% looks like a function. However there is a gate (or switch) interpretation of its behavior - which there is a definite mental barrier to grasp.
It is a coin with 2 faces.
Also you would need to have investigated hash based associative memory and in particular locality sensitive hashing based associative memory to understand the mechanics and math going on.
Almost no one has studied associative memory in years, not since the mid 1980’s really.
They are stalled, yet they sense there is simple mechanistic explanation which they are oblivious to.
1 Like
I have a picture of ReLU as a gate system:
https://www.pinterest.com/pin/1057712662505592958
I had to put it on pinterest because archive.org didn’t like the science and wholesomeness.
1 Like
Maybe this can help people understand the associative memory aspect of ReLU neural networks:
https://archive.org/details/a-note-on-column-selection-linear-layers-and-associative-memory
1 Like
You could make an argument that biological neural can have x.gate behavior. The gate is zero until a certain threshold of input activation. Then it switches on (1) and the output is approximately linear in x.
Anyway I fired some Bloom Filter ideas at chatGPT and this is an idea we arrived at:
“We propose using a Bloom-filter-like structure to track visited regions of input space and modulate plasticity in a conditionally routed linear network. This results in a progressively frozen tiling of the input space, reducing catastrophic forgetting without explicit replay or parameter regularization.”
I don’t really propose it, I just note it. Where you reduce plasticity on previously indexed routing choices that have already been learned but allow more major changes on novel inputs.
It might be more sensible to have a visitation counter for each weight in a neural network and reduce the plasticity with count. But that would only work with neural networks that have very decided gating, such as ReLU based networks. With other nets you would have to integrate some other measure rather than count.