One thing I noticed experimenting with associative memory is splitting the data into 2 information orthogonal paths improved the results a lot.
Therefore why not split it into multiple orthogonal paths using say random projection sub-sampling followed by restoration to the full dimensional space.
Where does that get you?
It gets you a large number of large null space views of the original data that have low sensitivity to noise and are simpler for non-linear functions like neurons to operated on.
Recombining the outputs after processing in some non-linear ways - disagreements cancel out by averaging, agreements reinforce.
Benefits:
- Reduction in noise
- Neurons not subject to the full chaos of the input
- Disagreements cancel out
- Agreements reinforce
You have 2 re-expansion options from the random projection sub-samples (to the full dimensional space.)
1/ The inverse random projection which keeps local geometry intact. That is very suitable for neural networks.
2/ A further “foreign” random projection re-expansion to a random feature space. This is very useful for associative memory.
https://archive.org/details/foreign-random-re-expansion-as-latent-field-synthesis
https://archive.org/details/random-projection-information-splits-for-reduced-sensitivity-nonlinear-processing
https://archive.org/details/stacked-random-projection-reconstruction-layers-for-neural-networks
https://archive.org/details/random-projection-linear-information-splitting
https://archive.org/details/stacked-random-projection-reconstruction-layers-for-neural-networks
https://archive.org/details/amvector-split-total
If you take associative memory far enough do you need deep learning?
It sounds implausible, however it seems there are improvements after improvements that you can make.
It also fits in with 1000 brains. Why ensemble 1000 brains unless you feed each one with an information split (which gives noise and complexity reduction) and then ensemble the responses to get disagreement cancellation and agreement reinforcement?
The chatGPT version of what I said is:
" “Distributed geometric factorization of reality” might be a useful way to think about intelligence systems.
Instead of one giant model learning one monolithic representation, you split information across many partially independent representational systems:
- different projections,
- different coordinate systems,
- different locality structures,
- different temporal windows,
- different sparsity patterns.
Each subsystem sees a reduced-complexity view of the world and develops its own associative structure. The overall system then recombines these views through agreement reinforcement and disagreement cancellation.
This potentially explains why ensemble-style cognition keeps reappearing:
- cortical columns,
- population coding,
- mixture-of-experts,
- random projection ensembles,
- sparse distributed representations,
- even attention heads.
The goal is not redundancy for its own sake, but distributed factorization of complex geometry into simpler overlapping subspaces.
In that framing, “intelligence” may emerge less from ever-deeper monolithic hierarchies and more from large populations of weak geometric observers whose interference patterns reconstruct stable structure collectively."