Power-of-Two-Based Permutation Logic

I’m just reading this at the moment:
Power-of-Two-Based Permutation Logic

Generally though I am more looking at papers about saddle points etcetera in deep neural nets.


The basic idea is that you never actually end up being trapped in a local minimum while training a large neural net. There is always some vector direction you can move the weights in to get an improvement. Ie. you can’t go wrong using some variation of the random hill climbing algorithm (for descending!!)
I guess it ends up that the structure of a deep neural net is stupid and the algorithm you train it with is stupid. However if you throw enough floating point operations at it with a GPU cluster you can train it to do things in a few weeks that would take an equivalent biological system millions of evolution to be able to do.

When I try random hill climbing with small nets it doesn’t work too well (or at all.) When I try it with large nets it seems to go far more smoothly. The scaling effects look to be counterintuitive. I guess that is the reason people abandoned the idea in the 1980’s when they could only experiment with small nets.

The biological paper then is suggesting (sparse) locality sensitive hashing. Okay, I’ll think about it that way. Maybe I’ll investigate multiplexing and demultiplexing as well.

I suppose repeatedly demultiplexing from 4 to 15 (+ 1 null) states is a very expansive and very sparsity inducing. In some forms of machine learning you create millions of variations of the input data and then choose some of those variations that best explain the target. Of course the chosen items can be demultiplexed into new variations and used to further explain the target.

Anyway you could have made something like that in the early 1970’s:http://www.ti.com/lit/ds/symlink/74ac11138.pdf

So recurrence plus demultiplexing to create a reservoir and then linear readout. It that it?
Well, who can be sure yet.


I’ll save you the trouble of having to look for the paper:


And just as an aside:

It kinda, sorta sounds as if Hawkins/Ahmad synapse building results in micro-correlation learning in the reservoir. Making the reservoir less sparse over time and increasing the number of non-linearities that can be drawn from to make predictions. That is a dual learning system that works together to create great richness in the system yet being very simple to train. And in fact having very obvious and biologically plausible learning mechanisms.
Are kinda and sorta valid scientific terms?

I’m kinda using your forum as a collecting point for related ideas, hope it’s not annoying.
I guess a simplification would be to use Hawkins/Ahmad type neurons as micro-correlators but not to feed the outputs back into the (random) reservoir. Then learn read-out projections from the whole kit and kaboodle to the wanted result. So while the reservoir might only contain a cue for a bright light and a cue for a loud noise a Hawkins/Ahmad neuron projecting into the reservoir could (unsupervised) learn any correlations in the environment between the two cues (simultaneous or one followed the other.) You are enriching the reservoir with unsupervised correlation learning. Obviously if that neuron were to fire it would be good idea to initiate fleeing behavior.
Such a myriad of possibilities though, AI is all about the structure.


I’ll leave it there for the moment and think about how best to code such ideas.