Lack of Biological Correlation: Are these NN Flaws still valid?

Read here and decide?

http://www.i-programmer.info/news/105-artificial-intelligence/7352-the-flaw-lurking-in-every-deep-neural-net.html

Does anyone who’s familiar with classical NN tech know if these limitations reflect the current state of NNs?

Yes as far as I know adversarial examples have a certain universality to them. They have been attributed to linearity in NN,and at very high dimension spaces, the volume tends to concentrated at the surface, so is relatively easy to find such examples. Moreover it has been seen that adversarial examples have generality as such that they are independent of architecture or training data.

Please see paper below…

3 Likes

The current deep neural networks are superhuman. That is because they use non-local learning.
Whereas humans and Hopfield nets are stuck with local learning at each neuron.
The 2 minute papers channel on youtube does an excellent job of showcasing the current state of the art with deep nets: https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg
To do the non-local learning you surely need multiple GPU’s to allow experimentation within a reasonable time frame.
The alternative is local learning (which seems to me synonymous with attractor state learning.)
That can be much faster. The is also the idea that maybe you can elevate the Hopfield network one step above what is possible in the human brain by using decision trees combined with ID3 as a reworked type of neuron. Anyway there is some relatively recent work that improves the learning algorithm for conventional Hopfield networks. It uses a minimum probability flow training algorithm:
https://arxiv.org/abs/1411.4625
https://arxiv.org/abs/1204.2916

Anyway Numenta should get its head around random projections. The papers out there are horrifyingly mathematical. However the basics are simple to conceptualize. Each point in the input maps to Gaussian noise like pattern in the output (unbiased). Unlike the Fourier transform which maps a point to a sin or cos wave which is a biased transform not least because it maps a point to a single frequency. Usually each of the random patterns are orthogonal and then you can invert the random projection.
This gives you the ability to create a distributed representation of your input data.
I mention this because you can then use the idea from deep neural networks of dropout.
RP your input data, randomly zero some elements and then do another RP. The idea is that when you use that as input to a training algorithm it will learn to respond to a region around the given input and not just exactly the input itself. That could help you create a learning algorithm that had a better ability to generalize. However the deep neural net researchers apparently don’t understand how or why their own idea works, so there you go.

So the layers in a deep neural network that has just been randomly initialized are effectively random projections. When an error signal is back propagated it ends up (via summation) as Gaussian noise via the central limit theorem. Presumably that Gaussian noise is driving search.
The question is whether there is some implicit evolutionary algorithm, or a dissapative algorithm operating. I would guess it is a dissapative algorithm with the weight parameters relaxing into a local minimum. There are suggestions in the literature that there are a multitude of local optimum that are still very good in such nets.
Anyway the “dropout” idea should help with things like the Hopfield network as well by helping carve out a response region for things to fall into. What use is a singularity if it doesn’t have a gravitational field to drag in nearby things?

So the idea as to how current deep neural nets learn is that of a Jostle net. The weights are getting knocked around by Gaussian noise via summations (via approximate random projection) of the back propagation errors. When by happenstance the weights end up in good configuration they are less likely to be knocked around further. Thus the system gradually sheds disorder and settles down.

You’re talking about using random projections for encoding data into SDR format, yes?

You can use random projections to make sparse data for sure.
I dug out a non-assembly language version I wrote from my files :https://drive.google.com/open?id=0BwsgMLjV0Bnhc0ZnN2ZqS3Rua2s
I may still have similar Java code somewhere.