OMG AI Winter


#1

Come now, you have to know that the AI winter is coming whether you like it or not.
Here is Gartner’s best guess on when:
https://www.forbes.com/sites/louiscolumbus/2017/08/15/gartners-hype-cycle-for-emerging-technologies-2017-adds-5g-and-deep-learning-for-first-time/#57ac6fdc5043


Does anybody else find "Sophia" concerning?
#2

Evolution in digital systems may be thousands or millions of times more efficient than in biological systems. It is certainly more efficient than point mutations in biological systems which are mostly deleterious. And is probably much better than crossover which is a weak optimizer but allows nonlethal mixability of traits. The brain then in my view is a deep neural network that has been “trained” over time by evolutionary processes, connected to a vast repetitious memory system. The deep neural network being the logical CPU of the brain and the neocortex the memory.
You can evolve in a few weeks on a GPU cluster something equivalent to the human visual system that took millions to billions of years to evolve in biological systems. Depending on whether you want to include or exclude the time taken to evolve the extremely complex internal chemical processes of the cell, which were a far greater computational burden to create than intelligence.
In other words you are simply going to see greater than human intelligence in less than 5 years.


#3

I’m going to be a bit technical here, because claiming evolutionary computing can create intelligence is an absolute over simplification. Evolution works in mysterious ways. The biggest problem faced by evolutionary algorithms is the problem of objectives. Once, intelligence gets involved evolutions is skewed. In fact, the famous Hinton has published a paper on this (https://pdfs.semanticscholar.org/5d6c/84e7cd46d0a520ad6784a0f7f6825ef83685.pdf).

So, in a sense in order to evolve a brain in a computer system, you can’t simply write a program and use an objective (in this case survival). In fact, Kenneth Stanley, has a fantastic talk on this subject (https://www.youtube.com/watch?v=dXQPL9GooyI). He is the inventor of NEAT, HyperNEAT, picbreeder (based on CPPNs). He knows that objective is a fleeting concept.

Also, one has to know that even-though evolution has a broad concept modularity (neocortical columns) and regularity (a lot of them), using generative encoding (read Jeff clone papers) to create one is not going to be an easy task. Because evolution doesn’t move on an straight line.

For example, take c. elegans, a simple organism, early early in evolution. These organisms don’t have synapses (they mostly have gap junctions). They only have around 300 neurons, yet, it’s going to be impossible to create a generative encoding of their genome, given the extraordinary complexity of their neuronal systems (they have about 100 morphologically different neuronal types). In another words, out of three neuron is morphologically different. The reason for that is that the genome doesn’t simply code for a neuron, it codes for components of neurons. Those components are thousands of different proteins, that interact in a thousand different way. Just an over expression of one protein (let’s say, a membrane pump) can change the membrane potential of a neuron. A microtubule over-expression, may make a neuron elongated, and so on and so forth. An understanding of the complexity of biology and lack of a objective model (loss function - if you are coming from deep learning) makes it hard to even attempt evolving such computationally expensive brains (one also need to evolve a body with it).

Even if we get doubling of computations, we may still need to wait 50 or more years to get to a place where we can get any meaningful data from evolutionary biology. One other problem with that, is also, any evolutionary algorithm may require a perfect simulation world to grow in (that can be solved faster than computational limitations but still we’re far from that day).

I think one should not only have a bottom-up approach toward intelligence (like Henry Markram - which btw I love the guy and his fearlessness toward brain science) but also must understand how the brain works from the top down, as Marvin Minsky, repeatedly claimed. For example, I’m a fan of Lakoff’s idea of embodiment.

Watch this talk: https://www.youtube.com/watch?v=WuUnMCq-ARQ

If we want to solve AI in our life time, and benefit from tremendous potential that comes from it. We have to have an abstract approach and not a detailed approach (well, that’s my opinion - it may be wrong). Deepmind RL, may not be very scalable (since it’s extremely computationally expensive) but it’s one of the best we have. Also, if perfected, Numenta’s approach will be a less computationally expensive approaches to AI and even-though it may be perfected later than any famous approach, it may take off faster than most of them. For example, I believe if perfected HTM algos (ex. www.sparsey.com) will also require many orders of magnitude less training data.


#4

I typed it very fast, so lots of typos. But I don’t have the patience to go back and fix them :slight_smile:


#5

Let me try to throw some numbers here:

Imagine E. Coli, one of the most known simple organisms out there. It has genome size of 5*(10^6) or around 5MB (if you save it on your disk). To clone a gene, scientist, usually grow these organisms in flask overnight to produce a large number of the gene they want (it’s easier to use E. Coli to make gene than to synthesize it), it grows, overnight to a number of around 3 * (10^9) cells per ml (just multiply this number with size of their genome to see how much hard drive you need to just simulate the genome size of an e.coli growth in overnight flask). They usually have 1.5 L flasks. They usually start with just one E. Coli. Now, if only, one of these organisms, have a mutation, it can completely out grow the other ones. Parallelism of evolution is not a joke.

Now, let’s talk about the brain.

Many times, you hear that brain has about 100 billion neurons, and these neurons, depending on where they are can have up to 30 thousand connections to other neurons. This is an astronomically big number.

But depending on neuronal code, this number may get extremely bigger. What if, neuronal code is based on the rate (Hz) of spikes.

After all, colors, pain, and many other aspects of brain, have been shown to be tuned by the rate of spikes. Now if (I hope not), rate code turns out to be important in neuronal code and cognition, and given a neuron can fire fast and slow, and maybe hundreds of different ways, you need to add many orders of magnitude to the number you see usually mentioned (in hundreds of trillions of ways).

Man, that is big number. The power of brain is hugely underestimated by a lot of people, but just imagine, how much data is coming to your brain every mili-second. Just from you retina.

Also, you have to know, that connections of this size, may create loops. Now loops and feed backs, are extremely computationally expensive. Imagine a simple code, that says, take data from node A (a hundred neuron), to node B (50 neurons), to node (A), and then again to node B, … this is similar to RNNs. But this aspect of brain, has been completely absent when calculating the computational power of brain as well. These are like for…loops in programming.

Now, if you think this is unlikely, well cliques (algebraic loops) have been shown to exist in columns of rats (https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full). You have to now that these loops can be combined with the coming stream, which can add also another multiple orders of magnitude to the complexity of brain.


#6

If one bacteria in a flask of trillions outgrows all the others and takes over that may be the result of only 1, 2 or 3 simulations mutations. And maybe it is another hour of 2 before any further positive mutations take hold. Very slow considering the number of parallel systems.
With digital evolution you generally aim for about 20% successful mutations.
Also when you gain some experience using evolutionary algorithms you learn that they are very successful at extracting usable information out of the environment even though each trial can only pull out at most one bit of information (good/bad.) Over time the information extracted really builds up. The key to success is formulating the problem in such a way that it can be solved in tiny steps at a time. In higher dimensions with real valued parameters that often is what happens, as nearly always there is some way to reduce the cost in some direction. It has been shown in some real valued systems that all local minimums are global minimums, things that look like high cost local minimums are actually saddle points at which if you spend enough time looking around you can find a cost lowering direction to move.
Biological systems face more discrete valued parameters, an atom is or isn’t there, there are only a finite number of alleles of a particular gene. The cost landscape becomes much more choppy to navigate. Which is often fortunate because otherwise we all would be dead from Ebola. Nevertheless biological systems exist in a high enough parameter space that progress is nearly always possible given enough time.


#7

“If one bacteria in a flask of trillions outgrows all the others and takes over that may be the result of only 1, 2 or 3 simulations mutations. And maybe it is another hour of 2 before any further positive mutations take hold. Very slow considering the number of parallel systems.”

seriously? do you think, mutation acts like changing a number in a vector/matrix? if that’s what you think, you’re thinking in the domain of simple evolutionary algorithms, such as hill climber optimizer. Hill Climber Optimization, is in the realm of direct encoding.

I assume you’re very new to this. If you’re interested to broaden your view on the difficulty of this subject, you may want to read this book: http://baibook.epfl.ch

“Also when you gain some experience using evolutionary algorithms you learn that they are very successful at extracting usable information out of the environment even though each trial can only pull out at most one bit of information (good/bad.) Over time the information extracted really builds up. The key to success is formulating the problem in such a way that it can be solved in tiny steps at a time. In higher dimensions with real valued parameters that often is what happens, as nearly always there is some way to reduce the cost in some direction. It has been shown in some real valued systems that all local minimums are global minimums, things that look like high cost local minimums are actually saddle points at which if you spend enough time looking around you can find a cost lowering direction to move.”

Global minimums, local minimums, and saddle points, momentum, … you’re thinking about function optimization. Saddle points may be a good news to gradient descent lovers out there, and ease their fear of never achieving global minimum, but it’s in most sense irrelevant to evolutionary algorithms. Because as far as I know, you’re going to be stuck with a fleeting objective.

You don’t want to turn an evolutionary algorithm’s search space into an optimizer. IF YOU WANT AN OPTIMIZER, go with fully developed, RL and Temporal Difference algorithms (they do much better job at this).

But hey, if you think you can contribute, by all means I wish you good luck. As the saying goes, never say never. I’m personally interested in evolutionary algorithms, and I know they can be very powerful, but I’m also aware of their limitations.


#8

@walkertalker i like also evolutionary algorithms very much, but we have to define a goal function for using them. It may be a reward of RL. For scalar encoder, we successfully use the mutation for creating randomly distributed scalar encoding. But I think more about the suitability of artificial immunology because the dense and shape of cells, here T-cells can be used for detecting new input patterns or anomaly.


#9

The reward for RL can be mere survival. I mean alpha zero only starts with a rule book.
OpenAI have done some work on using evolution for RL too.

There are no local minimums in sufficiently large deep neural networks because even if you are blocked from descending the energy landscape in 99.9% of basis directions at say a saddle point there are still 0.1% directions where you can proceed. And once you move you are blocked in other different ways, and free to move in a small number of others.
Surya Ganguli the the main person with insight into the matter:


Basically problems require less sophisticated optimization algorithms as you increase the number of dimensions but take a lot of time - obviously. Also the finer level of quantitization possible in digital systems compared to biological ones makes them easier to optimize. There probably is some relation between quantitization and the number of dimensions where at some critical point there are no more trapping high energy local minimum in the system. Similar to percolation theory:

It could also explain how complex large animals can exist despite their viral and bacterial foes having a shorter generation time and other advantages. How to you explain the existence of higher animals if they do not have gain some advantage through having a larger biological state space to explore through evolution? And in some sense out evolve microbes.


#10

This suggests that our “evolutionary” algorithms are somewhat lacking.


#11

This algorithm is open ended in lower dimensions (say 1 to about 100):
https://forum.processing.org/two/discussion/23422/a-simple-way-to-optimize

If you have 2 or more individuals using it and interacting in some environment they can continue adapting to each other forever.

In higher dimensions there are some extra things you can do. The algorithm works very well on most problems but I found an Fourier transform problem recently where it didn’t.