Introducing: GPT-4-powered, scientific summaries - Consensus - Evidence-Based

MTIzNDU2Nzg5 · March 24, 2023, 4:57pm

Cong_Chua_toc_may · March 24, 2023, 11:59pm

5 years down the road with the current pace of advance, we will live in a different world.
2 Minute papers has this video on GPT-4.
https://youtu.be/7VSWyghVZIg

When they actually realise and fix all the foundational errors in understanding artificial neural networks (information storage in weighted sums, ReLU as a literal switch etc.) that should lead to further improvement and reduction in model size.
However they are just locked into those errors at the moment.

cezar_t · March 25, 2023, 4:30am

The next step following not showing what’s up your sleeve is to claim your trick is not a mere trick but an actual wonder.
Switching your business from research to priesthood.

neel_g · March 25, 2023, 11:52am

what “errors” are you talking about?

Cong_Chua_toc_may · March 25, 2023, 10:25pm

If you can get past some very heavy preconditioning and view ReLU as a literal switch rather than a function that opens the door to many improvements.
You can understand how to directly incorporate computationally cheap dot product algorithms like the FFT and WHT, you can change ReLU to a 2 way switch etc.
In terms of understanding the weighted sum better you can see how to reduce the effect of adversarial inputs by dealing with the case of where the input vector correlates too closely with the weight vector and understand why and how that is a problem. You can understand how a neuron and its forward connected weights in the next layer project a pattern onto the forward layer, like Plato’s shadows on a cave wall.
All this has been explained before on this forum.
However the preconditioning is so intense that it is nearly impossible for people to get past a functional view of ReLU, for example.
Also it seems that in human science, facts don’t speak for themselves, that the position in the social hierarchy of a person presenting information counts highly.

cezar_t · March 25, 2023, 10:39pm

If you change relu with a switch you say goodbye to backpropagation, and the followup question is then what do you replace it with in order to train a 1B parameter model?

neel_g · March 25, 2023, 11:44pm

What? That’s not how it works my friend.

Then you lose the differentiability and can’t backprop through. In practice, you usually use softer version of activation functions - like GeLU, SwiGLU and SoftPlus for well behaved gradients.

lmao didn’t ever expect to see Plato’s allegory of the cave used to describe NNs like that.

Adversarial attacks exist, and will continue to exist because fundamentally they compromise a brittle system. Larger models are less susceptible to adversarial attacks, but because you have access to the full weights you can compromise any system.

The brain also isn’t resistant to this. Dyanmical systems in general depend on certain assumptions being met. You can’t really theoretically prove an equilibrium point where those attacks won’t work for any network - because that’s impossible.

There isn’t any hierarchy here. Nobody even knows a scrap of information about me.

The only thing that’s valued here are facts, which seem to be lacking in your response.

Cong_Chua_toc_may · March 26, 2023, 2:40am

Lol. Anyway it’s not my job to convert anyone to any perspective or cause anyone to disgorge themselves of dogma.
A ReLU neuron projects a pattern onto the following layer through the weights the neuron is connected to in the following layer, the forward connected weights if you like. And that pattern is of intensity x when x>0 (x being the input to the ReLU). By 2 way switching I mean that a different pattern is projected when x<=0 through an alternative set of forward connected weights. With ReLU nothing would be projected when x<=0.
Backpropagation still works as far as I have tested it, though I mainly evolve weights. There was one Google engineer who said they did try that a little bit but maybe that is not true since the paper I was pointed to didn’t seem to contain the same concept or if it was they didn’t try very hard to get it to work.
One key point about 2 way switching is that information can flow freely in the net under nearly all circumstances rather than simply being felled by ReLU whenever x<0.

cezar_t · March 26, 2023, 9:04am

This can be accomplished with two relu nodes each with opposite biases (one negative and the other positive).

More parameters yet avoiding conditional processing which is very adversarial to GPU-s workflow.

Cong_Chua_toc_may · March 26, 2023, 11:35am

That doesn’t sound quite right. I think you mean having 2 ReLU “functions”, one with input x and one with input -x where x is the value of one of the weighted sums in a net.

And then each of the ReLUs forward connecting to n weights in the next layer. Where n is the width of the net.
Since there are 2 ReLUs for each weighted sum in the net the total number of weights doubles except perhaps for the final layer, depending on what you do.

neel_g · March 26, 2023, 11:41am

Obviously backprop would still “work”, except the gradients you would calculate would be so sharp and noisy (along with the dying neurons problem) that the network’s overall performance would heavily suffer with scale

Cong_Chua_toc_may · March 26, 2023, 11:47am

I don’t see that at all, but then I’m no expert in backprop. When I evolve such nets I get perfect behaviour with very smooth progress to a very low loss. In contrast with ReLU the loss landscape is much rougher with progress to only a lowish loss.
As you would expect from information loss considerations due to a single ReLU blocking entirely half the time.

cezar_t · March 26, 2023, 12:42pm

Well, yeah kind of. Instead of input w and -w they can have opposite biases.

The number of parameters increase theoretically twice but in practice the model might not need to learn (or evolve) a negative(inhibiting) node for every possible positive node.

cezar_t · March 26, 2023, 12:46pm

That would be interesting to see if you can share it. Obviously the elephant in the room with evolving networks (or anything outside gpu reaching) is how big your elephant can be

Cong_Chua_toc_may · March 27, 2023, 12:39pm

Opposite biases would allow gaps (both ReLUs inactive) or overlaps (both ReLUs active) with would make the loss (cost) landscape rather rough again.
The switching at zero is kind of important for good results.
I’ll provide some very simple code in a day or two. There is other code but it involves dot product shortcuts such as the FFT or WHT.

Cong_Chua_toc_may · March 28, 2023, 11:21pm

Ok, here’s your code. Fortunately it wasn’t too time consuming.
Since it is online I will fill in some comments explaining it during the day as is convenient. You can edit the code since you get a local copy.
https://editor.p5js.org/congchuatocmaydangyeu7/sketches/iZwTnULQ2

Cong_Chua_toc_may · March 31, 2023, 4:28am

I did a blog post.
https://ai462qqq.blogspot.com/2023/03/2-siding-relu-via-forward-projections.html

Topic		Replies	Views
Fast Transform neural networks trained by evolution and BP Lounge	10	1352	July 12, 2020
And there they stopped Machine Learning	1	910	March 16, 2020
Fixed Filter Bank Neural Networks Machine Learning	9	1809	September 27, 2019
Double weighting for neural networks Lounge	0	475	November 9, 2017
Single layer neural net with 2 way nonlinearity Lounge	4	804	June 24, 2016

Introducing: GPT-4-powered, scientific summaries - Consensus - Evidence-Based

Related topics