Why do members here think DL-based methods can't achieve AGI?

I suppose we have different ballparks of what a complex conversation implies. IMHO models like LaMDA are usually decent enough to hold what I consider pretty complicated conversations. For some reason, everyone seems to hold GPT3 to be king of hill ignoring other models that exist :thinking:

Says who? I am not particularly holding a position here, but I would need actual citations for some of the energized arguments you’ve made. The aim is not to convince but to understand things in an unbiased lens.

I don’t really get the attitude of most people here. Is this some sort of a war, between the DL community and Numenta, the latter being the protagonist, the underdog who has persevered through difficulties and would free people of the dark forces of DL that has usurped the minds of scientists? :joy:

It is not particularly towards @MaxLee but a general assessment of the responses I read here. There seems to be a certain antagonism for other methods (which I repeat, is not held by everyone - some frequent posters especially stand out in their openness to new ideas).

There isn’t any major disagreement about the achievements of Numenta, merely that they have still failed to overturn any LeaderBoards or provide shining scores. Some papers have displayed pretty decent results against baselines which I am very happy to see, but the point that drags it down is that Numenta was founded about 20 years ago - and the rate at which DL has produced results is unmatched.

There is a certain hope that Numenta’s work was always a long-term investment, but really I suppose the GOFAI vibes ensures that results take priority over almost everything. It is perhaps not the most effective way to reach AGI, but it is a tried & tested methodology that has empowered other scientific fields for decades now.

1 Like

I dunno… if Geoffry Hinton, a guy who’s lived in this most of his life, thinks that backprop is flawed, I’ll weight his opinion on the matter highly. As for why others who’ve seen it don’t get it yet, well… most humans love their niche. Our brains, biologically, want to use the least amount of calories to accomplish their goal (staying alive, making/keeping relationships, earning money, entertainment, etc.). They may be brilliant experts in their fields, and there’s nothing wrong with that. But, we also have oddballs who aren’t satisfied and want to know everything about everything, even those seemingly impractical things that don’t serve any immediate purposes. Lucky for us, those folks exist and keep concatenating knowledge and experience to synthesize new ideas in art, mathematics, science, technology, etc., and keep us moving forward in our understanding of the universe. I’d politely say Hawkins is one of those oddballs. :slight_smile:

Without them saying specifically how much of their process required hyperparemter optimization, how many restarts (false starts, failed convergences, random experimentation), and other general activities that take place when trying to train a DL model, especially a massive transformer model, I feel fairly confident that these estimates are definitely the lower-bound of energy requirements. No way they simply had a straight shot and taught the model without issues.

https://www.reddit.com/r/MachineLearning/comments/htxjoq/d_gpt3_175b_energy_usage_estimate/

I wouldn’t describe it as “antagonism” per-se, but more a cleared-eyed acceptance or understanding of what these approaches can or can’t do, with a mild disgust towards the over-hyping of DL by profiteering self promoters whose only goal frequently seems to be fast unicorn funding raises before skipping out leaving others holding the bag. I have a love for bayesian, markov, spiking, tree-based, boosting, genetic, symbolic, etc… and the whole family of algorithms under the umbrella of AI. But this topic is specifically asking why it’s believed that DL won’t result in AGI, and I’ve put forth my thoughts, quite loudly. :slight_smile:

Speaking of DL and Hinton… they both just languished for ~20-30 years before data and hardware caught up. Seeing what HTM has been able to do with FPGAs (limited in gates and overall clock speed) already quite impresses me, as have their early experiments in applying sparsity to deep learning. Even if Numenta were to evaporate today, those seeds aren’t lost with them. I’ve a suspicion we’re about to see them bloom regardless, and taking inspiration and modelling from the brain was the source of it all.

1 Like

Here’s something that I think highlights a key tension in this discussion:

When I mention to folks in the DL world, what Numenta is working on, the overwhelming response is “But does it work? *Show me the results!” The field is deeply pragmatic, and willing to go along with approaches even if they are not elegant or efficient, simply because they actually accomplish the task.

In order to take neuro-AI work as anything more than idle tinkering, then, a group like Numenta needs to show that the methods they develop are on a path towards performing comparably at the tasks that DL methods have made progress on (such as text generation). The experiments with sparse NNs on FPGAs are a small step in that direction, although I would caution that comparisons between methods that rely on hardware-specific optimizations (ones that simply don’t work on the actual hardware we have, i.e. GPUs) and ones that are generally applicable are looked on with great skepticism.

That being said, I look forward to the day that non-backprop NN approaches start working as well as backprop NNs do today. That will be a grand occasion.

1 Like

that still doesn’t guarantee that future methods would have similar energy requirements. I agree with the points raised in the Reddit thread - ultimately, I believe any expense would have been worth it.

I hope so too :slight_smile:

I suppose they sharded considerably for hyperparameter-optimization and test runs anyways, so the usage wouldn’t be on-par the final 175B run.
Not that it matters much anyways…

Neuroscience is a long-term investment.


Possible-Probable, my black hen.
She lays eggs in the relative when.
She doesn’t lay eggs in the positive now
because she simply can’t postulate how.


If you define AGI as something that can replace a human in an industrial/factory setting, then GOFAI seems like a good approach. But if you define AGI as something that can act like a human, then you should study neuroscience.

3 Likes

Indeed. There is no guarantee, however - it could still be just a failed investment. I fail to share any optimism about anything without strong and accurate results.

what.

@MaxLee A prime example of what I mentioned earlier.

The only model we have for something that acts human, well, are humans. And all our behaviors, unless you assume we’re some magical spirit antenna, come from our brains. So yeah, study neuroscience (or read and build up an awareness of the physical phenomena that happens within our biological neurons)… otherwise you’re building a system based on a whole host of potentially flawed assumptions.

I think that’s safe enough to say with a level of surety.

3 Likes

I think it’s safe to say that this thread was a waste of time for most involved - clearly, the cross-domain specialists I was looking forward to conversing with aren’t as open-minded as I thought.

I suppose everyone has a different mindset and opinions to what they think would be the optimal path to AGI; but I would caution against sticking to a single route and recognizing ideas in their true light - otherwise one may end up pouring their entire lives on something that turned out to incorrect.

As a reminder, GOFAI communities still exist with the strong enthusiasm they’re going to discover AGI tomorrow (I really wish I was joking) - clearly their optimism has weathered the decades and decline of such methods. Whether they have been wrong or right is never a simple question, but the majority hold out irrational ideas about how implementing this idea based on how they ‘feel’ would suddenly create consciousness.

Simply put, I really don’t hope that Numenta’s work ends in another hole like this. I would advise anyone who’s reading this to keep an open mind about all the roads and learn as much as they can from all of the paths.

This is a pretty accurate description of the DL community - precisely this approach of focusing towards strong results that have managed to achieve so much in these short decades.
One really wishes to jump in and trust the process, but if anything history of AI has taught us is that those methods usually end up too ambitious and near-sighted to actually deliver in the long term.

Thanks to everyone for participating and taking the time out to share their thoughts :slight_smile:
Hope everyone here has great success with their endeavours!

2 Likes

This is true of both deep learning as well as neuroscience, but the two fields of study measure their results in very different ways. Scientists measure their productivity by the number of true facts that they discover. Measured by this standard: Numenta and the the whole field of neuroscience have achieved a lot in the past 20 years.

I know and that’s just sad. I too spent years in college studying GOFAI and deep learning and thinking like this.

But regardless, good luck with your studies!

This thread is titled “Why do members here think DL-based methods can’t achieve AGI?”

Naturally, that will prompt responses and opinions as to why we think DL-based methods can’t achieve AGI. There have been a few folks that took the time to try to respond, quite generously in some cases. To that end, this thread accomplished its surface-labeled goal.

I’m trying to emphasize this to anyone who reads this later… quite a few of us here actively use different ML/DL approaches in our work/study. We’re not anti-DL or anti-ML, but the scope of this topic was “Can it achieve AGI?”

Now, if the actual guerrilla goal of this topic was to proselytize why DL-based method might achieve AI, it’s probably been a waste of time to that end. If you feel we’re missing something, you can always create a new topic with the label of “Here’s why I think DL will achieve AGI” then lay it out there.

Personally, at work I inherited a team and project from a couple folks who were DL charlatans that were promising the impossible in order to try to get rich and have spent the past year doing everything to get us back to reality with a real, practical, working system. I also work on algorithm development, systems level programming, architecture, platform, data science, and deployment problems, so that experience all weighs on and influences my opinion.

For a real business where the intent is to make working systems while keeping risks/costs low, there IS a drive towards the pragmatic requirements of a system when it gets to production. More often than not, DL is a risky proposal in production systems. It’s expensive to train (GPU instances are often more than 4x a CPU instance), they’re unstable at arriving to their intended learning destination, explainability is always an issue, as is bias (very important when working on financial decision applications), and the simple time required to train a DL-based system… more often than not, xgboost and its variations achieve just as good of results, require far less compute to train, have higher levels of interpretability, etc… They’re cheaper while delivering “good enough” results.

What I think you might get out of even just the spatial pooler of HTM is using it as a dropout selection mechanism in your DL networks. None of us have to be religiously inclined to any approach or another. You have full freedom to pick and choose, just as many of us here do. Have fun with it and take care :slight_smile:

4 Likes

The thing I got from this thread was how little agreement/understanding there is of what constitutes AGI.

My take is that AGI has nothing to do with ‘being human’ and everything to do with science and engineering and performing intellectual tasks. Clearly there are those who don’t see things that way.

It might help if you set down your definition/concept of AGI. Or it might not.

1 Like

Hello MaxLee,

Thanks for your detailed response! I liked how you tried to give a wholesome view combining the problems in DL with the ongoing efforts in Numenta.

I have been following Numenta’s work for quite a while now (although not very keenly in the last 6-8 months), but I have never come across this idea of HTM using Attention mechanisms or anything approximating it in their work so far.

Perhaps, I may have missed it. :confused: Could you please elaborate on this point a little more?
Many thanks! :slight_smile:

2 Likes

There’s a little bit of reading between the lines required, but for a moment, pretend you train a spatial pooler on your dataset. The resulting spatial pooler itself, with its sparse representation, already represents which points are important within your dataset.

The enforced sparsity in a spatial pooler IS attention, even if it isn’t advertised as that, by forcing minicolumns in a pool to learn efficient and sparse representations for a given input. How we choose to take advantage of that is up to us though. For example, take that SDR representation and map it to the neurons within your DL architecture so that it turns neurons on/off depending on which columns are active in your spatial pooler.

Trawl the blog a bit :slight_smile:

4 Likes

I’m a bit late to the party, but here are my two cents…

There are many reasons (such as efficiency, resiliency, lack of continuous learning, flexibility, etc… ) that makes AGI unattainable for DL, but a nice one is in The Brain from Inside Out. The book, which is really fun to read, has an overwhelming number of examples of how AGI will require a complex and stable internal state that dominates in the outputs (i.e., there is an “inside” dynamical state is affected slightly by the inputs). (Assuming that our brain does some sort of AGI :slight_smile: )

I think that this invalidates DL. My understanding is that DL are mostly a “state-less” out-to-inside system (i.e., output is mostly dependent on the inputs).

4 Likes

I think that reason also applies to general object recognition.

What I find most interesting about Buzsáki’s work is the ‘search for neural syntax’ theme. A hugely important question and not trivial to answer. In the end, it will come down to metaphor.

He has an impressive knowledge in neuroscience. Certainly with very nice ideas like that (although I am little fond of some, like his interpretation on oscillations).

Why are you ‘little fond’ of? Apparently Science had no issue and neither does Brockett.

This sort of smacks of belief. Like “I believe in science.” Not talking about the journal, but those signs and memes that speak to science as religion–it is not. There’s no faith and belief going on here, you state a hypothesis and then you either prove or disprove it. If you can’t prove or disprove, then at that point you can turn it over to religion.

1 Like

I agree. I’m just saying that looks not very elegant to me. It’s not a question of beliefs, just of aesthetics :slight_smile:

1 Like

Not sure any gradient based methods would work for error assignment even if we had an explosion in chip technology to handle the trillions of calculations and at a local level. Whatever the brain is doing its not adjusting synapses like deep learning tunes weights. The gradient calculations are too expensive and slow.

4 Likes