Why do members here think DL-based methods can't achieve AGI?

Very well said - I strongly agree with such an approach, hence why I am here myself :slight_smile: for me, it doesn’t; matter whether GOFAI or Numenta gets the crown - I simply try to keep an open mind and learn everything I can to facilitate that progress myself.

Are you implying current algos can’t even make it out of a maze? :thinking:

I find the arrogance highly amusing. Its true DL took inspiration from early neuroscience - but its not really intertwined with biology in any way. Its mostly working towards UAT’s holy blessing and figuring out what mathematical system can achieve all the tasks together.

DL simply aims to apply a mathematical tool to solve AGI rather than waiting on science to catch up to develop the perfect architecture that approximates every function given enough data. It works on single datasets, even few datasets - but its not working for all. However, progress is still being made as models are able to tackle more and more task. Whether that leads to AGI is debatable.

I think that DL has been able to model datasets very well - but I think it can model reasoning too.

This is all my personal opinion, but I think reasoning is simply a world model. Anytime when we think that, “I shouldn’t drive the car since I can be involved in an accident” - I interpret it as the world model which associates a car crash with say metallic pieces injuring the body (conditioned on information/media I have heard), which in itself is not a problem - but my instincts don’t like when the prediction involves pain, thus creating a negative feedback for “driving a car”.

I think even if we can achieve very little reasoning capabilities through this naive modelling, it would itself be a testimony that perhaps we can still achieve reasoning through more well-thought out methods :slight_smile:

1 Like

This showed up in the (e)mail this morning (highlights mine):

Dear fellow Trader,
We have a new market forecast to share with you.
You might want to see this forecast before the markets open tomorrow. Over the years, our team of A.I. specialists have released some incredible insights by using machine learning to pinpoint the best opportunities for successful trades.
It goes without saying that a trader doesn’t just consistently win. They do so with careful, methodical research and the right software. Artificial intelligence is a must-have in a trader’s arsenal of tools.

Interesting example!

To me the first step in achieving any AI goal (and certainly any AGI one) is the design of the test.

I’ve been driving a while and don’t really get scared just getting in the driver’s seat – but I can get very scared very fast if say another car in next lane starts swerving – because I know it endangers me.

However if that swerving car is on the other side of the highway, separated from me by a big metal divider, I’d be very concerned but much less scared.

If that same car zooms past me at 100+MPH but not swerving, I’d be concerned and think “what an a-hole”, but wouldn’t be too scared - because that a-hole at least has control of their car.

But if the car zooming past me at 100+ was a police car I’d think “damn I wonder what he’s chasing”.

So here the AI task (as I see it) is to robustly recognize what true danger is.

As a human I have much more nuance than a simple mapping like:
swerving = life in danger or
speeder = a-hole.

Understanding what I’m seeing is elemental (the car swerving or speeding), but so is the context around it (can they actually hit me, is it a cop or just some douche, etc).

In my limited knowledge, I think of DL as being best equipped to tell an AI what its seeing, while Numenta’s cortical-style learning as helpful for understanding the context.

That’s not to say that either approach couldn’t potentially do both of course.

IDK if this robust real-time danger-recognition task is unsolved, but potentially an example of one we could achieve by joining algorithmic forces!

I started programming with 48kB of memory (and that included the video display memory) so have seen over a million fold increase in data and compute over that time, What I have really learnt is that we are great at making some data manipulation tasks hugely inefficient and overly complex.

How this applies to AI development is a tendancy to go down rabbit holes that ultimately lead nowhere but they are a glittering journey along the way to keep us distracted, helped by our brains brilliant evolutionary biasing and chemical chasing behaviour. DL will carry on for a long while as the glittering prizes are still many to be had.

How I think it will evolve will be very much like superconductivity, whereby a single stochastic step will transform the world AND ultimately it will not necessarily need someone with an IQ of 200 (self bias here as I’m waaaaay under 200, lol). The biology and huge volume of research papers out there seem to already show us 99% of the theory, it’s just the problem of swimming in the data and joining up the right dots to make sense of it all.

My perception is that the underlying base of all AI is structurally a very similar pattern (TBT style / mountcastle) and once that pattern is known it will draw in all known and new unknown sensory inputs into a system and will scale out very very easily with “current” hardware platforms let alone the new compute down the road which is seemingly going towards another million fold increase over the next 10 years.

Playing around with a text model that can learn a million words in real time (under a second) and then adjusting the model to see what and how things change can create a whole new perspective on things. This is where and how the biggest lessons occur very quickly, shortening the feedback discovery loop. This does not need scale, it’s the basic efficient pattern that matters. Keep (re)searching, someone will discover the key.

What happens after that point I don’t think anyone really has a clue as to what will really happen. Keeping an AI caged ? lol.


Exactly! I simply say that it doesn’t matter how finds the key in the end - just stating why people here denounce DL based methods which many others hold in high regard, if its yielding all interesting experiments which may just be a distraction (no one knows) but for now, works better than others?

In such a scenario, commenting which approach won’t yield any tangible results by pointing out flaws alone is like 2 middle school kids bickering about what their favourite candy is. It would be more productive if people keep an open mind and attempt to understand both sides of the coin and glimpse what they truly are tossing it for.

The only part I would disagree on is the,

Because we are already achieving the limits of Moore’s law (and quantum is years away). But the good news is that we don’t really need more compute, just to assemble all there is together in one place.

Currently, if we do achieve AGI, it can be easily run on a modern-day supercomputer anyways :man_shrugging:

1 Like

Don’t confuse single chip structure with overall system ability. Look at the direction of Bluefield-4 in terms of the way in which compute/memory and interconnects are arranged. Also IBM’s approach to unified memory in large scale system. Very different from today and take “scale” to a different level.

Cerebras is also a different direction which makes Moors Law a bit problematic as how do you define “doubling” when we jump to single 12" wafer compute with 2.6tn transistors ?

End of the day the compute will continute to scale, that’s even before the complete change in mindset needed for the quantum compute augmentation or the next evolution of FPGA type architectures.

It’s still flat plane technology. At some point, someone will figure out how to make the circuits in 3D and we are off to the races again.

Moore’s law might die, but there are other more fundamental laws than Moore’s law:

  • Wright’s law: Experience curve effects - Wikipedia
    This is the observation that when you build more of an product you get better at building it (and so overall production costs decrease).
  • Supply & Demand: increased demand for an item should drive the construction of new factories for it, which in turn will drive down the cost per unit.

Even in the absence of any real technological innovation, these market forces will continue to act.

@bitking: That’s already happening! See “VNAND”. It’s resulting in a rapid decrease in flash storage prices.

1 Like

Oh, mister, I like your arrogance even more! Are you trying to show off?

The way you use contradicting terms and contain them under the same umbrella is amazing. On top of that, it appears you didn’t even realize the nature of mathematics! Harsh days wait for you, my friend.
If you don’t realize what symbol manipulation is, what language is, what math is and when those symbols arises, you are gonna waste your life, like the professors that didn’t do due diligence.
I am gonna give you a tip, though: symbols, words and terms don’t exist in the real world. Your brain creates them IN THE END. But your brain doesn’t initially use symbols to create symbols. Your brain use mental representations for them. BUT NOWHERE it does “Math”. No symbols until it has a mental representation. None.
So if you want to do symbol manipulations to understand generalization of human brain, by all means, go first. But remember, all you do as a career is manual Neural Architecture Search.

1 Like

I think part of my obsession with biological methods (and my distain for deep learning) stems from one of the neuroscience papers that originally got me into the field:

  • Ahmad, S., and Hawkins, J. (2016). How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites
    archive link

This article describes the mathematical basis of all intelligence.

The equations in that paper are mind blowing.
Like seriously, have you seen those factorials? They’re insane!


Let’s follow this bit - what does the symbol look like in mental representation?
I have been touting the hex-grid but I am open to other concepts.

In particular, the contents of activation of a single map/region could be considered one or more symbols.
Or the co-activation of multiple maps/regions.

Don’t forget the joys of serial processing of tokens as part of a parallel system for the higher-level operations. This is distinctly different than the 100-step rule for lower-level operations and is likely to be different pathways/configurations; it often takes “some thinking” to arrive at a decision.

In light of these conceptual units, GOFAI concepts could be implemented using the cortex as a support vehicle.

1 Like

I’m not implying it, I’m saying at outright as a matter of irrefutable fact.

Of course you can write a program to solve any fully defined highly constrained task but that’s leaving out the ‘G’ in AGI. There are thousands of individual problems a rat can solve, and you can write a program for any one of them, but a single program to solve them all? We have absolutely no idea.

1 Like

The core principles of Moore’s Law expired about 10 years ago. I was buying a new computer every year from my first IBM PC (1985) until the 17 reached 3GHz (2010), but now there is no point. That path is at an end, now we need parallel.

One thing we can be sure of is that AGI (like all animal brains) will depend on massive parallelism. The MIPS rating of the brain is maybe 0.001 (per neuron) times 10^11 (in the human brain), or about 10^14 IPS. We can do that, but not the massive interconnection. Plenty of room down that path.

In all I’m fairly sympathetic with the original poster inasmuch as I believe that Numenta is doing very good work, and could be on a path to reverse engineering major algorithmic components of the neocortex. At the same time, I am anxious to see more in the direction of the implementation and validation of concepts the team develops. That could be in the form of pseudocode I could translate into concrete GPU routines, or simulations of the Numenta cortical columns learning a task, or comparisons on existing benchmarks. At least, that is the standard of evidence I’m used to from the world of DL.

What exactly are you referring to here? I know of no documented system with this capability (at least not for a notion of “learning” that would go beyond “storing in a bloom filter” etc.), so I assume this is either a hypothetical or referring to something I haven’t run into. Color me skeptical, if the latter. :wink:

1 Like

ML is all about optimization math stuff (I know little about ML, so correct me if I’m wrong). It produces a different kind of insight than brains do, so it often seems like a strange, broken intelligence. It’s just different. To deal with locations or whatnot like rats can, it’ll need very very good optimization systems, at which point it’ll be able to do other things incredibly well.

Because I think ML is different from the brain, I don’t think Numenta’s theories should compete with that. Those theories are basically neuroscience, just more algorithm-oriented than most neuroscience. That’s probably nice for ML researchers looking to draw from neuroscience.

I have been experimenting with a feed forward learning model which is based on what I think the Hippocampus is doing in terms of dealing with language and interacting with the cortex. The early stages and method of hierarchy and chain formation (see the HVC chains in this Building a state space for song learning - YouTube). I think it fit’s with resonant thoery as to how structures are learnt.

It’s a bit (very much) work in progress and does not follow any existing formal language rules so a bit catious about being the village idiot at the moment. It’s quite interesting seeing how different changes can impact the way the model evolves in real time. The method/rationale I think is closer to how the biology deals with language rather than our cortex reflections as to how we retrospectively (over) think language is structured. This is based around the rule of 7 (+/- X) and short term memory constraints that determine the structure irrespective of content (language use and variability) and may then explain why some languages have some bits in reverse order because parts of the sequence orientation do not matter in relation to how the cortex deals with them. I think the rule of 7 has a sort of second layer, which is what we class as a working memory of sorts, but it’s then mixed with activation decay (priming effects) in the cortex. But that’s a further leap from an already big conjecture around hierarchy formation from the gorund up.

The thinking is also based around the polar oposites of memory savant’s and HM.

The element of having a smaller model that can work and learn fast and give and show feedback in short timeframes changes the perception a lot. This is what and where I think research should focus on rather than trillion parameter wizard of Oz exhibitions waiting an eon between itterations to get a reply that a dog cant have 3 legs because it will fall over.

It’s only running on a CPU, no GPU involved. Sparse.

I experimented with a few billion nodes (memory chains) on a 18 node cluster at home (heats the house but not good mid summer, lol.) to research what works and does not work from a systems perspective so that I don’t end up coding something that needs a complete re-write or takes a day to load and save between experiments. Plus eliminating scalability requirements that involve remortgaging the house. Raw compute I think is just a small fraction of the problem, sparsity makes for a huge amount of random memory access at scale.

I had a scenario at a company I worked for a few years back, where I was trying to explain how electricity demand changed in a particular way at different times of the year. Figuring it would take an hour discussion and presentation I sat back and had a think. What I did was animated the 8760 points of data into a short 5 second repreating clip and then gave a short description as to what they were going to see and then watched the reactions as I played the clip. 60 seconds later, job done, lot’s of ahhhh moments. That’s how I think very fast progress can be made.

DL i think of as just a data compression mechanism that allows chains to be compressed into instincts. Current DL methods to me seems like an over sized single cotrical column at the moment, which cripples scalability.

That’s how I see it at the moment and still have a huuuuge amount to learn. This forum tends to be a paradox disaster for progress as I end up reading and thinking for hours, but a great way of some sense checking back to biological reality, lol.

If I can’t access to some representation that has the potential to refer to an organized group of sensation consciously, that means I didn’t learn make it and that means it is definitely NOT a symbol.
Grid cells learn to path integrate sensation in various levels. It is a prediction mechanism. It is not a symbol. Grids don’t refer to things, what they refer to changes. If you give a signal to an encoder population and you want its integral or its double integral etc you get grid cells (there is a deepmind paper about it). It is before an organization.
It is only a symbol if you (and we) the symbol manipulators think it is a symbol. It is not like language. Math is literally the formal language of the universe. It is pure symbols. Grid cells arent.

do you have any specific envs in mind that a rat can solve, and say something like DeepMind’s PlayerOfGames or MuZero can’t solve? And since its irrefutable, it would be also appreciated if you have evidence to prove that x algo fails at this etc. Otherwise, the only environment I can think for rats is navigating mazes which is a pretty solved problem.

That’s an interesting observation - I would have rather said that DL doesn’t approach a cortical column at all, but it still has discrete modules which increasing their quantity leads to direct increase in performance - very reminiscent of the TBT theory describing cortical columns.

I am not claiming that say attention heads are cortical columns - but rather what they seem are to be very weak approximation of a cortical column (while vanilla NNs are even worse). I would point out though, that with research we are slowly increasing this approximation through various methods - who knows what comes next? :wink:

In the meantime, I think you’d be hard-pressed to prove NNs not to be cortical columns anyways. Since they’re both doing prediction, just learning by different methods.