Why do members here think DL-based methods can't achieve AGI?

To experience hunger or thirst etc is a sensory input. The cortex receives it, but we don’t care how.

My point is treating the cortex as purely passive provides no explanation for a range of ‘higher’ functions, such as intention, attention, prioritisation, planning and so on. It’s plausible to think of the cortex as the ‘jockey’ with only indirect access to sensory input and motor output, but it’s the jockey that knows the difference between a race and a training run, and plans accordingly.

1 Like

I don’t know what you exactly mean by “height” of a function, but evolutionarily, most of the things you mention - intention, attention, prioritisation - must have been around long before the cortex. Primitive animals have these elementary routines. Placing them in cortex doesn’t seem right. So Bitking is likely right, probably cortex is more like an use-case catalogue, or large associative memory mapping huge number of previously encountered
(states + actions) => consequences

Well, I don’t know if an oracle that I can ask “what Y should I expect if I do X”, qualifies as passive, it’s informative nevertheless

2 Likes

What I get out of thinking of the relationship between cortex and subcortex is what functions need to be done by each.

If you think of the subcortex as an elaborate state machine then you can focus on the cortex being that large association cache.

Likewise, trying to cram “everything” into the column level blinds you from seeing what functions should properly be located at the map interaction level.

If you are going to be guided by biology I think it is important to understand the distribution of function before decoding what and how said parts are doing it.

1 Like

If you look to the chimp, you find that they have the ‘best’ (let’s delay arguing what that means) cortex after ours. All their cortex does is as you say:

and now you ask "What about all the other stuff? Well, all that is left is language and consciousness (introspection, ToM, mental time travel, etc., etc., etc.).

I’ve been watching this conversation take place even as I’m working at my own personal and work projects around AI.

From the most practical perspective, DL will not result in AGI for two important reasons:

  1. Complex DL systems akin to GPT3 which use transformer architectures are wildly unpredictable in their training activity, even more so than other DL architectures. This means more restarts, more hyperparameter tuning, more feature engineering work, more everything in terms of time, effort, energy… energy here referring to the fact that these large models take gigawatts of electricity from concept to having a working demonstration model that can show somewhat impressive results compared to earlier work, but still a far cry from our own brain. All of that time, resource, and energy to produce a wildly unpredictable and brittle system which still lacks explainibility while inevitably being forced into a production setting, consequences be damned (despite the bias that these systems pick up).
  2. DL systems are still, at their most fundamental level, a series of fixed-weight matrices. They’re frozen snapshots of arbitrary decision boundaries which will require constant retraining (and all the inherent issues that come along with that mentioned above). Said more simply, DL systems aren’t able to actively learn, and the unpredictable nature of their training/retraining means that even if you attempt to freeze higher level weights, you’ll still end up with potentially random catastrophic forgetting in your production pipelines.

Continuing to try to scale up DL to achieve ‘AGI’ will only scale these fundamental problems… I’m not sure that TPUs or other ASICs will resolve any of this, because the architecture itself is simply flawed/wrong.

What DOES interest me, and I believe will be important to the development of AGI are attention mechanisms and ideas around that such as multi-headed attention being integrated into the ideas of a group of cranial state machines who have learned over time to value/map certain types to a given desired or detested state, along with a thalamocortical structure which has the ability to influence the transition from one state (or concurrent series of states more likely) to another, where these states would be affecting the general system action/behavior policy.

I assert that DL, in its current shape/form, even with ‘quorum of experts’ type architectures, does NOT have the ability to produce AGI.

Whatever does win, by virtue of using us and other living creatures as an example, must be more efficient with its energy usage, must have clearly delineated boundaries between responsibilities of the system, and a system of continuous integration between these competing regions (which likely all have concurrent influence on the current series of decisions at any given timestep).

3 Likes

Agreed. The problem is if you don’t have the basic needs (food, water, warmth, social factors) as basic motivations what do you replace them with? The usable factors of boredom/exploration and pseudo-social approval seem like they are not enough. Is it necessary to run a body?

Think this is a pretty interesting view. Would like to give you a response as someone who’s had one foot in a community that believes it is likely that generally intelligent artificial systems will come from scaled up DNNs, and another foot in this community (which is deeply skeptical of such a proposition).

On your first point, I think it is important to keep in mind: oftentimes, building complex and novel systems is difficult, requires an upfront investment of research and capital, and is typically fragile (cf. engineering). If you’ve ever tried getting a dynamical system or a non-backprop based network to accomplish a task, you’ll understand how challenging and fragile it is oftentimes. These both do stand in contrast to the many instances within biology of robust yet emergent complexity, but whether we like it or not, we don’t know how to build such objects, and the constraints humanity is under for engineering intelligence are vastly different from the constraints evolution was under. Evolution can afford to tune its solutions over millions of years of trial and error, but we are impatient, more liable to find clever ways to instead expend electric power (readily available) to the task.

On your second point, this depiction strikes me as odd. First of all, there is a growing popularity within DL of architectures (such as the transformer) that apply a dynamic set of weights to their inputs, in a way that depends entirely on the context of processing, timing/location, prior inputs etc. If what you were intending to gesture at is that the systems don’t use online learning, I would agree, but repsond that that kind of online learning removes any guarantees we could’ve had about runtime behavior, which is critical in a real-world deployment. So it seems like a valid trade-off to weigh in practice.

More generally, I’m very sympathetic to the points you’re making. There is great inspiration to be found in cortical structures, and I expect we will continue seeing fruits bear in the cross-talk between the biological and artificial neural network worlds. But I feel that the bottom line for an “AGI implementation”, rather than biological plausibility, or energy cost, or theorerical properties, or even interpretability, will come down to (1) how long it will take to start automating economically valuable tasks (2) the amortized hourly cost to do those tasks with it.

3 Likes

The only way people will willingly allow any AGI of any architecture to be put in a position of decision making power is if it can clearly be interpreted and explain itself, otherwise we’ll be exactly where we are right now.

Very few governments are going to allow an AGI without an explainability layer. If you can’t ask “Why did you do this?”, it simply won’t be allowed to publicly exist.

The other point about transformers and arguing that they’re deterministic in their behavior at production is that data is rarely consistent over time, models drift, concepts drift, a variable might simply change its behavior out from under you, or any number of potential evolutions within a model’s input occur. The advantage of online learning then, if it can be pulled off, is the incremental evolution of a model over time to continue fitting the data.

In a production setting, the model and deterministic behavior are only half the problem; real world inputs are noisy and non-deterministic anyway. An AGI would need to be able to deal with that in the same way we do as humans.

If an AGI popped up today, but it required a dedicated power plant to operate, then it really would get down to cost vs. value, but again, regulation and explainability would come into play, as well as the externalized costs of societal impact and destabilization of skilled markets… it wouldn’t matter how cheaply it could get work done if society collapsed due to social unrest.

All said, as someone who works with and also researches the various branches and applications of AI, DNNs are neat, useful in some cases, but still fundamentally flawed for their energy, brittleness, and extremely heavy data requirements (not to mention hyper parameter tuning and restarting, which despite overwrought promises, autoML doesn’t fully solve). @Bitking, one area I plan to research this year is online agent evolution and policy creation to see if/how a system might evolve its own state machine to navigate within a given problem space. As I’m allowed, I’ll try to share what I can here.

2 Likes

In all honesty, I would love if this were true. But I think it is not. There are algorithms today that are moving large sums of value with relatively little oversight or transparency (you can probably think of a few examples), and there is no knight in shining armor poised to stop this from continuing in scope and damage. Is this
incredibly risky? Hell yes it is!

Would it be lovely if we had some guarantee of transparency and steerability when it comes to AGI systems? Absolutely, I hope we get it. But development and deployment of advanced AI will almost certainly outpace the regulation thereof, as it has for most historic technologies.

3 Likes

Yes, I think we could reasonably expect that consciousness, the ‘inner voice’ and a few other things like rumination or certain kinds of creativity originate in the cortex, and perhaps not much else does.

So if we take the view that most of what the cortex does is passive/reactive, then we should look for a communications channel, an API/protocol if you like. There should be a means for midbrain to send a wide variety of sense input data to cortex and receive motor outputs by return.

That channel might look like SDR, or quite different. Finding that would be a very big deal.

1 Like

I disagree, although I’m not at all certain.

The cortex might generate commands (attention / behavior) for purposes of comprehension only. Those commands are sometimes constrained by decisions, but they don’t involve decisions, they just do whatever reduces ambiguity. We constantly try to comprehend the world (or our inner world), with attention and some behaviors, even without motives.

I think most attention is really about anchoring locations, not filtering information (based on L5 stuff I’ve talked about a lot). It ends up filtering info which doesn’t make sense with the spatial system, but that’s just a side effect (not identifying patterns). If the cortex is all about locations, that attention is a cortical thing.

Anchoring locations accounts for attention based on object identity, because each object has its own spatial system. Sensory info from other objects doesn’t fit that spatial system, so it’s not really recognized.

What about attention to part of the sensor? That might seem like a filter, making attention not so cortical.

I think attention isn’t really a filter. It’s easy to attend part of the retina, but what about part of the cochlea? Good luck attending one frequency while hearing a sound.

We can pay attention to a fingertip, but that’s not a filter either. It’s attention to part of the body, so it’s about egocentric location. It’s not attention to part of the sensory array, it’s to the body part, because we can’t perceive things somatotopically. Filtering that info wouldn’t work because it doesn’t exist. Try recognizing the raw sensory input on a fingertip, rather than the shape of the feature invariant to all the deformations on the skin. We can only figure out the raw sensory input by using abstract thoughts

In the case of vision, we can do retinotopic attention, but that’s because we need a line of sight spatial system. There’s no such thing as line of sight for touch or hearing, whereas in vision, two things can occupy the same line-of-sight location, blocking one. That’d be confusing without a concept of line of sight, but luckily we have some visual regions for 2d. That spatial system just happens to be retinotopic, but we might learn it with generic mechanisms rather than deriving it from which parts of the retina activate which parts of the cortical sheet.

1 Like

That doesn’t mean much, any information processor is passive without information.

1 Like

I’m not saying that cortex does not do anything. I am addressing HOW it does things.

The subcortex sets up the inputs as a conductor. In the sensory areas the thalamus acts in concert to do gain control, inter-map routing, and process the surprise into an attention signal.

In the frontal area, the subcortex initiates action selection. The attention signal drives eye pointing and in the ear, focus in the cocktail effect.

In the temporal region, the subcortex adds the reinforcement portion of learning.

In all of these cases, the cortex is doing important work but it is a sub-processor for the sub-cortical structures.

Being passive does not mean that it is not in control - just that the subcortex is the activation mechanism for the cortex to do what it does. To me, this is an important part of understanding where and how the HTM/TBT theory fits in the overall picture.

2 Likes

You have to look at chimps (Pan troglodytes) to elucidate what the cortex does ‘in general’. You could look at something lower with a cortex, but the chimp is closest to us and aside from cerebral torque, cortex size and language, they are pretty much us. Far more accessible, even today, to scientific inquiry. There was a fascinating study that was done comparing human and chimp infants intelligence. They were identical up to 1½ and then the human babies left the chimps in the dust.

2 Likes

I recall hearing about this, in addition to the fact that we found that chimps don’t have the same organs for speech that we do. I have to wonder if that physical impediment, not having the same larynx, while reaching that critical age of brain development, has a negative impact on how chimps’ brains are forming. On the other hand with primates whom we’ve taught some form of sign language or taught how to use a symbolic vocabulary board, it seems to show that they do seem to express awareness, higher level reasoning, the ability to lie, and mourning the loss of cared for relationships.

So with our primate cousins and any conclusions we’ve historically reached about them, I suspect that there’s something inherently wrong with what we’ve been doing to sample/observe them which is negatively affecting their actual potential for expressive intelligence.

Just saying all this because I think it’s something we should keep as context in mind while discussing other creatures… our tools for observing, and their tools for expressing, all have an impact which might be different if not for randomness of evolutionary constraints on biology.

1 Like

Indeed! Behavioral studies taking place in “Primate Centers” have come under criticism and the most recent studies have been careful to show that the chimp troops studied are untainted by prior human contact as well as the experiments conducted designed for minimal chimp interaction with humans.

The current hypothesis on speech is that we have cerebral torque and they don’t. There is, however, some work that looked at their larynx development as well. The whole premise that the ‘Planet of the Apes’ franchise was built around is not too far fetched.

2 Likes

FYI for those following this thread:

I have some minor quibbles with this theory as humans that have received a hemispherectomy can still speak.

1 Like

I guess I’m describing an exception to the rule. I agree that the generic-ness of cortex means a lot of things must originate conceptually from subcortical non-generic-ness.

Motives aren’t required for all behaviors. For example, TBT’s object layer could learn how to reduce ambiguity without RL. Volition must override that, but that sort of thing could be the background hum.

I’m convinced that the cortical motor output cells anchor locations to parts of objects.

Their apical dendrites reduce perceptual detection threshold, and they burst based on self-movement to correlate strongly with location. Those two things suggest they produce an anchoring signal.

For example, imagine digging around for a mug in a purse, to take a sip of coffee. Once you feel the handle, boom, you found it and know how to pick it up. You anchored locations to the mug.

If motor output isn’t an inherent part of the perceptual process, why would the cortical motor output cells anchor locations?

I think this exception might be a bridge to the mindset you advocate. Anchoring locations defines the spatial system, so it needs to anchor to the right information. The brain needs to produce its wide variety of spatial systems, so it could use subcortex for its non-generic-ness.

For example, anchoring to faces detected by subcortex could produce the fusiform face area. Getting things to actually work when implemented might require that. Even primary cortical regions have a lot to do with super complicated subcortical stuff. Like, check out S1 in this diagram:


(I think the “simplified” is hilarious, btw.)

This is an incredibly naive academic perspective. Work in any business involving the use of technology and you will see how wrong this is. Making money just cares about the organ grinder, not the organ. The financial crisis was all about a model that people knew was flawed but the music continued longer than it should have done, as just one example of many. Black Scholes is used by many in blind faith without any understanding and in a portfolio effect near on impossible to unpick it as a black box (as in one bank I worked at on a system to do just that 25 years ago with a massive portfolio).

In this instance, the cortex could not be completely passive, otherwise it means that what we think of as consciousness is just a layer (literal) of complexity like the curtain in Wizard of Oz. Maybe the logic of addiction is such an obvious conflict where the old/animal brain is still in charge and we just pretend it is not thinking our (perceived conscious) cortex is far too clever to get addicted to something we “know” and understand is bad for us.

Cocktail effect aside, if you have tinnitus you can filter out the frequencies and reduce them one by one, which is what I do to a degree as I can’t reduce the noise completely. They are high pitched white noise with certain frequencies. There is also the case of the guy who could identify many separate notes played on a piano all at one time.

What about bats or human echolocation ? All senses are the same in relation to the cortex, they are just an activation stream (blended/mixed), why think each is really any different ? Does the degree of perceived vectors really change the processing ?

Do you consciously ride a bike or is this a non cortex dominated activity (re the video of the cat walking with no cortex) ? Does this motion process just get a directional queue as to where to go or rather just a “delta signal from the cortex”. The motor aim of the mug is the animal desire for food (primary instruction guidance/activation origin) and the cortex just “alters” the motion rather than controlling it as such. Otherwise you end up with an activation issue of many parallel activations cross polluting if you think it’s all cortex sequence based. This is where I think some aspect of the TBT may be a bit wrong.

1 Like

Let’s not confuse any of hemispherectomy, callosectomy, decortication and decerebration with a brain that has the ability to learn speech (only humans) and has learned speech (again, only humans).