Why do members here think DL-based methods can't achieve AGI?

Alright, so a major point has been the lack of hierarchical structure in DL models. As I interpret, the “hierarchical” part HTM presents, the cortical columns align in something like this:-

    [#] [#]
    |     |
  [#] [#] [#]
  |   |  |  |
[#] [#] [#] [#]  

to draw something roughly - each column’s predictions being fed into a higher-up column that somehow takes this information as a prior and generates more complicated predictions…?

Is this the correct picture at least of the basic structural arrangement of cortical columns? Apologies as I don’t understand the complicated papers at all - so a simple explanation of HTM distilled down to a few core concepts would be very enlightening!

Incredible inference, this is not even wrong. Can you please explain your thought process?

I am finding this somewhat profound. Is this an original idea or did you see it discussed this way somewhare? I think you are absolutely correct, BTW.

2 Likes

After monitoring various discussions on the forum and matching that up to what I knew about cortex and HTM as an implementation of column processing I realized that everything the cortex does is driven by external influences.
After mulling this over and looking at it from several viewpoints I could not see any case where the cortex was initiating action.
For me, this adds emphasis to understanding the role of subcortical structures in partnership with the cortex. Since this realization, I have been a gadfly in directing attention to understanding subcortical structures as a full and indispensable partner to cortical processing.

4 Likes

What about dreaming? When asleep, the motor functions are disengaged (possibly only during REM, don’t remember exactly) but the brain hallucinates–something has to set that off.

Spindle waves / fast ripple waves, from subcortical structures, forcing recall.

1 Like

Yes, a passive cortex covers most of what it does: tracking sensory input, spotting anomalies, formulating responses. It doesn’t cover attention, intention or goals. The brain may passively become aware ‘I’m hungry’, but having formed the intention of ‘eat pizza’, it will initiate and then focus on a complex series of actions to satisfy that goal.

Rats and crows can be observed to do something very similar.

2 Likes

Subcortical structures monitor blood glucose and water balance to add intentionally and initiate action.

The cortex does sense anomaly but this is registered in the thalamus as a shift from tonic to bursting mode. There are connections to other subcortical structures to initiate action that we normally think of as attention.

Your eyes are subconsciously driven to scan objects in stereotypical patterns that are then registered in cortex. This is related to the attention mechanics.

To experience hunger or thirst etc is a sensory input. The cortex receives it, but we don’t care how.

My point is treating the cortex as purely passive provides no explanation for a range of ‘higher’ functions, such as intention, attention, prioritisation, planning and so on. It’s plausible to think of the cortex as the ‘jockey’ with only indirect access to sensory input and motor output, but it’s the jockey that knows the difference between a race and a training run, and plans accordingly.

1 Like

I don’t know what you exactly mean by “height” of a function, but evolutionarily, most of the things you mention - intention, attention, prioritisation - must have been around long before the cortex. Primitive animals have these elementary routines. Placing them in cortex doesn’t seem right. So Bitking is likely right, probably cortex is more like an use-case catalogue, or large associative memory mapping huge number of previously encountered
(states + actions) => consequences

Well, I don’t know if an oracle that I can ask “what Y should I expect if I do X”, qualifies as passive, it’s informative nevertheless

2 Likes

What I get out of thinking of the relationship between cortex and subcortex is what functions need to be done by each.

If you think of the subcortex as an elaborate state machine then you can focus on the cortex being that large association cache.

Likewise, trying to cram “everything” into the column level blinds you from seeing what functions should properly be located at the map interaction level.

If you are going to be guided by biology I think it is important to understand the distribution of function before decoding what and how said parts are doing it.

1 Like

If you look to the chimp, you find that they have the ‘best’ (let’s delay arguing what that means) cortex after ours. All their cortex does is as you say:

and now you ask "What about all the other stuff? Well, all that is left is language and consciousness (introspection, ToM, mental time travel, etc., etc., etc.).

I’ve been watching this conversation take place even as I’m working at my own personal and work projects around AI.

From the most practical perspective, DL will not result in AGI for two important reasons:

  1. Complex DL systems akin to GPT3 which use transformer architectures are wildly unpredictable in their training activity, even more so than other DL architectures. This means more restarts, more hyperparameter tuning, more feature engineering work, more everything in terms of time, effort, energy… energy here referring to the fact that these large models take gigawatts of electricity from concept to having a working demonstration model that can show somewhat impressive results compared to earlier work, but still a far cry from our own brain. All of that time, resource, and energy to produce a wildly unpredictable and brittle system which still lacks explainibility while inevitably being forced into a production setting, consequences be damned (despite the bias that these systems pick up).
  2. DL systems are still, at their most fundamental level, a series of fixed-weight matrices. They’re frozen snapshots of arbitrary decision boundaries which will require constant retraining (and all the inherent issues that come along with that mentioned above). Said more simply, DL systems aren’t able to actively learn, and the unpredictable nature of their training/retraining means that even if you attempt to freeze higher level weights, you’ll still end up with potentially random catastrophic forgetting in your production pipelines.

Continuing to try to scale up DL to achieve ‘AGI’ will only scale these fundamental problems… I’m not sure that TPUs or other ASICs will resolve any of this, because the architecture itself is simply flawed/wrong.

What DOES interest me, and I believe will be important to the development of AGI are attention mechanisms and ideas around that such as multi-headed attention being integrated into the ideas of a group of cranial state machines who have learned over time to value/map certain types to a given desired or detested state, along with a thalamocortical structure which has the ability to influence the transition from one state (or concurrent series of states more likely) to another, where these states would be affecting the general system action/behavior policy.

I assert that DL, in its current shape/form, even with ‘quorum of experts’ type architectures, does NOT have the ability to produce AGI.

Whatever does win, by virtue of using us and other living creatures as an example, must be more efficient with its energy usage, must have clearly delineated boundaries between responsibilities of the system, and a system of continuous integration between these competing regions (which likely all have concurrent influence on the current series of decisions at any given timestep).

3 Likes

Agreed. The problem is if you don’t have the basic needs (food, water, warmth, social factors) as basic motivations what do you replace them with? The usable factors of boredom/exploration and pseudo-social approval seem like they are not enough. Is it necessary to run a body?

Think this is a pretty interesting view. Would like to give you a response as someone who’s had one foot in a community that believes it is likely that generally intelligent artificial systems will come from scaled up DNNs, and another foot in this community (which is deeply skeptical of such a proposition).

On your first point, I think it is important to keep in mind: oftentimes, building complex and novel systems is difficult, requires an upfront investment of research and capital, and is typically fragile (cf. engineering). If you’ve ever tried getting a dynamical system or a non-backprop based network to accomplish a task, you’ll understand how challenging and fragile it is oftentimes. These both do stand in contrast to the many instances within biology of robust yet emergent complexity, but whether we like it or not, we don’t know how to build such objects, and the constraints humanity is under for engineering intelligence are vastly different from the constraints evolution was under. Evolution can afford to tune its solutions over millions of years of trial and error, but we are impatient, more liable to find clever ways to instead expend electric power (readily available) to the task.

On your second point, this depiction strikes me as odd. First of all, there is a growing popularity within DL of architectures (such as the transformer) that apply a dynamic set of weights to their inputs, in a way that depends entirely on the context of processing, timing/location, prior inputs etc. If what you were intending to gesture at is that the systems don’t use online learning, I would agree, but repsond that that kind of online learning removes any guarantees we could’ve had about runtime behavior, which is critical in a real-world deployment. So it seems like a valid trade-off to weigh in practice.

More generally, I’m very sympathetic to the points you’re making. There is great inspiration to be found in cortical structures, and I expect we will continue seeing fruits bear in the cross-talk between the biological and artificial neural network worlds. But I feel that the bottom line for an “AGI implementation”, rather than biological plausibility, or energy cost, or theorerical properties, or even interpretability, will come down to (1) how long it will take to start automating economically valuable tasks (2) the amortized hourly cost to do those tasks with it.

3 Likes

The only way people will willingly allow any AGI of any architecture to be put in a position of decision making power is if it can clearly be interpreted and explain itself, otherwise we’ll be exactly where we are right now.

Very few governments are going to allow an AGI without an explainability layer. If you can’t ask “Why did you do this?”, it simply won’t be allowed to publicly exist.

The other point about transformers and arguing that they’re deterministic in their behavior at production is that data is rarely consistent over time, models drift, concepts drift, a variable might simply change its behavior out from under you, or any number of potential evolutions within a model’s input occur. The advantage of online learning then, if it can be pulled off, is the incremental evolution of a model over time to continue fitting the data.

In a production setting, the model and deterministic behavior are only half the problem; real world inputs are noisy and non-deterministic anyway. An AGI would need to be able to deal with that in the same way we do as humans.

If an AGI popped up today, but it required a dedicated power plant to operate, then it really would get down to cost vs. value, but again, regulation and explainability would come into play, as well as the externalized costs of societal impact and destabilization of skilled markets… it wouldn’t matter how cheaply it could get work done if society collapsed due to social unrest.

All said, as someone who works with and also researches the various branches and applications of AI, DNNs are neat, useful in some cases, but still fundamentally flawed for their energy, brittleness, and extremely heavy data requirements (not to mention hyper parameter tuning and restarting, which despite overwrought promises, autoML doesn’t fully solve). @Bitking, one area I plan to research this year is online agent evolution and policy creation to see if/how a system might evolve its own state machine to navigate within a given problem space. As I’m allowed, I’ll try to share what I can here.

2 Likes

In all honesty, I would love if this were true. But I think it is not. There are algorithms today that are moving large sums of value with relatively little oversight or transparency (you can probably think of a few examples), and there is no knight in shining armor poised to stop this from continuing in scope and damage. Is this
incredibly risky? Hell yes it is!

Would it be lovely if we had some guarantee of transparency and steerability when it comes to AGI systems? Absolutely, I hope we get it. But development and deployment of advanced AI will almost certainly outpace the regulation thereof, as it has for most historic technologies.

3 Likes

Yes, I think we could reasonably expect that consciousness, the ‘inner voice’ and a few other things like rumination or certain kinds of creativity originate in the cortex, and perhaps not much else does.

So if we take the view that most of what the cortex does is passive/reactive, then we should look for a communications channel, an API/protocol if you like. There should be a means for midbrain to send a wide variety of sense input data to cortex and receive motor outputs by return.

That channel might look like SDR, or quite different. Finding that would be a very big deal.

1 Like

I disagree, although I’m not at all certain.

The cortex might generate commands (attention / behavior) for purposes of comprehension only. Those commands are sometimes constrained by decisions, but they don’t involve decisions, they just do whatever reduces ambiguity. We constantly try to comprehend the world (or our inner world), with attention and some behaviors, even without motives.

I think most attention is really about anchoring locations, not filtering information (based on L5 stuff I’ve talked about a lot). It ends up filtering info which doesn’t make sense with the spatial system, but that’s just a side effect (not identifying patterns). If the cortex is all about locations, that attention is a cortical thing.

Anchoring locations accounts for attention based on object identity, because each object has its own spatial system. Sensory info from other objects doesn’t fit that spatial system, so it’s not really recognized.

What about attention to part of the sensor? That might seem like a filter, making attention not so cortical.

I think attention isn’t really a filter. It’s easy to attend part of the retina, but what about part of the cochlea? Good luck attending one frequency while hearing a sound.

We can pay attention to a fingertip, but that’s not a filter either. It’s attention to part of the body, so it’s about egocentric location. It’s not attention to part of the sensory array, it’s to the body part, because we can’t perceive things somatotopically. Filtering that info wouldn’t work because it doesn’t exist. Try recognizing the raw sensory input on a fingertip, rather than the shape of the feature invariant to all the deformations on the skin. We can only figure out the raw sensory input by using abstract thoughts

In the case of vision, we can do retinotopic attention, but that’s because we need a line of sight spatial system. There’s no such thing as line of sight for touch or hearing, whereas in vision, two things can occupy the same line-of-sight location, blocking one. That’d be confusing without a concept of line of sight, but luckily we have some visual regions for 2d. That spatial system just happens to be retinotopic, but we might learn it with generic mechanisms rather than deriving it from which parts of the retina activate which parts of the cortical sheet.

1 Like

That doesn’t mean much, any information processor is passive without information.

1 Like