Why do members here think DL-based methods can't achieve AGI?

Not a question - an assertion.
Without a driving input, the cortex does nothing.

I start with that when considering the common criticisms of DL and tangentially the HTM/TBT models in regards to AGI. That is the original thrust of this thread.

In most (all?) of these models, the “animal spirit” that is brought to the table by the subcortex is the secret sauce of the missing element. Volition, desire, attention, common sense. The cortex adds so much more but it is not the driver in control. Dumb boss, smart advisor.

1 Like

To maybe put a different way… Freud may have had the insight to recognise that :

Dumb boss = ID = subcortex
Smart advisor = EGO = cortex

During sleep the smart advisor teaches the dumb boss.

The smart advisor is THE fast learner

Deep learning methods at the moment are the dumb boss.

2 Likes

I guess some people like dumb bosses. I fired mine a lifetime ago.

2 Likes

So the key question IMO is how to put that in the form of a testable hypothesis.

Start with a general lack of sensory input: quiet, nothing happening, not hungry, etc. Then the experimental subject initiates some behaviour: explore, make noise, demand attention, etc. How do you determine whether such unsolicited action is triggered by cortex or sub-cortex?

This should actually be accessible experimentally. There are a variety of methods to localize activity in the skull.
One of the earliest tests of volition is the Libet experiments:

The conclusions are not a slam-dunk:

Adding viewing tools that tie neural correlates to the basic Libet experiment should serve to further refine this experiment and localize decision making.

1 Like

If you mean like, the sensory input is ordering the command, I agree except I’m unsure about the semantics. The right semantics depend on the context and implications, which I don’t know.

Perceptual detection doesn’t necessarily originate in the thalamus.

L5tt cells burst to trigger perceptual detection [1]. Detection is highly impaired if their synapses are silenced in thalamus or superior colliculus, and also impaired for striatum.

Most L5tt cells can burst upon a stimulus, with no task or tangible reward [2]. Reward could be involved, but it doesn’t make a critical difference, so it would probably work without reward.

Now, if you just talking about motor commands, that’s much more iffy. They do burst like that even if they project to certain parts of the brainstem [2], but that study didn’t check their motor output [3].

I absolutely agree in spirit. I imagine it’d still do stuff with all its sensory inputs gone, like predict noise.

The evidence for this is scattered through many papers. It would be handy if this was the center of study in a paper but alas, this is usually mentioned in passing when showering attention on the prefrontal cortex, and the ones that are all about the basal ganglia generally don’t pay much attention to what is going on in the prefrontal cortex.

Sigh.

I have dozens of papers related to this area of study but these few should give you an idea of why I think that there are loops of control passing from cortical to subcortical structures and back again. A careful reading biases the subcortical structures as being the initiator of these loops.

Human Volition - towards a neuroscience of will - Patrick Haggard

A key passage is:
In practice, preparatory brain activity may begin as early as researchers are able to look for it. For example, recent attempts to decode free choices using new algorithms suggest that neural preparation begins much earlier than was previously thought. Second, the preparatory activity of the preSMA must itself be caused. The brain’s circuits for voluntary action might consist of loops rather than linear chains that run back to an unspecified and uncaused cause (the ‘will’). Indeed, the input from the basal ganglia to the preSMA is thought to play a major part in the initiation of action. For example, patients with Parkinson’s disease, in whom the output from the basal ganglia to the preSMA is reduced, show less frequent and slower actions than healthy controls. Moreover, signals that predict a forthcoming voluntary response can be recorded some 2 s before movement onset from electrodes implanted in the basal ganglia — these signals thus precede the typical onset time of readiness potentials. The subcortical loop through the basal ganglia integrates a wide range of cortical signals to drive currently appropriate actions, whereas dopaminergic inputs from the substantia nigra to the striatum provide the possibility to modulate this drive according to patterns of reward. From this view, voluntary action is better characterized as a flexible and intelligent interaction with the animal’s current and historical context than as an uncaused initiation of action. The basal ganglia−preSMA circuit has a key role in this process.

Interactions among the medial prefrontal cortex, hippocampus and midline thalamus in emotional and cognitive processing in the rat

Neural Correlates for Apathy: Frontal-Prefrontal and Parietal Cortical- Subcortical Circuits
What happens when the subcortical command signals are not very strong?
I don’t care.

How Basal Ganglia Outputs Generate Behavior

Parallel basal ganglia circuits for voluntary and automatic behaviour to reach rewards

Goal-directed and habitual control in the basal ganglia
Note that there are multiple control pathways from the subcortex. This paper describes two control systems.
Progressive loss of the ascending dopaminergic projection in the basal ganglia is a fundamental pathological feature of Parkinson’s disease. Studies in animals and humans have identified spatially segregated functional territories in the basal ganglia for the control of goal-directed and habitual actions. In patients with Parkinson’s disease the loss of dopamine is predominantly in the posterior putamen, a region of the basal ganglia associated with the control of habitual behaviour. These patients may therefore be forced into a progressive reliance on the goal-directed mode of action control that is mediated by comparatively preserved processing in the rostromedial striatum.

Inhibitory Control of Prefrontal Cortex by the Claustrum

Why is this important? Subcortex regulates cortical activity in many ways.

More is less: a disinhibited prefrontal cortex impairs cognitive flexibility

I maintain that a big chunk of what the cortex does is explain the world in a way that makes sense to the dumb boss, and take the will of the dumb boss and elaborate that into more complex and useful behaviors.

4 Likes

I think I was wrong about the anchoring thing. It stemmed from a study measuring bursts but kinda not single spikes (calcium imaging). The location selectivity doesn’t actually require bursting.

So I no longer disagree that the cortex is passive.

2 Likes

It doesn’t matter what the original initiator is, the “control” adds influences accumulated along the way. And it modifies all prior influences / “initiators” with feedback, that’s value drift in conditioning. Go through enough loops and it’s likely to extinguish original initiators altogether.

Subcortical primitives may start as a boss, but then “instrumental” values take over. In terms of evolution, it’s all instrumental anyway :).

3 Likes

So my Chinese Room has two books? the active and passive ones.
The singled book interpretation is naively wrong: the book contains written responses for whatever messages might arrive from senses. Not even bugs are that simple.

In the dual book interpretation, the “active” book contains rules about both

  • how to update the “passive” book content based on its current content and recently arrived messages
  • how to generate output messages based on passive book’s recently updated content

Well… a kind of.

2 Likes

Yes and no.

In traditional networks the nodes are all of the same types of neurons.
If you look at networks like autoencoders, you see a “choke point” in the middle.
This is the place where the very different processing methods of the subcortex is located. To complete this proposed model I would extend the autoencoder to feed a substantial portion of the output back to the input for a partially closed loop. The senses complete the input portion, and motor drives complete the output. This configuration combines the strongest features of both types of processors.
Cortex plus subcortex

While cortex is very powerful it could end up spending forever digging around in higher dimensional space for the most trivial of problems. The “simple minded” subcortex keeps the cortex grounded and focused on the most important task at hand.

If you don’t explicitly build the subcortex into your AI you will have to emulate the functions in some other way. Something will have to prioritize multiple competing goals, establish a primary goal, task switch if necessary, seed the machine state, start the processing, monitor progress, recognize when a solution is “good enough” and initiate action when the proposed solution meets some internally defined goal. Rinse and repeat.

Evolution has settled on this arrangement of cortex/subcortex and tuned it over the long haul. Subcortex worked pretty well without cortex. It works even better with cortex. Before you reject this arrangement out of hand it may make sense to understand how it works and what it is doing.

2 Likes

Not really.
There is still the books, cortex, and the agent that runs the rules of the room, subcortex.
For some very basic input messages, the agent responds directly without bothering with the books.

1 Like

Thanks, those are quite useful. Frankly, I feel your pain. Note how most of the papers address Parkinson’s disease. The Hindawi article was particularly provocative–loved the rat fitted with an electronics package, brought the Borg to mind.

The passive/active debate with the cortex is completely irrelevant for me. A few decades ago one of my grad students got interested in ‘computing in memory’ and there was a brief flurry of excitement in that regard. That is exactly what the cortex is, a computing memory. Just fascinating.

2 Likes

I think you are talking about supervised learning, it has nothing to do with autoencoders.
As for the rest, we’ve been through this a bunch of times, so it’s probably hopeless.

Regarding the recurrent autoencoder picture you have, I think there is a deep problem with that. It’s more of a hunch.
With that middle layer or “representational embedding” if you want. I’ll call it simply “representation”. By definition It’s a set of features that can be used to reconstruct the input (within a desired accuracy)
But there is no rule stating autoencoders representation vector have to be narrow. There are autoencoders that fan out their representation.
They are not popular since in DL the larger the vectors on each layer the more computing power is required for each dot product step.
And there-s the implicit assumption that the shorter the representation vector the more it embeds “higher order features”.
Which aligns with the observation that “conscience layer” is apparently very narrow - the numbers vary studies say we can not be aware of more than a few things at the same time. This “few” varies from one to 3, 5 maybe 7 depending on who are we asking.

The problems with that model start when we want it to learn more and more things. The representation optimized for certain input domain(s), becomes too narrow. Is not sufficient to add extra points in the middle representation layer since each change affects all following decoder nodes. Then you have to retrain the whole model, and cannot reuse previously known representations. “Consciousness” (whatever that means) will not recognize new representation. And another problem is the representation vector size is nowhere near 7, 5 or 3.


That’s why my hunch is the autoencoder fans out in an extremely large, yet sparse representation space.
How sparse? Forget 5% or 1%. It squeezes the shit out of it until 1, 3 or 5 feature points remain active and THAT is what consciousness layer sees. “Oh a TV!”. “That’s Tom!”

TLDR the autoencoder’s representation layer might be as large as many columns in the brain, and all learning machine purpose is continuous sparsification up to the point the input can no more be “explained” (== acceptably reconstructed) by the few remaining active points.

1 Like

Does it actually fan out in a different temporal dimension, in a way which none of the current approaches use… after all are autoencoders really just temporal compression and dilation mechanisms ? They convert one temporal dimension into another and then back again ?

The feedback loop would be heavily “what if I do this” type information, sort of a decaying predictive vector of all possible actions.

Well, quite possible, as long as we don’t have an actual implementation to see that it works all plausible hypotheses are welcomed.
Or should be welcomed. The state of affairs nowadays is when someone reports some important improvement in a certain, narrow direction, everyone else rushes in “We made a bigger transformer. one more % over SOTA! We nailed it!”

1 Like

I disagree. Just because Google Assitant or Siri isn’t capable of holding complex conversations doesn’t mean other models (mostly DL) cannot too. They are not “brittle detection” models, but I suppose we can never really comment on what they are seeing the pace of interpretability.

To hold complex conversations, I think they’d have to understand the world. They’d need human-level general intelligence. At that point, they’ll be useful for a lot more than conversations.

2 Likes

Again, as someone who actually works in implementing solutions, the biggest detriment to me and others who are trying to apply these (very useful, though absolutely brittle) solutions in production is that so many maybe well meaning, maybe hype-jacking, and maybe profiteering people are misrepresenting everything that DL can do, the ease with which it can be accomplished, and making ill-formed blanket statements that all we need is “more data” without stopping to consider all the potentially flawed and biased base assumptions that data brings.

The data might be utter garbage, the variables completely unrelated to each other (or just happenstance correlations), mislabeled (if labeled at all), or might shift concept or use randomly throughout the dataset (where some developer kept changing their mind about what a column was supposed to be doing, its categorical vs. numerical nature, range, interval, etc.). And that’s just the data aspect of it. Then there’s the algorithms themselves that we use, which again, are just clever mathematical tricks to attempt to force a certain “shapes” or “boundaries” into the jumbled mess, where the algorithms and parameters we set, by nature of their numerical embodiment, have unintended effects on the output shape such as creating clusters and divisions between groups that really shouldn’t be there, and yet we accept it because 80% recall is “good enough” for a certain application.

A production Deep Learning system is (oversimplified) just a numerical manipulation through a set of fixed-weight matrices which feed into functions. Our ability to get these systems to train, even with “clean” data assumes that any real or actual relationship exists between the input variables. Having to update weights through the bruteforce of backpropagation, though it sometimes works, isn’t guaranteed to find a working, or even a good solution consistently. There’s a lot of randomness and non-deterministic behavior so that even with the same architecture, same shapes, same data, even same learning rate and other hyperparameters, you still might not consistently arrive at a working model which means that you’ll still spend more time, energy, electricity, all of it, just to attempt to maybe get a working model.

So it’s often fine when you can get it to work and make sure to buttress it with all the required constraints and expectations, but way too many folks and companies out there are making way too big of claims about the ability of DL to solve problems, much less lead to AGI. Often, the people who talk the loudest about it know the least about how to actually implement any of it. They’re just hucksters looking to make a profit off the hopes/dreams of the gullible and ignorant-but-well-meaning.

DL is applied calculus and it IS pretty neat when it works. But network-wide back-propagation is a terribly inefficient way to conduct learning which produces fascinating and still brittle results, and I’ll stick by that. Even those impressive massive models which have memorized troves of written data (GPT-3, for example) are still terribly brittle and temperamental beasts around which folks are working hard to place hand-written filters and limiters so that only semi-correct answers are allowed to fall out.

Simply repeating the flawed approach over and over again while scaling up to powerplant-dependent levels of electricity is not going to cut it. Instead we’ll need Numenta (and others) who are pushing the boundaries on biologically-mimicking systems with more efficient basic operations, a different approach to the math, corresponding ASICs, and a rethink of how we’re picking/choosing what connections to update and when.

Attention mechanisms help in DL, but if we take a step back, the entire HTM approach was already a multilayered, multi-headed attention mechanism long before the DL community ever considered attention heads.

4 Likes