Let's Watch the Marcus / Bengio AI Debate

Prereading for the debate:

1 Like

Marcus proposes that connectionist networks can only represent some subset of the training set used to build the network. This position simplifies to “I can only recall a remembered state.”

In a post response to the Numenta December 4 research meeting I outline a general method of building an object representation based on a collection of cortical maps or regions.

As this was already a very concentrated blast of material that would take perhaps 10 times as much supporting background material to provide a digestible path to take in the concept I stopped with what I had posted and hoped that someone would pick up on the concepts if there was any interest.

Alas - nada.

Reading the Marcus position drives home that this high-level view of cortical representation is not a mainstream view and that without this - the constructionist model is as impoverished as Marcus claims.

He does not think big enough. I see multiple object representation as I have outlined in a rough sketch - in particular - two in common use in the brain. The objects are composed of short fiber tracts, and the resulting objects are joined by the long fiber tracts.

The pair of high-dimensional representations most important to human speech are on either end of the Arcuate fasciculus, which joins the high-dimensional grammar (Broca’s area) to the core of high-dimensional object representation. (Wernicke’s Area) This allow a modular construction of object particles to populate the structure of language templates.

When this is combined with the serial stream of consciousness as described in this post, you have a speech production system that is vastly more complicated than most recurrent networks but at it’s core, works on the same basic concept.

As you may have noticed, you may not know how a sentence will end when you start it. You have some object(s) and relationship(s) you want to represent and you fire up some sentence form to encapsulate that object. Speech production is initiated and you perceive the speech as it produced. As the sentence rolls on the relationship part of the object lurking in the object store in the temporal lobe is selected and this goes back through the Arcuate fasciculus to prime the next part of the sentence production. This serial process is interactive between these two object stores, (grammar and object/relationship) each influencing the other to work cooperatively to form the sentence.

I have just described the production of external speech but in the process that we call thinking this process is retained internally and allows modular manipulation of the stored object fragments to form novel object relationships between stored object fragments. You can perceive this internal speech as an experience and both store and recall this as if it was perceived from an external source.

An important feature of this system is the novel recombination of sub-features and related generalization.

This is not the symbolic relationship that Marcus describes. It is the functioning of a properly configured connectionist system.

Of the prereading provided, this paper from Bengio comes the closest to what I am proposing here:
The Consciousness Prior - Yoshua Bengio


I would definitely like to know more about your ideas on how to reconcile the connectionist and symbolic AI approaches.

However, I was a bit lost in your explanations about how a mental manifold composed of representations of features & locations could exist / work. Could you expose your idea with the example of the coffee cup that Jeff often uses? I think it will help to understand your more complex example of langage (that involves serial conscious processing).

Are you using the term “constructionist” on purpose? Or was it supposed to be “connectionist”?

Even if your ideas are not easy to digest, it seems to me that you are suggesting that adding symbol manipulation to deep learning networks sounds like adding serial consciousness abilities to massively parallel unconscious abilities.

If yes, I have a similar intuition on my side (probably biased by reading too many of yours posts :wink: ), but it is still very fuzzy in my mind and I am struggling to formalize it.

I haven’t read this paper yet. Thanks for the link!

It supposed to be “connectionist” but I am intrigued that constructionist does apply even though it was not intentional.

Today is a snow day here in Minnesota, I will be shoveling out two properties after work tonight so I will not be able to “build a cup” using this system until tomorrow at the earliest.

And yes, you have hit in the core of my proposal: adding symbol manipulation to deep learning networks sounds like adding serial consciousness abilities to massively parallel unconscious abilities.

The global workspace is an encompassing framework but it does not directly address the contents and evolution of contents of the connected maps; I am making an attempt to fill in this missing part.

The brain is made of many massively interconnected maps or areas. I propose that there is a general overall organizing principle to both the contents and the evolution of these contents over time. This general system picks up at a higher level than HTM but incorporates the basic mechanism of the thousand brain or hex-grid systems at the lower level and the related cortical column computation at an even lower level.


Definitely watch this:


TIME UPDATE! They moved it up 30 minutes to 3:30 PST.

How to Vote in Twitch Chat

!vote <points> <name> <contest>
  • points: Everyone gets 10 points to award. Followers of my channel get 100 points (so follow me dammit)
  • names: yann, gary, and yoshua (names will change between debates)
  • contests: delivery, technical, science, & rebuttal


If you see Yoshua Bengio give an excellent technical rebuttal, you might award him points like this:

!vote 5 yoshua rebuttal
!vote 5 yoshua technical

This would give Yoshua a total of 10 points, split between two categories. If you run out of points, get more by following my twitch channel.

If you decide to re-tally your points, use !vote clear to clear your points and start over. You can do this as many times as you wish during the debate.

The points awarded within contests are tallied into an overall debate score. You’ll see all this on the screen when I get started. I hope to see a lot of you there with me!


If something happens during the debate and we’d like to take some time to discuss it, should we just make a note to discuss later? Or pause the debate and have a quick discussion?

  • Pause debate and discuss
  • Never pause live debate

0 voters


I am live now with the Pre Show debate.


1 Like

I’ve watched this, and Bengio sounded like he’s running out if ideas. I know it’s not quite helpful of a comment but even myself a noob in NS, wth is system 1!

World changing ideas don’t pop out on a schedule like Star Wars movies. The fact that someone has them at all is utterly wonderful.


Sounded like sexier classical AI on top of DL.

More and more epicycles. Still, geocentrism was utterly wrong.

Marcus spent a lot of time attacking a cartoon version of older connectionist work; dwelling on his early research work in comparison the then current (1986) PDP books - living in the past much?

Stray thought - does every one of his classes have to sit through his epic battle with early connectionist theories 30 years ago?

It warms my heart to see Bengio mention some concepts near and dear to me - Boltzmann machines, gating, global workspace, and large scale organization of sub-networks. I wish he would have spent some more time on these concepts. He hinted at how connectionist techniques do some of the same things as symbolic AI but did not make the case in a clear way. Since his slide deck is the same one he has used in earlier presentations it would be nice if he worked up an elevator pitch for this “way forward” concept and added it to this presentation.


So, the former are objects, and the latter are more like groups, systems, or concepts?
Do you see any low-level differences in the way inputs from these two types of tracts are processed? My guess is that objects are composed primarily through lateral interactions: some form of connectivity clustering, by gradient from lateral inhibition in grids. And relatively non-local concepts would be composed primarily through vertical or centroid clustering: more direct Hebbian learning?

1 Like

The dividing line between representation has been the subject of interest for me for years.
In the visual system you can see a progression from Gabor in V1 to some degree of abstraction in V2 on up to texture in V4.

Mapping with receptive field properties only goes so far as every map seems to multi-task.
There seems to be the same sort of progression in the auditory and somato-sensory cortex.

On the motor side they have worked backwards about two layers.
So for about 50 maps per side I can account for perhaps 10 of them. The areas between seem to be so abstract that I have no real frame of reference to describe what they are doing. We get some hints from the EC/HC areas but other than some intriguing connections to perceived spatial relationships- nobody really has a handle on what is going on there either.

Trying to force an arbitrary grouping that matches up to sensory based grouping feels wrong to me. It’s kind of hard to express this but I will try - in the real world we break thing down with integer grouping and relationships. I think the distribution between map level is more of a log function. Since this is different than how the world works it is basically incomprehensible. We have no vocabulary or familiarity with how that kind of representation works - as such - the relationships don’t make any sense.

I have pointed to this paper before but it is very relevant to this post, it describes the “where” of some of the processing, and the general semantic contents, but not detailed contents of those maps.


Yoshua Bengio apparently also sometimes allows his reserach meetings to be published. Here is a clip from a brainstorming session with some of his students at the Montreal Institute for Learning Algorithms. (I skipped the first uninteresting minute). It’s somewhat dated though. It from about three years ago. Still, interesting.


I’d be so curious to hear their discussion on how all these learning challenges are addressed by HTM theory

1 Like