Yan LeCun on GI vs. current DL

I think you’re addressing something like the continuity of the similarity parameter there.

I don’t want to sweat it, anyway. The contrast I want to make is that in transformers the similarity measure is derivative. The more fundamental thing is prediction on sequences. Prediction on sequences defines what “similarity” means, for transformers.

It’s funny. Your interpretation is completely the reverse from mine. You’re saying the fundamental processes are more obvious in images. I’m saying superficial properties of images actually obscure the more fundamental processes.

I think that both language and wider sensory data will end up being structured in the same way. So I agree ultimately “meaning” will come from the world. But I think language provides a more simple place to see that process.

I think language is a more simple place to see that process, because it is something produced by the brain, for itself. So naturally it will have just those characteristics which the brain responds to, and as little as possible of anything else.

And what “learning” over those characteristics is telling us, I believe, is that characteristics the brain responds to are predictions over sequences.

Having gained that insight you can dump language if you like. We can analyse other sensory structuring processes as predictions over sequences.

I’m guessing that will explain why our visual system, for instance, uses saccades. And as I recall Jeff Hawkins pointed out, back in the day, that the proper perception of touch also requires you to move back and forth over a texture to be sensed.

IIRC that’s why Jeff conceived his direction of research to be a “temporal” model.

So I think we’re coming back to that. We’re coming back to the “temporality” of (Hierarchical) Temporal meaning. And we need to start thinking again of meaning creation from sensory perception generally as a “temporal” process.

That can be a place HTM got things right.

But you’re saying, because images have this property of visual contrast edges, that somehow this is the fundamental system, and we need to seek it elsewhere too. Even though it was kind of a dead end for a long time.

Transformers have been great. But we’ll not learn from them. We’ll just dismiss them as a niche data set, and go back to what we were doing before?

I just don’t think a generalization of the image abstraction by edges idea has generalized to other systems. If the continuing admonishment to keep a bland expression for my passport photo is any indicator, or even Tesla’s continuing agonizing transit of the asymptote of infinite visual road novelty, it hasn’t even worked completely for images either. That’s why transformers have been an advance over CNNs.

Once again, the reverse of me. I see this ability to make transformers find what to attend to as a feature not a bug. A feature which is more apparent for language because of the “second-hand” nature of language being something made by the brain for itself, maybe. But that’s not niche. It’s just more revealing.

It’s because transformers are led by language to define similarity in terms of prediction, that they provide a more flexible framework to address the meaning problem more generally. We can get away from visual “edge” contrast.

Visual edge contrast is OK. It was a step up from full image supervised “learning”. But it’s not enough. CNNs have trouble with broader structure. Training to edges is barely an advance over the old supervised “learning” of images. It generalizes them a bit. Which is why CNNs became popular. But the convolution window is small, they generalize by combining these small parts in different ways, and it’s not obvious how to parameterize those higher (hierarchical) structural principles. Often for CNNs it’s just random? Or maybe a deformation (I think I saw something glancing at LeCun’s lecture talking about deformations…)

By contrast, talking a step back to define similarity as derivative on prediction gives us more power. Most important of all, it allows us to start having concepts of novel meaning. Not just deformations. Flat out novelty. Something can be novel and still meaningful, if it still predicts usefully.

Exactly! Right! We’re seeing this in exactly the opposite way!

You’re saying a prediction based measure still meets the similarity problem? Yes, everything needs its foundation. For the prediction measure I think that will come down to nerve firings. They’re the ultimate resolution of perception.

I don’t think that’s hard to do. You’ll have to do it too.

What’s hard, is figuring out how nerve firings need to be combined to create higher (hierarchical) structure. If you’re saying the mechanism of meaningful structuring will be “edge contrast” all the way down, you need to explain how edge contrast will generalize beyond visual images.

In particular, if you want it to apply to language, the success of transformers suggest you’ll need something like prediction. So unless you want your structural principle to be ad-hoc to images only, it’ll need to incorporate prediction anyway.

Which will be the more fundamental? I can build edges from prediction. Can you build prediction from edges? Can you build novel meaning from edges?

I call light intensity an ad-hoc similarity measure for cognition, because it is appropriate to one special circumstance of cognition, which is images, but not more generally.

Dictionary: ad hoc

  1. For the specific purpose, case, or situation at hand and for no other.

Prediction over sequences I think is more fundamental (once again suggestive of saccades, and the “temporal” original motivation for HTM.)

You think visual edge contrast is more fundamental. Well, I hope the success of transformers can provide an argument against that. Transformers have simply proven more effective than CNNs.

And edge contrast just has no way to deal with novelty. How do you create new meaning with edge contrast? It’s only one step of abstraction above supervised learning from a fixed set of meaningful “objects”. That lame failure to generalize meaning beyond examples was what I stated here:

Prediction over sequences solves this problem that we can’t think of any definition of meaning except listing examples we consider meaningful. And opens the door to novel meaning. It opens the door to meaning beyond just a big list of examples. Ha, “meaning” defined as the biggest list of examples you can find (c.f. Tesla building simulations to create things to train to!!)

And once again a more fundamental definition of meaning in terms of prediction is suggestive of saccades, and the “temporal” original motivation for HTM.

Was that a reason why the operation can’t be dynamic?

2 Likes