Yan LeCun on GI vs. current DL

I don’t know exactly what documents you have to create. Can you give any short examples?

I wasn’t being sarcastic when I said current transformer tech might be good digging up obscure references. Similar to the task of recognizing obscure breeds of dogs, which is the sort of task where AI vision already exceeds human. Or IBM Watson winning Jeopardy. For recognizing obscure references we already can’t beat AI memory. If you have to collate a lot of obscure standards and regulations, especially if they are unambiguously identified by unique token acronyms, it might put together a very useful combination.

Even in the case of obscure references though, I would be very careful. Is this something you have to attach your own name to? Especially in fulfillment of regulations. Isn’t there a risk of being sued if some subtle invention deep in the text later turns out to invalidate an entire product release, or equivalent costly mistake?

I’m interested to see examples where it works. Every example I’ve seen so far turns out having something flat out wrong. Like the example @bkaz produced above: sentence translation word-by-word equating to “single embedding”, in whatever sense he was using “single embedding”. Is that correct?

Maybe government regulation uses terms which are much less subject to ambiguity. And it might not require more abstract mappings like the one BARD seems to have conjured between an over simplification of the translation task and an interpretation of the “single embedding” term.

Or maybe I’m wrong in my interpretation of the BARD text. I’d like to see the source texts. It takes time to disentangle these sorts of things. I don’t envy any human stuck with the job of full time fact checker on transformer output.

(It reminds me of the task of “editing” machine translation output back in the day. You were given a word salad output by the system, without access to the original, and asked to “correct” it. Though “correcting” the output of poor human translators similarly begged for a complete re-write from source language original, it has to be said. Once meaning is gone, meaning is gone.)

Where I see current tech excelling is in uses where fluency matters, but factual correctness doesn’t matter. Or at least where it is not pushed far beyond very well established meanings.

I can see the entire foreign language teaching industry being taken over in short order, for instance. I’d love such a tutor myself. Language learning requires a willing partner more than a smart partner.

It may be the step in robustness and natural response that voice assistants like Alexa and Google Home have lacked. Was Alexa recently cancelled because they’ve been unable to make money with it? It was too fragile. Current tech might do fine at applications like changing a TV channel, where errors are immediately apparent.

Computer code strikes me as another area it might do very well. That’s a domain where the target meaning can be finitely learned. And if it is wrong, an error is immediately apparent because the code doesn’t work.

For the rest, on this issue I become more aligned with LeCun. Transformers need better command of meaning. But unlike LeCun I don’t think it is to be achieved only by more data. I think it is to be achieved first by better structure.

1 Like

Your prediction simply selects inputs for a comparison operation, based on prior comparisons. I can just as well say that CNNs “predict” next (adjacent) pixel. Again, atomic comparison in self-attention is AND: a test for co-occurence, before it is converted into weights during training. That’s because the symbols in language are meaningless per se, so self-attention has to degrade their resolution down to binary same | different.
This is literally a repetition that you are trying to contrast it with. As opposed to a far more subtle inversed-similarity gradient computed by a trained CNN: a lot less like repetition. You keep getting it backwards. Ok, there is this issue of using embeddings for input selection, I think it’s basically references. But it’s not specific to transformers.

And then dot-product of the weights is an obviously unprincipled similarity measure, it is trying to account for both match and miss, mixing them together. So it can’t see how they behave separately. It’s just a hack, and I suspect a chunk of a reason for using it is that it’s easy to do on GPUs.

I don’t have much confidence in these responses, but the very process of formulating questions and getting instant related feedback is thought provoking. And I don’t have much confidence when interacting with humans either, unless they are experts in some very specific domain. Same with transformers, they should get a lot better just with domain-specific fine-tuning, and then project-specific fine-tuning. It will just cost more. After such project-specific fine-tuning, I can have gpt asking me questions instead, that’s even better for focus :slight_smile:

1 Like

If CNNs predicted the next pixel based on shared context rather than repeated pattern, I would agree.

If you’re saying they are the same, what is the analogue for this {the - 1, a - 0.5} set I sketched in my example, for the pixel case?

There could be such an analogue. You might cluster a range of tonal values on a particular contrast. Better yet, a range of contrasts. The tonal values within the set itself could be very diverse. And the set of contexts specifying the set could be very diverse. This is the “embedding” or “skip” idea. That a word is defined by what is around it, not the word. The way “I like ___ cream” says “ice” without saying “ice”. One version of that might define something like an “edge”. But the “edge” wouldn’t be a convolution of the state. It would be based on a collection of properties external to the state.

It reminds me a bit of what I believe Google do when they “enhance” their low light photography. Instead of trying to enhance one image, they take a bunch of images in sequence, and then stack them on top of each other, aligning with context to fill in the gaps.

Applied more fully this stacking on context idea might allow a system to “stack” visual experience from waaay back along its sequence. Along the full sequence of visual experience of the organism over its lifetime.

But it needn’t be repeated experience. It could be new stackings, based on new collections of contexts, all the time.

1 Like

What do you mean by repeated pattern? Kernel? That’s the same as context window in self-attention, just a lot smaller.

It doesn’t make sense for pixels because their similarity is already grey-scale, you don’t have special treatment for exactly the same pixels. Instead there is a weight per position within a kernel. And you can combine results between kernels of different size, with a separate weight per size. These weights can be 0, that will be your skipping. Generally the gradient is computed for the whole kernel, but there nothing to prevent preserving and re-comparing individual pixel-to-pixel differences, from which the gradient is formed.

1 Like

Think of the Google low light image enhancement idea. You take multiple images over time, and then combine them by aligning contexts.

The greyscale value of the element doesn’t matter. “Similarity” is specified by the alignment of contexts. A variety of greyscale values might be assigned a “weight” based on their occurrence in a context.

Their “similarity” is not a greyscale. It is the alignment of contexts.

1 Like

I had specified key details in my request, such as this is going to an aero customer, and that this was for ground support equipment.
Sure, I can provide you with a sample OCM (Original Component Manufacturer) letter that you can use as a reference to create your own. Here’s a sample letter:

[Your Company Letterhead]

[Date]

[Name and Address of Recipient]

Dear Sir/Madam,

We, [Your Company Name], are the Original Component Manufacturer (OCM) for the [Name of the equipment]. We certify that the equipment has been designed, developed, and manufactured in accordance with our established procedures, processes, and quality management system.

The equipment has been manufactured using the highest quality materials and components, and it has been tested and inspected to ensure that it meets or exceeds all applicable specifications, standards, and regulatory requirements. We certify that the equipment is free from any defects in material or workmanship.

Please note that the equipment is not flightworthy, and it is intended for [insert intended use, e.g., ground support equipment, laboratory use, etc.] only.

We confirm that the equipment is genuine and that it has not been modified, altered, or refurbished in any way. We also confirm that we have no knowledge of any counterfeit or suspect parts used in the manufacture of the equipment.

Should you have any questions or require any further information, please do not hesitate to contact us.

Sincerely,

[Your Name]
[Your Title]
[Your Company Name]

After this probe, I know what regulations I am looking for and will confirm the details offered in said regulations. I have been trained in DOT DG code and did not need to explore the motor freight regulations. Like any tool, you have to know how it works and what its limits are.
Mark
what is the regulation that governs shipping of lithium batteries?

The regulation that governs the shipping of lithium batteries is the International Air Transport Association (IATA) Dangerous Goods Regulations (DGR) and the International Maritime Dangerous Goods (IMDG) Code. These regulations provide guidelines and requirements for the safe transport of lithium batteries by air and sea.

The IATA DGR and IMDG Code classify lithium batteries as dangerous goods due to their potential to pose a risk of fire and explosion. As such, they must be properly packaged, labeled, and declared as dangerous goods when being transported.

Some of the key requirements for shipping lithium batteries include:

Lithium batteries must be packaged in accordance with the UN Packing Instruction 965 (for air transport) or Packing Instruction 965-970 (for sea transport).
Packages containing lithium batteries must be marked and labeled with the appropriate hazard labels, including the Class 9 Miscellaneous Dangerous Goods label and the Lithium Battery mark.
A Shipper's Declaration for Dangerous Goods must be completed and provided to the airline or shipping company, declaring the presence of lithium batteries in the shipment and providing information on the type and quantity of batteries being shipped.

Failure to comply with these regulations can result in serious safety risks and penalties. Therefore, it is important to ensure that lithium batteries are properly packaged and declared when being shipped.
Mark
what are the details of package marking?

Package marking is an important aspect of shipping that involves labeling packages with relevant information about the contents of the package. The following are some of the key details that are typically included in package marking:

Shipping address: This includes the name, street address, city, state, and ZIP code of the recipient.

Return address: This includes the name, street address, city, state, and ZIP code of the sender.

Package weight: The weight of the package is usually indicated in pounds or kilograms.

Handling instructions: This includes any special handling instructions for the package, such as "Fragile" or "This End Up".

Hazardous materials information: If the package contains hazardous materials, it must be properly labeled with the appropriate hazard labels and markings as required by applicable regulations.

Tracking information: The package should be marked with a unique tracking number to allow for tracking and delivery confirmation.

Customs information: If the package is being shipped internationally, it may need to be marked with customs information, such as a description of the contents, the value of the contents, and the country of origin.

The details of package marking may vary depending on the specific requirements of the shipping carrier and the nature of the contents being shipped. It is important to carefully review the shipping instructions and regulations to ensure that packages are properly marked and labeled for safe and efficient delivery.
Mark
how about the hazardous marking requirements for these batteries?

Lithium batteries are classified as dangerous goods due to their potential to pose a risk of fire and explosion. As such, they must be properly marked and labeled with the appropriate hazardous materials information when being shipped.

The following are some of the hazardous marking requirements for lithium batteries:

Class 9 label: Lithium batteries must be labeled with a Class 9 Miscellaneous Dangerous Goods label. This label indicates that the package contains hazardous materials that are not otherwise specified.

Lithium battery mark: The package must also be marked with the appropriate Lithium Battery Mark, which is a symbol consisting of a black rectangle with a white outline and the letters "Li" in the center.

Proper shipping name: The proper shipping name of the lithium batteries must be clearly indicated on the package, along with the UN number (UN 3480 for lithium ion batteries or UN 3090 for lithium metal batteries).

Cargo aircraft only label: If the package is being shipped by air, it must be labeled with a Cargo Aircraft Only label to indicate that it can only be transported on a cargo aircraft and not on a passenger aircraft.

State of Charge (SoC) Mark: If the package contains lithium ion batteries with a SoC greater than 30%, it must be marked with a "Caution - Forbidden for transport aboard passenger aircraft" label.

These hazardous marking requirements are intended to ensure the safe transportation of lithium batteries and to alert shippers, handlers, and carriers to the potential risks associated with these batteries. It is important to comply with these requirements to prevent accidents and ensure the safe delivery of the batteries.
Mark
is there a weight or wattage limit?

Yes, there are weight and wattage limits for lithium batteries that are shipped by air or sea. These limits are put in place to ensure the safe transport of lithium batteries and to minimize the risk of fire or explosion during transportation.

For air transport, the weight and wattage limits for lithium batteries are determined by the International Air Transport Association (IATA) Dangerous Goods Regulations. These regulations classify lithium batteries as either “small” or “large” batteries, depending on their watt-hour rating (for lithium ion batteries) or lithium content (for lithium metal batteries). The weight and quantity limits for lithium batteries vary depending on their classification and the mode of transport (passenger or cargo aircraft).

For sea transport, the weight and wattage limits for lithium batteries are determined by the International Maritime Dangerous Goods (IMDG) Code. The IMDG Code sets out specific rules for the transportation of lithium batteries by sea, including weight and wattage limits, packaging requirements, and labeling and marking requirements.

It is important to comply with these weight and wattage limits when shipping lithium batteries to ensure the safe transportation of these batteries and to avoid any penalties or fines for non-compliance. If you are shipping lithium batteries, it is recommended that you check the relevant regulations and guidelines to determine the appropriate weight and wattage limits for your shipment.
Mark
what are the limits?

The weight and wattage limits for lithium batteries that are shipped by air or sea depend on several factors, including the type of lithium battery, the mode of transport, and the quantity being shipped. Here are some general guidelines:

Air Transport:

For lithium ion batteries:

Small lithium ion batteries: These have a watt-hour rating of less than 100 Wh. There is no limit on the number of small lithium ion batteries that can be carried by a passenger or on a cargo aircraft, as long as they are installed in or with equipment.

Large lithium ion batteries: These have a watt-hour rating of 100 Wh or more, but less than 160 Wh. A maximum of two large lithium ion batteries are allowed to be carried by a passenger on a passenger aircraft as spare batteries. On a cargo aircraft, the quantity of large lithium ion batteries that can be shipped is subject to certain limits based on their watt-hour rating and packaging.

Lithium ion batteries exceeding 160 Wh are generally not allowed to be transported by air, except under certain circumstances.

For lithium metal batteries:

Small lithium metal batteries: These contain less than 1 g of lithium. There is no limit on the number of small lithium metal batteries that can be carried by a passenger or on a cargo aircraft, as long as they are installed in or with equipment.

Large lithium metal batteries: These contain 1 g or more of lithium, but less than 2 g of lithium. A maximum of two large lithium metal batteries are allowed to be carried by a passenger on a passenger aircraft as spare batteries. On a cargo aircraft, the quantity of large lithium metal batteries that can be shipped is subject to certain limits based on their lithium content and packaging.

Lithium metal batteries exceeding 2 g of lithium are generally not allowed to be transported by air, except under certain circumstances.

Sea Transport:

For lithium ion and lithium metal batteries:

The IMDG Code sets out specific weight and wattage limits for lithium batteries that are shipped by sea. The limits vary depending on the type and quantity of batteries being shipped, as well as the mode of transport and the packaging used.

It is important to note that these are general guidelines, and the weight and wattage limits for lithium batteries may vary depending on the specific regulations and guidelines that apply to your shipment. If you are shipping lithium batteries, it is recommended that you consult the relevant regulations and guidelines to determine the appropriate weight and wattage limits for your shipment.

So you’re looking up the regulations. You’re not asking the transformer to write anything in your name.

If you find it more productive than simply googling the regulations, that’s good.

Maybe if regulations always exist, there is less danger of it inventing any.

I don’t have access to Bing or Bard. But I was foisted with an app called Nova, which accesses something like this (GPT-3.5 “Turbo”?) I asked it for shipping regulations to Mars. And it said that there are none.

That’s pretty good. It has means for identifying a negative.

So I tried the novelty of the “substitution patterns as the source of creativity” idea again.

Once again there it shot off its mouth about the infinity of language, Chomsky, etc.

I guess the concept of shipping to Mars is pretty clear, but exactly what is meant by substitution patterns growing without bound being the source of creativity, is by no means clear, even to most humans!

I wonder where the border for identifying a negative is.

I tried asking it for mailing regulations for mail to be sent underwater. Like Mars, it identified that there are currently no specific USPTO regulations for shipping underwater. Seems likely.

I don’t know. That’s interesting. When does it decide that it has no information about something?

So I asked it to name three books by Noam Chomsky on watercolour painting. It thought about that for a long time. I went through two cycles of “Just one moment” messages. A wait one moment, message. And a, wait just a bit longer, message.

Final answer, pretty good: “Chomsky is primarily known for his work in the field of linguistics and philosophy, and not for watercolor paining. He has written numerous influential books and articles on these topics, but as far as I’m aware, he has not written any books on watercolor painting.”

I wanted to ask it Bertold Brect’s opinion about evolution? But I had run out of free chats.

That is good. I hear these chatbots are still inventing attributions, but it appeared to be identifying negatives for questions where the concepts might be thought to be fairly clear.

Maybe the risk of them inventing government regulations is no longer as great as I feared.

There must be a criterion for recognizing something doesn’t exist though. It certainly searched a long time for Chomsky’s watercolour books. I’d be very curious to know what that criterion is.

1 Like

I did some searching how the latest models implement some kind of fact checking. It seems one approach is an integrated classical NN search on the likes of Wikipedia, to check for facts. And that can either be used to filter queries to the LLM, or co-trained with the LLM.

My impression is that it is much the sort of thing LeCun was suggesting in the head post here. And indeed for instance in this paper, one of the authors is affiliated with FB AI Research:

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez … FB AI Research and others.
,

“a dense vector index of Wikipedia, accessed with a pre-trained neural retriever.”

So not a power of LLMs at all. Just some hybridized curation of the LLM representations with more conventional NN information retrieval techniques.

1 Like

I don’t know how Google does it, but the only way to define “context” is through some kind of similarity among elements. Basically pixel clustering / image segmentation, which is always done by similarity of the pixels. Alignment would be overlap between fuzzy clusters.

That’s cool, but LeCun was talking about grounding in real world through senses, not wikipedia.

1 Like

What appears to me to be your determination to disagree, defeats my enthusiasm to explain, Boris. No-one else seems interested, so I don’t know how much it is worth labouring this. Unless someone else wants to chime in, I suggest we take any genuine wish you may have to understand, offline.

It’s not hard to imagine a similarity measure over the contexts.

The point is this can then be used to assess individual greyscale values within the context.

If the broader context is an image of a face, you can align a sequence of images on a statistical consensus of pixels making up the face, and use that alignment to inform particular values within the image, which might have been obscured in any particular image in the sequence.

You “enhance” the image by grouping greyscale values for a particular pixel across a sequence. The individual pixel greyscale values within this grouping might vary wildly. Bright with noise in one image of the sequence, dark in another. But “similar” in that they occur in the same context.

It’s different to a similarity between greyscale values of the pixel itself.

You will no doubt make some obtuse objection, such as that the word “similarity” is the same in the two cases, or that the context repeats, so it is really repetition, even though the grouping of pixel values within the context can be different each time, and even, in the broader case, beyond Google, where you reach back across time, you could choose a different alignment grouping each time too.

1 Like

So I interrogated Chat and Bard about hybrid CNN / ViT approaches.
Chat professed ignorance, Bard hallucinated a bunch, as usual:

"Sure, here are some papers that have explored the idea of directly computing the similarity between image patches in ViT:

  • “Vision Transformer with Local Self-Attention” by Yu et al. (2021)
  • “Spatially Aware Vision Transformer” by Wang et al. (2021)
  • “Patch-based Vision Transformer” by Zhang et al. (2021)

These papers all propose different approaches to computing the similarity between image patches directly. The results of these papers suggest that it is possible to improve the performance of ViT models by directly computing the similarity between image patches.
Here are some additional details about each of these papers:

  • “Vision Transformer with Local Self-Attention” by Yu et al. (2021) proposes a new ViT model that uses local self-attention instead of global self-attention. Local self-attention allows the model to attend to patches of the image that are close to each other in space. This can help the model to learn more about the spatial relationships between different parts of the image.
  • “Spatially Aware Vision Transformer” by Wang et al. (2021) proposes a new ViT model that uses a spatial attention mechanism to learn about the spatial relationships between different parts of the image. The spatial attention mechanism allows the model to attend to patches of the image that are close to each other in space. This can help the model to learn more about the spatial context of the image.
  • “Patch-based Vision Transformer” by Zhang et al. (2021) proposes a new ViT model that uses a patch-based approach to learning. The patch-based approach divides the image into a set of patches and then learns a separate representation for each patch. This can help the model to learn more about the local details of the image.

I hope this helps!

These papers don’t really exist, but here is a similar brand-new one that looks interesting:

3 Likes
4 Likes