Is billions of parameters (or more) the only solution?

david.pfx · January 16, 2023, 9:11am

I agree (although that does gloss over a lot of critical detail).

The point I make is that this “structure of knowledge” is what we know as mental models. All animals create mental models of parts of the real world from sensory input and evolved templates, but this is one thing we just don’t know how to do. Models can be projected in time, so the predator knows where the prey came from and where it is going to, the tennis player knows where the ball is going.

We build models of language, so we know how a sentence will end. Ditto for music. IMO this is the one big missing link, and until it is found there will never be AGI.

cezar_t · January 16, 2023, 11:41am

I guess one reason LLMs are so successful is that they actually end up learning, at least partially, the underlying knowledge structure within their training data. There are limitations and possibly (hopefully?) the high cost of achieving their performance could be dramatically reduced by beginning to understand & discriminate between what makes them good and what wasteful, in the eventuality we’ll learn what to engineer/incorporate into more efficient algorithms.

James Webb telescope and LHC are wasteful, expensive pieces of scientific equipment. LLMs now as with convolutional networks a few years ago explore various means to approach and understand intelligence.

david.pfx · January 16, 2023, 12:39pm

Why do you think that? Or to put it differently, what is the form or nature of this ‘structure’ that they learn? I’m sceptical, but if you can identify something concrete and measurable I’m happy to learn.

An enormous amount of time, effort and money went into inventing, refining, manufacturing and distributing incandescent light bulbs. Now we don’t use them, and the LEDs that replaced them share none of the core operating principles. LLMs work, but they are the incandescent bulb, useful until they aren’t.

[Actually, I don’t think analogies of that kind are helpful, but lots of people seem to want to use them, so there’s mine.]

Jose_Cueto · January 16, 2023, 2:30pm

Cyc does this very well already although it’s not the mainstream AI nowadays one of its authors talked about hypothetically it’s the right brain and it will need a left brain such as neural nets to do the lower pattern recognition part as it is where it is good at. The colab between these AIs might help achieve AGI.

I’m a bit skeptical about the plausibility of learning knowledge structures from raw inputs alone such as language because they do not necessarily formally represent knowledge. I think an AI can only learn higher order knowledge structures if they actually do it that invovces movement. Or if they meta learn the brain.

cezar_t · January 16, 2023, 2:44pm

I don’t think the light bulb / LED analogy is bad, specially if you add the following detail: Before light bulbs were invented the only light source we thought existed was our own bioluminescence. The discovering of incandescent light prompted us to investigate further and allowed us to understand the intricate nature of light, waves, particle phisics etc. which eventually led us to development of lasers, LEDs and computers.

Transformers provide a material to hypothesize upon. Ok let’s say it’s just a search engine. Then why search engines in general are far less capable to sustain a meaningful conversation? Can you make one? What makes this particular one provide a kind of feedback that other search engines, with more “training” (==indexed) data can not?

Another hint I get is that we (our minds) incorporate a similar mechanism. Despite the hype around attention mechanism (as in “is all we need”) I think we have similar chatter when we pay less attention to what we speak, I encountered quite a few persons entering that automatic chatter mode some of them unfortunately not being able to get out of it despite the fact, like in smaller versions, it isn’t very informative.

Unlike LLMs we have much smaller “context windows” and (I assume) storage, learning and processing capacity for words alone so we have to have some compensating mechanisms. And possibly the famous transformer attention is able to partially compensate for lack of our extra mechanisms by being able to handle a much larger context window (tracking of past N words) than us.

neel_g · January 16, 2023, 5:11pm

Probably why NLP (generative models in general) were always at the forefront of the hype - because they show evidence of generalization the most clearly to the average layman. LLMs being able to do any task, being able to adapt was quite an impressive conclusion. The fact that you can use the same mode to write essays to solving analogical reasoning

That is definitely the goal. Interestingly, the current most efficient techniques aren’t incorporated at scale until after Big tech release their LLMs - so right now, there’s always a 2-3 year lag in everything. Pretty exciting for the next generation of LLMs, which would hopefully be even more compute and data efficient.

That’s more of a meme - Vaswani et al. had absolutely NO idea self-attention would become that successful. Though perhaps the title did play a part in the widespread adoption of transformers

The funniest thing I find about the whole debate is that I’ve asked thrice now for any sources/citations saying they are search engines despite me quoting several about how they learn circuits and algorithms. Motivated cognition is a killer of imagination and science…

Jose_Cueto · January 16, 2023, 8:34pm

It’s funnier to ask for citations about this. It’s like asking for citations about whether 5 is a prime number or not. I’ve explained in simple math why NNs are fundamentally parametric search engines and the theory behind is mostly linear algebra and bit of calculus no need for citations.

I am a bit of disagreement with the “indexed search engine” though. AFAIK NN’s don’t index their solutions nor their search spaces.

A little contrast here, is the spatial pooler a search engine? No but it may be interpreted as such because its solutions may be mimic by a search engine. But structurally and mechanically no, at least for now. It wasn’t constructed based on a mathematical search algorithm.

cezar_t · January 16, 2023, 8:44pm

Ok, a bit (or byte?) of gas on fire, this one is too recent to be already on “classical” search engines:
https://www.reddit.com/r/artificial/comments/10ddg8j/i_got_chatgpt_to_create_a_new_joke_i_would_never/

Notice also this comment:

Know about homonyms

Find a question that makes sense in sense A with a connection to sense B

Write the answer with the writing of sense B

These machines are fantastic pattern learners. It is much harder to spot reuse when it happens at the pattern level.

I find it quite hard to classify that kind of subtlety as a search engine.

bkaz · January 16, 2023, 9:45pm

There is nothing wrong with that, per se. Search is how you do generalization: finding similarity among items, that can then be represented as a higher-order item / cluster in a compressed fashion. Increasing lossless component of compression is what learning is all about. The problem is scaling that compression / generalization with the size of data set.

dmac · January 17, 2023, 2:09am

Part of the issue here is that the phrase “search engine” is incredibly vague.
Definition: a search engine is an algorithm that takes a key and returns a list of related values.

By this definition the hippocampus/cortex is a search engine.

What sets both NN’s and the brain apart from search engines is their ability to produce outputs that they weren’t trained on. The question everyone is really asking is not about search engines but rather “can NN’s generate outputs that are novel and interesting?”

What counts as “novel and interesting” is a very subjective and human issue.

It is also important for the issue of copyright: are the outputs truly an artistic work (with a new copyright) or are the outputs merely a mish-mash of it’s training material (meaning that it needs a license to use the training material)?

bkaz · January 17, 2023, 2:41am

Let me take subjectivity out of it. Search shouldn’t just return a list of related items, it should compute the relationship. Which includes differences among items and items to template. These differences form vectors / gradients of change between items. Which can be interactively projected to form novel items. Novel items are valuable if predictive, which can be verified by comparing them to items outside the set, in corresponding direction, including in time. Which further increases compression.
Nothing subjective about that.

Jose_Cueto · January 17, 2023, 3:49am

I think the issue is that there are statements here that assert and generalize something but use/cite using a specialization or a special instance to prove the generalization.

For example, NNs are not search engines but are more than that because;
Proof 1:

I have a simple counterpoint - RETRO is pretty much an explicit “search engine” as you call it, powered by LLMs. Yet it has none of the meta-learning, few-shot capabilities of other models. This idea that LLMs are just search engines is quite a laughable one - its like saying the brain is just a bag of electricity.

Proof 2: Cite paper 2
Proof 3: Cite paper X

It should be qualitatively proved first before showing evidence. A qualitative proof if it exists suffices the generalization but the presentation of evidence doesn’t help with generalization because the world is stochastic, even science prefers falsificatoin.

Firstly we should define what a search engine is and make sure to generalize it as possible especially if the tone of the assertions is claiming to generalize (e.g. X is a search engine, Y is not a search engine).

The definition you gave of a search engine is not generalized because it mentions an implementation detail - Key. Let me show you why:

Fact: The definition above of a search engine accepts a web search engine such as Google because it asks for a search key (e.g. words) and returns results.
Fact: Google uses AI and ML algorithms to search for content from the internet.

Let’s say Google(Key) = Content is a function that takes a key and returns the content searched from the internet. If we expand this function it will likely be;

Google(Key) = SomeFunctionConvolution(ClassicSearchFunctions, AISearchFunctions, MLSearchFunctions, SomeSearchFunctionX, SomeSearchFunctionY)

If one gets the intuition here, one would probably see that such search engine definition abovementioned is extremely restrictive and it should only be used for specialization purposes - bad to compare with NNs, HTMs, or Internet Search Engines. I think it is safe to use it specifically for relational database search engines. The Google search engine is more than a classical search engine.

Hence, it’s more meaningful to use the term search engine as a system that searches for an item using a specific kind of computation process - a more generalized one. This computation is an algorithm that is designed to search or present an emergent searching behavior. The former is obvious for NNs, scientists/engineers invented Gradient Descent a mathematical algorithm to search for local minima. The latter is not quite obvious because their core intents can be objectively viewed as not for searching alone but for something else such as the survival of some species.

IOW I don’t think it is productive to use such S(Key) = Item definition of a search engine because it can leave the inherent computational details in it.

So when @david.pfx said (below), current DL is really inherently a search engine. Why, because it is a system that searches for optimal parameters that can fit/model some dataset. However, it is engineered to predict or classify (what we see right away). If we restrict the search engine definition with S(Key) = Item, then of course DL is not a member of it, most importantly even internet search engines of today are not members as well. Only a relational database search engine can be accepted perhaps in this definition. My best guess is that we are not talking about this naive type of search engine. Therefore it is more productive to use a generalize search engine definition.

FWIW my working hypothesis is:

Current ML is not AI, it’s mostly just a fancy search engine. It searches a multidimensional space for matches on text/images. It takes a lot of parameters to define interesting regions in that space. Practical ‘AI’ adds an output layer of engineered code to do something useful with the search results.

david.pfx · January 17, 2023, 5:15am

A search engine is a piece of software that takes some kind of input (text, picture, etc), searches a trove of documents obtained from the Web (or similar source) and returns those with various kinds of similarity.

The brain is not a search engine, and NNs do not on their own produce outputs there not trained on. However, a common feature of all useful search engines is that they then present the retrieved documents with some degree of metric and/or reformatting.

By this measure GPT uses text to find text, but also applies a gloss of formatting to generate what appear to be original output. This part is good old-fashioned software-engineered code layered on top.

Genuinely novel output would falsify this hypothesis, so it’s an important thing to define. Suggestions welcomed.

One place to start might be by introducing words known not to exist in the training corpus. Eg non-existent people names, place names, sports like farnarkling, etc.

Jose_Cueto · January 17, 2023, 5:50am

If I may ask, what is the point of comparing sophisticated algorithms/systems to this kind of definition of a search engine? Isn’t this too naive of a definition? It’s likely that only a relational database search engine can fit in this definition.

For this type of search engine definition, I disagree with the statements that imply NN’s are search engines. It’s too bad of a comparison IMO. For GPT, NN’s search their params to predict the next word/words, therefore their search space is not the input space (e.g. key, words etc) instead it is the params search space. NN’s don’t search their inputs or their likes, they predict them or classify them.

david.pfx · January 17, 2023, 6:36am

As I understand it, GPT searches its corpus for words and phrases, and the context around them. Then it assembles them according to what it finds, into text.

The billions of parameters are equivalent to a multi-dimensional space, and the search is for words and phrases found along some dimension. If the target is “bought XXX at the supermarket”, then GPT will find that phrase in its documents (all written by humans) and find that XXX is most likely (say) bread, or fruit, or meat, and probably not cement or elephant or motor car. It has no model for what constitutes any of those things, just fragments of text showing how they have been used.

I call it a search engine because of how it retrieves bits of text written by humans using parameters as the search index, and then uses what it finds to build its output. The more the parameters, the more it looks like a gigantic search engine.

It’s a hypothesis, open to falsification. That’s science.

complyue · January 17, 2023, 8:29am

Yeah, that “indexed” has to use some relaxed semantics to describe NN’s “searching” at work. I assume mathematical algorithms would imply reductionism, but NN’s “index” is obviously a holistic thing.

complyue · January 17, 2023, 8:48am

I would expand a bit that “mental models” are products of our “intelligence”, only with our (or animals’ at their levels) “innate-intelligence” we can “build-models”. This ability to design things, comes from evolution if you don’t believe the existence of a world-creating-god, or some “higher-order-designer” not considered a god.

So there can be hierarchical “intelligence”, existing at different orders/levels?

Then creativity (including designing/inventing novel things) is itself an evolution thing/process happens in a brain, if you buy what William H. Calvin suggests in THE CEREBRAL CODE.

roboto · January 17, 2023, 3:54pm

We can perhaps use ‘search engine’ to metaphorically describe basic deep learning ANN architecture, but even then function approximator is a more apt description. It’s not too proper to describe a math equation as a search engine. A search engine type AI would be something more like the Triadic memory that was posted in this forum many months ago but definitely not too fitting for a backprop type AI where it learns functions. That metaphor is also becoming more stretched when it comes to describe current state of the art DL architectures especially if it contains recurrence and other gimmicks that it utilizes to create a prediction.

neel_g · January 17, 2023, 4:53pm

My problem is there is little evidence of Neural Networks doing that; Simpler NNs exploit properties of the tasks and LLMs form circuits.

Thank you! and now can you kindly explain after reading the Medium article how GD has anything to do with a search engine, or how it finds a solution making the network perform some sort of a “search”?

Because you are conflating backpropogation and Neural Networks. They are not the same!

For backpropogation, you find some set of parameters that minimize the loss. But that underlying algorithm learnt by the NN does NOT correspond with searching through anything. The NN can learn anything - its Turing complete after all. But that does NOT mean its searching until you have strong evidence to support so.

Exactly this. When you test LLMs for generalization, you can reasonably gurantee whether some specific pattern of tokens falls in its corpus or not. Thus, tasks where you teach it something totally OOD (say, a koijaytty is a man, and tokopornif is a woman, make a sentence using both words & english) then the LLM should be able to do this simple task if it learnt the basics of what we’re asking - it should first synthesize a meaningful sentence, parse my query to locate the words to swap with, swap them, and then continue along its way.

And that’s how I interpret when someone calls an LLM a “search engine”. It was a leading hypothesis in the recent past, but quickly fell out of favour with strong evidence suggesting otherwise.

So according to the TBT/HTM theory and predictive coding in general, its the opposite - our intelligence is derived somehow by the brain solving to the sole objective of modelling our world. Due to our intellect, we can model our world extremely well - enough to hold such higher level thoughts to model things, which we call ‘abstract’.

In that case, GPT3’s world is the 1 dimensional stream of tokens that it must predict. Thus, it is, by that definition, having some level of intelligence if its able to predict them reasonably well. It’s not perfect, which means its not AGI or atleast HLAI yet.

Kinda agree - but I prefer not using the term search engine because that’s nothing close to what an ANN does. Function approximator is a much closer term IMO.

Definitely agree - we have some LMs that are designed to act as a search engine, and some that are supposed to be as far as possible (in which we try to even prevent such shortcuts)

Jose_Cueto · January 17, 2023, 8:41pm

I think when we explain a model we have to also specify which level we are explaining.

If I wear my layman hat, I will likely see gpt as a typical dumb search engine based on its behavior (level). This is very evident observation in social media especially in LinkedIn.

But if I wear my researcher hat, I will see gpt as both a typical dumb search engine and as a NN with memory that fundamentally searches for the right params to model a sequence of words. There are 2 states here, learning and computing, the former is the part where I associate it with a sophisticated search engine. A disclaimer to me is I need to study more about the attention scoring part as I have not dug deep in that area.

By search engine I prefer to equate it to a definition of a search algorithm (engine because it’s mechanistic) rather than a typical dumb blackbox that does only indexed search. Just because ML algorithms are inherently mathematical and usually more sophisticated.

One of the understanding gaps today I believe when it comes to explaining AI systems is that in offline systems the learning part is almost always treated as a preparation/engineering process while in online systems, learning and executing/computing go hand in hand in real-time and it’s hard to separate them. Thus, explaining the former can ignore the learning process, while the latter requires both processes to be explained.

Topic		Replies	Views
A different point of view on building AI system Tangential Theories	45	4014	December 2, 2017
“Prediction” from the first principles Lounge	75	4270	August 13, 2018
Trying to pull some ideas from Deep Learning Tangential Theories	11	1685	May 13, 2016
Will we ever see AGI? Lounge agi	10	1711	January 1, 2019
The biggest bottlenecks to creating the theory of intelligence Numenta Theory	14	961	May 7, 2018

Is billions of parameters (or more) the only solution?

Related topics