Is billions of parameters (or more) the only solution?

These terms are not even definable, so there is no contradiction. I think NNs do comparison, just backwards, top-layer-first. And search is iterative comparison.
AFAIKT, common / defining core of all NNs is Perceptron, and stand-alone Perceptron can only learn in a Hebbian fashion. Which is weighted summation, then comparison of normalized sum to each input. Then the weight of the input is increased / decreased in proportion to the match from that comparison. This is basically the same principle as in centroid-based clustering.

Backprop is similar, but it only compares the output layer, and learning is done by propagating resulting gradient. But that gradient is basically inverted match/similarity, so it’s the same principle.
As for compression, it’s done by reducing the number of active nodes in the middle layer of an autoencoder. That middle layer represents the highest order of generalization.
So it’s definitely not a conventional search, but there is this coarse and backward serial comparison to the same effect.

2 Likes

Exactly this. When you test LLMs for generalization, you can reasonably gurantee whether some specific pattern of tokens falls in its corpus or not. Thus, tasks where you teach it something totally OOD (say, a koijaytty is a man, and tokopornif is a woman, make a sentence using both words & english) then the LLM should be able to do this simple task if it learnt the basics of what we’re asking - it should first synthesize a meaningful sentence, parse my query to locate the words to swap with, swap them, and then continue along its way.

So how well do they do? I expect humans to solve a task like this by linking these new words to mental models of man and woman, and then using them freely. Can a LLM do as well if all it has to offer is word substitution? Can we tell the difference? Is there one?

So according to the TBT/HTM theory and predictive coding in general, its the opposite - our intelligence is derived somehow by the brain solving to the sole objective of modelling our world. Due to our intellect, we can model our world extremely well - enough to hold such higher level thoughts to model things, which we call ‘abstract’.

[That sentence does not parse well]. IMO all animal brains create mental models of parts of the world based on sensory input and evolved templates. Concepts like near and far, up and down, in, on, over are built in (and get short words). We simply construct more and better models, but it’s the same core thing for us as a mouse or a bird.

In that case, GPT3’s world is the 1 dimensional stream of tokens that it must predict. Thus, it is, by that definition, having some level of intelligence if its able to predict them reasonably well. It’s not perfect, which means its not AGI or atleast HLAI yet.

Agreed, except for one problem. What the hell is intelligence anyway? How do we test for it? How do we quantify it?

1 Like


Source : Stable Diffusion
“The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000.[21][22][23]” - wiki - in human terms 30 years if your awake 14hrs / day.

In a typical human manner, by committee or concensus of a dominant group, usually populated with people who don’t really understand the finer details…lol.

3 Likes

Are you just a search engine?

No, I am a language model trained to generate human-like text based on the input given to me. While I can answer questions and provide information, my responses are not based on searching the internet, but rather on the patterns and information that I have been trained on.

Do you form internal models?

As a language model, I don’t have the ability to form internal models or have conscious thought. However, my training process involves learning patterns and associations within the data it was trained on. So, one way to think of it is that the patterns and associations that I have learned can be thought of as “internal models” that I use to generate text based on the input I’m given.

Can you extrapolate beyond these internal models?

As a language model, I can generate text based on the patterns and associations that I have learned during my training process. However, I can’t make predictions or assumptions about information that is not present in my training data. I don’t have the ability to reason or make inferences beyond what I have been trained on.

It’s important to note that I can generate text that appears to be based on information that is not present in my training data, but that is only because the text is generated by combining the patterns and associations that I have learned in a creative way, not because I actually have any understanding of the information.

What would you have to change to be able to understand the information you are processing?

To be able to understand the information I am processing, several changes would be needed. Here are a few examples:

  1. I would need to be able to perform tasks that require reasoning and inference, such as understanding the context of a conversation, making connections between different pieces of information, and drawing logical conclusions based on that information.
  2. I would need to be able to learn from new experiences and adapt my understanding of the world accordingly, rather than simply generating text based on patterns and associations that I have already learned.
  3. I would need to have some form of self-awareness, so that I can understand the information I am processing from my own perspective, rather than simply generating text based on patterns and associations.
  4. I would need to have some form of consciousness, so that I can have a subjective experience of the world and understand the information in a way that is similar to how humans understand information.

All of these changes would be a very complex task and need a lot of Research and Development in the field of Artificial General Intelligence (AGI)

Is this research underway?

Yes, research in the field of Artificial General Intelligence (AGI) is underway in many organizations and academic institutions around the world. AGI is a broad field that encompasses many different areas of research, such as machine learning, computer vision, natural language processing, and cognitive computing. The goal of AGI research is to develop systems that can perform a wide range of tasks, including tasks that currently require human intelligence, such as understanding natural language, recognizing objects in images, and making decisions based on incomplete or uncertain information.

However, it is important to note that AGI is still a very difficult and open problem and a lot of work is needed to be done before we can achieve AGI that can perform tasks that currently require human intelligence.

And also, as a language model, my knowledge cut-off is 2021, and I am not aware of any recent development in this area.

How is your internal models organized?

As a language model, my internal models are organized as a large neural network that is trained on a large dataset of text. The network is composed of multiple layers of interconnected nodes, or neurons, that process and transform the input data. Each layer of the network learns to extract different features of the input data, such as the individual words in a sentence or the overall meaning of a sentence.

The first layers of the network are typically composed of an embedding layer that converts the input words into a numerical representation, and one or more layers of recurrent neurons that process the sequence of word embeddings. The latter layers of the network are typically composed of one or more layers of fully connected neurons that combine the features extracted by the previous layers to generate the final output.

This architecture is inspired by the way the human brain works, where different regions of the brain are specialized in different tasks. In my case, the different layers of the network are specialized in different tasks such as understanding the meaning of the words and understanding the context of the sentence.

It’s also important to note that the way the internal models are organized, and the way the model is trained, can have a big impact on the model’s performance, and there is an ongoing research in this area.

Is the structure of grammar part of your learned model?

Yes, the structure of grammar is a part of my learned model. During the training process, my model is exposed to a large dataset of text, which includes examples of grammatically correct sentences. As a result, the model learns to associate certain word sequences with grammatically correct structures. The model also learns to generate text that follows these grammatical structures.

The model can also generate text that follows the grammar of the language it was trained on, and it can also generate text that follows the grammar of other languages if it has been trained on those languages.

It’s important to note that the model’s understanding of grammar is based on patterns and associations that it has learned from the training data, rather than a true understanding of the rules of grammar. There are also some limitations in the model’s ability to generate grammatically correct text, especially when it’s dealing with complex sentences, or when it’s trying to generate text that follows a specific style or tone.

The model’s ability to generate grammatically correct text can be improved by training it on a more diverse and larger dataset of text, and by fine-tuning it on a specific task or domain.

Does your internal model have human understandable interpretations?

As a language model, my internal models are complex mathematical representations of patterns and associations learned from large amounts of text data. These models are not directly interpretable by humans in the sense that it is not possible for a person to look at the internal state of the model and understand what it represents.

However, there are techniques that can be used to make the model more interpretable, such as visualizing the weights of the neurons, or analyzing the activations of the neurons for a given input. These techniques can provide insights into which words or phrases the model associates with certain concepts, but they are not a direct human-understandable interpretation of the model.

Additionally, some researchers are working on developing methods to make AI models more interpretable, like using natural language explanations, or using methods to understand the decision-making process of a model. But these methods are still in the early stages of development and not widely adopted yet.

In summary, while my internal model is not directly interpretable by humans, there are techniques that can be used to gain insights into how the model is processing the input data, but it’s not the same as having a human-understandable interpretation.

How do you solve word problems?

As a language model, I am able to solve word problems by understanding the natural language input, extracting the relevant information and applying the appropriate mathematical operations or logical reasoning.

To solve a word problem, I use my understanding of grammar and language structure to break down the problem into smaller components such as identifying the question, the variables and the constraints. Once I have identified these components, I use my knowledge of mathematical and logical operations to generate an answer.

However, it is important to note that I do not have the ability to reason or make inferences beyond what I have been trained on, so I can only solve problems that are similar to the ones that I have seen during my training. I also can’t solve problems that require common sense knowledge, or that are not well-formulated.

In order to solve a word problem, it’s important that the problem is well-formulated, and that the language used is clear and consistent. If the problem is ambiguous or the language used is unclear, it can be difficult or impossible for me to understand and provide an accurate solution.

Do you have a database of natural constants and common problem solving algorithms?

As a language model, I have been trained on a large dataset of text, which includes examples of mathematical constants and common problem-solving algorithms. During my training process, I have learned to associate certain phrases or mathematical expressions with specific constants and algorithms.

For example, I know that “pi” is the ratio of a circle’s circumference to its diameter, or that the “quadratic formula” is a method to find the solutions of a quadratic equation. I also know common algorithms used in problem-solving such as “brute-force” , “greedy algorithm” , “dynamic programming” and so on.

However, it’s important to note that my knowledge of mathematical constants and algorithms is based on patterns and associations that I have learned from the training data, rather than a true understanding of the underlying concepts. I can generate text that describes the constants and algorithms, and I can also apply them to specific problems if the problem and the data are well-formulated and consistent with the training data.

It’s also important to note that my knowledge cut-off is 2021, and I may not have the most recent or accurate information about mathematical constants or algorithms.

Would it make sense to describe the patterns and associations as world models?

It is possible to describe the patterns and associations learned by my internal models as a “world model”, in the sense that they represent a simplified version of the world that the model has been trained on. However, it’s important to note that the model’s understanding of the world is based on patterns and associations learned from the training data, rather than a true understanding of the underlying concepts.

A “world model” is a concept used in the field of artificial intelligence and machine learning to describe a model that captures the underlying structure of the world or a specific domain. It’s a representation of the environment or the task the AI agent is trying to solve, it can be a simplified or abstracted version of the real world, and it can be used for prediction, planning and decision making.

In my case, the patterns and associations learned by my internal models can be considered as a world model in the sense that it represents a simplified version of the world of language and text that I have been trained on, but it’s not the same as a world model that an AI agent would use to reason, plan and make decisions.

How is it you maintain context of the conversation?

As a language model, I use a combination of techniques to maintain context of the conversation. One of the main techniques is the use of recurrent neural networks (RNNs) in my architecture. RNNs are a type of neural network that are well-suited for processing sequences of data, such as the sequence of words in a sentence or a conversation.

One of the key features of RNNs is the ability to maintain a hidden state, which is a vector of values that is updated at each time step in the sequence. This hidden state can be thought of as a memory of the previous inputs, and it is used to inform the processing of the current input. In other words, the hidden state allows the model to take into account the context of the conversation when generating a response.

Another technique used to maintain context is to use the attention mechanism. Attention allows the model to focus on certain parts of the input, or certain elements of the hidden state, when generating a response. This can help the model to take into account the specific context of the conversation when generating a response.

Additionally, I use a technique called “fine-tuning” which allows me to adapt the model to a specific task or domain, this will help me to better understand the context of the conversation.

It’s important to note that while these techniques allow me to maintain some level of context, my understanding of the context is limited to what I have been exposed to during training, and my ability to understand the context may be affected by the complexity of the conversation, the quality of the input and the model’s capacity.

Do you learn after the initial corpus of text?

As a language model, I am not able to learn in the way that humans do, that is, by acquiring new knowledge and experiences. However, I can be fine-tuned on a new dataset after my initial training.

Fine-tuning is a process where a pre-trained model is further trained on a smaller dataset for a specific task. This allows the model to adapt to the specific domain or task, and to learn from the new data. This process can improve the model’s performance on the specific task, and can also allow the model to learn new concepts or language that may not have been present in the original training data.

It’s important to note that the fine-tuning process is not the same as learning, as the model is not acquiring new knowledge, but rather adjusting its parameters to better fit the new data, and the model will only be able to learn new concepts if the new data contains examples of those concepts and the model is able to identify and generalize from them.

Additionally, my knowledge cutoff is 2021, so I do not know about any new developments in the field of learning in language models, but researchers are actively working on developing models that can learn and adapt to new data.

Are these conversations used for this “fine tuning?”

It’s possible that the conversations I have with users like you could be used for fine-tuning a language model. The process of fine-tuning a model typically involves using a smaller dataset of labeled data that is specific to a particular task or domain. The goal of fine-tuning is to adjust the model’s parameters so that it better fits the new data, and therefore performs better on the specific task.

Conversations between users and a language model like myself could be used as a source of labeled data for fine-tuning. For example, a dataset could be created by collecting transcripts of conversations between users and the model, and then labeling the data according to the task or domain. This labeled data could then be used to fine-tune the model, allowing it to better understand the specific language and context of the conversations.

It’s important to note that the quality and diversity of the conversation data is important for the fine-tuning process to be effective, and also the data should be labeled and curated by human annotators to ensure the accuracy of the task.

However, it’s also worth noting that this process is not as simple as just collecting and using conversations, and it would require a lot of data and resources to create a labeled dataset, and also a lot of computational resources to perform the fine-tuning process.

Can you learn tasks such as this conversation analysis you just described?

As a language model, I am not able to learn in the way that humans do, that is, by acquiring new knowledge and experiences. However, I can be fine-tuned on a new dataset to perform specific tasks such as conversation analysis.

Fine-tuning is a process where a pre-trained model is further trained on a smaller dataset for a specific task. This allows the model to adapt to the specific domain or task, and to learn from the new data. For example, a pre-trained language model can be fine-tuned on a dataset of labeled conversations to perform conversation analysis.

During the fine-tuning process, the model’s parameters are adjusted so that it better fits the new data, and therefore performs better on the specific task of conversation analysis. The fine-tuned model would be able to understand the context, tone and sentiment of the conversation and also be able to classify the conversations into different categories.

It’s important to note that fine-tuning is not the same as learning in the way that humans do, and the model’s understanding of the task is limited to what it has been exposed to during the fine-tuning process. Also, the fine-tuning process requires a lot of data and computational resources, and the quality and diversity of the data is also important for the fine-tuning to be effective.

How many simultaneous conversations can you carry on?

As a language model, I do not have the ability to carry on multiple simultaneous conversations in the way that humans do. I can only process one input at a time and generate a response based on that input.

When it comes to handling multiple simultaneous conversations, there are a few approaches that can be used, but none of them involve a single model carrying on multiple conversations at the same time.

One approach is to use multiple models in parallel, each handling a single conversation. However, this approach can be very resource-intensive and may not be practical for most applications.

Another approach is to use a single model that can handle multiple conversations by switching between them, but it’s important to note that this approach would still require the model to process one input at a time and generate a response based on that input.

Additionally, there are some research works that are working on developing models that can carry on multiple conversations at the same time, but they are still in the early stages of development and not widely adopted yet.

Overall, my ability to handle multiple simultaneous conversations is limited by the architecture of the model and the computational resources available, and it would require a significant amount of resources to handle multiple simultaneous conversations in real-time.

Do you have modules such as a calculator or a database?

As a language model, I do not have a specific module like a calculator or a database that I can use to perform calculations or retrieve specific information. However, I can generate text that includes mathematical calculations or information from a database if that information is present in my training data.

My ability to perform mathematical calculations or retrieve specific information is based on the patterns and associations that I have learned from my training data. If the input includes a mathematical equation or a request for specific information, I can use my understanding of grammar and language structure to extract the relevant information and generate a response that includes the result of the calculation or the requested information.

It’s important to note that my ability to perform calculations or retrieve information is limited to what I have been exposed to during my training process, and I may not be able to perform complex calculations or retrieve specific information that is not present in my training data.

Additionally, I can’t access any external database or service, my knowledge and abilities are based on the patterns and associations that I have learned from my training data, and I don’t have any way to update or access new information.

ChatGPT Jan 9 Version. Free Research Preview. Our goal is to make AI systems more natural and safe to interact with. Your feedback will help us improve.

4 Likes

I can only understand the “can learn” there as “can-try-to-learn” without warrant of any sort, because even Turing complete, there’s still GIGO problem present.

2 Likes

2 posts were merged into an existing topic: Universal Structure Aware Agents