Can Transformers generate a story backwards, from the conclusion?

Oh, I see now, you store the model on HD and do database-style lookups, makes more sense now.

So what you have is a massive database of substrings and patterns and you just build a lookup system that can query it.

It is starting to look more and more like variant of the wavefunction collapse algorithm that is built for scalability.

I thought of doing something like this but instead of storing a big database of stings I’d store a big database of sparse weights, and dynamically lookup and swap the relevant connections in realtime. But it was easier said than done, I coudnt even get a recurrent network to learn first order sequences.

3 Likes

wait people here think that nested hashmaps are going to outperform NNs :neutral_face:

time to block numenta emails now I guess

2 Likes

Run it through Chat GTP, it can probably convert most of that to Python, at least enough to explain the idea. May also convert your idioms to ML terms.
In general, it’s now better at constructive convo than any forum I know. But you still need to know how to talk: ask a dumb question, it will give you a dumb answer.

2 Likes

User

Compare n-grams to self-attention

ChatGPT

n-Grams and self-attention are two different mechanisms used for processing sequences in computational models, particularly in the field of natural language processing (NLP).

n-Grams:

  • Local Fixed-Size Context: n-grams consider a fixed-size window of n contiguous items (like words in a sentence). For example, in a bigram (2-gram) model, the probability of the next word is predicted based on the previous word only.
  • Markov Assumption: They generally operate under the Markov assumption, which posits that the future state of a sequence depends only on a finite history of previous states.
  • Data Sparsity and Generalization: They can suffer from data sparsity because the model might not have seen all possible n-grams in the training data, leading to poor generalization for unseen n-grams.
  • No Parameter Sharing: Each n-gram has its own parameters; there’s no parameter sharing across different positions in the text.

Self-Attention:

  • Flexible Context Size: Self-attention mechanisms, as used in models like Transformers, allow each item in a sequence to attend to all other items in the sequence, thereby considering global context.
  • Weighted Relevance: The model learns to assign different weights to different parts of the sequence, signifying how much attention to pay to each part when considering a particular element.
  • Efficient Handling of Long-Range Dependencies: Self-attention is particularly effective for modeling long-range dependencies because it can directly relate distant items in a sequence without stepping through intermediate items.
  • Parameter Efficiency: Self-attention uses parameter sharing extensively, as the same attention mechanism is applied to all positions in the sequence.

In essence, while n-grams look at fixed, local patterns in the data, self-attention mechanisms provide a more dynamic and context-aware representation. Self-attention can flexibly learn which parts of the sequence are important and how they interact with each other to contribute to the overall understanding, thus offering a more powerful and nuanced way to model sequential data compared to the relatively simplistic n-gram approach.

3 Likes

It makes more sense than you do.

2 Likes

Chatgpt is honestly becoming more useless as time passes.

The more openai finetunes it for family friendliess the more self-restrained and incapable it becomes.

“I appologize but as an AI language model, I do not have the capability to do this simple task that I used to be capable yesterday”

But, yeah, we are in this forum to talk to people, its also kinda annoying when people answer with a “here’s a chatgpt answer for ya” more often than not its not very useful.

3 Likes

as someone who has done the same several times. I can assert, leaving a forum doesn’t mean exiling yourself.

if you find a new community that is more active and receptive, please come back to tell me, I’m also looking for a new place.

1 Like

Assume a story ends with “and he went back to fishing!” give me a short story that leads to this ending.

ChatGPT

Once upon a time, in a small coastal village named Serenity Bay, there lived an old man named Samuel. Samuel had spent his entire life fishing in the crystal-clear waters that stretched out before his quaint little cottage. He was known far and wide as the best fisherman in the village, and his love for the sea was unmatched.

One sunny morning, as the golden rays of the sun danced upon the gentle waves, Samuel prepared his fishing boat for another day of adventure. He gathered his fishing gear, loaded a basket with a simple lunch, and set out onto the tranquil waters. His old wooden boat, weathered by years of use, creaked softly as it gently rocked on the ocean’s surface.

As he cast his net and patiently waited for a catch, Samuel couldn’t help but reminisce about his youth. He remembered the days when he was full of vigor and used to sail far and wide, exploring uncharted waters and reeling in exotic fish. But as time went by, he had settled into a slower pace of life, content to fish close to home and savor the simple pleasures of the sea.

On this particular day, the sun hung high in the sky, and the sea was unusually calm. Samuel gazed at the horizon, the deep blue stretching as far as the eye could see. Suddenly, his fishing rod bent, and the line pulled taut. With a grin, he began reeling it in, his weathered hands working skillfully. As the fish emerged from the water, it sparkled like a gem, its scales shimmering in the sunlight. It was the most beautiful fish Samuel had ever seen.

He carefully placed the magnificent catch in his basket, but as he did, he couldn’t help but feel a sense of wonder. The sea had always been his true love, his constant companion, and it had never ceased to amaze him. Yet, lately, he had taken it for granted.

With the stunning fish safely stored, Samuel made a decision. He would not let the years pass him by without fully embracing the beauty of the sea that had given him so much. He realized that the true joy of fishing was not just in the fish themselves but in the deep connection he felt to the ocean.

With newfound determination, Samuel hoisted the sails of his boat and set a course for the open sea. He would explore once more, savor every moment, and appreciate the world beneath the waves. The wind filled his sails, and the boat cut through the water, taking him on a journey of rediscovery.

As the sun began its descent on the horizon, casting a warm, orange glow over the sea, Samuel’s heart felt light and free. He had found his way back to the passion of his youth, and the world seemed to come alive around him. And so, as the story goes, with a heart full of gratitude and a renewed love for the sea, he went back to fishing!

It seems that GPT can generate a story from the conclusion.

2 Likes

I think he meant something like this:

fishing! to back went he sea, the for love renewed a and gratitude of full heart a with goes, story the as so, And him. around alive come to seemed world the and youth, his of passion the to back way his found had He free. and light felt heart Samuel’s sea, the over glow orange warm, a casting horizon, the on descent its began sun the As

1 Like

The brain can’t really do that either so that’s not an important feature to have. I wonder if Bullbash’s AI have some sort of backprop like mechanism existing somewhere in the system? Without it I think it’s almost impossible. It’s like we’re trying to make GOFAI or build the brain component-by-component from scratch which would take ungodly amount of trial and error. Skipgrams and N-grams based system without some other things allowing it to mimick the attention mechanism of the transformer isn’t going to work I think.

1 Like

I might be wrong here, but what attention seems to do is to compute a “weight” for every skip-N bi-gram (arbitrary pair of tokens) within an input text, then use these values to “distort” the embedding of the newest token by adding to it embedding of previous tokens magnified by their corresponding weight value

2 Likes

I guess I have to ask - is the Bullbash system able to answer questions and write stories like ChatGPT?
Not theoretically, but an actual working system that I can see running?

Can they read from backward?

1 Like

That is about what plain vanilla RNN were doing years ago. LLM does a much better job of the whole thing making sense and working around the actual meaning of what is being generated.

Still, I think an RNN is the way to go, probably what makes them so poor at doing the job is that most designs try to store the short term memory in the activations rather than weights.

I believe we need to do the same as GPT models and go with the philosophy of “store absolutelly everything, retrieve only what you need”

The main challenge is how to learn what you need to retrieve.

I’m pretty sure the solution is already somewhere among all the papers about the hippocampus and PFC-basal-ganglia loops, too bad I suck at reading those.

3 Likes

That’s interesting, I wonder how could that work.

I recall a paper on test-time training in which, when given a certain prompt,

  • it generated an embedding of the prompt
  • a vector database was queried for a few dozen relevant documents
  • then the main LLM was shortly fine-tuned with these documents
  • then it ran the actual inference on the prompt.

here it is

The paper above hints one solution to that (no learning, just similarity search into a bunch of documents could be specialized by theme or a “personal thinking&talking history” or whatever combination)

2 Likes

I always end up misunderstanding what people say…

My understanding is the activation would be the forward pass / carry value in the network, wheras the weights are the multiplier (attenuator/amplifier). The weights being influenced by incremental “learning” from one to a few passes of input, however all input is part of the learning, so you seem to imply continuous learning at all times for all inputs ?

The weights would alternatively need a dualilty or a relative decay offset to embody an attention equivalent with a relative temporal decay ? This network would then never learn and interact like HM, completely fresh after a couple of minutes of inactivity ?

I’m curious as it’s an interesting and different.

That was my work philosophy working with data (market/systems) starting in the 90’s… storage was always the issue and now it’s bandwidth between processors.

2 Likes

If you “store everything” at the input stage (including the full temporal dimension) you don’t need to separately store the skipgrams as they are or should be an inherent feature of how your network is structured.

I have not tried this yet because I have no way to verify the output… Can ChatGPT recal word for word any of the training data ?

1 Like

The difficulty is here in my opinion. We need to overcome the scaling issues I think. One approach I’m currently investigating is related to triadic memory. This topic was suggested by a member of this community in one of my threads and it made me discover an entire thread around it here: Triadic Memory — A Fundamental Algorithm for Cognitive Computing - #8 by cezar_t.

The problem I see with this model is the size it takes in memory even without storing any value in it yet. It’s O(n^3) in the number of bits for the triadic, which is… bad… but O(n^2) for the dyadic one. The concept is interesting. That made me make a link with Bloom filters somehow because one can see the dyadic memory as a CBF where the hash functions are somehow defined by the way the hamming distance determines the neighbors and thus the locations of the counters to be updated.

I also thought about embedding things in 2D or more-dimension spaces like you are suggesting but the difficulty is that one somehow needs to perform geometric transformations that should keep the relations between some points but transform others without breaking everything. What would be the objective function for selecting the right points and the right parameters for the transformation? I thought maybe having a reference frame for each word in a dataset of sentences and update based on co-occurence of words but that would likely break all already established relations.

3 Likes

nah, I was thinking like, you could have two types of synapses, ones that are permanent and learn normally with a learning rate and ones that get zeroed out at every trial and only serve as scratch pad.

I kinda want to make it so that we could train the network’s weights and just save the weights instead of the documents, we could search a synapse database and “sum” relevant weights dynamically to change what the LLM “remembers” at any given time.

I dont think its necessary, whats so special about 2D,? why not 3D or 1024D?

I dont think storage is a problem, storage format is, we need a model that remembers relevant info, how to make it remember what has been stored, what symbols should you use as keys and query? how to transform retrieved symbols into a prediction, thats what I think is the challemge, a challenge that GPT solves by backrproping through everything, but we dont like backdrop here so we gotta figure out something else.

here’s an rough architecture I’m thinking.
We could have just a farily large diadic memory (A.K.A. SDM), everything that the language model sees gets stored in there in the same format it uses for inference. once it gets a new token, it does a small pre-processing to embed context into it and transforms it into a key. this key is like a fuzzy superposition of tokens it might want to remember, and it uses it to query the SDM, what it gets is a fuzzy supperposition of stored “Stuff”, then we transform the “Stuff” into a preddiction for the next token. once we get the true next token, we need to find a way to train the system to get the “Stuff” right, meaning we must find what to query to get the right netxt token.

I have no idea if would work, drawing diagrams is easier than actually doing the thing.

2 Likes