Visualizing Properties of Encoders

robf · April 29, 2023, 5:37am

You can do like a transformer, and “learn” which contexts to attend to with “attention”. 10 billion dollars in investment from Microsoft to OpenAI says clearly it works quite well.

Or you can leave all the information in a network, and let the runtime context select what it wants to attend to.

To expand on that second solution…

I think the bigger problem than identifying context to generalize on, and even bigger than the entanglement of generalizations which Coecke etc. have focused on, is the fact that these generalizations actually act like expansions of the data, and there appears to be no limit to them. So any fixed set will always be incomplete, and the only way to completely capture them, is to find the ones relevant to a given situation at run time. I went into this a lot in the “How Your Brain Organizes Information” thread. This post might be a good summary of (the first part of) that:

I gave a lot of examples, from the history of linguistics, maths, and philosophy, to support the idea that meaning can’t be completely abstracted. That in fact “learning”, acts like an expansion of the data. But I thought the data that… @cezar_t presented was also good support - that doing more training over transformers, acts much like just using more data:

"How Your Brain Organizes Information" video

If parameters were a simplification of the data you might expect that the number of meaningful parameters would decrease as the data size decreased. If you’re simplifying something it seems reasonable to assume it results in something smaller. But this seems to be indicating “simplifying” is much the same as just increasing the data size. With this result you might argue that calculating more parameters “expanded” structure to a degree roughly comparable (4x) to the degree to which the data size was reduced (1/4.) Calculating more parameters wasn’t simplifying more, it was expanding in roughly the same way as adding data.

I’m hypothesizing the models will continue to get better as the number of parameters are increased, even if data size is limited. This seems to me to be consistent with that

So that’s the first option. “Learn” the context to “attend” to. And capture the entangled, quantum, quality of generalizations, by having a black box, where no-one is sure what the structure is.

But I say it runs into the bigger problem, that generalizations expand. They are not only entangled as Coecke etc see, but they get forever bigger. The current solution to that seems to be to just try to have it as big as possible. (Which also means that only the biggest entities can get involved at all, and the little guy is reduced to begging, or paying, to get access to big company APIs.)

The real flaw with that is that no matter how big you make it, it will never be as “big” as human performance, and you’re forever chasing an asymptote:

The second option, one to deal with this expansion, the one I’m now focused on, Is to leave all the information in a network, leave it “embodied” in a set of data, essentially, and let the runtime context select what it wants to attend to. As I said earlier in this thread:

This explanation to @JarvisGoBrr might be a good summary:

Together with this:

Topic		Replies	Views
Can we deduce a Universal Encoder algorithm via the Fundamentals? Tangential Theories encoders , question , sdrs , auto-association , htm	28	2478	April 4, 2017
Scalar encoder to SDRs Numenta Theory	1	1198	April 11, 2017
Why do we need binary representation(encoder) and can we directly not create SDRs? NuPIC sequence-memory , spatial-pooling , encoders	5	700	June 23, 2020
Grid Cell Inspired Scalar Encoder Engineering encoders , grid-cells	21	4217	March 15, 2021
Toward An Unsupervised, Incremental, Streaming and One-Shot Visual Class-Based Learning and Recognition System with Hierarchical Temporal Memory Theory Engineering encoders	14	1405	July 18, 2018

Visualizing Properties of Encoders

Related topics