Universal Encoder

Thank you for the reference slides. Great material.

Exactly my point, with reference to the OP.

Will go through the slides in more detail. Very interesting.

You might find this post interesting:

Yes I guess my naive hope is that it’ll converge on the most useful / important features over time. But I guess I’m not sure if this would naturally happen or if it is something to be explicitly engineered.

That is why there is a WHAT and a WHERE stream. The semantics of WHERE share a common framework and can be intercompared directly. Likewise - the semantics of WHAT share a common framework and can be intercompared. In both cases, the higher ordered formats share enough properties that representations can mingle in the same space.

1 Like

I understand and this is possible but this won’t necessarily help in correcting the encoders themselves since semantic meaning is something learned from the patterns based on different data and the system will map any SDR to location despite its actual semantic validity with respect to the world(where encoders matter).

@jordan.kay I guess the possibility of correcting encoders seems plausible when a system has a lot of knowledge about the world and thus a lot of intuition with which it can validate the new patterns. Not sure though. But certainly hope this is so.

I tend to agree but I am not sure about this yet. I am not sure how this will be the case for different data types. The different high order representations can be mapped to one location but their accuracy cannot be corrected that way. Even if the high order representations contain the location information, the rest of the information is reduced from the encoder space.

1 Like

For a well known WHERE example - the entorhinal cortex successfully combines self-motion, vision, touch, body position, and the vestibular system all in the same area. For a WHAT example see the Cottical IO products to see how a map can fold together the semantic meaning from a significant fraction of the WIki database. That information could be read or listened to and form the same map.

Given that those systemic data are semantically correct and valid, this combination is valid and helps, but it could just as well combine faulty data using the same mechanisms.

Their encoder is interesting. I don’t see how it isn’t a special case for text encoding though. I have to read a bit more before I can come to a conclusion. Though I am not inclined towards calling that a universal encoding mechanism from the info I have about it.

I have described the WHAT and WHERE stream as being flexible and a powerful method to parse the physical world we find ourselves in. The end product should be compatible to the point that representations from various streams can mingle in the same neural structure. The addition of WHAT&WHERE sequences is sufficient to encode the WHEN of the world. I will allow that these systems are subject to errors (sometimes called illusions) but if the presentation is ambiguous so I don’t know that any encoder could do any better with that data.

I am not sure how much more encoding you need.

If you are looking for a perfect encoder for all information you are doomed to fail - Gödel has proven that this is a fool’s errand.

I am not disregarding the WHAT&WHERE stream as a method to parse information about the world. What I am saying is that this isn’t sufficient to be a universal encoder unless the WHAT encoding mechanism is made universal and also that this in itself cannot be used by the HTM to rectify encoders.

I am looking forward to your example of WHAT semantic information that can’t be put into words and then encoded in a system like Cortical IO makes.

The reason I go to the artificial text-based system instead of my much preferred entorhinal grid system is that nobody has demonstrated that the WHAT stream does work the same way that has been demonstrated for spatial information.

That’s not to say that it works this way or that way - it’s just that to the best of my knowledge - nobody has looked for it yet. For the reasons that I have elaborated above - I expect that it will be the same.

Out of curiosity - do you have a counter-proposal for a universal encoder?

Why would you want to put information into words? Lots of time and processing will be required even if we use that approach. But I like the idea of converting everything into an intermediate form, in fact I suggested this earlier. Lots of semantics will be lost if we use words unless we spend a lot of processing power to encode every little detail using them. I wouldn’t want to train a system on video streams by converting(describing) those video streams into words, if that’s what you meant.

My current stand is that I don’t think there can be a universal encoder. But we can try using genetic algorithms.

Perhaps this is off topic, but I’ve wondered this as well. Their encoding of words seem to be similar to a Latent Dirichlet allocation would that perhaps be a good place to start when trying to develop a universal encoder, or something that could evolve as a universal encoder?

1 Like

My first thought is that the system’s capability would be reduced drastically if we convert everything into language or words; the amount of detail and processing will be a trade off to a certain point but then after that we will lose some details. Secondly, it is easier to make encoders separately. Thirdly, I think using this would make the memory an associative, limited one with maybe less accuracy and good for classification but not detailed predictions and inference.
I mean, language is good for communication but we hardly are able to communicate our thoughts accurately using it. I think it’s a long shot to think that a system solely running on it will be of a great use. It’s certainly a good idea to implement and see, though. Perhaps, it just might work. :crossed_fingers:

And also thanks for the link to Latent Dirichlet Allocation, very interesting. The amount of citations referenced to the original paper are so many!

I don’t seriously think that the Cortical IO system is how the brain does it. It is an abstraction of part of how the brain does things in much the same way that deep learning mimics portions of how the hierarchy of the brain works and enables point neurons to do more than a single layer is capable of. In both cases - mimicking even a small part of the brain’s function turns out to be very powerful. The Cortical IO “retina” captures the essence of the combinatorial semantics portion of the WHAT stream.

In the case of the WHAT stream - this is about as far as I have seen anyone go in implementing the essential features of the WHAT stream and as limited as it is - the power of what it can do is impressive.

I have offered this model in the discussion because it is possibly the most intuitive portion of the WHAT stream and there is a working model with good explanatory papers to read.

The true WHAT encoding is far more subtle and I have shied away from bringing it to the discussion as I did not feel up to showing how the relatively complicated encoding fits the discussion at hand.

You seem genuinely interested in the limits of the cortical IO system and why those limits keep it from being the general system that is employed in the human brain. Fair enough. I will bring out the big guns.

The WHAT parser is somewhat like the coding in the WHERE stream in that at the higher levels it is hard to relate what is sensed in the raw streams to the abstractions; they don’t look like anything we normally think of as what is presented to our senses. That said - much of what it does has been sussed out and this paper does a fair job of outlining the various mechanisms in the WHAT stream:

How neurons make meaning: brain mechanisms for embodied and abstract-symbolic semantics
https://www.sciencedirect.com/science/article/pii/S1364661313001228

For those that don’t feel like reading the whole paper here is the money shot:
“In this paper, four semantic mechanisms are proposed and spelt out at the level of neuronal circuits: referential semantics, which establishes links between symbols and the objects and actions they are used to speak about; combinatorial semantics, which enables the learning of symbolic meaning from context; emotional-affective semantics, which establishes links between signs and internal states of the body; and abstraction mechanisms for generalizing over a range of instances of semantic meaning. Referential, combinatorial, emotional-affective, and abstract semantics are complementary mechanisms, each necessary for processing meaning in mind and brain.”

Hey, just saying what I think. And we need to be specific when it comes to this. Also, I never assumed that it could be the way the brain does it. I am interested in how those limits don’t qualify it as a proper universal encoder for different data in its raw form.

Is this about encoding? So the encoder will have multiple levels of processing in which the each level will work on the output of the previous one?
If so, doesn’t it make sense for the higher levels to be as close to what we actually sense?
If it is about the cortex’s internal system then it still makes sense for the higher order patterns to be the closest to what we sense.(?)

Where in the paper do they mention common encoding for different data? In fact, I sense that they mention different types of semantic relationships even at the cortical level.

I am afriad you are going to have to spell out the mechanism in detail.

A fundamental problem with “the higher levels to be as close to what we actually sense?” is combinatorial explosions. There are a lot of ways to combine things.

I see this again and again in various AI experiment - they set up a small promising experiment based on a “folk psychology” of how the brain works, capturing some aspect of what they think they are doing from introspection. The model usually works over some very limited domain and the researchers are excited and publish a paper. As they try to scale it up they find that it’s harder then they thought. Most quietly fade into obscurity. Grad students leave and the driving force behind it evaporates. Some doggedly press on and I get to see the problems and attempts to get around the problems that happen when you let a toy model encounter more of the real world.

It turns out the real world is a big complicated place. Who knew?

That said, other than a few fictional horror stories, there has never been a situation where a human encounters something and runs away screaming that whatever has been observed simply can’t be parsed. The parsing may be inexact because there is not a basis for framing the observation in fact but humans have always been able to make some guess at what something is. Dads have always been able to sit down with the kids and offer some sort of explanation. It may be phlogiston or impetus, humors of the blood, or spontaneous generation of life. For some of the harder observations the explanation turned out to be “gods” but there is always some sort of framing possible.

I have been trying to understand the gap between what people do in their heads and the puny attempts to capture that in various AI experiments for many years. The most common root of these problems usually comes down to combinatorial explosions. There are a lot of ways to mix this thing with other things. Relationships get complicated.

You may have picked up on the fact that I consider parsing part of encoding; picking out the essential features of a thing and distributing it’s various parts here and there in higher dimensional space. The encoding turns out to be a collection of features scattered throughout a collection of higher dimensional manifolds.

A key feature is stable parsing to access these spaces on the fly. Fortunately - the learned data forms attractor basins to draw new observations to it.

As you encounter more of the world, either through direct experience or indirectly through the encoded experience of others you form these higher dimensional manifolds as needed. You form them by learning. The parsing both forms the manifolds and assigns new perceptions to the existing manifold as is appropriate. Everything you learn is in terms of what you have learned before.

The tabula rasa is the innate behaviors that you were endowed with genetically; this is the grounding you build from. As you add dimensions you have something to frame with - scientific explanation wrestles puzzling bits of the world away from the gods.

By the time you get to these higher dimensions what is encoded does not look very much like the real world. The bits and pieces are scattered across ever expanding higher dimensional space.

Are all these machine learning experiments? How many of these experiments are done using HTM like methods, more importantly, using binary weights and higher level SDR representations that combine patterns using spatial pooling? I don’t think combinatorial explosion will take place when the patterns up the hierarchy are more sparse than the input patterns and combined in a voting-type strategy. For example, sequences of patterns combine into ‘higher dimensional’ sequences of sequences and so on.

Please elaborate this with respect to the following- Are these actual higher dimensional spaces wherein the encoded data is a multi dimensional stream or this is a way to say that features are encoded in a way such that combination of some of those encoded features results in more features?(Such that the input stream is essentially a one or two dimensional stream at any given moment) If so, then isn’t that all there is to encoding?

What kind of manifold? What are its properties? Is it a tangible manifold in the form of connections?

With respect to encoders, there should be just one encoded input stream per input data per moment, right? I mean, you can do multiple encodings for multiple features or using different methods, but essentially, its one input stream per encoding per moment. Please elaborate on what exactly are these high dimensional spaces and manifolds and in what exact way can they be represented?

With respect to cortical areas, the entire HTM structure with multiple levels of hierarchy and many layers per hierarchy will tend to have a saturation point of how much information it learns keeping the quality of the predictions same, without forgetting previous information. If this is so then the levels of abstraction of data, sparsity and other related parameters are previously decided and there won’t be any combinatorial explosion, so to say.

And again, pertaining to the previous post with reference to the symbolic semantics, how does that pertain to universal encoding, in its true sense?

This thread has gone off topic and I’m not sure where to split it. @bitking @abshej please suggest a post to start a split and a new thread name for this discussion and I will clean it up.

1 Like

I would ask @Bitking to suggest the topic since I am still querying how the method he described enables the creation of a universal encoder.
Though I would still suggest: Location mapping of semantic data to create universal encoder.

I’ve written a proof of concept variation on semantic folding to explore the possibility of a universal encoder (after a discussion with @jordan.kay on this thread). My implementation is intended to be a bit more general purpose than cortical.io implementation, though.

Their implementation, as I understand it, requires a complete data set ahead of time for pre-training (each snipit from Wikipedia is known ahead of time and given a specific coordinate on the semantic map). My implementation instead is trained on the fly, modifying on its encodings of a particular input over time, using the concept of eligibility traces from RL to establish semantics from other nearby inputs.

The system assumes causality is an important element of establishing semantics (the more often two inputs are encountered near to each other, the more overlap they will have in their encoding). So far I have only used it for generating word SDRs, though
 next step is to give it a non-language problem to solve, such as the hot gym data.

3 Likes