Universal Encoder

I would ask @Bitking to suggest the topic since I am still querying how the method he described enables the creation of a universal encoder.
Though I would still suggest: Location mapping of semantic data to create universal encoder.

I’ve written a proof of concept variation on semantic folding to explore the possibility of a universal encoder (after a discussion with @jordan.kay on this thread). My implementation is intended to be a bit more general purpose than cortical.io implementation, though.

Their implementation, as I understand it, requires a complete data set ahead of time for pre-training (each snipit from Wikipedia is known ahead of time and given a specific coordinate on the semantic map). My implementation instead is trained on the fly, modifying on its encodings of a particular input over time, using the concept of eligibility traces from RL to establish semantics from other nearby inputs.

The system assumes causality is an important element of establishing semantics (the more often two inputs are encountered near to each other, the more overlap they will have in their encoding). So far I have only used it for generating word SDRs, though… next step is to give it a non-language problem to solve, such as the hot gym data.


Can you share it? I’d like to see it!

Yes, I’ll be posting it on the HTM community github soon. I’m in the process of cleaning up the initial implementation (I explored several variations of the process, so it is a bit of a mess right now). I tend to have ADD and bounce between several projects at once, so progress on any one of them tends to be slow (I’m sure that is a bit annoying at times).


If you guys think all this discussion is relevant to “Universal Encoder” I’m fine leaving it alone. Just trying to keep the forum clean. Readers, let me know what you think in a PM please.

I do wish you would read the “How neurons make meaning” paper I linked above while holding your questions in mind. Many of your answers are there.

I have tried to group your questions to capture what I think you are asking and try to answer that.

abshej: Are all these machine learning experiments?

Bitking: Mostly yes, these were single-purpose approaches that were thought to be able to be extended to general intelligence. Some examples that come to mind are various symbolic manipulation programs; the actors in these efforts where folks like Minsky and Simon. These early efforts were extended through to modern expert systems, with the current wunderkind being Watson. A special shout-out to Fifth-generation programming languages; Logic Programming was going to rule the world. Spoiler alert: It did not.

abshej: How many of these experiments are done using HTM like methods, more importantly, using binary weights and higher level SDR representations that combine patterns using spatial pooling?
Bitking: None. Keep in mind that without the interactions with the limbic system HTM is not going to do everything by itself either.

abshej: I don’t think combinatorial explosion will take place when the patterns up the hierarchy are more sparse than the input patterns and combined in a voting-type strategy.
Bitking: Tentatively agree - but the devil is always in the details.

abshej: For example, sequences of patterns combine into ‘higher dimensional’ sequences of sequences and so on.
Bitking: Tentatively agree - the secret sauce is how these patterns in different maps in the brain are able to work together. Again - I really wish you would look at the “How neurons make meaning” paper.

Bitking: The encoding turns out to be a collection of features scattered throughout a collection of higher dimensional manifolds.

abshej: Please elaborate this with respect to the following- Are these actual higher dimensional spaces wherein the encoded data is a multidimensional stream or this is a way to say that features are encoded in a way such that combination of some of those encoded features results in more features?(Such that the input stream is essentially a one or two-dimensional stream at any given moment)
abshej: With respect to encoders, there should be just one encoded input stream per input data per moment, right? I mean, you can do multiple encodings for multiple features or using different methods, but essentially, its one input stream per encoding per moment.

Bitking: All the streams come pounding in at the same time. Attention and the global workspace combine to focus on some very small part of these streams. A serial process parses these parallel streams, resonating with your stored memories. The learned part is the delta between what you have learned and the “surprise” in the sensed streams.

Bitking: As you encounter more of the world, either through direct experience or indirectly through the encoded experience of others you form these higher dimensional manifolds as needed.
Bitking: By the time you get to these higher dimensions what is encoded does not look very much like the real world. The bits and pieces are scattered across ever expanding higher dimensional space.

abshej: If so, then isn’t that all there is to encoding?
abshej: What kind of manifold?
abshej: What are its properties?
abshej: Is it a tangible manifold in the form of connections?
abshej: Please elaborate on what exactly are these high dimensional spaces and manifolds and in what exact way can they be represented?

Bitking: I suspect from your questions that you would find some investigation in manifolds, in general, to be useful. The mapping/connection between these different semantic parsings is the manifold. Wiki can explain it better than I can.

I like the visualization in this Quora post - I think that the Parallel coordinates and Arc-Diagram examples are very applicable to understanding neural map inter-connections.:

abshej: And again, pertaining to the previous post with reference to the symbolic semantics, how does that pertain to universal encoding, in its true sense?

Bitking: Consider the following: You read about a car crash. You hear the same car crash. You see the same car crash. You experience the same car crash.
All engage different senses. All describe the same event and are quite possibly able to form the same detailed memories even though they are in completely different modalities. A universal encoder should be up to this task.

This doesn’t answer the question. So, it’s just an abstract high dimensional space.

Different parts of the encoded input are computed separately and then combined and they may be mapped separately but this doesn’t pertain to the encoding. They are encoded as a single stream of data, per timestep.

You don’t need symbolic attachments to these patterns to relate them, HTM works without them. And I suspect symbolic semantic relations are formed after the encoding is done.

Instead I like to imagine actual layer of stream divided into parts where each part is processed separately and then later combined to form high order sequences, ascending the hierarchy. Other abstractions introduce unnecessary complexity.

I have some questions for you so you can help me understand your point of view.

I hold it a an article of faith that everything you learn is in terms of what you have learned before.

Referring to your prior statement: Does a baby have the categories to parse these streams of data and form these higher order sequences? If not - what changes as the baby matures?

How are the various streams of data drawn to the attractor pools in each neural map so they are able to be combined and intercompared in recollection? I am not a believer in grandma-cells so I expect that these distributed patterns cover relatively large sections of neural tissue - this same tissue holds many other memories so there must be some way of choosing the best match of the memories already formed to make a continuous map of related semantic content. How does this encoder pick which cells just happen to line up with prior symbolic content?

How are these disparate pools of information scattered all over the brain unified into a memory? (the superposition catastrophe)

It is well established that if you do not learn language by a certain point you never will and that you will have significant cognitive deficits. One of the things a universal encoder must be able to encode is language.

There’s no doubt that syntax is what human levels of intelligence are mostly about — that without syntax we would be little cleverer than chimpanzees. The neurologist Oliver Sacks’s description of an eleven-year-old deaf boy, reared without sign language for his first ten years, shows what life is like without syntax:

Joseph saw, distinguished, categorized, used; he had no problems with perceptual categorization or generalization, but he could not, it seemed, go much beyond this, hold abstract ideas in mind, reflect, play, plan. He seemed completely literal — unable to juggle images or hypotheses or possibilities, unable to enter an imaginative or figurative realm… He seemed, like an animal, or an infant, to be stuck in the present, to be confined to literal and immediate perception, though made aware of this by a consciousness that no infant could have.

Similar cases also illustrate that any intrinsic aptitude for language must be developed by practice during early childhood. Joseph didn’t have the opportunity to observe syntax in operation during his critical years of early childhood: he couldn’t hear spoken language, nor he was ever exposed to the syntax of sign language.

This is true after a point in time while learning the basic patterns and if the low level patterns don’t change drastically. If they do, then the system will have to make new representations ascending all the way up the hierarchy to encapsulate the new low level patterns.

Studies show that in babies, the neocortex is not connected to the mid brain or the lower brain at all. The connections slowly begin to form starting from the right side of the limbic system to the right side of the neocortex. They function in a very unsophisticated manner and don’t have a ‘thinking’ brain so to say. Since growing babies are forming connections not just in the neocortex, but to the neocortex, they are allocating specific neocortical regions to specific tasks.

The encoders don’t work with symbolic content, they work with data differently.
I think the connections from the sensory modalities to the brain regions are genetically determined. The visual part of your brain is not the auditory part of some other person’s brain. The sense organs work despite the presence of neocortex. The encoding happens whether or not the neocortex receives it.

Language is learned by us using our primary senses and not a special encoder for languages. The encoding of the data streams from the sensory modalities allows us to understand the semantics of language. This coupled with the brain’s capabilities to relate different patterns together and treat that combined pattern as a high order(in the hierarchy of abstractions of meanings) sequence/pattern.

so in your model of “encoding” do you draw a distinction between the stream of data projected on the cortical sheet and the activation pattern formed in each neural sheet?

As that pattern goes up the H of HTM is it deterministically the same or different activation pattern based on context?

Before you jump to a flippant answer consider some of the variations of word meaning based on where it is in a sentence. To eliminate any possibility of the surrounding words influencing the parsing - project each word in turn on a screen all by itself.

When asked to explain what you just read in this scenario, in your own words, are you accessing symbolic representation? If so - are you encoding symbols?

At what point do you have to admit the possibility that symbolic representation is part of the encoding?

In the model of encoding as it takes place in sensory modalities, once the data leaves the encoder, it doesn’t matter where it goes.
Encoding doesn’t take place in the neocortex. If you are talking about the mechanism to use SDRs to encode symbolic meanings in the neocortex as a learning function, then it isn’t the same as the discussion about “encoders” in OP.
But nevertheless, the stream of data projected to the cortex is not the same as the activation pattern of the neural tissue(due to inhibition and abstractly combined patterns up the hierarchy).

I might be, or I might be using patterns that get predicted due to associations and the inference might be narrowing down as I read more and more words, thus the inference at the end being a representation of the (symbolic)meaning you tried to convey. Either way, back to encoders, my eyes don’t encode a special meaning to any word that I read.

Not at such high level of abstraction(such as language), no reason to admit so.

So I am starting to see why we are talking past each other.

I see the processing in the cortical sheet to “finish” parking the memory in wherever it ends up in the cortex as part of the encoding process.

From what I am taking from what you just said - when the senses project the activation onto the cortex the encoding is done.

Is this how you see it?

If I have this correct - why is the parsing processing in the hierarchy of the cortex not encoding?

I understand that.

Yes, the sensory modalities are responsible for the encoding. And the mapping can then be seen as just the combination of the patterns from different sensory modalities, including location modalities.

Because we are talking about encoded data of different types which are used together to form stable representations. The encoding is done when the semantics from each data type are put in the output stream of the encoder(sensory modality).

Having said this, I see no reason why encoders cannot be in the cortex; just that they should be referred to as being modalities of cells encoding data differently and sending it to neocortical layers. Which further makes the task of a universal encoder much more difficult.

The processing in the cortex allows the WHAT and WHERE streams to be sorted into the compatible formats, in my mind, finishing the encoding.

I write embedded system for a living and do lots of processing after I read the sensors to massage the data into standard formats. As far as I am concerned - that is encoding.

If you stop at the shores of the cortex I agree with you that there is no way to make a universal encoder.

If you allow that what the cortex is doing is part of the encoding process then I think you will agree with most of my assertions. This would include the semantic encoding portion.

FWIW: the wiki entry includes the processing and storage as part of the encoding process.

1 Like

Yes. But still noting that the combination of this WHAT and WHERE stream is part of the process of combining and representing, which is part of learning, and not encoding. I am sorry, I just cannot agree on your terminology. But we agree on the actual functioning. :slightly_smiling_face:

1 Like

I was thinking about the fact that you need to have some kind of feedback mechanism between the encoder and the HTM / NN / Learning structure that improves the encoder over time.

If that’s true you still have to ask the question, “ok, so what should the encoder do in the first place, to get the whole thing rolling?”

I’m thinking the only answer that makes sense here is to start with individual vs population analysis to pull out basic structure. Such as Latent Dirichlet allocation or Principal component analysis or other methods I know nothing about or a combination of them all.

If the structure of the data is not totally random noise and is not completely homogeneous then you have ‘differentiation’ in the data. You can just start by defining some of the major structures, and get finer and finer details through the feedback loop of the HTM. Just like how the HTM itself learns basic structures then exports that knowledge up the hierarchy, and is told by higher level regions what things it might look for. It’s the same principle, just a wider gap.

The question really is, “How do you encode for the goal of the HTM structure?” well let the HTM structure inform the encoder on how to do that. The encoder should start with the most generic goal possible: pulling out what the major differences are in the data since it’s not perfect homogeneity or perfect random chaos.

I just wanted to write this post to conclude my intuition on the topic.

1 Like

This sounds like an excellent analysis project.

I have been thinking about three different lines that may give you some things to think about.

I think think that at the most basic level the older brain structures are the initiating force that drives behavior. A baby is mostly a bundle of uncoordinated instincts that drive actions before the cortex is trained to deal with the world.

As the encoding/learning progresses islands of learning form in the various maps and are refined as you continue to experience the world and are “surprised” by things that are different than the previously learned items.
Please look at the Calvin references at the bottom of this post. HTM neurons with Mexican hat connection distribution/sampling are perfect for forming the hex patterns described as a basic unit of organization:

The popular school of thought on distributed/deep learning usually starts from the intuitive approach that the meanings in each map forms automatically which cascades up through the layers to build up to higher level representations. This paper makes a compelling case for the opposite approach - the high-level representation is pushed down through the layers. While this paper focuses on the vision system I can see the concepts described being appropriate throughout the cortex. If you think of this with the lizard brain as the driver for the top-down trainer suddenly a whole bunch of things fall into place:
Deep Predictive Learning: A Comprehensive Model of Three Visual Streams
Randall C. O’Reilly, Dean R. Wyatte, and John Rohrlich

Guys, I just don’t get why this is such an issue. If you wish to improve the encoder then it has to refer back to the reality than generated the original input. Given that then why not start with the original input?
For instance, lets say you are looking at the sunrise and the sun is at X1Y1 in the visual field. So (ignoring inverse mirroring etc) then the yellow light will land on the back of the _r_etina at the same relative place r(x1y1) where it will trigger those retinal cells which are sensitive to yellow light. In turn they send a signal to the _v_isual cortex where the signal is geo-spatially mapped onto V1 so it will arrive at those cells dedicated to processing yellow light at v(x1y1). Given this, then surely the system has the information that there is yellow at x1y1? Prediction in the next time step will then enable the accuracy of the input data to be improved. Thus the encoder [which I see as the combination of sense organ + pre-wired nerve fibre + those cells that receive the input] is able to verify the accuracy of the input by comparing the initial input with the predicted input received in the next time step (i.e. the sunlight which will be expected to be at X1Y1 in the visual field).
I’m at a loss to see how else you can correlate the SDRs to the reality they represent.
Or am I making some basic error here ?

That would be awesome if you did not move and if your eyes did not move. As you go about doing whatever it is you are doing everything is changing all the time.

The tiger that might eat you can be anywhere in your visual field (in any orientation) and it is important that you recognise if from as far away as its possible, even if it is part of a mixed scene with similary colored items. Learning a fixed object in a fixed part of the visual field is simple but not very helpful fo real critters.

This is true but the idea proposed is that the architecture and algorithmic processing of the encoder could be changed using the feedback from the HTM regions. The predictions will always rely on the encoding and after time all predictions will be accustomed to the new type of encoding. If you are saying that the prediction could be checked with the real data using supervised methods then that’s an interesting point.

I concur, you cannot do this since the SDRs will be a result of the encoded data stream and they cannot be used to predict the accuracy of the encoding.