# Can we deduce a Universal Encoder algorithm via the Fundamentals?

Before we talk about a possible design for a universal encoder I think it would be best if we are on the same page. I may not understand something fundamental about Encoders, so let’s take it slow…

Here’s what I understand about semantic SDR encoding:

1. An input bit needs to represent the same thing each time it is seen. (This requires sparsity because you can’t fill up every possible combination of bits like in ASCII). As a quick example of this concept, if the first bit represents the color blue of whatever object you’re encoding; if it’s ever ‘on’ in the representation, the object that is described by the SDR must be at least partly blue.

2. Input bits that are spatially close to one another in the representation (such as the first and second indices) don’t necessarily share any semantic meaning. What actually matters is that two ‘similar’ representations need to have overlapping ‘on’ bits. ‘Similar’ things need to have ‘similar’ representations.

The second point may be where I’m having trouble in my understanding. ‘Similar’ in what way? In all ways? You can have things be similar in different ways and when viewed from different angles.

Take a simple number line as an example - we have a scalar encoder where ‘similar’ as in ‘close’ numbers share the same bits. However, if you want to understand numbers in the light of other mathematical concepts you may not consider two ‘close’ numbers ‘similar.’ For example, if you want to traverse the number line with multiplication instead of addition and subtraction then prime numbers may all be very ‘similar’ to each other, more so than numbers that are neighbors on the number line.

Is it that the encoder needs to be specific to the application of your HTM model? That is my assumption. But a question I still have about that is: in what ways must it be tailor-made to best represent the semantic data to the HTM model? Different representations of an environment could be ‘close’ in many different ways. The simplest way might be ‘causally’ close; if I do an action (such as ‘+1’) I get something that looks a lot like the last thing I saw (such as the next number on the number line).

Is causality an adequate guide to making semantic representations? What are other guides we could use to make semantic representations?

Let’s put those two questions on hold for a second, and do a thought experiment (this should allow us to get to our conclusion):

Suppose causality was an adequate semantic guide. If so we should be able to make a Universal Encoder algorithm. All we would have to do is make a ‘causal map’ (what raw, or dense representations lead to what raw or dense representations) and encode them as similar sparse representations. As we see more of the environment our causal map will get filled out, and our SDR’s will change over time. Thus, as we change our semantics of how the encoder represents the environment we must allow the encoder to reach into the HTM model and move columnar connections around to compensate for its modifications.

I believe this is possible (unless I don’t understand something fundamental to the HTM theory) because evolution has created encoders. How did evolution create them? The better the creature is able to navigate its environment (measured by the number of its offspring that reproduce) the better the encoder organ was at encoding SDR’s for the neocortex to consume. In other words, the information about what is semantically similar (or simply the information about how best to encode the environment) reached back from the neocortex, through the algorithm of evolution to create a highly tuned encoder. Thus it is my contention that if the encoder was appropriately intertwined with the HTM model (meaning it could get the appropriate information from the HTM model and could modify the HTM columnar connections to the input bits appropriately to preserve, at least in part the model that’s forming), it could evolve in real-time with the HTM model. It would be, at that point, a Universal SDR Encoder algorithm.

But in order to create it, we need to answer the questions asked above: Is causality an adequate guide to making semantic representations? What are other guides we could use to make semantic representations?

Also, how does this feel to everyone? My understanding might be incomplete, so I might be oversimplifying the problem, and my interpretation of how evolution has created encoder organs may be inaccurate. But there doesn’t seem to be a good explanation yet of how encoders should be created - it’s more of an art than a science at this point. If it were a science we could automate it, I’m sure.

You could look at locality sensitive hashing:
https://github.com/tdebatty/java-LSH

This is not always true. For example, with a scalar encoder you might have a bit that represents many different values (1-10 for example). The complete set of on bits might represent a discrete value, but one bit inside the bucket could potentially represent a part of many values because it plays a role in many different buckets.

True unless the input is topological.

I think this is the challenge of creating an encoder. The creator must decide in what ways semantics are encoded, what things are similar, and how similar they are. This is a non-trivial task.

Unless the encoder is very generic (like the ones we have in NuPIC), I think this is true.

My first reaction is that this does not seem to be happening in biology. It may be true that an HTM layer gives some kind of feedback into sensory organs to help with their representations over time (like a baby learning to see or hear), but I can’t see why the sensors need to mutate the layer’s state.

The Universal Encoder concept has always seemed impossible to me. It’s not like we have one overarching sense that can process any sensory input. All our senses have been tuned to be very specific encoding mechanisms over millions of years, each getting totally different input from the world. How can we create one encoder that can process sight, sound, and touch? Each of those sensory organs has developed very specific methods of encoding semantics, and each works very differently when interacting with the world.

2 Likes

@Sean_O_Connor I’m not sure this would help. When you hash data, you loose so much semantic information. After looking at the LSH stuff, seems like you need to provide some logic to decide what indicates “similarity”, and that logic is what hashes the data in similar ways. Seems like you’d basically be writing the encoder logic within that LSH similarity function, and then you’d need to devise a way to convert the hash into binary without loosing the semantics.

I don’t mean to make a universal encoder, but rather, very distinctly, a universal encoding algorithm. Mmm, maybe I should use different words to express this idea. A universal algorithm for producing encoders, that works with the HTM structure to produce the most effective encoders in real time (rather than through iterations like evolution).

Meaning an algorithm that, hopefully, faster than evolution, could create all the separate specific encoders we have. That is the goal, to automate, in a more efficient and direct way evolution’s encoder creation process, not to create one encoder that can rule them all.

Thank you for your clarification on all my other thoughts in my post, your comments helped me understand where I wasn’t quite understanding the encoding principles as well as I should!

2 Likes

Isn’t this essentially just running swarming on a bunch of models and keeping the ones that work best as a form of evolutionary algorithm? Or are you trying to transform and transfer the connection data on one running model to another model that should work better?

We could look at how computer representations of various media evolved. In modern computers, numbers are used to represent things like text, sound, video, etc. Even in ASCII or Unicode, the numbers are highly organized, with one range representing one language, an ordered subrange of that representing capitals or lower cases, and another ordered subrange representing numbers. With video representations, initially it was a reel of pictures that one would see in a movie theater, then, much later, things like motion compensation were added. (Now that I think about it, I wonder how well a nupic model would do if structured with number encoders to receive data in the same format as a compressed video stream vs biologically inspired visual salience models.)

Anyway, if we follow biology, we’re bound to get working encoders eventually.

Do you mean an HTM system that would design encoders? Or do you mean some underlying algorithm behind the creation of encoders?

You’d want unsupervised representation learning, right? My intuition is that you’ll get pretty good encoders for any signal by doing a restricted Boltzmann machine or a denoising sparse autoencoder, something like that. Seems very similar to the stuff Hinton and others were doing with deep belief nets and so on, up until convnets took off.

Well you could arrange for the hash to be 10,000 bits in length. That would preserve maybe sufficient information for many tasks. The next step up would be smart hashing, where the hash algorithm is more sensitive to specific useful features in the input data. Those features could be given (SIFT) or learnt in an unsupervised way or learnt in a supervised way. Presumably supervised learning of the hash (via some kind of neural net maybe) would give the best results.

Thinking a little “out of the box” and deviating from your initial theory a bit, something that comes to my mind when I think about the problem of a universal encoder generation logic is that software could in theory be used to emulate how encoders (i.e. sensory organs) might change/improve in nature, using a system by which “genetics” can be modified and compete with the original and with other variations that have different modifications. There are three relevant theories/areas that might be considered:

1. Survival of the fittest
2. Selective breeding
3. Genetic modification

These three are also listed from slowest to fastest, and presumably the reason for that is the level of involvement from an intelligence (a breeder or genetic engineer in this case). If you think of a software’s source code as its “genetics”, one or more of these three mechanisms might be emulated to develop encoders. In the case of the latter two, the “intelligence” would presumably be an AI.

Survival of the fittest
This would be the easiest to emulate. A system would need to be designed in which “creatures” would rely on the encoder in some way to survive and compete with other creatures. The competition would need to be tailored in a way that an ideal encoder would ensure the best survival rate for the creatures. The system would make random changes to the encoder’s source code and plug it onto the creatures to compete. This would likely take an extremely long time (probably outside the limits of human time-frames), because you would be relying on what is likely to be fairly complex logic to be generated purely by random chance. For complex pieces that cannot be evolved through a series of small enough steps, this could be equivalent to waiting on a monkey randomly typing on a keyboard to type out the Gettysburg Address. On the flip side, you do not require an intelligence to be involved – if you can wait long enough for it to just happen

Selective Breeding
This would be similar to “survival of the fittest”, but with the introduction of an AI advisor that was intelligent enough to recognize patterns earlier than they would have been identified by the “survival of the fittest” strategy alone, and bias those genetics (even if they do not provide an immediate benefit to the creatures). The AI would presumably learn through watching numerous “survival of the fittest” competitions and over time learning to recognize patterns that lead to improved encoders. The AI would need the ability to select certain creatures and give them an advantage in the competition to ensure their genetics continue into subsequent generations. This would be something like spotting the monkey typing “Four score and seven bananasflkjd aoie ha…” and being able to preserve the first four words even though they, by themselves, are not the full Gettysburg Address.

Genetic Modification
In this case, you would need an even more intelligent AI, which has the ability to view the source code and make modifications to it. This would be equivalent to replacing the monkey with Abraham Lincoln (i.e. no longer relying on randomness). There was an interesting conversation about teaching an AI to code on this thread. I think many projects like that would need to happen first, to explore how to make an AI sophisticated enough to correlate source code with solutions to problems. This is obviously not something which could be done today, but certainly feasible as AIs become more sophisticated in the future.

Paul_Lamb, Your ‘Survival of the fittest’ notion is, I’m sure you know, nothing more than evolution itself which, over millions of years, has given us the wonderful encoding organs we have today.

Rather than taking that approach, or simply speeding it up by having better gradient descent (which is essentially what your other two suggestions do) I would rather find a universal method for encoding data and deliberately produce the encoder right the first time. What I’m suggesting is almost different in kind rather than different in degree.

But in order to do that we need to determine and isolate what information is needed to produce the correct encoder. That information must be related to two domains:

1. The Environment domain (the entire set of possible representations of the environment).
2. How the HTM can act in the environment (that’s where my hunch comes from that we can use a causal map to inform the appropriate semantics).

What we want to do is instead of iterating over many HTM models to see what encoder works best (evolution) we want to move the encoding algorithm into the HTM model so that it can ‘learn’ in real time with the HTM model, modifying how it encodes the data (iterating) in real time.

Yes, my strategies were a deviation from your theory to be sure – just thought I would throw them out there for comparison. There is certainly a very different approach whether you make modifications to a running software “on the fly” with the intelligence directly integrated with the system being modified, versus utilizing an external “advisor” which is spinning up new modified instances.

This is the important point in my mind – i.e. the specific meaning of “modifying how it encodes the data”. In practice, this means modifying logic in some way (source code, byte code, neuron counts/ connections, etc – depending on architecture of the encoders that are being generated). Can this be done procedurally with relatively low intelligence requirements (an RL-like strategy seems like it could apply here), or does it require a somewhat sophisticated intelligence to work?

I’ll get back on topic now Thinking this out a bit, it seems to me that since the columns in HTM learn to adapt to the input, it may not really be necessary to reach back and move columnar connections at all. As old bad representations of inputs are no longer encountered as better representations are produced in the future, the columns will modify their connections to adapt to the new representations (this is the normal behavior of a Spatial Pooler with learning enabled).

I am of course assuming that there is a large enough input stream and that the encodings produced by the system will stabilize to some ideal configuration over time. Can you think of a particular requirement that I am not considering, where the encoder generation procedure to have to modify columnar connections (other than that it would speed up the process)? The need to walk back through history would likely impose some sizeable resource requirements, so would be better to avoid having to do it if possible.

Paul_Lamb, I think that would work fine, because yes, I do think you’re right that the SDR encoding would stabilize the more it sees.

My thinking was just that the HTM is developing models and switching SDR input meanings would confuse it, throw it into chaos. Thus modifying those connections where possible as we go might actually preserve parts of the model already created in the HTM structure and minimize the damage of changing semantic representations.

Just wanted to say that although I don’t feel I can contribute to it right now, I think this is a very interesting discussion and thank you for having it.

1 Like

My sense is that the risk of damage from changing semantic representations should be minimal. Older representations would be devoid of semantic meaning. Or be more precises – the bits that later become semantically similar in later representations would be randomized in earlier representations (i.e. higher noise). If the representations are sparse enough, this noise should have little effect. As the encodings become more stable, the bits that are semantically similar are essentially just becoming less random (i.e. lower noise). To be honest, HTM seems ideally suited to adapt to this type of encoding changes.

The real problem to solve, of course, is the logic/procedure behind making iterative changes that actually can converge on an ideal policy. This part of the system is the least clear in my mind.

Allow me to actually give part of an example. I don’t think this is adequate, but I think it’s a mockup example to kind of explain where I’m coming from:

Say you’re environment consists of a dense representation of two bits. Such as 00 or 01 or 02 or 03… or 99

A ‘universal encoder’ (not a universal algorithm for created specific encoders, mind you) might watch the dense representation and produce an SDR over time by indexing specific symbols at specific areas. Let’s say it see the numbers 00 to 99 in order. it would create a lookup table and by the end of it have a way to encode the SDR.

``````| Input Character | Representation Index | SDR Index |
|---------------------|-----------------------------|---------------|
| '0' | 0 | 0 |
| '0' | 1 | 1 |
| '1' | 1 | 2 |
| '2' | 1 | 3 |
| '3' | 1 | 4 |
| '4' | 1 | 5 |
| '5' | 1 | 6 |
| '6' | 1 | 7 |
| '7' | 1 | 8 |
| '8' | 1 | 9 |
| '9' | 1 | 10 |
| '1' | 0 | 11 |
| '2' | 0 | 12 |
| '3' | 0 | 13 |
| '4' | 0 | 14 |
| '5' | 0 | 15 |
| '6' | 0 | 16 |
| '7' | 0 | 17 |
| '8' | 0 | 18 |
| '9' | 0 | 19 |
``````

So the first time it sees anything “00” it will produce this representation 11. Not very sparse. But after it’s seen the entire environment, the lookup will all be filled out and “00” will produce the following SDR: 11000000000000000000. And 06 would be 10000001000000000000. and 99 would be 00000000001000000001. so… (dots are zeros )

``````00 = 11..................
01 = 1.1.................
02 = 1..1................
03 = 1...1...............
...
10 = .1.........1........
11 = ..1........1........
12 = ...1.......1........
...
78 = .........1.......1..
79 = ..........1......1..
80 = .1................1.
...
``````

Now, we’ve developed a process to universally create a Sparse Representation this way, but it’s not very semantic. Each representation shares only 1 bit with other representations. We’ve essentially encoded the semantics of the dense representation in it, and that’s all. Meaning we’ve encoded base-10 semantics only. That’s like nothing.

My hunch, my feeling is we could use some other metric to make it semantic, maybe by seeing which states can lead to which other states, that is to say, making a causal map to guide our semantics. But I don’t know, I’m hoping someone can have better insights than I do on this topic. This example is a step towards a universal encoder, but really we need to abstract it one more step away from that - a universal encoding algorithm - one that creates specific encoders for any situation.

Anyway, that’s not even a proof of concept, it’s a brainstorm, a mockup, it’s an idea of an idea. But I hope that kind of explains where I’m coming from.

Yep, definitely. I’m just throwing some ideas around to help with the thought process.

The thing that keeps coming to my mind (probably because of the project I’m working on at the moment) is that concepts from RL might be applicable here to try and do this procedurally versus requiring a sophisticated intelligence. In particular, the concept of eligibility traces are good at associating cause/ effect relationships. It seems to me that cause/effect is an important part of semantics. At a simplistic level, the idea is that things which have similar effects might have similar semantics. For example, you could theoretically use an eligibility trace to enforce similar connections and degrade dissimilar connections with some type of “prediction lookup” logic (i.e. lookup all connections that could lead to a prediction of a particular state, and reinforce/degrade them a bit to be more similar to the most recent input which lead to that state).

Yeah, that’s my hunch, but you know more about the actual machine learning science than I do. I just see it as a function that correlates the way things look (aspects of the dense spatial representation) with what can follow what (aspects of how the dense spatial representation changes in time, or causal map).

It’s almost HTM-lite as it has to begin that correlation process in a small way. And causality may not be what’s most important, it may just be one semantic guide among many.

As a simple example, if Foo leads to Baz, and then later a semantically dissimilar input Bar also leads to Baz, we would look up Foo and modify it to be slightly more like Bar. If we later encounter Foo and it again leads to Baz, we look up Bar and modify it to be slightly more like Foo. Rinse and repeat. Given enough iterations, Foo and Bar will eventually converge into a common encoding.

To give you an idea of how this strategy might work in a more complex environment. Say Foo and Bar sometimes both lead to Baz as in the above example, but also inputs Bat and Bar sometimes lead to Garply. But Foo does not ever lead to Garply, and Bat does not ever lead to Baz. In this case Foo would still only be influenced by Bar, but Bar would be influenced by both Foo and Bat. And of course Bat would be only influenced by Bar. This would lead to two populations of semantic bits which are distributed across the different inputs, such that Foo is semantically similar to Bar but not Bat, and Bat is semantically similar to Bar but not Foo.

The more variations to inputs, the more complex this would become, so I am not sure how well this would scale. It would probably also work better to perform these modifications over multiple timesteps (versus only using an eligibility trace of one timestep)

2 Likes

That example helps me understand what you mean. Yes, that’s the kind of thing I have in mind. The encoder learns about the representation space the more it sees and changes it’s encodings as it goes. Over time, though, I would expect it to converge on a pretty stable set of dense-representation-to-SDR-mappings though because isn’t it essentially approaching what the probabilities are? does it not approach the statistics of how the environment changes? therefore, even with a complex environment wouldn’t it, after chaotic scaling, eventually plateau and stabilize, I mean considering how many changes are made to the mapping per representation seen?