Universal Encoder



@robf the connection matrix in the Shanahan paper is over 52 regions of the avian forebrain. It’s a Large Scale Network model (that’s their own title). Each value in that matrix corresponds to how many axons are transmitting from one region to another - in other words the size of the SDRs passed along that particular channel. For each region, the matrix tells you how many and which sources of SDRs it processes, and how big each one is, and where its outputs go and how big they are. While useful, it tells us nothing about what information is being transmitted and what each region does.

HTM now models the connections between individual neurons in a single layer in a single region in this kind of network, and we’re currently figuring out how the different layers in one region interact. The outputs of a region of cortex are from L2/3, which does some Temporal Pooling, which we haven’t figured out, and from L5 and L6, which both depend on L2/3, and also do things we are only beginning to understand. So we’re some distance from plugging regions together in any kind of network.


@fergalbyrne et al…

My point was that, as in the experiment where the blind person learns to see with his tongue; the HTM should be able to process generic SDRs after building a structure that can translate raw data into those SDRs. If it’s all just pattern recognition repeated over and over using the same algorithm, then what we’re after is the creation of the pattern pre-processing of those pre-existing lower mid/fore-brain organs - thus eliminating the need for encoders.

The idea being to stop - starting from scratch. To start training HTM pre-processing regions that can be reused to handle input generically such that we can gradually make those trained regions very refined and sharable across many different applications - already pre-made.

Not saying we have the know-how today. Just saying that this should be an anticipated step in HTM Implementation evolution?


@cogmission a trained encoder is an encoder just as much as a hand-coded one. It’s just a function which takes raw (ie non-SDR) inputs and produces SDRs. The cortical.io Retina is a trained encoder for words.

The only reason you’d want to train an encoder is if you can’t easily code it. The cortical.io encoder is an example of that. The midbrain nuclei are evolved examples of the same. The Geospatial Encoder is an example of the opposite. How would you go about training or evolving an encoder to give you the properties of Chetan’s design?

The blind person doesn’t see with his tongue. He feels what might be some object in his mouth. He gets an extremely vague sensation of what the camera is picking up, and his cortex works very hard, using this very sparse information, memory and feedback to clarify and amplify the perception. The same blind person uses his fingertips to “read” touch-encoded “letters” of Braille. In both cases he is using a pre-existing spatial touch encoder to provide SDRs of sufficient richness to the cortex.


I never disputed “what” the resulting processor would be called, nor do I have anything against the word “encoder”. What I’m interested in is the general nature of its abilities?

The only reason you’d want to train an encoder is if you can’t easily code it. The cortical.io encoder is an example of that. The midbrain nuclei are evolved examples of the same. The Geospatial Encoder is an example of the opposite. How would you go about training or evolving an encoder to give you the properties of Chetan’s design?

This also is known and undisputed. Wouldn’t you agree that something so generically capable would have to be trained and not coded? That at least was my assumption, which is why I suggested that we “train” something to handle this generic input?

The blind person doesn’t see with his tongue. He feels what might be some object in his mouth. He gets an extremely vague sensation of what the camera is picking up, and his cortex works very hard, using this very sparse information, memory and feedback to clarify and amplify the perception.

I remember the quote from On Intelligence that mentioned the subject was able to sense writing on the door? Since it’s all just sequences to the neocortex, I thought that the subject was able to actually have some form of an image? Albeit very limited in resolution due to the lowered density of receptors when compared to the Retina? But at the very top (the neocortex) wouldn’t the resulting “knowledge” be indistinguishable since the ubiquity of pattern handling is what is being emphasized by these experiments?

You’re the scientist Fergal, not me - but I’m confused by the need to make distinctions between what the inevitable processing is at the level of the neocortex when the generic-ness of the pattern processing is the salient point of the experiment to begin with?


OK, @cogmission I’m not trying to get into a row with you. I simply don’t understand what you mean by a “generic encoder”.

Every encoder takes a non-SDR item and produces an SDR. It’s a function with a certain domain and a range in the space of SDRs. Words, GPS coordinates, categories, real numbers, etc are all domains, and we have one or more encoders for each. We do this because one of the key features of SDRs is that “nearby” items (in the domain) produce “nearby” (overlapping) SDRs, and “nearness” in the domain differs from one domain to another.

What is a “generic input”? Is it a “value of any type”? If so, how can you propose to train an encoder for every possible type?

Or do you mean training one encoder for each type? If so, you’ll have to generate training SDRs for enough examples to train your encoder, which will require you to code an encoder for that domain.

HTM and cortex are generic pattern-processors, once you give them SDRs. That’s not the issue. The issue is how to get the SDRs from non-SDR input.


Yes Fergal, neither am I (trying to get into a row) - but if I disagree with something someone is saying, how do I state that succinctly without sounding like I’m arguing? Isn’t saying something contrary just as valuable - and isn’t that statement the very definition of argument? Maybe we shouldn’t be so sensitive to contrary statements (as a culture I mean)? My intention is not to “come back” but to actually tell a person what I’m thinking? Which when it is aligned with what a person is saying - is ok? But when it is contrary, isn’t?

…or is this medium so likely to inflame that we simply should confine our expression to that which is agreeable? I’m not asking you this Fergal, I’m just wondering how to “disagree” with someone without always winding up in this meta-conversation?

This is my entire point. Human beings have 5 senses - and using just those senses, we process the entire gamut of sensory input. I’m just saying that we need to have something like this? A very low-level sensor receptive region(s) that will learn over time to segregate input into what we now utilize from the output of high-level encoders. That will eventually (as the pre-processing ascends the initial regions) produce the SDRs that we now use specific encoders to generate?


[No discussion of the discussion necessary, @cogmission]

No, we have 5 (ish) encoders, allowing us to process 5ish types of sensory data. The actual number is much larger, but never mind, let’s pretend it’s 5. We don’t have the bat’s sonar (nor the dolphin’s, which is different), and we don’t have the hammerhead shark’s electromagnetic sensors (nor the electric eel’s, which is different). We don’t have the mouse’s UV sensor (nor the falcon’s which is different) etc. You need at least one new encoder for each sense, and they co-evolve in nature.


Cool! THIS is why you’re one of my favorite earthlings! :wink: (lovin’ you brother)

You need at least one new encoder for each sense, and they co-evolve in nature.

I think we’re now saying the same thing. So is it fair to say that we should “evolve” these senses as HTM Regions? (let’s start with 5 and move on to the others later). Can’t we train lower HTM Regions in a Network to do precisely this?

At some point in the future; instead of writing encoders, shouldn’t we be writing these input translators starting with our initial 5 and moving on toward the shark’s electromagnetic sensors? What I’m saying is, once we have all the sensors you quoted above, then there’ll be no need for encoders correct?

And the real point is that we don’t have to (as programmers) physically determine (predetermine) what the nature of the input data is; and then select the appropriate encoder - we can just point the HTM at the problem and skip that entire step. This is my point.


No, we’re saying precisely the opposite of each other.

HTM needs SDRs. The things in the world we need to feed into the HTM are not SDRs. We need something to translate from one to the other. That’s an encoder. Each type of thing in the world needs a new encoder.

The eye-retina…LGN is an encoder for light landing in our eyes. The ear-cochlea…(@riccro stuff)…thalamus is an encoder for sound. Look at all the kit in the two chapters I linked to above.

Encoding is really hard. Luckily for mammals, reptiles already figured it out. We literally inherited all the encoders we have from them, and we have only tweaked a few of them.


So imagine this. Commander Data is sitting at the helm. He encounters a swarm of Tribbles bursting out of the air vents. He then analyzes the shifting of the visual and auditory patterns and determines that one of the encoders he’s going to need is an rcrowderAuditoryCochlealEncoder ( :wink: ), he then asks the Tribbles to please cease all of their movement as he now has to update his code to insert the appropriate encoder. He goes over to Engineering and asks Commander LaForge to please update his MultiEncoder with the new Encoder, which LaForge then does after which rebooting Data.

Data then returns to the bridge with his new updated Encoder and determines a solution to the ridiculous Tribble problem?

Each type of thing in the world needs a new encoder.

At some point won’t we have a way of avoiding the “configuration” step? We’ll just have them all on and be able to process whatever sensory data is available? Without pre-configuration? <— mostly this


Oh, also I’m not trying to trivialize the difficulty of it - I’m just identifying this as one of the steps in the eventual evolution of our code?


I think you’re suggesting, @cogmission, that we should aim for some general input senses like say vision and hearing, then we can make something like an animal, like us, that can interpret a lot of different stuff through those senses.

That’s fine, but consider that as very visual animals we have to do a lot of encoding of stuff in our world into visual forms that we can then interpret visually. For example music notation, time series graphs, maps… it may be more sensible to encode some kinds of data directly with specialised encoders rather than going indirectly through vision, say.



…and the reason I give is so we can begin to “evolve” HTMs instead of pre-program them; taking advantage of the many years of processing implicit in their production - eventually winding up with configuration-less sensors?

I agree… Eventually though, won’t we be able to create encoders far surpassing even those more esoteric and advanced encoders Fergal mentioned - such that we will arrive at a state where they’ll just be an optimization problem rather then a creational one? …and we can have these all “in-boarded” to HTM Networks and (assuming that they’ll be the result of an assembly of sensory Regions), use these as a “repository” of Regions to use in any future HTM - being understood that this is the component you stick at the bottom of your Intelligence project?

EDIT: I may have skipped over one of your main points… oops…

What I’m talking about is the thing that is looking at the map, not the thing that is creating the map itself?


Also, I hope both you @floybix and @fergalbyrne see that I understand the need (probably for a long time to come), for creating more specifically aimed intelligent systems for intelligent learning of particular tasks. I think this is the point both of you are making - is that both the creation of “general universal encoders” is incredibly complex and maybe out of our grasp for years to come, and also that what has the most utility in the short run is specifically tailored intelligences which yield the advantage of requiring encoders we can currently create and that are more immediately useful?


My short answer to this thread is I believe the universal encoder for perceived structure/“meaning” in the world to be causality. As Jordan summarized it some weeks back, (Did Jordan not make the jump to this platform, I don’t see a handle?)

It may not be the only kind of cognitive processing we do. But I think it is the interesting one from the point of view of “intelligence”, and our experiential perception of the world.

I believe this happens in the telencephalon or cognates, which is thus the universal encoder for this perceptual structure, or “meaning”.

Other things like response to motion may happen elsewhere. They are not what I am seeking as “meaning”.

I think the structure produced by this common, though separately differentiated causality plotting structure (telencephalon, neocortex) will be of the same form as the structure observed in the connectivity matrix of Shanahan et al’s. analysis.

As Felix and Fergal have both pointed out, the structure that Shanahan etc. abstract is very coarse and high level. Their analysis suggests its connectivity breaks the world down into functions. I don’t know if this breakdown can be interpreted in terms of causal relations at all. Maybe back in evolution, with connections becoming reified by evolution. But lower down I think the connections will be causal. As they are posited to be in Jeff’s HTM.

Either way, we can extract the same, hierarchical structure from a causal network. This hierarchical structure is my candidate for the perceptual hierarchy HTM theory is looking for.

You can say the “true” hierarchical structure is elsewhere, yet to be found. But I’m suggesting it is there. Let’s test that hypothesis.

A quick way to do this would be to perform some of their “recursive modularity analysis” on connection matricies extracted from current implementations of the CLA. We could look at the hierarchical breakdown such a connectivity analysis produces and see if it corresponds in any way with meaningful groupings. Or groupings it would suit us to identify as “meaningful” in order to perceptually structure our world, and make better predictions.

Note, these groupings would change more slowly over time, thus concording with the current HTM conception of a “temporal pooling”.


No @cogmission, we’re not saying that at all. There is no such thing as a “general universal encoder” for everything. It’s literally meaningless. Anything in the world only exists to an agent in the sense that some measurement takes place. And that can only be communicated into the agent by transforming the phenomenon that was being measured into a form which the agent can process. That’s encoding.

Now, you can make an encoder that just takes any kind of phenomenon at all and produces some output SDR, but it will be completely useless to the agent unless it preserves something of use to the agent in the transformation. In HTM terms, it won’t ably represent in the SDR the space of possible measurements, and it won’t preserve the distance metrics in the world space in the distance metrics in the SDR space.

So, for example, the Superior Colliculus in mammals is called the optic tectum in reptiles and birds. This thing, which processes optical information and integrates it with sound and accelerometer data, and produces muscle commands which direct the eyes to targets, has been around for about 250 million years with almost no changes since then. It took from the Cambrian explosion about 450 million years ago to 250 million years ago for evolution to figure out how to do this, and it’s stayed the same ever since. It’s very special, so special that you find it practically unchanged across 90% of living vertebrates. Anything which doesn’t use this exact encoder goes extinct in a few generations at most.

The same thing happened in 2012, when AlexNet won the ImageNet contest. That year, only a tiny number of entries used deep convolutional networks. But within a couple of years, all other designs were driven to extinction, and now people regularly just bang a pretrained DNN on their image data and get a suitable encoding for a myriad of other tasks.

Encoders are the taxes of intelligence. You need them, they’re a pain in the butt, but there’s no way around them. Thankfully, for every modality, it seems someone will figure out a good way to build an encoder, so just piggyback off that. But there’s no way to avoid needing a new encoder if you don’t already have one.


Thanks @fergalbyrne,

I can’t speak for others who want to continue this conversation, but I’m getting weary of this because I don’t think we’re seeing eye-to-eye :wink: At least when you respond you always offer a wealth of information and I always learn something new!

Again, what I was saying was that eventually we could stop making high level Encoders (let’s try that term) - and we can make ones that are the allegories of our senses (those that took millions of years for nature to provide, as you pointed out), such that once we get Encoders at a low enough level, (such as our senses), we won’t have to create higher level ones because we can use the lower level ones which can process a broader spectrum of input and have already been optimized over millions of repetitions of HTM usage.

For instance, at some point when we have hierarchy, the SDRs which exist at a very high level could be said to be receiving input from Encoders below (which are just outputs of SDRs which have ascended many Regions of HTM Network) - assuming that these “higher-level” SDRs represent concepts which exist at a higher level of complexity than whatever entered the bottom of the Network, and those “higher-level” SDRs are composite entities built upon the aggregate assembly (union maybe?) of “low-level” SDRs which are combined to represent a higher level of conceptual representation - then these “lower-level” Regions which produce these fundamental SDRs could be re-used across many different HTMs - much in the same way as specific Encoders are today?

Maybe I’m being dense, but I think I’m making a point of Region re-use of highly trained subsections of HTM Network hierarchies that receive very raw input and produce specialized SDRs at the other end which can be used much in the same way as our one off Encoders are today.

And if this is true, then we also remove the need for a developer to “configure” the Encoders too - we can just turn them on? EDIT: By “them” I mean the lower level Regions acting as Encoders or pre-processors of input.

Ok this is my last attempt to communicate this :stuck_out_tongue: Because either I’m not making any sense or something is preventing the communication from being interpreted in a useful and mutual way?



OK, now I think I understand what you’re saying. You’re using “encoder” in a general sense to mean anything that transforms its inputs and produces an SDR output. I’m using Encoder as a way of getting something non-SDR into a HTM system. In some sense everything producing SDRs is an “encoder” in the sense you use, and indeed we are trying to build such “encoders” - that’s what HTM is all about. And you’re right, our belief is that a HTM region is a generic “encoder” in your sense: something that can handle any kind of inputs. I’d just prefer to use the word Encoder to mean a function which converts non-SDR information into SDRs, and not to use “encoder” in your sense after that.


What term should we use to represent “producer of SDRs” or “pre-processor of SDRs” in a generic sense when we want to mean just that and not be more discrete in our distinction between other things that produce SDRs such as HTM algorithms, Network components (i.e. Layers or Regions), or Encoders?


It’s easy. Encoders, Layers and Regions all produce SDRs, but only Encoders produce them from something that’s not an SDR to start with. This is in the nomenclature sense, not in the implementation sense, so a SensorLayer or SensorRegion in some implementation is actually an Encoder and not a Layer or Region.