When I think of encoders, I think of sensory encoders. Those are the things that take temporal data (of whatever type) and convert them into a semantic representation of bits. These bits don’t need to be sparse, they just need to have semantic meaning. This is a difference between encoders and things like regions that also produce SDRs.
@rhyolight makes a valid point. Encoders don’t really produce SDRs (they aren’t necessarily distributed, though they are most often of an acceptable sparsity). So there is that…
Yes, they are distributed. Meaning can be spread across bits.
I say not because Encoders produce output which has contiguous placement requirements (i.e. the location of the on bits is rigidly representational - if you flip an Encoder’s bit you get a vast difference in meaning )
Edit: Maybe there’s room to say there is spatial significance out of necessity as one of the requirements for Encodings because you want overlap and contiguity between bits that represent concepts that are “near” each other… But I still think that they aren’t truly distributed because they aren’t as noise resilient as true SDRs They’re closer to Ascii representation then SP output in my view?
No, the definition of a good Encoder is that flipping one bit corresponds to the least difference in meaning in the underlying metric.
I see, and does that mean their meaning is distributed?
Yes. I call the output of Encoders “pseudoSDRs” because they are SDRs but not necessarily as sparse as the usual. They MUST be distributed, and they MUST have the semantic bit interpretation (in the distributed sense), and they SHOULD be quite sparse.
Actually I don’t know if I accept that @fergalbyrne, this discussion is also related to the difference in opinion over Cortical.io SDRs too…? The rigid placement of on bits in an Encoding means that their meanings are precarious. If the bit you flip is not contiguous with one of the other on bits, you can potentially change the representation to mean something at the other end of the spectrum. This isn’t so for SP SDRs?
And actually it would seem you need to have columnar competition; inhibition and severe sparsity to overcome the effects of noise? When I think about it, I don’t agree at all with Encoder outputs being SDRs…?
Well, that’s true of cortical.io SDRs, but that’s more an artifact of the enormous dimensionality of word meanings. Our real word SDRs are far bigger than the 16k of cortical.io outputs, but there’s more than enough to consider them suitable encodings.
This is the essence of an SDR encoder, along with its three major properties: sub-sampling, classification, and union. Nupic has many hand-crafted delicate encoders for different types of input, but I always longed for a universal encoder which is defined only by these four major properties (which are actually the performances under these four operations).
There’s a troubling feeling of the contradiction between hand-crafted encoders and the reasons why SDR are proposed. Somehow I feel the need to see solid proof (theoretical , mathematical or statistical, I don’t know) that the hand-crafted encoders in Nupic do have all these four properties. All I have seen is these encoders are designed by common sense, for example, the DateEncoder explicitly extracted features like
holiday etc. from our daily experiences.
The universal encoder, or I might deviate a little bit, the universal encoder generator in my dream, is a mechanism which:
- is deduced from the four properties
- accepts some constraints or meta-information from the properties of the input (such as sub-fields, ranges etc.)
- is trained with datasets for best performance under the four operations
- generates an ideal SDR encoder for that type of input.
I understand that this very concept of a universal encoder generator might not be practical or useful, but I do have these doubts about the hand-crafted encoders in Nupic, just think of it as a metaphor.
@cogmission , thank you for bringing this topic up, it’s a leap of imagination. I would hesitate too long before asking for this much…
@fergalbyrne , I read all your posts in this thread and I basically agree with every point of yours, from the practical angle (e.g. why to train a geospatial encoder blindly and numerically when Chetan can design it elegantly with symbolic and structural thinking capacity and human creativity? ). But I believe this topic is about the evolutional direction of ideal encoders for an AI which is certainly generic. For me, this tendency originated from these two slides from What the Brain says about Machine Intelligence:
Perhaps you were thinking of the alternate pathway that connects from the optic nerve (after passing through the optic chiasm) directly to the hypothalamus, and is involved in the regulation of circadian rhythms.
There is also the concept of parallel pathways from the retina to the LGN, but these are separate pathways which originate in distinct retinal ganglion cells (M-cells and P-cells). Projections from M-cells terminate in layers 1 and 2 in the LGN, while P-cells terminate in layers 3-6 of the LGN (there are a set of K-cells that also terminate ib between the layers of the LGN. From the LGN, the output from layers 1 & 2 (M-cell originated) project to the dorsal half of the ventral half of layer 4 (layer 4Cα) of V1, while output from layers 3-6 (P-cell originated) project to the ventral half of the ventral half of layer 4 (layer 4Cβ) of V1.
yes Yes YES! (reaches for a cigarette) LOL!
This is exactly my point. Eventually we would sit a synthetic intelligence on a bench next to an organic one, and they would be able to process the world around them identically without having to anticipate a priori what all the encountered stimuli would be.
Thank you for bringing this out. I was thinking the same, for a while.
I think we have 2 problems here. First, the real world data have little sparsity. To convert the stream of data to a suitable SDR sequence we need some (out-of-HTM) processing. This is why Encoders are used, now. The other thing is that the HTM network don’t like “dense” data streams, so we can’t use them directly. In biology this operation is taking place within the network. Are we missing something here?!
If we take the auditory processing in the biological version, the cochlea arrangement act as a Fourier Transform device, taking small range of frequencies (in the data) directly to a specific part of the network. Loudness inference seem more complex. Same thing happen in vision, where pixel-like cells react to (basic 3 colors/light frequencies) and activate specific parts of the network. In both examples I’m not sure if sparsity is achieved by higher dimensions processing neurons (not binary) and/or the temporal effect (the effect of time/action potential fade-out) in the neurons. One thing is granted; SDRs are arbitrary. Once the HTM is started, the encoder/sensor should not be altered or changed.
I’m trying the mathematical approach side by side with direct code/test cycles I will come back to you with whatever I will get.
The process by which the retina and cochlea are influenced by evolution, particularly the brains evolution I think must be the inspiration. We need to encode the workings of that feedback loop, then we’d have a universal encoder.
After reading through most of the arguments, I think the need for universal encoders in the sense that @cogmission suggested will also bring about the need for very large, efficient and hierarchical implementations of HTM. I agree with @fergalbyrne that we cannot have a universal encoder for all unique types of data. So what we can aim for right now is to transform any kind of data into a universal encoding format that will still somehow maintain the semantic information and then we can have HTM encoder for that format. For example, transforming ASCII streams and visual(image) streams into a single form of data type(kind of like computer bit representations) and then having a single encoder for that output. Computers store everything as ones and zeros and we have ways to assign values to particular semantic features. So this simple encoding is an example of what I am referring to.
So the way we do math and literature is essentially by using our eyes and ears, and then we have representations for what we see and hear, and then processing takes place on those high order spaces. So we can encode everything similarly into a format. Does this make sense? But still a heavy emphasis on the fact that there must be separate encoders for separate kinds of sensory modalities.
So the point about an organic and artificial intelligence sitting side by side and perceiving the world similarly can only be possible if they share the same HTM architecture and the primary sensory encoders. And also possibly the same types of sensory modalities.
Thanks @fergalbyrne for the large amount of valuable information.
I think this pertains more so to the functional advantages of the HTM algorithm than the encoders. The person is not getting visual information and can’t see but is creating associations and generating predictions from the input. Inferring the presence of objects from pressure/other data is not exactly ‘seeing’. The person might be able to avoid obstacles and guess that there is a circular object(like processors using IR and other sensors), but he/she isn’t actually ‘seeing’. To ‘see’, he/she will have to have input stream coming from their optical modalities(eyes). The kind of perception that occurs is actually important here, not just the overall inference(detecting some object).
@utensil The problem here is that different data present different types of semantic information in different forms and it is difficult to encode all those multiple types using a single algorithm. One might argue that a modality that senses multiple types of data using different sensors and combines them into one algorithm can be made, but this cannot be done without having separate encoding algorithms for those different types and then combining them later.
This is in turn means that different aspects of the data that have to be inferred are encoded separately and combined into input streams to the cortex. Which further makes the coding of a universal encoder much more difficult. If one does encode all types of data using a single algorithm then lots of semantic information will be lost.
I may be wrong, but I did mean this quite literally. The person in the experiment was actually “seeing”. The difference in input complexity between the tactile sensations of the tongue and the visual perception of the eyes, in my mind, expresses itself perhaps as a reduction in resolution, number of colors (if any), and discrete-ness of the visual image (lack of blur - precision of edges etc.).
The electrical signals (sdrs) probably aren’t as resolute or complex, but as long as there is the same processing algorithm producing the same or close to the same output, I could see how the downstream interpreting mechanisms would still produce “images”, in my opinion…
I understand what you are trying to say. The person can try to imagine an image from the tongue sensations, but again, those are not actual images as processed from the ocular input(only these qualify as vision). The details are important. I think the more accurate statement would be that the person can still infer a subset of the visual features from the tongue sensations.
Again, I disagree. This (in my mind) is the significance of the discovery that all neocortical processing centers use the same algorithm. This is why that discovery is so important. What matters is the downstream processing that takes the signals and uses them to produce olfactory, visual, taste, kinesthetic and other sense information - but the processing is (basically - and maybe not exactly, but close) the same.
Maybe we should ask Numenta’s research team to see how they interpret that experiment?