# “Prediction” from the first principles

I did skimmed main papers a few times, but some things just turn me off, especially that neuron diagram.
But I will read again, thanks!

1 Like

You mentioned integer values several times.

I would like to mention that the binary value is actually theoretically the same thing. The binary presence/absence of a feature or fraction of a feature is actually equivalent in a distributed form. An integer arbitrarily collects states in a single place. There is nothing that says that this is the natural form of data. The HTM model follows the much more general case that the information is distributed over as much space as is required to contain it.

Another point that you mentioned is closely related to this - the coordinate. This is mixed with the feature representation so that the binary presence and place are coded in the same bit.

This takes some mental strength and flexibility to comprehend the difference between the more “normal” computer science view of the representation and this more general and powerful representation scheme. It is worth the effort.

3 Likes

I would like to mention that the binary value is actually theoretically the same thing. The binary presence/absence of a feature or fraction of a feature is actually equivalent in a distributed form. An integer arbitrarily collects states in a single place.

Not arbitrarily. You are thinking about features, which are relatively high-level representations. If we have a general algorithm, it should follow the same principles on all levels, and the logic can only be made fully explicit on the lowest. Which means raw input: brightness, within a limit of resolution: pixel. Everything else (features) should be derived from these inputs, seamlessly. You are right that it actually starts from binary inputs, but multiple bits of brightness are located in a single place: pixel. That’s because “place” is externally defined, it is a macro-parameter relative to input, so it’s resolution lags that of an input. You could increase positional resolution to the point where content can have binary resolution without much overflow. But then almost all pixels will have 0 value, which is waste of space.

Another point that you mentioned is closely related to this - the coordinate. This is mixed with the feature representation so that the binary presence and place are coded in the same bit.

That only works if you know what type of feature that is. And if your neuron gets multiple types, then you don’t know which type triggered it. I lose information too, but only after evaluation, while this loss is indiscriminate.

1 Like

This would be true if you used “ordinary” coding. Here is where the powerful insight of “sparse coding” comes into play. Please look at it with an open mind when you read the paper.

As you pointed out - every thing eventually comes down to some level of quantitization. No mater what system you build you must deal with this. Integers do not isolate you from this truth. For that matter - neither does floating point.

Once you recognize this as one of the ground truths you can structure your problem and solution space around this constraint.

It turned me off for a long time. I hardly use it in HTM School. Start here:

1 Like

These are original inputs, you didn’t encode anything yet.
Your system will be sitting and waiting for a meaningful input.
Positional resolution is an order lower than input resolution in every working system, biological or artificial.
Actually, it’s a few orders lower in biological ones.
My intuition is that it’s because input resolution is cheaper: micro-cost vs. coordinate macro-cost.

At least in case of primary vision. Sparsity should increase on higher levels, but probably along with input disparity (resolution). Then we are talking about integer coordinate and float input, or something like that.
You see, SDR in HTM is implicit, represented by network topology, + binary presence | absence.
In my model, it is represented by explicit coordinates and values, which can be directly compared to form predictive vectors. Isn’t that more meaningful?

As you pointed out - every thing eventually comes down to some level of quantitization. No mater what system you build you must deal with this. Integers do not isolate you from this truth. For that matter - neither does floating point.

Yes, higher levels should have higher orders of quantization. That’s how it works in my model

2 Likes

or this one

Perhaps you could help me understand what you mean by pixels that have 0 value in this picture.

Perhaps you could help me understand what you mean by pixels that have 0 value in this picture.

Black pixels. If you take a pixel of brightness = 64 and increase it’s resolution by splitting into 1024 subpixels, but keep sensitivity constant, then ~ 1 / 16 nth subpixels will have brightness ~ 1, and 15 / 16 nth will be 0.
That would be 1st level SDR, and it would look pretty silly.

That’s why HTM is not meant to start from raw senses, it needs low-level sensory “encoders”.
I understand that this is biologically / phylogenetically plausible, but there is no conceptual justification to have two separate mechanisms. The same principles should apply on all levels, with incremental encoding per level.

You see, HTM may become the best functional model of neocortex, but neocortex itself is a horrible piece of engineering. Considering that evolution works incrementally, it’s probably the worst possible implementation of effective GI. Hawkins and co. do recognize this, but they don’t have the confidence to work from the first principles. Because cortex is “tangible”, and we didn’t evolve to take abstract principles very seriously.

What do you call the first principles? And what is a justification that they are the first?

The definition of intelligence as a mechanism to maximize predictive power, via hierarchical pattern discovery: http://www.cognitivealgorithm.info. This is what I agree with Jeff Hawkins on.
Justification is introspective generalization.

All colors are potentially important. On the screen in front on me now - the text which carries the main information is composed of black pixels. What I was hoping to convey to you in the example pictures provided is that in the real world information is conveyed at all scales and all brightness levels. The most important feature is integrated and intermixed with the surroundings where simple segmentation will not work very well. All the pixels are important.

Well, it’s a very controversial starting point.
Natural intelligence is first of all an adaptation mechanic, all other its properties are only derivatives.
What is more important from the practical point of view, natural intelligence is not hierarchical. Every pattern is determined based on all available context, which include horizontal and top down information.

1 Like

Yes, but that’s not a starting point. Learning is hierarchical, higher scales, context, etc., are initially empty. That is something that you have to learn first. The starting point for a blank system is that input is an impact on a sensor, higher brightness means greater impact. The fact that real predictive value of this impact is relative must be learned.

Well, it’s a very controversial starting point.
Natural intelligence is first of all an adaptation mechanic, all other its properties are only derivatives.
What is more important from the practical point of view, natural intelligence is not hierarchical. Every pattern is determined based on all available context, which include horizontal and top down information.

There are all kinds of adaptation, this is not specific to intelligence.

Re context and and top-down, they doesn’t exist when you just started learning. They are a feedback from previous learning. Primary learning is bottom-up.

Primary learning is local. Sure, more complex and especially abstract patterns are convolutions of others, but it’s not about hierarchy, it’s all directional network.

BTW, our intelligence is not a blank system at the beginning. It contains quite complicate hardwired primitives higher than V1.

2 Likes

It’s better to say the opposite: any behavior is kind of intelligence, even in prebiological molecules. The human intelligence is only a super elaborated version.

That is the bottom of S-T scope hierarchy. Higher levels have greater scope of search and generalization.

BTW, our intelligence is not a blank system at the beginning. It contains quite complicate hardwired primitives higher than V1.

That’s a product of evolutionary learning, and these primitives are very simple compared to what we end up learning. In any case, I was talking about intelligence in general, specific shortcuts are not essential to it.

Calling it adaptation or behaviour doesn’t say anything about how to design it.

There is some hierarchy in our brain, but it’s hierarchy of abstraction, it’s not straight forward, and it’s artificial classification, not algorithmical.
Basically our brain is collection of semi independent interconnected modules and for many of them you can’t define which is higher.
Because of the variety of the modules it’s possible to determine patterns for emergent properties, which we call abstract. It’s obvious that evolutionary they could be created later and morphologically they are typically described as higher regions. However they are just other semi independent modules in the network, you can describe them as placed aside, or at the center - it’s just a matter of convenience of thinking about it.

It’s true only if you exclude evolutionary development as a way to develop it.

On other side, an arbitrary choice of one of its properties as a fundamental description is not the best practical approach eigher.