“Prediction” from the first principles


#29

What Jeff said in On Intelligence about hierarchy is not exactly how we think about hierarchy anymore. We think objects are being recognized at all levels, not just being composed at higher levels. This is different from the standard hierarchy model. Lots of discussion about this at Follow-up question to Podcast 1 with Jeff (location, orientation, and attention).


#30

@bkaz sorry, I don’t really want to read all your posts. Have you a MNIST benchmark or something like that?


#31

I just moved this from #htm-theory:tangential-theories (it really isn’t) into #other-topics to avoid further confusion.

Then don’t read them.


#32

Since my spatial navigation model predicts the loss of ability to generate cognitive maps I had to search Google Scholar for more information, and found this:

Hippocampal lesions disrupt navigation based on cognitive maps but not heading vectors.

Pearce JM1, Roberts AD, Good M.

Abstract

Animals can find a hidden goal in several ways. They might use a cognitive map that encodes information about the geometric relationship between the goal and two or more landmarks. Alternatively, they might use a heading vector that specifies the direction and distance of the goal from a single landmark. Rats with damage to the hippocampus have difficulty in finding a hidden goal. Here we determine which of the above strategies is affected by such damage. Rats were required to swim in a water maze to a submerged platform, which was always at the same distance and direction from a landmark. The platform and landmark remained in the same place for the four trials of each session, but they were moved to a new position at the start of a session. Rats with damage to the hippocampus found the platform more efficiently than did normal rats in the first trial of a session but, in contrast to normal rats, their performance did not improve during a session. Our results indicate that hippocampally damaged rats are able to navigate by means of heading vectors but not cognitive maps.

https://www.nature.com/articles/297681a0

There would still be the very useful heading vector that existed before adding feedback from the spatial mapping network. But as in the paper there would be no improvement in its navigational success rate.

I was beginning to worry about patient HM having possibly squashed my model. But it’s still OK!


#33

Sorry, these are not my concerns. I am designing an algorithm to do science, not a social media chatbot.


#34

You seem to be implying your work is more important than others’ work on this forum. That is not how we operate here. Please be nice.


#35

Now that I have some idea what it is you are trying to do, I can address your work more closely.

You have made it clear that you are not trying to make a part of a larger system but instead – to develop a particular method of analysis of a stream of information and through this method – predict trends.

In your writing you devote a section (Comparison to Artificial and Biological Neural Networks) comparing your work to several technologies that you have identified as somehow similar to what you are trying to do.

This cherry picked list has biological and artificial ANN and CNN networks and then some casual and arbitrary dismissals based on your personal criteria. These criteria include personal taste and invocation of comparison of costs in either resources or algorithmic complexity.

I cannot speak to the personal taste portion as that is entirely up to you as to what is acceptable.

I can speak to the computational costs – it is now commonplace to have access to a GPU on every computing platform – even the $5 raspberry pi has one. These platforms are all capable of running common graphics kernels at video speed.

Your choice of strawman to compare your technique to judge relative merits is odd – most are dimensional mapping algorithms – they are not meant to do temporal predictions. I am not surprised that they do not compare very well.

It is odd that you don’t include RNN and HTM technology at these ARE meant to do temporal prediction. If you do get around to doing this comparison I would avoid invoking computational complexity until you finish fleshing out your level 3 & 4 methods as these are likely to end up with much the same complexity to do anything useful.

As to your level 1 & 2 – you should be looking more at other technologies that are a better match.

The basic operation you are doing a weird mélange of basic arithmetic and logic that end up retracing the steps of basic operations that are usually used in edge detection operations in graphics:

Some edge detection kernels: Sobel, prewitt, laplacian, canny, Gaussian Blur, …

Your insight that these can be used on a temporal stream is not particularly novel:

Evaluation of Kernel Methods for Change Detection and Segmentation

https://www.researchgate.net/publication/222102464_Unsupervised_Change_Detection_by_Kernel_Clustering

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.1469&rep=rep1&type=pdf

I wish you luck in your continuing efforts but I don’t see anything of use to me at this time.

If you manage to come up with something in the higher levels, I will be happy to review your work.


#36

ECCV 2018 - Occlusions, Motion and Depth Boundaries:


#37

Bitking

This cherry picked list has biological and artificial ANN and CNN networks and then some casual and arbitrary dismissals based on your personal criteria. These criteria include personal taste and invocation of comparison of costs in either resources or algorithmic complexity.

I think by " taste" you mean my complaints about lack of theoretical grounding. This is widely acknowledged in ML and there is nothing personal about it.

It’s the main reason I am designing algorithm from the scratch, even if that means repeating some work that’s already been done.

Because the hard part here is theoretical justification, which no one seems to care about.

I can speak to the computational costs – it is now commonplace to have access to a GPU on every computing platform – even the $5 raspberry pi has one. These platforms are all capable of running common graphics kernels at video speed.

As I mentioned in the intro, the costs I am concerned about are on higher levels | layers. Because they increase exponentially with elevation, and I want to design a system that can add new levels with experience, indefinitely. On lower levels, my design is significantly more complex and expensive than anything I know, not mathematically but logically. Especially because basic algorithm is strictly sequential, to make it tracktable.

I consider parallelization a separate problem.

And I tried to explain why it is so complex: I need to derive a lot of fine-grained parameters to predictively prune higher-level search, because that’s where the costs get out of control. It’s an upfront investment.

Your choice of strawman to compare your technique to judge relative merits is odd – most are mapping algorithms – they are not meant to do temporal predictions. I am not surprised that they do not compare very well.

It is odd that you don’t include RNN and HTM technology at these ARE meant to do temporal prediction.

I addressed basic operations: weight matrix multiplication, integrate-and-fire, etc.

My main problems are statistical nature of it all, no one-to-one integer-level comparison, distributed vs. localized parameters.

These apply to all NNs, regardless of architecture. And none of that is my personal taste, it’s the very logic of strictly incremental search.

I wish you luck in your continuing efforts but I don’t see anything of use to me at this time.

If you manage to come up with something in the higher levels, I will be happy to review your work.

Thanks for taking your time.


#38

The HTM model is both binary and not statistically oriented. Perhaps you should take the time to learn how it works before grouping it with all other neural networks. You may find a lot to like about this predictive model.

Many of us share your concerns that that is why we are working with the HTM model. One of the oddities is that is it very good at predictions and so-so on the things that “ordinary” neural networks tend to do very well.

If you work through either the HTM school videos or the collections of foundation papers you will see that they are not at all like the classic ANNs that you were comparing in your paper. The theoretical foundations are both mathematically secure and well elucidated.

They have evolved since the original introduction but the essential features remain the same. What has been added is the underpinning of your level 3-4 structure.

You may recognize that the Numenta group is working much the same way as you do - starting from a basic provable premise and extending it slowly. As they have worked with the basic model they have been open to learning that the first ideas they had were not correct and adjusting the way the model is used. I respect this openness to the possibility that the first intuitions were not correct.


#39

I know it’s binary / spiking, and the inputs are OR-ed within a dendrite, but aren’t they summed in the soma?
Being a “cortical” algorithm, how can it not be statistical, given the amount noise in the brain?
And parameters are still widely distributed across the network, a column only represents one type of parameter?
I have a problem with binary inputs too, logically, it should be integer inputs with binary coordinates: first step is digitization within a coordinate…


#40

I would waste both of our times repeating and explaining what is very well documented in the papers.

I will say that it is NOT a spiking model. The summing part is collection of local features and this is much the same thing you do with your local area manipulation in your model. The breakthrough that sparse representation is sufficient to reduce the search space is informed by the brain but is solidly backed by mathematical theory in the “thousands of synapses” paper.

As far as the representation issue - you yourself have stated that some data types must be reduced to be handled efficiently by your model. (reduction to part 0) In the HTM system they follow the same path with front end “encoders” that bring everything into a compatible format.

I really do encourage you to take the time to learn the HTM system. At a bare minimum, your insight could provide valuable feedback on what parts you see at lacking either in function or explanation. You started with the “on Intelligence” book as did many of us - you could at least see where this work has lead to - if for no other reason than to see where we have all gone wrong. :slight_smile:


#41

I did skimmed main papers a few times, but some things just turn me off, especially that neuron diagram.
But I will read again, thanks!


#42

You mentioned integer values several times.

I would like to mention that the binary value is actually theoretically the same thing. The binary presence/absence of a feature or fraction of a feature is actually equivalent in a distributed form. An integer arbitrarily collects states in a single place. There is nothing that says that this is the natural form of data. The HTM model follows the much more general case that the information is distributed over as much space as is required to contain it.

Another point that you mentioned is closely related to this - the coordinate. This is mixed with the feature representation so that the binary presence and place are coded in the same bit.

This takes some mental strength and flexibility to comprehend the difference between the more “normal” computer science view of the representation and this more general and powerful representation scheme. It is worth the effort.


#43

I would like to mention that the binary value is actually theoretically the same thing. The binary presence/absence of a feature or fraction of a feature is actually equivalent in a distributed form. An integer arbitrarily collects states in a single place.

Not arbitrarily. You are thinking about features, which are relatively high-level representations. If we have a general algorithm, it should follow the same principles on all levels, and the logic can only be made fully explicit on the lowest. Which means raw input: brightness, within a limit of resolution: pixel. Everything else (features) should be derived from these inputs, seamlessly. You are right that it actually starts from binary inputs, but multiple bits of brightness are located in a single place: pixel. That’s because “place” is externally defined, it is a macro-parameter relative to input, so it’s resolution lags that of an input. You could increase positional resolution to the point where content can have binary resolution without much overflow. But then almost all pixels will have 0 value, which is waste of space.

Another point that you mentioned is closely related to this - the coordinate. This is mixed with the feature representation so that the binary presence and place are coded in the same bit.

That only works if you know what type of feature that is. And if your neuron gets multiple types, then you don’t know which type triggered it. I lose information too, but only after evaluation, while this loss is indiscriminate.


#44

This would be true if you used “ordinary” coding. Here is where the powerful insight of “sparse coding” comes into play. Please look at it with an open mind when you read the paper.

As you pointed out - every thing eventually comes down to some level of quantitization. No mater what system you build you must deal with this. Integers do not isolate you from this truth. For that matter - neither does floating point.

Once you recognize this as one of the ground truths you can structure your problem and solution space around this constraint.


#45

It turned me off for a long time. I hardly use it in HTM School. Start here:


#46

These are original inputs, you didn’t encode anything yet.
Your system will be sitting and waiting for a meaningful input.
Positional resolution is an order lower than input resolution in every working system, biological or artificial.
Actually, it’s a few orders lower in biological ones.
My intuition is that it’s because input resolution is cheaper: micro-cost vs. coordinate macro-cost.

At least in case of primary vision. Sparsity should increase on higher levels, but probably along with input disparity (resolution). Then we are talking about integer coordinate and float input, or something like that.
You see, SDR in HTM is implicit, represented by network topology, + binary presence | absence.
In my model, it is represented by explicit coordinates and values, which can be directly compared to form predictive vectors. Isn’t that more meaningful?

As you pointed out - every thing eventually comes down to some level of quantitization. No mater what system you build you must deal with this. Integers do not isolate you from this truth. For that matter - neither does floating point.

Yes, higher levels should have higher orders of quantization. That’s how it works in my model :slight_smile:


#47


or this one

Perhaps you could help me understand what you mean by pixels that have 0 value in this picture.


#48

Perhaps you could help me understand what you mean by pixels that have 0 value in this picture.

Black pixels. If you take a pixel of brightness = 64 and increase it’s resolution by splitting into 1024 subpixels, but keep sensitivity constant, then ~ 1 / 16 nth subpixels will have brightness ~ 1, and 15 / 16 nth will be 0.
That would be 1st level SDR, and it would look pretty silly.

That’s why HTM is not meant to start from raw senses, it needs low-level sensory “encoders”.
I understand that this is biologically / phylogenetically plausible, but there is no conceptual justification to have two separate mechanisms. The same principles should apply on all levels, with incremental encoding per level.

You see, HTM may become the best functional model of neocortex, but neocortex itself is a horrible piece of engineering. Considering that evolution works incrementally, it’s probably the worst possible implementation of effective GI. Hawkins and co. do recognize this, but they don’t have the confidence to work from the first principles. Because cortex is “tangible”, and we didn’t evolve to take abstract principles very seriously.