“Prediction” from the first principles

I am familiar with orientation columns, but they only represent that one parameter: orientation.
My comparison derives and encapsulates into patterns multiple parameters at once. That’s far more complex, general, and informative.

In my opinion it is the system that starts off only able to detect two angles of motion that becomes “pretty useless for deep analysis”. Having to make adjustments adds another step to the process. It makes more sense to me to start off with the brain’s center-surround receptive field organization.

Adding another step to inputs that actually deserve the costs is more intelligent than adding the same step to every bit of noise that comes across.

Are you sure they are similar? Because I haven’t seen any. Do you have a reference?

I’m going by the results of hundreds of great sounding ideas I tried including ones that parallel what you proposed, and did not work as well as expected. From my experience successfully reproducing what is found in neuroscientific literature results in discovering a trick that can be modeled with a small amount of code. For example two frame place avoidance behavior and getting from place to place without bumping into anything unless still learning to walk/run/fly or were startled:

Neuroscience requires answering questions like: where would the brainwaves be represented?
How would you even model this kind of behavior without first modeling waves?

It’s fine that you didn’t get very deep into my intro, but then you are not in a position to judge what it is similar to. I think I made it pretty clear that I am not doing neuroscience, and that’s not because I don’t have clue.

I don’t need global synchronization, via waves or otherwise, because my parameters are encapsulated into patterns, rather than distributed across the whole system. This encapsulation means that they can be processed locally and asynchronously, in parallel with patterns of other levels. The brain can’t do it because there is no RAM within neuron, so it must use dedicated physical connections for memory. That’s a huge handicap, and we don’t need to replicate it.

That’s true if you are talking about instincts, but they are a small part of human motivation.
It’s not true about conditioning because there is a value drift, driven by cortical learning.
Which means that cortex can and does swap it’s bosses all the time.
And it’s definitely untrue about value-free curiosity: that boss in an empty chair.

Why did you bring your non-neuroscientific model to a neuroscience forum?

Because my model offers a conceptually better way to achieve the same purpose.
I actually asked Matt if this is a good place for it, and he approved.

I am not attacking here so please don’t take this the wrong way.

You seem very hung up on curiosity as if it is somehow special. I take this to mean that you place this as some sort of different behavior from - oh say - seeking shelter or a food source.

In my own case I have a very solid “big picture” idea of how it all works as a system and I am trying to learn how the parts I don’t understand operate to fill in this picture; mine goes from a helpless infant to a functioning adult. I have been working on this since the early 1980’s and there are still many loose ends.

Do you have a complete working model/framework that you think can be elaborated into a working AGI even if it is not documented in your writings? Said differently - Does your partial model fit into a bigger picture or is just an interesting sub-problem to be solved in any way you can work up?
Does it account for saccades and assembling those snapshots into a mental model?
Does this model include being able to drive a body and generate speech?
Does it account for the sub-cortical coloring of perceptions from experience to form judgments?
Does it account for the known observations of various defects of the human brain and the affects it has on expressed behavior? This is important as these defects form the fence of what a “broken” AGI would look like.

I see that these things are not random questions but instead - paths to understanding how an AGI will have to function to be compatible with human culture. I have said this before but I will raise it again: As a researcher in the AGI field, I spend considerable time thinking about the various mental defects and wonder if I would consider it a win to create a fully functional profoundly autistic AGI. Or a fully functional psychotic one.

What Jeff said in On Intelligence about hierarchy is not exactly how we think about hierarchy anymore. We think objects are being recognized at all levels, not just being composed at higher levels. This is different from the standard hierarchy model. Lots of discussion about this at Follow-up question to Podcast 1 with Jeff (location, orientation, and attention).

2 Likes

@bkaz sorry, I don’t really want to read all your posts. Have you a MNIST benchmark or something like that?

I just moved this from #htm-theory:tangential-theories (it really isn’t) into #other-topics to avoid further confusion.

Then don’t read them.

Since my spatial navigation model predicts the loss of ability to generate cognitive maps I had to search Google Scholar for more information, and found this:

Hippocampal lesions disrupt navigation based on cognitive maps but not heading vectors.

Pearce JM1, Roberts AD, Good M.

Abstract

Animals can find a hidden goal in several ways. They might use a cognitive map that encodes information about the geometric relationship between the goal and two or more landmarks. Alternatively, they might use a heading vector that specifies the direction and distance of the goal from a single landmark. Rats with damage to the hippocampus have difficulty in finding a hidden goal. Here we determine which of the above strategies is affected by such damage. Rats were required to swim in a water maze to a submerged platform, which was always at the same distance and direction from a landmark. The platform and landmark remained in the same place for the four trials of each session, but they were moved to a new position at the start of a session. Rats with damage to the hippocampus found the platform more efficiently than did normal rats in the first trial of a session but, in contrast to normal rats, their performance did not improve during a session. Our results indicate that hippocampally damaged rats are able to navigate by means of heading vectors but not cognitive maps.

Place navigation impaired in rats with hippocampal lesions | Nature

https://www.researchgate.net/profile/Amanda_Roberts5/publication/13469881_Hippocampal_lesions_disrupt_navigation_based_on_cognitive_maps_but_not_heading_vectors/links/5542120e0cf224a89a333746.pdf

There would still be the very useful heading vector that existed before adding feedback from the spatial mapping network. But as in the paper there would be no improvement in its navigational success rate.

I was beginning to worry about patient HM having possibly squashed my model. But it’s still OK!

Sorry, these are not my concerns. I am designing an algorithm to do science, not a social media chatbot.

You seem to be implying your work is more important than others’ work on this forum. That is not how we operate here. Please be nice.

Now that I have some idea what it is you are trying to do, I can address your work more closely.

You have made it clear that you are not trying to make a part of a larger system but instead – to develop a particular method of analysis of a stream of information and through this method – predict trends.

In your writing you devote a section (Comparison to Artificial and Biological Neural Networks) comparing your work to several technologies that you have identified as somehow similar to what you are trying to do.

This cherry picked list has biological and artificial ANN and CNN networks and then some casual and arbitrary dismissals based on your personal criteria. These criteria include personal taste and invocation of comparison of costs in either resources or algorithmic complexity.

I cannot speak to the personal taste portion as that is entirely up to you as to what is acceptable.

I can speak to the computational costs – it is now commonplace to have access to a GPU on every computing platform – even the $5 raspberry pi has one. These platforms are all capable of running common graphics kernels at video speed.

Your choice of strawman to compare your technique to judge relative merits is odd – most are dimensional mapping algorithms – they are not meant to do temporal predictions. I am not surprised that they do not compare very well.

It is odd that you don’t include RNN and HTM technology as these ARE meant to do temporal prediction. If you do get around to doing this comparison I would avoid invoking computational complexity until you finish fleshing out your level 3 & 4 methods as these are likely to end up with much the same complexity to do anything useful.

As to your level 1 & 2 – you should be looking more at other technologies that are a better match.

The basic operation you are doing a weird mélange of basic arithmetic and logic that end up retracing the steps of basic operations that are usually used in edge detection operations in graphics:

Some edge detection kernels: Sobel, prewitt, laplacian, canny, Gaussian Blur, …

https://www.ics.uci.edu/~majumder/DIP/classes/EdgeDetect.pdf

Your insight that these can be used on a temporal stream is not particularly novel:

Evaluation of Kernel Methods for Change Detection and Segmentation

https://www.researchgate.net/publication/222102464_Unsupervised_Change_Detection_by_Kernel_Clustering

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.1469&rep=rep1&type=pdf

I wish you luck in your continuing efforts but I don’t see anything of use to me at this time.

If you manage to come up with something in the higher levels, I will be happy to review your work.

1 Like

ECCV 2018 - Occlusions, Motion and Depth Boundaries:

Bitking

This cherry picked list has biological and artificial ANN and CNN networks and then some casual and arbitrary dismissals based on your personal criteria. These criteria include personal taste and invocation of comparison of costs in either resources or algorithmic complexity.

I think by " taste" you mean my complaints about lack of theoretical grounding. This is widely acknowledged in ML and there is nothing personal about it.

It’s the main reason I am designing algorithm from the scratch, even if that means repeating some work that’s already been done.

Because the hard part here is theoretical justification, which no one seems to care about.

I can speak to the computational costs – it is now commonplace to have access to a GPU on every computing platform – even the $5 raspberry pi has one. These platforms are all capable of running common graphics kernels at video speed.

As I mentioned in the intro, the costs I am concerned about are on higher levels | layers. Because they increase exponentially with elevation, and I want to design a system that can add new levels with experience, indefinitely. On lower levels, my design is significantly more complex and expensive than anything I know, not mathematically but logically. Especially because basic algorithm is strictly sequential, to make it tracktable.

I consider parallelization a separate problem.

And I tried to explain why it is so complex: I need to derive a lot of fine-grained parameters to predictively prune higher-level search, because that’s where the costs get out of control. It’s an upfront investment.

Your choice of strawman to compare your technique to judge relative merits is odd – most are mapping algorithms – they are not meant to do temporal predictions. I am not surprised that they do not compare very well.

It is odd that you don’t include RNN and HTM technology at these ARE meant to do temporal prediction.

I addressed basic operations: weight matrix multiplication, integrate-and-fire, etc.

My main problems are statistical nature of it all, no one-to-one integer-level comparison, distributed vs. localized parameters.

These apply to all NNs, regardless of architecture. And none of that is my personal taste, it’s the very logic of strictly incremental search.

I wish you luck in your continuing efforts but I don’t see anything of use to me at this time.

If you manage to come up with something in the higher levels, I will be happy to review your work.

Thanks for taking your time.

1 Like

The HTM model is both binary and not statistically oriented. Perhaps you should take the time to learn how it works before grouping it with all other neural networks. You may find a lot to like about this predictive model.

Many of us share your concerns that that is why we are working with the HTM model. One of the oddities is that is it very good at predictions and so-so on the things that “ordinary” neural networks tend to do very well.

If you work through either the HTM school videos or the collections of foundation papers you will see that they are not at all like the classic ANNs that you were comparing in your paper. The theoretical foundations are both mathematically secure and well elucidated.

They have evolved since the original introduction but the essential features remain the same. What has been added is the underpinning of your level 3-4 structure.

You may recognize that the Numenta group is working much the same way as you do - starting from a basic provable premise and extending it slowly. As they have worked with the basic model they have been open to learning that the first ideas they had were not correct and adjusting the way the model is used. I respect this openness to the possibility that the first intuitions were not correct.

3 Likes

I know it’s binary / spiking, and the inputs are OR-ed within a dendrite, but aren’t they summed in the soma?
Being a “cortical” algorithm, how can it not be statistical, given the amount noise in the brain?
And parameters are still widely distributed across the network, a column only represents one type of parameter?
I have a problem with binary inputs too, logically, it should be integer inputs with binary coordinates: first step is digitization within a coordinate…

I would waste both of our times repeating and explaining what is very well documented in the papers.

I will say that it is NOT a spiking model. The summing part is collection of local features and this is much the same thing you do with your local area manipulation in your model. The breakthrough that sparse representation is sufficient to reduce the search space is informed by the brain but is solidly backed by mathematical theory in the “thousands of synapses” paper.

As far as the representation issue - you yourself have stated that some data types must be reduced to be handled efficiently by your model. (reduction to part 0) In the HTM system they follow the same path with front end “encoders” that bring everything into a compatible format.

I really do encourage you to take the time to learn the HTM system. At a bare minimum, your insight could provide valuable feedback on what parts you see at lacking either in function or explanation. You started with the “on Intelligence” book as did many of us - you could at least see where this work has lead to - if for no other reason than to see where we have all gone wrong. :slight_smile:

3 Likes

I did skimmed main papers a few times, but some things just turn me off, especially that neuron diagram.
But I will read again, thanks!

1 Like

You mentioned integer values several times.

I would like to mention that the binary value is actually theoretically the same thing. The binary presence/absence of a feature or fraction of a feature is actually equivalent in a distributed form. An integer arbitrarily collects states in a single place. There is nothing that says that this is the natural form of data. The HTM model follows the much more general case that the information is distributed over as much space as is required to contain it.

Another point that you mentioned is closely related to this - the coordinate. This is mixed with the feature representation so that the binary presence and place are coded in the same bit.

This takes some mental strength and flexibility to comprehend the difference between the more “normal” computer science view of the representation and this more general and powerful representation scheme. It is worth the effort.

3 Likes

I would like to mention that the binary value is actually theoretically the same thing. The binary presence/absence of a feature or fraction of a feature is actually equivalent in a distributed form. An integer arbitrarily collects states in a single place.

Not arbitrarily. You are thinking about features, which are relatively high-level representations. If we have a general algorithm, it should follow the same principles on all levels, and the logic can only be made fully explicit on the lowest. Which means raw input: brightness, within a limit of resolution: pixel. Everything else (features) should be derived from these inputs, seamlessly. You are right that it actually starts from binary inputs, but multiple bits of brightness are located in a single place: pixel. That’s because “place” is externally defined, it is a macro-parameter relative to input, so it’s resolution lags that of an input. You could increase positional resolution to the point where content can have binary resolution without much overflow. But then almost all pixels will have 0 value, which is waste of space.

Another point that you mentioned is closely related to this - the coordinate. This is mixed with the feature representation so that the binary presence and place are coded in the same bit.

That only works if you know what type of feature that is. And if your neuron gets multiple types, then you don’t know which type triggered it. I lose information too, but only after evaluation, while this loss is indiscriminate.

1 Like