After learning about HTM a bit and with a general background in ML, it seems to me that a central difference is that ML uses mathematical optimization to optimize an objective function, and HTM’s goal seems to just being fairly good at predicting sequences.
Intuitively, it seems to me like there’s an inherent trade off between generalization and optimization (I’ll be happy to hear about any attempts to formalize this concept) and this is possibly the reason why the ML algorithms of today really excel at a particular task (weak AI). The attitude of ML today is to set up a model with a big parameter space and to use gradient descent to optimize it to perfection. I fear that such approaches may be losing on the generality side.
HTM on the other hand seem to be less accurate than current ML algorithms but is possibly far more general.
Do you think this is a fair reading of the state of affairs?
Another thought is about how this applies to AI safety and alignment. Where the question is how to design an objective function such that it aligns with our desires even when the AI becomes super intelligent and can explore a much bigger parameter space in it’s optimization.
First of all, due to instrumental convergence, I think it’s fair to say that as the AI becomes better at optimizing a certain objective function it also becomes more general. Given the tradeoff discussed above, the AGI will then not be able optimize the function up to arbitrarily small error, thus possibly solving the AI alignment problem (we don’t have to worry about it going out of control because it finds a more optimal weird state of the world which we didn’t think about).
HTM is a technology; it does not have a goal. The goal of Numenta is to understand how intelligence works and implement it in non-biological systems. We make all our research available, including open source code.
I’ve never thought of it that way. When you say “optimization” you’re talking about tweaking input parameters over time, right? We never do that. You have to find the right params for a data set, and the system will learn over time. “Learning” does not involve parameter tuning in HTM. But I think you understand that based on the rest of your post.
I don’t know what you mean by “less accurate”. HTM needs temporal data. Most of the ML data set benchmarks are for spatial classification, so the only level playing field we have is with time series analysis, and we created the Numenta Anomaly Benchmark for comparisons. That’s the only benchmark we’ve really cared about, and honestly if someone came along and beat us at it I am not sure we’d dedicate our resources to improving our score. There are bigger fish to fry right now understanding grid cells in the context of HTM. Almost all Numenta’s resources are focused there, and bearing fruit (new papers coming). But I digress
I really don’t think this will happen with HTM. To retune an HTM’s parameters is to erase everything it has learned.
Thanks for the quick reply! glad to see that the forum is active.
I don’t think wer’e quite on the same page with regards to terminology.
Of course you do. Tweaking the params doesn’t have to mean doing backpropagation with real valued weights. You tweak the permanence values of synapses in response to new samples over time, right? Every learning system has some state that changes over time to cause the output to be better in some way. The process of exploring this space of state and picking the best one is called learning (in reality we settle for an approximation of the best state to stay computationally tractable).
The tradeoff I was referring to is between the ability of the system to be good at many different tasks (generality) VS how good is it on on each task (optimality). I feel that it’s impossible for a system to be both maximally general and maximally optimal, i.e. arbitrarily good at every task, and that’s the tradeoff. In other words, you gotta have some slack in the accuracy of the system for it to be able to generalize to totally different problems.
And I fear that the ML community may have fallen into this trap. They optimize the hell out of their models to get every last inch of accuracy out of them. They get models that are very good at a particular task and fail to generalize to other tasks. Trying to train a previously trained model to learn a different task completely destroys the accuracy of the old task because wer’e optimizing it too hard.
Does that make sense?
I mean that HTM may need to have more data and compute in order to generate the same level of predictions compared to other ML models (RNNs for e.g.).
Also, I understand that Numenta is focused on understanding the neocortex, but don’t lose sight of why that’s important and useful: we want to build intelligent machines and reach AGI.
The things I said about AI alignment and safety are not directed specifically at HTM, but rather more broadly considering AI alignment in light of the tradeoff discussed.
I meant that the better the AGI gets the more possibilities for the state of the world it can consider. You can think of an AGI as trying to find the state of the world that maximizes a certain objective function. A stupid AGI might only consider a small subset of states, while a super-intelligent AGI would be able to effectively scan a much larger chunk of the possibility space and find a state of the world which has a very high score.
My point is that it may be impossible to create an AGI which can find a state of the world that is arbitrarily good because it would have to be both maximally general and maximally optimal, thus alleviating us from the dangers that people like Nick Bostrom are talking about.
Hope this clears up some of my less coherent mumblings.
(Would love to hear Jeff Hawkins’ perspective, he’s probably to busy for me… )
I think I understand and agree with everything else you said except:
HTM actually needs less data than most systems. For example you don’t need to train it over a huge dataset. It can start producing valuable anomaly indications after seeing only a couple thousand scalar values (with time encodings). Each input we’re talking is like 100 bytes maybe. So if you are talking about raw data, we can start learning very quickly compared to other methods (300KB raw SDR input). You should see our paper Unsupervised Real-Time Anomaly Detection for Streaming Data for examples.
I am curious about how you define intelligent as a quantity or property. This is inplied as necessary to have more of it such as “super.” Are you saying that “deep learning” does not have enough of this property now?
Is there some property that just having more of will result in AGI?
Or could it be that we only have parts of a fully functioning system and the missing parts are preventing the mythical AGI? If this is the case then how do we know if the current technology is enough or not?
I agree, and I want to highlight it. When you talk about model parameters to an ML algorithm, it is hard to compare it to model parameters of an HTM system. Our model parameters are about cell counts, initial connection strengths, topology, distal reach, proximal reach, etc. They are all pretty much directly associated with the underlying biological theory of HTM. Once you pick these, they are built into a virtual structure that is hard to change over time as it runs (some params can be easily changed, but we do not do this as a part of our “learning”, our learning happens entirely as synaptic permanence changes).
So generally with HTM you create model params up front and they don’t change. What changes, what learns over time, are the synapses between the neurons. All knowledge is contextually stored within these weights.
Maybe you can describe what you mean when you say “parameter optimization”?
In the AI alignment community, the standard definition is that an AGI is just a function that takes an objective function f and finds it’s argmax over a space of states of the world: argmax_U(f) (where U is the set of possible states of the world). The larger the space it can take into account, the more intelligent the AI is.
(You can also say it finds the most optimal action (over a space of actions) to maximize the expected value of f in the future, that’s probably a better definition)
In fact we can create a super intelligent AGI right now with just a couple lines of code that brute force the entire space and finds the true maximum. But in reality that’s practically impossible (the space is huge and very irregular) so we must find an algorithm which gives the closest approximation in the least amount of resources.
(In fact a deep enough deep learning model can be a super intelligent AGI, as well as one with a single huge hidden layer, as well as a turing machine scanning all possabilities)
So is that how brains do it? This is the best working model of a general intelligence at the moment and somehow I don’t see this definition as fitting very well. There is more to this.
Brains work when they don’t know very much at all and don’t have a huge search space. Brains work in very intelligent ways long before they are “trained up” with real world data. I could go on but I think you get the idea.
My point is just this: since nobody has successfully built one and validated the approach any claims that an AGI works this-way or that-way are just unproven conjecture.
I may have used this term twice to mean two different things.
With regards to learning algorithms I parameter optimization means changing the state of the model to make it’s output better. In ANN this means changing weights by backpropagation, in HTM this means changing permanence values. What you’re talking about is what’s called hyper-parameters, which are parameters in the space of different models (i.e. change the model itself).
With regards to AI alignment, I mean it to be optimizing the state of the external world to maximize a certain objective function.
Brains are specific implementation of a particularly good approximation of this argmax business. They are very good at exploring a huge chunk of the possibility space and pick the one that (approximately) maximizes their survival, pleasure, well-being (what have you…).
The trick is you don’t have to go through and measure each and every possibility to have a good approximation. You can use various methods to discard large amounts of the search space and that’s what brains do, they have very good rules of thumbs and heuristics (that are learned over time) to achieve this great approximation.
Given your answer - would it be fair to say that the real search for an AGI is to learn what these simplifications and approximations are to make a practical AGI?
Your general method of having a function that has the right answer to all problems in a large search space is about the same as saying that I will just build a Watson database that has all the answers to all the questions I can put to it. In principle it would work but I don’t see that as really building an AGI. I have problems calling a really clever version of Eliza an AGI.
From an AI alignment perspective, we don’t really care how the function is constructed, the problems are all the same. The fact that humans don’t find meaning in anything other than what they perceive to be meaningful and a table of answers does not seem to fit the bill is not a reason to rule it out as “not intelligent”.
Maybe the table of answers doesn’t consider us to be truly intelligent? it only thinks of other tables as really intelligent.
Having a magic table may not provide much insight to us humans, but I consider it to be intelligent all the same.
The fact is that once you understand how the algorithm works under the hood you don’t consider it to be intelligent anymore…
People once thought that playing chess requires intelligence, now computers brute-force their way to victory, it doesn’t seem to require true intelligence. I don’t accept the “if it’s just following a simple set of rules than it must not be intelligent” stance.