Neural Architecture Search for HTMs?

I agree. But what if nature doesn’t consciously calculate (which I would assume) these gradients, what if these method of calculation is just our (humans) way of formalizing a subset of a bigger and much irreducible calculation/computation?

Hence I thought of a scientifically guided exploratory search.

In the “billions of parameters” post, some member added that gradient-descent based algorithms can basically search for any algorithm. This is wrong, for the simple reason that gradient search is contained in its static world - it cannot adapt quick and massive parameter updates just like how we would model the parameters of the world.

2 Likes

Not sufficient as a goal (1) and as a method.(2):

  1. “Better prediction” alone doesn’t specify a measurable parameter to optimize for. There could be several, competing parameters - e.g. accuracy vs sample efficiency.
  2. genetic algorithms (or alikes) optimize for a single criteria. In evolutionary terms, the more specialised an organism is for a particular niche, the less likely it is a good survivor in all environments.

And finally, genetic optimizations optimize only within the given architectural (or conceptual) framework. They do not (or have very small chances) on addressing more fundamental issues.

1 Like

Actually, that’s not really true. Genetic algorithms, just like those in nature, are by nature parallel. They can optimise multiple parameters for survival against a single criterion, or a single parameter against multiple criteria, or both. Survival can be a complex and moving target, and this is where adaptation really works.

But yes, and the issue is time. A GA optimising parameters is like an organism optimising its choice of genes from available alleles. This is adaptation and can be done usefully in quite a modest number of generations, say 10-100 or so. But as soon as available genes won’t hack it and you need mutation to make some new ones, that takes serious time, 1000 generations and up.

I’ve written GAs to do both, and the difference is striking. Adaptation is time for a cup of coffee, but mutation is come back next week, or next month.

2 Likes

Yeah, I guess natural selection is the one true general algorithm, also there are lots of nuances. First nature did not came up with one single solution. Each has to account for some level of specialization for its particular environment(s).

Back to HTM in particular, being it’s conceptual constraints - sdr size, cells per column, segments per cell, synapses per segment, etc… will end up concluding that, surprise-surprise, the more synapses it has, the more complex an input space it will be able to model.

A second obvious observation will be that, when we search for sample efficiency, the fewer parameters (synapses) the faster it learns (both sample & compute efficient).

Which is contradictory with the previous observation. Remaining strapped to the current HTM concept we can optimize only within this space.

Increasing both number of parameters (= complex problem space mapping) and sample efficiency would demand a change more fundamental than just an HTM parameter search.

1 Like

I see what you mean.

I wasn’t thinking about the params to optimize, instead I was thinking about the goal that we perceive. Of course there can be many params that can be optimized. But at the end of the day, I want my model to predict better from an observer perspective.

–edit—
I looked back and you were actually specifically asking for “optimization” sorry I missed that. So the answer to that is I don’t know which to optimize.

1 Like

By experience/example yes, but this is extremely hard to prove for nature’s case.

1 Like

Nature does not use temporal-columnar arrangement of neurons everywhere. And there is little evidence that columnar cortex is “all you need” in order to spark intelligence, specially when, despite its volume, a minority of brain’s neurons populate it.

1 Like