What is a "hyperparameter" in HTM vs Deep Learning?


#1

This is not true, in practice.


HTM and Deep learning
#2

It is true once you have a model created. There is no way to parameter tune once the model is created, which is different from DL.


#3

But, in ML, hyperparameter tuning doesn’t necessarily imply that the hyperparameters are tuned or optimized after the model has been created or trained. In NuPIC, there’s a form of hyper-parameter tuning, so the general sentence “No hyper-parameter tuning or other human intervention” is incorrect.


#4

@nbro - I don’t want this to turn into a long stream of back & forth discussions. I think that it is fair to say that if a deep network fails to converge it is fairly common to tweak parameters or even the whole model and continue. The previously learned connections are not necessarily invalid.

With HTM this is generally not necessary or even possible.

Yes, you can fiddle with some HTM parameters before training starts but you cannot after the learning commences as the configuration of the learned bits depends on them not moving in relation to the input stream; a bit position must match the related feature or it loses the meaning of that feature.

You may quibble on the detail that HTM has parameters but I will agree with what @rhyolight stated that once the training starts the model configuration is frozen.


#5

At an Intel AI conference last year, I saw the main AI guy at Intel say the way they “learned” was to update hyperparameters, create a new model, and throw the old one out. I think this is a really common tactic in today’s DL systems. The point is that they cannot adjust to changing input streams without updating the model params. HTM does not have that restriction. It is a part of the whole “online learning” thing.


#6

What you said is valid, but it doesn’t contradict what I said. In the current HTM implementation (i.e. NuPIC), there’s a form of hyper-parameter tuning. NuPIC has hyper-parameters, which are either manually set up (i.e. human intervention) or are searched, e.g. using evolutionary algorithms (i.e. hyper-parameter optimization).

Actually, the fact that the current HTM implementation (i.e. NuPIC) contains a high number of different hyper-parameters upsets me a little bit. Of course, this is due to the fact that HTM is a more complex model than many other ML models.


#7

@nbro I think you are missing the point, which is online learning. With regards to “hyperparameter tuning”, HTM and DL are completely different.

All models have initial parameters that must be tuned to process digital input. For ANY model, someone or something must create these initial parameters. This includes things like network architecture, size, connectivity, permanence thresholds, input space dimensions & encoding, etc. Once these parameters are established for a data source, then we talk about “hyperparameters”.

At this point in HTM, we don’t tweak any more params. There are no such thing as hyperparameters. At this point with DL, you run on hundreds of millions of data points in production, tweak the model to tune it to be better, then run on hundreds of million more data points, tweak the model, repeat.

The point is that HTM learns new patterns online without hyperparameter tuning. For DL to learn new patterns it was not designed to detect, it must be changed and all learning is thrown away.


#8

I didn’t miss this at all. Online learning is not what I am talking about. You started talking about it, whereas I just wanted to talk about hyper-parameters optimization.

DL is not just one model. There are many different models. There are several different ways to optimize hyper-parameters.

These “initial parameters” are hyper-parameters.

Actually, no. Hyper-parameters, in ML, are all parameters which are not found directly via the training process (e.g., the number of neurons in a layer of a feed-forward neural network). Weights of the connections are not hyper-parameters, because they are “found” during the training process.

The fact that you have not tried to tweak these parameters it doesn’t imply that they are not hyper-parameters. They are fixed hyper-parameters (so far).

Again, there are several methods to perform hyper-parameter optimization. See the Wikipedia article I linked you to above. Also, it’s not true that hyper-parameter optimization is usually performed in production: it’s usually performed during training.

Maybe we would obtain better results, in a few cases (at least), if we tweaked them (maybe according to changes of input)?! Has anyone ever investigated this?

Again, I really think you make a big mistake by talking about DL as if it was only one model or approach. Anyway, it would be better to refer to ML (instead of DL), as there are so many useful models in ML that are not necessarily “deep” (e.g. SVMs).


#9

Don’t forget that HTM is biologically constrained, which means that it’s not allowed to do anything the brain doesn’t do. So this is off the cards, unless the brain is able to tweak its chemistry/topology in response to different inputs (my understanding is it doesn’t).


#10

But every brain is (slightly) different from all other brains. I don’t see why tweaking these hyper-parameters would be against nature (i.e. it would be like finding the best brain to perform a certain task). Anyway, we can still optimize these hyper-parameters using constraints.


#11

@nbro From this wikipedia article you linked:

If you follow this definition, then you must say that every HTM parameter is a hyperparameter. Pay close attention to the word training above, and remember that HTM does not train. It learns continuously. There is no training phase, and therefore no separation of normal parameters vs hyperparameters.

What params in HTM are you thinking about tweaking? We’ve tweaked the params a ton to try to get them to a state the actually works. If you change even one param too much, it no longer resembles a cortical process. I would argue that most of the parameter tuning has been done already.


#12

Training or continuous online learning: it doesn’t matter. The point of that sentence is that those parameters are set before the algorithms are executed (they e.g. define the structure of the architecture of the model, the learning mechanism, etc.).

Honestly, I’ve not yet experimented enough with NuPIC to do more than raising the question/suggestion.

Anyway, I didn’t really want to start a “war”. I just wanted to point out that, according to the knowledge of master’s student in AI, current HTM implementations, of course, have hyper-parameters, and, of course, hyper-parameters can be optimized.


#13

For the sake of the HTM world, I think you can ignore the “hyperparameters” label altogether and we can just get back to calling them model parameters. :man_shrugging:

Whatever you call them, we set them before algorithms are executed, and they are never tweaked manually in real time.


#14

That being said, there is no reason not do try this. I think in NuPIC you can change some of the model params directly on the sp / tm instances like permanences. This allows you to change permanence thresholds during runtime, but we never do this. Feel free to try it out!


#15

Obviously not a neuroscience myself, but I can imagine the possibility of chemicals which affect various
properties on the fly. Certainly from a software perspective I can imagine this being a useful capability (for example, when something particularly anomalous is detected, I might want to boost up the learning rate to quickly acquire new knowledge, then dial it back again when things go back to normal to promote stability, error tolerance, and generalization).


#16

I have seen several lines that point to dynamic modifications of the learning rate.
The amygdala certainly releases learning promoters when a particularly significant event happens.

The link to the RAS/RAC works to gate “more” of anomalous input which indirectly serves to increase vigilance and learning. More on this: I think that the cortex is always weakly “resonating” with the senses and if the cortex can find a loose match nothing special happens. If there is some degree of surprise the RAS detects the mis-match between the input stream and the cortex as “surprise” that needs attention. The RAC opens up the firehose of sensation flooding the cotex with a strong stream of sensation which is learned with our online/one-shot learning until enlightenment is achieved. (The cortex, resplendent in its newfound learning, resonates with the input stream!) This is how I have come to view the “spotlight of attention” that was proposed by Francis Crick.

There are some other mechanisms but these are the ones that I feel the most certain of.