While I was learning “learning how to learn” a.k.a meta learning and been pondering on search algorithms for searching good optimizers hence my previous post about billions of params (thanks to the responders), I stumbled on the following article:
Then I thought, would it be worth it or useful to do a search for HTM algorithms as well? This would be a search outside the gradient-deecent search algorithm space.
I’m thinking out of the box here, what are your thoughts?
I’m curious if there has been any “qualitative-gradient” schema to guide an optimization/search algorithm?
Numeric approaches mandate quantitative “loss/fitness function” which is differentiable w.r.t. model parameters, thus numeric gradient can be calculated to improve the “parameters” toward a smaller/larger loss/fitness function result.
I don’t think an architectural search can be that “quantitative”, or can it be?
If we’re left with only qualitative options, what a device can work in place of the “numeric gradient” to point out the direction for model evolution?
An idea emerges to answer my own question, though I don’t think it perfectly scientific, but in darwinian processes as defined in book THE CEREBRAL CODE, seems that survival vs juvenile-mortality can be such a ratchet, roughly functions like numeric gradient, if massive trial-and-error is afforded.
Survival might be a good overall optimization criteria, but for an evolutionary search within a brain-like machine what is/are these inner survival criteria(s), and which are the competing parts?
Superiority should be the performance of the algorithm relative to vanilla HTM.
I think there should be a scope of the likeness using params so it will be a variable and up to the engineer. For now, I can imagine evolving HTM structurally and changing its params to find other HTM algorithms.
Sorry, this isn’t what I asked. What is the performance criteria you wish to optimize for? Stronger? Faster? Better eyesight? - and if yes, what exactly makes an eyesight good?
As I understand it, THE CEREBRAL CODE book would say that patterns compete for occupation of cortical columns, and diversity seems the after-survival winning criteria.
It sounds right to me: when multiple, diverse sources can agree on a conclusion, it’s more believable than those judgements by isomorphic sources.
But then it becomes very strange, for “the existence” of mathematical rules and physical laws, which appear universally true, regardless of the number or authority of people who believe them.
Yeah those book quotes sound promising, I found a very good one on another page about the book:
Intelligence is what you use when you don’t know what to do - Jean Piaget
Which reflects quite well what AI/ML, HTM included, fails to address.
Yeah, this could be useful, but I think is not sufficient.
To be a bit more clear (but still vague, sorry) - I think all DL edifice approaching trillion parameter models are temples built to worship the Minimum Loss God where we burn countless FLOPs in the hope He will reward us with an AGI saviour
Oh, don’t worry, we have the Large Nanometer Minimizer to maximize FLOPs to Gigawatt-hour . There are minor issues with it too e.g. keeping it out of China.
Ironically, China may have the least costing power supply from Three Gorges Dam plus many smaller hydroelectric power stations built there. That’ll be economically controversial.
I think (almost tend to believe or bet on) that algorithms running on an HTM-architectured memristor based system (ideally stacks of such info-processing hardware—without any inbuilt emulation of our somatic and basic social needs, since any ambition to achieve such emulation is a sign of serious inEPTitude) might be the best we uniquely EAVASIVE simians can expect to (cause to) non-biologically evolve.
My own aeimcinternetional.org (currently the Heady Office of the Fairyland registered ÆIMC Internetional Ptd. Lty. — short for ÆPT insights marketed conspicuously on the ‘Internet’) is currently accessible on the WWW via this frivolously paid for (fancy-looking and facetiously fun for me to use also in order to keep me at least somewhat incognito, since I’m a cocky and provocative coward) URL.
Interestingly in the 80/90’s relevant if not one of the founding researches took place and in some of the papers cited they’ve mostly used genetic algorithms to search for better programs that learn how to learn. No gradient descent-based algorithms used initially. A disclaimer for me is that I haven’t finsihed reading these papers and I find it very computer science oriented rather than machine learning/ numerical which is quite intersting to me after all everything can be represented by a program.
I stumbled on Schmidhuber while listening to Lex Fridman’s podcast and man he is a legend in this meta learning studies. If we listen only to mainstream “meta learning” we will learn shallowly for sure, there is so much in it.
Here’s another probably unexpected thing, symbolic AI proved empirically that some meta learning is necessary - Lenat’s work. Because of the mainstream ML of today, this history is kinda lost IMO.
I don’t know. Maybe the search would be more of exploratory rather than an optimization one. Then at some point base classes of HTMs will be determined and their capabilities will be concluded.
Perhaps the search can be done as simple as exploring HTM’s hyperparameter space. The criteria for HTMs that are worth studying should be the ones that can be considered as a base class of HTM by structure.