Hi Tom,
Well, I kind of do know, you know. Though you have to tread a fine line
between getting to stuck in orthodoxy and trying to see something new.
But I can see very clearly where I think they are at. I’m just trying to
massage a little relevance for HTM.
And for that it doesn’t hurt to go over the basics again.
That Andrew Ng talk brought some things into greater clarity for me. I had
seen what DL was doing with hierarchies of features. It didn’t seem that
new to me. But Ng gave it some philosophical context and made me realize
that, in that field, there had indeed been something of a revolution in the
last 15 years.
The ideas I didn’t think were new were not new to me because they were the
state-of-the-art in Grammatical Induction circa 20 years ago. All the
feature discovery stuff. That was what computational linguistics was doing
through the '90s. They weren’t doing in with distributed representations,
but in terms of feature discovery, it was the same.
In NN’s breaking contrastive features down like that must have been new.
Hence the Deep Learning revolution. (Old since the '20s(?) in linguistics,
viz. phoneme discovery procedures in the field. Some more history on that:
it stopped in the '50s when Chomsky demonstrated features could not be
learned, but started again when data became cheap… Always topping out at
about 80% accuracy.)
Equally computational linguistics seems to have benefited from some reverse
cross-pollination around 2003 when Yoshua Bengio (first?) decided to make
his features distributed (inventing Neural Language Models.)
So there was some swap of ideas around 2000. Great.
'Nyways, hearing Ng talk about what a revolution (motivated by biology?)
these feature discovery procedures were for NN’s, presented it to me in
stark contrast to the Grammatical Induction, feature discovery techniques
of the '90s. Which puts it in to sharper contrast with my model, which is a
cure for the problems GI ran into in the '90s.
Now with this thread I’m trying to get back to basics and find common
ground in DL so I can show how my solution is relevant to DL, just as it is
relevant to GI. And since everyone in HTM is currently being seduced by DL
(Matt L, @Felix? Spatial Pooler, anyone?) it might save HTM some time, as
people in this community slowly drift into feature discovery procedures by
smuggling DL assumptions into the Spatial Pooler.
HTM needs to clearly understand what is good in those traditions (actually
summarizing contrasts) but also what is bad (bottom up, fixed features.) By
keeping close to the biology HTM has so far avoided the theoretical
assumptions which are crippling both DL and GI (assumptions: GI - that
grammar can be learned, DL that features can be learned. Ah, and by
learned, I mean that these features are global, stable, not chaotic, in a
word.)
Oh, and on that subject. Yes I love Ripple Pond Networks. If by that you
mean Reservoir Computing. Because Reservoir Computing does not assume the
reservoir cannot be chaotic.
-Rob
P.S. I’m going to try to edit your bottom quote to be only what you wrote.
See if it works. (Oh, so it just strips off the whole thing. Kind of the opposite of the “kitchen sink” approach sent to email. Ed.)