Trying to pull some ideas from Deep Learning

robf · May 12, 2016, 4:10pm

Matt,

Bcc to your personal address because I don’t know how well new topics are
notified in the new forum. If no-one else is interested you can reply to me
off-list. (Though I’m not sure how that will work 'cos my first send got
bounced by the list, so you prob. now have two copies, and only one will
reply to the list.)

Matt Lind wrote:

…

As a techie, I wanted to understand the nitty gritty details of HTM and
CLA, so I developed my own implementation (in C). Then, to compare with
other machine learning approaches, I also started digging into deep
learning and built convolutional neural nets.

Listening to Andrew Ng’s talk yesterday (https://www.youtube.com/watch?v=3Dn1ViNeWhC24) brought into sharper focus for me the similarities between what I’m trying to do and the current state of DL.

The only substantial difference may really be that DL assumes structure
should be global, whereas I assert it will be chaotic.

That, and I’m sure the brain squeezes the vision problem into the time
domain. Probably not necessary for vision, but unifying that problem with
older prediction mechanisms.

Anyway, when you implemented convolutional NN, did you do any
feature/structure discovery?

In my model I’m assuming once columns are generalized in the CLA by
following synapse paths across contexts, that we will be able to structure
the resulting generalized groups of columns just by having mutual
inhibitions and allowing the largest/densest grouping to dominate.

But I’m interested to hear your experience how DL cluster their
features/columns to find structure.

-Rob

rhyolight · May 12, 2016, 4:23pm

Rob, I see what you tried to do here, but it didn’t work because Discourse strips off the previous emails from forwarded emails, so I only see your text, not the forwarded email. That’s something to think about as we go forward. I’m going to edit your post and include your original text from the ML so it makes sense.

robf · May 12, 2016, 4:58pm

Short answer. No. The quoting is wrong.

But I can’t see a way to edit from my side. Is there a “trust” block on that, or have I not found the right button?

robf · May 12, 2016, 6:13pm

robf
May 12

Short answer. No. The quoting is wrong.

But I can’t see a way to edit from my side. Is there a “trust” block on
that, or have I not found the right button?

I guess I was trying to edit with the little crayon symbol for your part of
the post, not mine.

Finally managed to recover passwords and things to get back into the Web
interface. There seems to be a timeout.

Underlying Issue with the mess that is appearing in the quoting seems to be
that editing the HTML in email replies to get rid of the little pictures
creates differences between the email display and what gets to the site. I
don’t know, there may be a system to editing it which will remove the long
series of distracting pictures for levels of quoted messages, without
cooking attribution too.

I’m trying this reply now without HTML. Though it strips down to a very
bare bones ascii, without good attribution on the quotes.

Email reply. Here goes.

Oh, but this is interesting. You’ve split this thread. What’s going to
happen? Is it going to the new thread or the old one?

-R

lindmatt · May 13, 2016, 4:06am

Hi Rob,

In convolutional neural networks features are extracted by applying the same map of weights to input pixels located in a small squared kernel window which moves over the image. So the spatial proximity of input signals is considered.

Each weight map represents a feature, hence called feature maps. The more of these maps you create the more features you can extract. All features together form a “layer” which corresponds to a region/level in HTM theory.

The same process is then repeated in the next layer/region, extracting features of the features. And in order to gradually reduce complexity as you move from layer/region to layer/region, signals are (max) pooled, similar to inhibition in HTM.

Cells in each layer/region are only connected to the previous and following layer/region (proximal dendrites). But there are no connections/synapses between cells on the same layer/region (distal dendrites).

It works well for image recognition due to the spatial aspect but there is no sequence learning (temporal aspect) and no consideration of context.

Best,
Matt

TerryHalwes · May 13, 2016, 5:51am

Can’t understand who wrote what, but sorting this out early seems like a
very real plus!

For now, I’d suggest leaning away from the quick and easy including of
parts of messages, toward actually restating what you think the person
meant to say.

Also:

Have a real breath before you send anything you really care about.
(don’t ‘take’ a deep breath – wait till one surprises you. You are allowed
to nudge it a little deeper at that point, if you like.) Taking a deep
breath on purpose is (roughly) trying to con yourself into believing that
you want to reconnect to your ‘actual world’ without having to give up your
current focus. My guideline is “If You can’t breathe (and I’m tuned into
you, resonating with what you are doing right now), I can’t breathe. If I
can’t breathe, I can’t hear you.”
Don’t send anything you don’t really care about (DUH!).

robf · May 13, 2016, 12:32pm

Matt,

You say:

“In convolutional neural networks features are extracted by applying the
same map of weights to input pixels located in a small squared kernel
window which moves over the image.”

Any more detail on that? Where does the “map of weights” come from?

You say “Each weight map represents a feature”.

It sounds like there must be some learning of these weights, and this
learning of weights is the essence of their feature extraction which I’m
looking for.

It also sounds from your description like the size of the window is
important. (Does the word “convolution” refer to combining smaller
features, i.e. from smaller windows?) But there must be something in the
algorithm for learning weights, whatever the window size. This is
unsupervised learning. What are they training to (objective function)? I’m
guessing something like maximum contrast/dimensionality reduction. (Note:
contrast only exists by relating something to something else, by definition
context, but I’m jumping the gun there. Is it contrast?)

The movement of the window over the image suits me fine. I think something
similar will happen in HTM. Movement can convert spatial information into
temporal.

-Rob

tbreloff · May 13, 2016, 1:03pm

@robf I’m honestly a little surprised that these are questions you don’t already know. There’s tons of information and tutorials about convolutions and CNNs out there, but I’ll give you my attempt at a short summary:

CNNs are hierarchies of pattern-matchers, or filters (the convolutions). The lowest convolutional layers will be things like edges, color gradients, and other very basic components of images. The middle layers will combine those lower layers into curves and more complex structures. The highest layers will be able to match very complex patterns which are composed of the lower layers… eyes, arms, cars, etc.

Pooling (typically max-pooling) is a technique to give translation-invariance to the patterns in an image. A convolution produces a matrix of how much each part of the image matches the filter as it slides over the image… a pooling layer will group those together and ask “is there an eye anywhere in this vicinity?” As such, pooling layers are standard in image recognition problems, but fail miserably in many real-world tasks. (For example, Deepmind’s DQN model specifically does not use pooling because you don’t want translation-invariance… location matters!)

Convolutions as a concept are not tied to a specific model… they are just a means for preprocessing inputs which have spatially local properties/patterns, like images. They would possibly be part of a Spatial Pooler in HTM? (Though I admit I’m less up to date on HTM terminology.)

I think more interesting for this group is something like Ripple Pond Networks, which promote rotational and scale invariance and could be turned into a temporal problem. The concepts are somewhat similar to CNNs and can be combined if you’re smart about it.

robf · May 13, 2016, 2:05pm

Hi Tom,

Well, I kind of do know, you know. Though you have to tread a fine line
between getting to stuck in orthodoxy and trying to see something new.

But I can see very clearly where I think they are at. I’m just trying to
massage a little relevance for HTM.

And for that it doesn’t hurt to go over the basics again.

That Andrew Ng talk brought some things into greater clarity for me. I had
seen what DL was doing with hierarchies of features. It didn’t seem that
new to me. But Ng gave it some philosophical context and made me realize
that, in that field, there had indeed been something of a revolution in the
last 15 years.

The ideas I didn’t think were new were not new to me because they were the
state-of-the-art in Grammatical Induction circa 20 years ago. All the
feature discovery stuff. That was what computational linguistics was doing
through the '90s. They weren’t doing in with distributed representations,
but in terms of feature discovery, it was the same.

In NN’s breaking contrastive features down like that must have been new.
Hence the Deep Learning revolution. (Old since the '20s(?) in linguistics,
viz. phoneme discovery procedures in the field. Some more history on that:
it stopped in the '50s when Chomsky demonstrated features could not be
learned, but started again when data became cheap… Always topping out at
about 80% accuracy.)

Equally computational linguistics seems to have benefited from some reverse
cross-pollination around 2003 when Yoshua Bengio (first?) decided to make
his features distributed (inventing Neural Language Models.)

So there was some swap of ideas around 2000. Great.

'Nyways, hearing Ng talk about what a revolution (motivated by biology?)
these feature discovery procedures were for NN’s, presented it to me in
stark contrast to the Grammatical Induction, feature discovery techniques
of the '90s. Which puts it in to sharper contrast with my model, which is a
cure for the problems GI ran into in the '90s.

Now with this thread I’m trying to get back to basics and find common
ground in DL so I can show how my solution is relevant to DL, just as it is
relevant to GI. And since everyone in HTM is currently being seduced by DL
(Matt L, @Felix? Spatial Pooler, anyone?) it might save HTM some time, as
people in this community slowly drift into feature discovery procedures by
smuggling DL assumptions into the Spatial Pooler.

HTM needs to clearly understand what is good in those traditions (actually
summarizing contrasts) but also what is bad (bottom up, fixed features.) By
keeping close to the biology HTM has so far avoided the theoretical
assumptions which are crippling both DL and GI (assumptions: GI - that
grammar can be learned, DL that features can be learned. Ah, and by
learned, I mean that these features are global, stable, not chaotic, in a
word.)

Oh, and on that subject. Yes I love Ripple Pond Networks. If by that you
mean Reservoir Computing. Because Reservoir Computing does not assume the
reservoir cannot be chaotic.

-Rob

P.S. I’m going to try to edit your bottom quote to be only what you wrote.
See if it works. (Oh, so it just strips off the whole thing. Kind of the opposite of the “kitchen sink” approach sent to email. Ed.)

tbreloff · May 13, 2016, 2:31pm

That makes sense (hence my surprise when you implied you didn’t know what convolutions were)

I suppose RPN were intended to be used within RC, but I don’t think that’s a requirement. My memory is that the important part of RPN is how the spatial data is processed in a swirling, temporal way. Whether that’s fed into a Liquid State Machine or LSTM cells or HTM region is kind of beside the point.

And yes… I’m a fan of the chaos of RC as well.

robf · May 13, 2016, 2:33pm

Tom,

I should reply to the specific content. See if my inline quoting works.

…

CNNs are hierarchies of pattern-matchers, or filters (the convolutions).

Oh good. Thanks. I lost track of the specific reference of convolutional
for the minute. I had seen convolutional NN’s pulling translation invariant
features out, but there is also this aspect that higher layers will try to
combine features for higher patterns. I wasn’t sure if it was the filtering
aspect or the combining aspect which was the specific word reference.

The lowest convolutional layers will be things like edges, color
gradients, and other very basic components of images.

Cool. So these are all contrasts.

The middle layers will combine those lower layers into curves and more
complex structures. The highest layers will be able to match very complex
patterns which are composed of the lower layers… eyes, arms, cars, etc.

How do they train to pull out more complex feature structures? What
objective function do they use to discover curves?

Pooling (typically max-pooling) is a technique to give
translation-invariance to the patterns in an image. A convolution produces
a matrix of how much each part of the image matches the filter as it slides
over the image… a pooling layer will group those together and ask “is
there an eye anywhere in this vicinity?” As such, pooling layers are
standard in image recognition problems, but fail miserably in many
real-world tasks. (For example, Deepmind’s DQN model specifically does not
use pooling because you don’t want translation-invariance… location
matters!)

So this is that combination of features angle I talked about, but with
absolutely no parameterization of the combination.

The problem is a bit like the one facing syntax modeling. You are stuck
between fixing a phrase, which often depends on exact wording for its
meaning, but which gives no “transational invariance = new syntax”, and
just giving up and allowing words to occur in any order. No-one has been
able to figure out how to strike the right balance. Syntax seems both
impossibly specific, and impossibly variable.

Convolutions as a concept are not tied to a specific model… they are
just a means for preprocessing inputs which have spatially local
properties/patterns, like images. They would possibly be part of a Spatial
Pooler in HTM? (Though I admit I’m less up to date on HTM terminology.)

Well, this is the question I was kind of trying to find some common ground
on (then I will show the discovery procedure is OK, it is just it needs to
be done top-down, chaotically, not bottom up.) What is the objective
function of your feature detector? A lot of people here have become
interested in features, are relating it to the spatial pooler, and are
seeking inspiration in DL. That’s OK. But I’d like to head them off at the
pass and show that DL also has a problem with these features (related to
the problem Grammatical Induction met 20 years ago.)

-Rob

(OK, so inlining didn’t quite work. I’ve had to pull my final paragraph out of Tom’s here in the forum. Maybe you need to add another line below inline when writing, so Discourse can see it is separate. And more. Seems to stick comments to quote above. Need an extra line. Ed.)

TerryHalwes · May 13, 2016, 2:47pm

The line is only a fine line till you really focus on it. HTM-wise, you always get more of whatever you’re interested in ( like it or hate it, doesn’t matter).

ideas from Deep Learning

          tbreloff
          

          May 13

robf:
Well, I kind of do know, you know. Though you have to tread a fine linebetween getting to stuck in orthodoxy and trying to see something new.

That makes sense (hence my surprise when you implied you didn’t know what convolutions were)

robf:
Yes I love Ripple Pond Networks. If by that youmean Reservoir Computing. Because Reservoir Computing does not assume thereservoir cannot be chaotic.

I suppose RPN were intended to be used within RC, but I don’t think that’s a requirement. My memory is that the important part of RPN is how the spatial data is processed in a swirling, temporal way. Whether that’s fed into a Liquid State Machine or LSTM cells or HTM region is kind of beside the point.

And yes… I’m a fan of the chaos of RC as well.

Visit Topic or reply to this email to respond

In Reply To



  
    
      
        
          
        
        
          robf
          

          May 13
        
      
    
  


  Hi Tom,

Well, I kind of do know, you know. Though you have to tread a fine line between getting to stuck in orthodoxy and trying to see something new.

But I can see very clearly where I think they are at. I’m just trying to massage a little relevance for HTM.

And for that it doesn’t hurt to go…

Visit Topic or reply to this email to respond

To stop receiving notifications for this particular topic, click here. To unsubscribe from these emails, change your user preferences

Topic		Replies	Views
Microsoft Research Talk by Jeff and Subutai Talks and Events	4	748	March 27, 2019
Geoff Hinton and the Thousand Brains Theory Tangential Theories research	2	1041	July 31, 2023
Deepmind DNC paper Lounge	5	1655	October 14, 2016
A different point of view on building AI system Tangential Theories	45	3999	December 2, 2017
Simple Cortex Tangential Theories	38	3899	October 8, 2017

Trying to pull some ideas from Deep Learning

Related topics