Application of HTM in today’s ML frameworks

You’re saying the problem is with how DL works, not the hardware nor the frameworks, right?
I’ve kinda ran into the problem when I made DeepHTM.
I just solved it by updating the parameters every 10 steps instead of updating them every time.
You can think of it as having the size of minibatch as 10.
It was a temporary solution and might not scale well.
But there are many techniques such as layer normalization that can help with online learning.
I’ve tried them, then it kinda worked.
And I used the pure SGD but backprop techniques like RMSProp or even momentum could help as well.
Correct me if I’m wrong, but wouldn’t many of deep RL not work if online learning isn’t possible?
Also, shouldn’t an HTM system or any other online learning system that’s implemented with the ML frameworks work just fine if the problem is with how DL works not with the frameworks?

1 Like

This is not how we’re doing it. We’re not calculating firing rates or anything. We just use the SP to make the connections sparse, but we keep them floats. Stay tuned I’ll be working on this all week. Will have code as soon as possible.


True, but elapsed over a long period of time, boosting would have about the same effect on firing rate as batch normalization would have on values of neurons. I mixed up my terminology.

I’m planning on working on my code every day after work this week. It’s all in python and tensorflow. Could I help out?

1 Like

I’m just amazed by your insight.
That’s such a great way of viewing them!

How can SP form sparse connections if the inputs are dense?
Doesn’t SP form sparse connections because of its sparse inputs?
Wouldn’t the connections turn out to be at least dense as the inputs?

Sorry @SimLeek and @hsgo I should clarify, I was talking only about the spatial pooler in the context of deep learning frameworks as a layer. Not the spatial pooler by itself, apologies.

Yes there are plenty of SGD methods, you can even find a useful list with enough information on the wikipedia page under Stochastic Gradient Descent. Here is a link so you can read up on it. They have plenty of examples. What I don’t think it mentions is that most of these methods will benefit from batch normalization. But again, that’s not the problem. You are adding time to every inference run.

If i is inference run time, and b is whatever backpropagation and learning method you use, then i < i+b and will always be less than that. If you optimize i, it will still be less than any version of i and b. There is no getting around it.

It depends on how they are doing it, but if online learning were impossible then yes, it wouldn’t work. But it’s not impossible so it does work.

@SimLeek I should also mention that I have not experimented with any HTM or Spatial Pooler concepts on Phis or GPUs only CPUs. I was assuming you were talking about HTM concepts within the context of HTM Integrated Deep Learning Systems. My bad. My comments were on my experience with DL on Phis and GPUs.

1 Like

Heh. Seems like there’s confusion all around.

I was talking about HTM Integrated Deep Learning Systems though. But I was mostly referring to boosting, and a few actual spatial pooler mentions, since that was brought in.

I want to take things one step at a time. Boosting is useful for my image recognition, since I can get more sparsity while still getting the whole input over time, which limits the calculations I run on the CPU while still eventually getting the full image, as well as important updates.

1 Like

Please review the mechanism. Numenta uses a k-means voting to select a winner for a given area; this reduces the local population activation and produces the sparsification.


Ah, I see there was a misunderstanding at the time. I’m sorry.
Was he saying SP makes sparse connections to the layer after?
I thought it was from the layer before(the input layer for the SP layer).
If not, I’m even more confused. :confused:
EDIT: Wouldn’t the connections turn out to be at least dense as the layer after anyway? I don’t think I get it.

Please be patient. My job is to explain it all to you, but I have to understand it first. :slight_smile:


The funny thing is that DL is actually moving toward a more “binary” activation networks, and it seems to start working lately. .


I’m trying to at least. Note the: “not exactly sure what I’m doing” comment. I tried to approximate sparsity enforcing with small convolutions, but I have another idea I want to try that’s more like some competitive attractor networks. Think: gravity bringing things together while electromagnetism (or dark energy) keeps it apart.

I don’t think I can do a global k-winners activation with local algorithms though, but I should be able to enforce a max.


We are working on a complete paper with code (weeks not months), you’re going to like it. I would love to see what sparse activations look like running when compared to dense activations. This is going to be a fun year. :nerd_face:


Looking forward to it! I’ll try to get those sparse activations working right then.

1 Like

There is a lot of confusion in this thread, so I am working hard to run these new models and understand them before we release this paper. I’ll have a video to support the paper coming soon that will further explain the model setup in the paper. So stay tuned!

In the meantime, keep in mind these 3 things we can use from Spatial Pooling to enforce sparsity in a neural network:

  • potential pools to enforce weight sparsity
  • k-winners to represent a global minicolumn competition (inhibition)
  • boosting to enforce homeostasis in the layer (must compute active duty cycles for this)

@rhyolight, what do you mean by noise tolerance? What kind of noise? Is this paper about image classification?

Sparsity seems to help quite a bit with additive noise. Random or structured. Yes, images. I’ll show some examples once the paper is out.

1 Like

Cool, I just submitted a paper on NN noise tolerance to IJCNN, so I’d be curious to look at your results. Where did you submit it to?

1 Like

The paper is out: