Going through a deep learning class on udemy a bit, I learned about keras, and thought it’d be great for testing since it has other neural networks set up for training and general usage. However, I’m wondering what would be a good model for training. I’m planning on keeping the model running continuously if I can so I can see how it reacts to different things in real life, and there’d be an output function to look around.
For input, from what I can tell, the retina already takes in information somewhat pyrimidally, so I’ll just pass it all in that way.
For convolution + max pooling, I’ll select for simple features like lines or colors at first, then take the max pool and use that to shrink the result to about 1/(e^(1/2)) the input, which will give some translation/skew/rotation invariance. Later on, I can also pass in that shrunken pyramid, minus the top, as the input to a similar network with a different convolution.
For the spatial pooler, I’ll use a dense network first, and I’m only planning on passing only the smallest part of the pyramid in. That way large, simple features can be inputted alongside small, detailed features.
Then I plan on passing that into an RNN I’ll replace with a temporal pooler.
I think I can modify my image pyramid function for this pretty readily, and it should be both simple and expandable.
What do you guys think? Should I use this model? Some other existing model? Something closer to the neocortex and surrounding systems?
How are you turning a dense network into a SP? They work very differently. For the very least, training a NN needs a error function and gradient (and thus a “gold standard” value to compare to). But non is needed in a SP.
Edit You might want to use a RBM (restricted boltzmann machine) to replace SP.
Thanks! I would’ve probably tried auto labeling the data, or to hack a dense layer into something like a RMB. Now that I know of that though, it looks like it’ll be much easier to work with, and it looks like there are a few implementations for keras!
It seems like some people think dense layers are generally more useful for learning though, which could be why they’re not by default part of learning libraries, so I might still want to try using the dense layers with auto labeling. Though, I get the feeling using deep learning in real time and continually learning hasn’t been explored as much as batch training.
Edit: for auto labeling, I could take the consensus of multiple untrained nets by average pooling their output, then using that for training. Still seems hacky though.
There are 3 concepts from SP that I think you need to apply:
potential pools
minicolumn competition
boosting
For the potential pools, you can set up front and never change, just like we do. For the competition, use a k-winners function for activation instead of RELU. But then you have to perform boosting as much as possible, which means you need very small batches, and run boosting logic between as many samples as possible (ideally continuously).
Boosting may have less of an affect once the cell population has reached some homeostasis, so you might be able to get away with only boosting for some period of time, or only where it seems necessary (boosting is going to be expensive in all the DL frameworks I’ve seen).
From my experiences working with DL
There are 2 ways to get pain free unsupervised learning: RBM and Auto Encoder. RBM works like a SP, but instead of learning using simple rules and trying to maintain a constant density, it learns by solving equations to get to a minimal-error state. An Auto Encoder on the other hand is simply a neural network that tries to reconstruct the input.
Anyway, you might want something better than Keras for your task. Maybe PyTorch. Keras is simply not flexible enough.
We are doing our experiments on pytorch. The main incompatibility I see so far is that we have to run very small batches of input so we can run the boosting logic continuously.
I’m using tensorflow to implement array and other operations, and I’m planning on putting that into keras.
Pytorch and tensorflow seem lower level than keras, and keras can use both of them as backends. I’d be using keras for integration tests using a fully built model in tensorflow or pytorch. I could use tensorflow or pytorch entirely though, but I don’t think I could use keras.
Ok, I’m going to switch to pytorch for all of my personal code.
Looking at PyTorch vs. Tensorflow+Keras, PyTorch is easier to debug than tensorflow while tensorflow supports more devices and more backwards compatibility. I’m not a company that “needs” deprecated code, I’ve been running everything on my laptop, and I’ve been having trouble debugging tensorflow. I’m also writing non-standard algorithms, which have much more support in PyTorch. Also, fast.ai wrote a long blog post about switching to a PyTorch backend, including details like many competition winners using PyTorch.
I discussed an idea here where the HTM stuff is divorced from the DL stuff. To train the DL, you still need to have some sort of error measurement in order to find your gradients (unless you’re trying to just do some regression from a trained network).
What are you intending this combined system to be used for? If it’s classification, you’d want to add a softmax on top of it all during training. You’d have to consider at what point you would be updating your HTM components… At the end of each forward pass (when you get your y^ value)? End of each batch?
As a simple test, make your batchsize one, then update your pools at that time, dependent upon the error level of your y^ value.
Slightly unrelated, but perhaps worth mentioning if you’re feeling like mixing DL and HTM, I think Mark Brown has floated the idea before of using anomaly detection as a means of guiding DL training. That is something worth looking into as well.
I’m thinking of finishing up a test robot that interacts with the world. In the starting case, I’ll take that to mean moving its eye (entire body) towards the most unlearned/unpredictable object in the scene, then learning that object. This may be close to Mark Brown’s anomaly training idea.
In this case, I’ll want to update the HTM components after each input image, or at least a small group. So I’m not sure if that will end up being for forward passes or for batches. I’ll start with the batch size of one though.
This gave me an idea, which I just created a new thread about here. The way I see it, the inverse of the problem you’re looking to solve is the self-driving car problem.