I don’t know the definition of online decoding, but yes CNN is a traditional supervised learning algorithm which requires a large labeled training set. It also has no relation to biological neural networks.
Another way to think of the basic concept, is imagine one person using words to describe something they saw to someone else who wasn’t there. This scene from “Someone Like You” comes to mind:
For anyone interested in doing this, you can use the /expressions/similar_terms API, with a body like:
I did this for the ET image and SDR that @Jose_Cueto posted.
The original concepts found by Clarifai were:
wildlife: .972 portrait: .970 nature: .969 animal: .968 staring: .952 one: .947 eye: .946 face: .945 looking: .927 isolated: .918
Running the SDR through the above similar terms API suggests:
animal * looking * eye * pet cats wild * face * animals * eyes * cat
(I added a * by the ones which match the original classification). An extraneous “cat” idea seems to have been introduced here, but given the rather non-specific original classification, seems like it did an OK job in this case.
How can you tell that your algorithm is doing a good job? One way to validate it is reconstructing the original image from the SDR.
Here’s an alternative (and much simpler) approach: just use a sparse autoencoder, trained on a large dataset.
Sure, there are no doubt many other better ways to accomplish this. I put this together after a conversation we had on Hackers’ Hangout, where I mentioned one really quick and dirty way to connect a lot of traditional AI algorithms with HTM is by simply leveraging word SDRs.
Is there anywhere that suggests that the brain reconstructs entire images like this?
Considering that the perception of an image is a constructive process that involves serial scanning of portions of an image there really is no place where “an image” ever exists in the brain at any point.
As far as I know we perceive and recall sequences of tokens that let us construct and reconstruct these properties of an image; as fast as we can consider some aspect of an image that part is reconstructed in recall.
These tokens are initially formed by perception and learning of novel features. From that point these features become available for perceiving and recalling things that contain that feature.
I am excluding the images that are directly and unconsciously processed by sub-cortical structures.
The encoding (feature extraction) done by convolutional layers is inspired by visual cortex 
Convolution is inspired by the foveal scanning that extracts these streams of tokens. I understand that the biological mechanisms have been emulated to make a process that is more suited to use by digital hardware but it is an inexact copy of these mechanisms.
There are bunches of good ways to do image processing - I am just pointing out that reconstruction of an image does not seem to be biologically plausible.
The usual focus of HTM researchers is to make something that works in biologically inspired ways to help model and understand the wetware.
Sorry, forgot to answer your question. Typically the best way is to compile a good set of realistic test data and perform whatever prediction or anomaly detection task you are planning, to see how it does.
Perhaps. I don’t think we know enough about how a brain reconstructs an image while trying to visualize something. In this case we use reconstruction as a way to test the quality of an SDR.
In any case, using a convolutional network to encode an image seems to be way more biologically plausible than some of the methods used by cortical.io.
I think that neither method is suitable for anomaly detection. Feature extraction strips away most of the relevant information to detect an anomaly. For example, neither Clarifai’s classifier, nor an autoencoder will produce any relevant features for an image with people walking upside down, or cars driving wrong way.
In fact, thinking more about this, it seems that the self-supervised learning approach advocated by LeCun would be way more effective: https://youtu.be/7I0Qt7GALVk?t=2472
I see this as a huge unsolved problem for HTM systems.
HTM is great at saying “I saw this thing” before. It can even say “I saw this sequence” before.
I really don’t see how HTM will be able to match up some cue, like a perceived printed or spoken word to a paired sound or image or stream of tokens that make an image.
The current excitement in Numenta regarding the coding in tuples of grid nodes (object) (displacement) (object) does not really solve this problem yet. That gets us to a representation of the relation of perceived features. The work to use hierarchy to resolve that to a higher level representation is unfinished
I have high hopes that I can bridge this problem using hex-grid coding and tuples of tokens and sequences of tokens but at this point I do not have a functioning system. I do think that the enabling technology will be the dumb boss/smart adviser model but I have not worked out how these systems interact to the point where they function together organically. The few toy attempts produced very limited stereotyped behavior.
Depends on the use case. If looking for anomalies in an image, no this is not useful. For detecting weird physics like upside down people, also not useful. If detecting an increase in the frequency of cars passing in front of a camera at 8:00 AM on a Saturday compared to typical Saturdays, then maybe useful.
inspired by visual cortex != biologically plausible
CNNs fundamentally don’t work as the brain, it’s easy to demonstrate.
Can you please explain how your image2sdr method would be useful for this task? I just don’t see how the sdr for an image with 20 cars would be different from the sdr from a similar image with 50 cars, given that the tags from Clarifai would probably be identical in both cases.
Please do yourself a favor and read the article I linked to. Unless you are a neuroscientist and actually know what you’re talking about.
Classic HTM is designed for streaming data, not static data. So naturally you wouldn’t pass it an image with 50 cars… you would pass it a series of images over time, in which 50 cars passed by a camera.
Of course, it depends on the use case. This wouldn’t be useful for anomaly detection of vehicle frequency either if you were to try monitoring a busy highway where every frame always had cars in it.
And again, this was a demonstration for one easy way of linking HTM with classic AI algorithms using word SDRs. One could imagine a slightly different system which uses classic image AI to locate things in video frames, and then have both subjects and positions to do some streaming HTM magic with. Or an audio AI for identifying animal calls hooked up with HTM to detect population anomalies in a particular habitat, or tracking migrations, etc.
Also, just to be clear, image2sdr is not meant to be biologically plausible. It is just another tool that some people may find useful for a few AI / HTM related cases. There are probably many other tools that could be applied to many of the same cases.
I believe I know what I’m talking about, but I scanned the blog post you linked to and it’s full of stretches. We can discuss exact claims from it if you want, but you can start from the opinion on this of people who know the DL in deep details, like Geoffrey Hinton and Andrej Karpathy.