Introducing new project: image2sdr

I have uploaded a new project on GitHub called image2sdr. It is a simple NodeJS service which you can post an image file to, and it returns an SDR.

Behind the scenes what it is doing is:

  1. Classify image via Clarifai and return the top 10 concepts and their confidence level
  2. Parse the concepts found and retrieve word SDRs via cortical.io
  3. Stack the word SDRs, scored by their confidence level
  4. Generate a standard bit-array SDR with the desired sparsity, containing the bits with the top scores (using random tie breaker)

This project is in response to a conversation we had during the most recent Hackers’ Hangout.

Couple of notes:

  1. You will need to copy config_example.yml into a file named config.yml, and enter your own Cortical.io and Clarifai API keys. You can get free API keys from the respective websites.
  2. The service does not currently support proxy settings. If running behind a proxy, you’ll need to add the necessary code
  3. The word SDRs from cortical.io are not a fixed sparsity (common words are more dense than less common ones), so might need some tweaking of the sparsity and sdr_size parameters (or refactoring if there is a better way)
  4. Multi-word concepts from Clarifai are skipped (such as “hard hat”). To enable them, special consideration will need to be given to negative concepts (such as “no people”).
  5. This is an open source HTM community project, so feel free to make updates and evolve the idea
19 Likes

Great job Paul. I just ran it locally and it is working great. There are some fun things to do with this codebase. It was fun to print out all the Clarifai terms for each image I uploaded. It would also be interesting to send each SDR back to Cortical IO to see what one term matches the bag of words best.

3 Likes

I am interested in encoder algorithm.
Could you please:

  1. explain your algorithm Paul?
  2. show us your input image and the output sdr @rhyolight?
1 Like

@Paul_Lamb Awesome!!!

@thanh-binh.to
Here’s an example SDR output that I’ve generated, and I think I’ve done it correctly. :crossed_fingers:

Input:
et

Output:

{"sdr":[125,163,212,309,377,380,392,406,428,436,619,663,768,811,812,900,905,939,940,1044,1089,1122,1148,1212,1297,1298,1317,1332,1361,1438,1515,1598,1768,1773,1788,1802,1831,1896,1931,1945,2020,2031,2060,2111,2118,2137,2154,2158,2159,2176,2281,2285,2289,2309,2397,2407,2408,2412,2413,2493,2497,2516,2522,2534,2535,2536,2537,2538,2539,2540,2651,2663,2664,2665,2666,2678,2750,2751,2782,2791,2792,2793,2794,2795,2806,2864,2912,2913,2916,2922,2923,3045,3046,3172,3175,3180,3189,3190,3191,3193,3225,3286,3300,3316,3328,3395,3420,3424,3443,3444,3456,3552,3560,3567,3569,3570,3571,3576,3684,3697,3702,3772,3825,3827,3828,4086,4186,4215,4319,4372,4378,4421,4452,4460,4461,4544,4572,4578,4672,4696,4704,4717,4721,4757,4769,4796,4800,4801,4831,4846,4860,4864,4920,4931,4956,4960,5088,5091,5114,5172,5174,5183,5214,5215,5216,5265,5304,5365,5432,5471,5472,5487,5567,5571,5599,5607,5685,5686,5810,5820,5821,5948,5978,5981,6010,6063,6176,6336,6508,6610,6777,6849,6981,7233,7427,7494,7515,7617,7917,8151,8304,8360,8385,8465,8486,8487,8516,8775,8776,8784,8795,8798,8834,8875,8894,8899,8904,8906,8965,9011,9034,9105,9159,9179,9225,9322,9350,9360,9447,9491,9530,9575,9706,9734,9737,9831,9935,10084,10090,10117,10212,10221,10222,10299,10337,10484,10490,10586,10605,10606,10728,10755,10833,10842,10844,10864,10988,11095,11112,11113,11240,11241,11256,11504,11505,11628,11629,11637,11638,11692,11826,11827,11879,11883,11886,11900,11941,11980,12278,12331,12529,12596,12664,12795,12916,12923,12997,13125,13310,13419,13550,13685,13691,13994,14022,14031,14333,14403,14454,14536,14546,14606,14659,14661,14786,14787,14855,14858,14914,14915,14916,15042,15049,15094,15243,15684,15696,15739,15907,15940,15943,15946,16009,16077,16100,16201,16259,16267]}
2 Likes

@Jose_Cueto how about image and sdr size? Can you display sdr output as image?
Currently I am working on different concepts, based on extracting image features and converting these features into sdr.

2 Likes

Sure, the basic process is:

  1. Retrieve the concepts found in the image and their level of certainty. For example, ET: .99, creepy: .88, ugly: .75.
  2. Retrieve the word SDRs for each of those concepts (ET, creepy, and ugly in this example).
  3. Loop through the word SDRs and for each “on” bit increase the score for that bit position by the level of certainty for the associated concept.
  4. After adding the scores for all of the SDRs, select the desired sparsity of the bits with the highest scores, using a random tiebreaker.

I called the core algorithm mergeWeightedSdrs. By “weighted SDR” I mean a word SDR and its level of certainty.

No, it is a single-direction encoding. SDRs from images, but not back the other direction.

2 Likes

BTW, if you are referring to the algorithms for images to concepts and words to SDRs, those algorithms are performed outside of the above merging algorithm (in this case, Clarifai is used for concept identification, and cortical.io is used for word SDRs)

Clarifai uses a convolutional neural network algorithm.

Cortical.io uses an algorithm called semantic folding.

This project merges the two to provide a way of generating SDRs from images for use in the HTM algorithms.

1 Like

@Paul_Lamb thanks for your explain. I do not know about Classifai, but as far as I know CNN does not support online learning and decoding. Am I Right here?

1 Like

I don’t know the definition of online decoding, but yes CNN is a traditional supervised learning algorithm which requires a large labeled training set. It also has no relation to biological neural networks.

1 Like

Another way to think of the basic concept, is imagine one person using words to describe something they saw to someone else who wasn’t there. This scene from “Someone Like You” comes to mind:

image

2 Likes

For anyone interested in doing this, you can use the /expressions/similar_terms API, with a body like:

{“positions”:}

I did this for the ET image and SDR that @Jose_Cueto posted.

The original concepts found by Clarifai were:

wildlife: .972
portrait: .970
nature: .969
animal: .968
staring: .952
one: .947
eye: .946
face: .945
looking: .927
isolated: .918

Running the SDR through the above similar terms API suggests:

animal  *
looking  *
eye  *
pet
cats
wild  *
face  *
animals  *
eyes  *
cat

(I added a * by the ones which match the original classification). An extraneous “cat” idea seems to have been introduced here, but given the rather non-specific original classification, seems like it did an OK job in this case.

4 Likes

I just want to make sure that @sheiser1 sees this, since he was the one asking about it at the hangout.

3 Likes

A fantastic tool. Big ups @Paul_Lamb!

3 Likes

How can you tell that your algorithm is doing a good job? One way to validate it is reconstructing the original image from the SDR.

Here’s an alternative (and much simpler) approach: just use a sparse autoencoder, trained on a large dataset.

1 Like

Sure, there are no doubt many other better ways to accomplish this. I put this together after a conversation we had on Hackers’ Hangout, where I mentioned one really quick and dirty way to connect a lot of traditional AI algorithms with HTM is by simply leveraging word SDRs.

1 Like

Is there anywhere that suggests that the brain reconstructs entire images like this?

Considering that the perception of an image is a constructive process that involves serial scanning of portions of an image there really is no place where “an image” ever exists in the brain at any point.

As far as I know we perceive and recall sequences of tokens that let us construct and reconstruct these properties of an image; as fast as we can consider some aspect of an image that part is reconstructed in recall.

These tokens are initially formed by perception and learning of novel features. From that point these features become available for perceiving and recalling things that contain that feature.

I am excluding the images that are directly and unconsciously processed by sub-cortical structures.

2 Likes

The encoding (feature extraction) done by convolutional layers is inspired by visual cortex [1]

[1] Deep Convolutional Neural Networks as Models of the Visual System: Q&A | Grace W. Lindsay

Convolution is inspired by the foveal scanning that extracts these streams of tokens. I understand that the biological mechanisms have been emulated to make a process that is more suited to use by digital hardware but it is an inexact copy of these mechanisms.

There are bunches of good ways to do image processing - I am just pointing out that reconstruction of an image does not seem to be biologically plausible.

The usual focus of HTM researchers is to make something that works in biologically inspired ways to help model and understand the wetware.

3 Likes

Sorry, forgot to answer your question. Typically the best way is to compile a good set of realistic test data and perform whatever prediction or anomaly detection task you are planning, to see how it does.

Perhaps. I don’t think we know enough about how a brain reconstructs an image while trying to visualize something. In this case we use reconstruction as a way to test the quality of an SDR.

In any case, using a convolutional network to encode an image seems to be way more biologically plausible than some of the methods used by cortical.io.

2 Likes