Introducing new project: image2sdr

I think that neither method is suitable for anomaly detection. Feature extraction strips away most of the relevant information to detect an anomaly. For example, neither Clarifai’s classifier, nor an autoencoder will produce any relevant features for an image with people walking upside down, or cars driving wrong way.

In fact, thinking more about this, it seems that the self-supervised learning approach advocated by LeCun would be way more effective: https://youtu.be/7I0Qt7GALVk?t=2472

I see this as a huge unsolved problem for HTM systems.

HTM is great at saying “I saw this thing” before. It can even say “I saw this sequence” before.
I really don’t see how HTM will be able to match up some cue, like a perceived printed or spoken word to a paired sound or image or stream of tokens that make an image.

The current excitement in Numenta regarding the coding in tuples of grid nodes (object) (displacement) (object) does not really solve this problem yet. That gets us to a representation of the relation of perceived features. The work to use hierarchy to resolve that to a higher level representation is unfinished

I have high hopes that I can bridge this problem using hex-grid coding and tuples of tokens and sequences of tokens but at this point I do not have a functioning system. I do think that the enabling technology will be the dumb boss/smart adviser model but I have not worked out how these systems interact to the point where they function together organically. The few toy attempts produced very limited stereotyped behavior.

Depends on the use case. If looking for anomalies in an image, no this is not useful. For detecting weird physics like upside down people, also not useful. If detecting an increase in the frequency of cars passing in front of a camera at 8:00 AM on a Saturday compared to typical Saturdays, then maybe useful.

2 Likes

inspired by visual cortex != biologically plausible
CNNs fundamentally don’t work as the brain, it’s easy to demonstrate.

2 Likes

Can you please explain how your image2sdr method would be useful for this task? I just don’t see how the sdr for an image with 20 cars would be different from the sdr from a similar image with 50 cars, given that the tags from Clarifai would probably be identical in both cases.

Please do yourself a favor and read the article I linked to. Unless you are a neuroscientist and actually know what you’re talking about.

Classic HTM is designed for streaming data, not static data. So naturally you wouldn’t pass it an image with 50 cars… you would pass it a series of images over time, in which 50 cars passed by a camera.

Of course, it depends on the use case. This wouldn’t be useful for anomaly detection of vehicle frequency either if you were to try monitoring a busy highway where every frame always had cars in it.

And again, this was a demonstration for one easy way of linking HTM with classic AI algorithms using word SDRs. One could imagine a slightly different system which uses classic image AI to locate things in video frames, and then have both subjects and positions to do some streaming HTM magic with. Or an audio AI for identifying animal calls hooked up with HTM to detect population anomalies in a particular habitat, or tracking migrations, etc.

Also, just to be clear, image2sdr is not meant to be biologically plausible. It is just another tool that some people may find useful for a few AI / HTM related cases. There are probably many other tools that could be applied to many of the same cases.

2 Likes

I believe I know what I’m talking about, but I scanned the blog post you linked to and it’s full of stretches. We can discuss exact claims from it if you want, but you can start from the opinion on this of people who know the DL in deep details, like Geoffrey Hinton and Andrej Karpathy.

5 Likes

can we use this for images that are present in my desktop and use cortical.io and clarifai. if yes , can you say how?

If you are comfortable with coding in Javascript, then all of the logic is in api/image/index.js and could be fairly easily copied and pasted into another NodeJS application, or ported to another language.

If not comfortable with Javascript, then you will need to install NodeJS on your desktop to run the service locally. Once NodeJS is installed, clone the repository, and make a copy of config_example.yml named “config.yml”. Edit this file to plug in your API keys. Then create a folder called “uploads”. At this point, you can run “npm start”. You will now be able to access the service via http://localhost:80/ Post an image file to the service, and it will return a response in JSON format.

I am using python and I have a problem I converted video to images , image per second it is difficult to upload bulk of images and get the concepts. I have around 3000 images .how to do this.

Unfortunately, processing multiple images at once is still on the TODO list.

portrait –

[
  {
    "positions": [

      11,
      98,
      125,
      131,
      269,
      271,
      277,
      309,
      319,
      385,
      386,
      399,
      401,
      402,
      403,
      406,
      436,
      513,
      530,
      569,
      570,
      620,
      638,
      656,
      663,
      698,
      730,
      791,
      900,
      905,
      1007,
      1089,
      1129,
      1169,
      1212,
      1237,
      1292,
      1298,
      1350,
      1360,
      1361,
      1478,
      1489,
      1541,
      1542,
      1549,
      1553,
      1615,
      1626,
      1681,
      1683,
      1740,
      1746,
      1802,
      1809,
      1881,
      1931,
      1945,
      2060,
      2065,
      2118,
      2132,
      2137,
      2321,
      2498,
      2522,
      2572,
      2573,
      2576,
      2701,
      2702,
      2829,
      2830,
      2958,
      2959,
      3088,
      3193,
      3209,
      3210,
      3216,
      3225,
      3272,
      3273,
      3338,
      3339,
      3342,
      3343,
      3344,
      3399,
      3401,
      3469,
      3470,
      3597,
      3656,
      3657,
      3658,
      3772,
      3805,
      4156,
      4206,
      4410,
      4457,
      4544,
      4578,
      4624,
      4687,
      4688,
      4695,
      4696,
      4717,
      4757,
      4796,
      4800,
      4801,
      4846,
      4860,
      4931,
      4956,
      4969,
      4970,
      4972,
      4978,
      5051,
      5075,
      5106,
      5190,
      5205,
      5206,
      5265,
      5393,
      5423,
      5442,
      5459,
      5461,
      5487,
      5571,
      5606,
      5607,
      5700,
      5810,
      5845,
      5978,
      6119,
      6226,
      6233,
      6336,
      6354,
      6508,
      6610,
      6731,
      6863,
      6980,
      6981,
      7002,
      7107,
      7233,
      7251,
      7369,
      7381,
      7387,
      7397,
      7427,
      7494,
      7515,
      7617,
      7666,
      7667,
      7792,
      7921,
      8023,
      8151,
      8162,
      8176,
      8303,
      8304,
      8305,
      8326,
      8401,
      8459,
      8555,
      8661,
      8784,
      9322,
      9447,
      9450,
      9470,
      9530,
      9575,
      9583,
      9706,
      9714,
      9830,
      9831,
      9837,
      9838,
      9839,
      9841,
      9842,
      9956,
      9957,
      9958,
      9964,
      9965,
      10083,
      10084,
      10085,
      10086,
      10087,
      10090,
      10091,
      10092,
      10093,
      10094,
      10095,
      10096,
      10212,
      10215,
      10216,
      10218,
      10219,
      10220,
      10221,
      10222,
      10223,
      10224,
      10225,
      10340,
      10341,
      10344,
      10345,
      10346,
      10347,
      10348,
      10349,
      10350,
      10351,
      10352,
      10470,
      10471,
      10472,
      10473,
      10474,
      10475,
      10476,
      10478,
      10479,
      10480,
      10599,
      10601,
      10602,
      10603,
      10604,
      10605,
      10606,
      10607,
      10729,
      10730,
      10731,
      11125,
      11211,
      11216,
      11217,
      11308,
      11343,
      11344,
      11346,
      11347,
      11474,
      11475,
      11601,
      11602,
      11681,
      11729,
      11847,
      11980,
      12198,
      12331,
      12332,
      12337,
      12496,
      12589,
      12590,
      12625,
      12795,
      12844,
      12850,
      12971,
      12972,
      12973,
      12975,
      12998,
      13310,
      13330,
      13487,
      13550,
      13612,
      13613,
      13614,
      13685,
      13760,
      13809,
      13835,
      13872,
      13877,
      13937,
      14001,
      14127,
      14252,
      14262,
      14265,
      14268,
      14403,
      14475,
      14487,
      14661,
      14858,
      14859,
      15179,
      15204,
      15682,
      15810,
      15938,
      15943,
      15946,
      15947,
      16077,
      16267,
      16268
    ]
  }
]

Response Code
if this is the SDR what I have to do after this ?

Whatever you’d typically use SDRs for. Anomaly detection in a stream of video frames, for example.

I have converted video into frames(images) of 1 second. I want detect the advertisement in the video i.e anomaly detection. if above one is SDR what to do after this?

Before you expect magic - what feature(s) tells you that the frame is part of a commercial?
Is there anything about your video that reliably tells you that this is a commercial?
HTM will tell you that the scene has changed but are there also scene changes in the video?

I am new to this. My task is to show the change when advertisement occurs. By complete change in background I want to show anomaly.
a) I have converted video to images per second.
b) I got the concepts in images based on confidence level.
c) I want to know the above co-ordinates I have given is SDR or not, and I want to know what to do next after this.

Yes, those are the positions of the “1” bits in an SDR representing the concepts which were discovered by Clarifai in the image. The size of the bit array, by default, is 16384 (this is the SDR size used by Cortical.io’s “retina”)

HTM may not be the right tool for this particular problem. There are a couple of potential issues which come to mind:

  1. Unless the system were trained on the same episode of the program without advertisements, it would be more likely to flag the TV show as anomalous, since advertisements are repetitive and predictable. It seems unlikely to me that the content of new episodes in a program will share many temporal patterns with previous episodes. On the other hand, this could be used to your advantage, where you train the system only on advertisements, and have a low anomaly score trigger your “commercials” logic.

  2. The classifications made by Clarifai are not super granular, so it is unlikely that it would detect simply a background change. Clarifai is classifying objects that it recognizes. That said, concepts in a background might be picked up if there were things like clouds or trees that Clarifai could get a strong hit on.

  3. The repeating inputs problem will not allow you to sample every frame and pass it to TM directly. You’d probably need to add some custom code to control when to progress each timestep, and/or implement a function for encoding specific timing.

One approach which comes to mind is to start with simple SDR operations. If the SDR for one frame has low overlap with the SDR from the previous frame, then likely a scene changed. This wouldn’t distinguish between scene changes in a program versus a switch to commercial break though. You could train the system on lots of commercials ahead of time. Then run it with learning disabled. When a scene change is detected, do a reset, then anomaly detection to see if the new scene is one of the advertisements it has been trained to recognize.

1 Like

@Paul_Lamb for my understanding better, it is very kind of you to share us some images and the corresponding SDR, generated by your algorithmus.
How precise is the classification score by using your image2sdr for MNIST?

It isn’t for MNIST. It classifies an image using a Clarifai model which output labels in the form of English words or short phrases (for example their “General” model). Those English words or phrases are then converted into SDRs with encoded semantics using a Cortical IO retina (for example their general English retina).