Well it’s been a while. I’ve been pretty busy, but whenever I felt like I had enough spare time, I’ve continued working on something inspired by this forum:
This uses OpenCV and TensorFlow to translate rgb camera input into a sparse set of points, in as close to real time as possible. It uses n-dimensional algorithms and should be modifiable to work on any spatial input.
Currently, it doesn’t take in rotation correctly, so circles get exploded into seemingly random points instead of one or two off center circles. That might not make much sense before explaining how it works though.
How it works:
-
To gain scale invariance, the input is split into different images, ranging from very zoomed in, to the entire view. This is the top row of images in the top window.
-
The input pixel colors are compared to their surroundings. red to red, green to green, blue to blue. Well, they should be. I don’t think it is, because it’d look like a mostly white edge detector if it did that exactly, so I probably added some of the original image back in. This one still needs some work.
-
The colors are again compared, but this time it’s red to green, blue to yellow, green to red, and yellow to blue.
-
A set of orientation detectors are applied to the previous set. These ones activates a pixel color more if there’s a 3x3 stripe of white-black-white in the right orientation. There are three orientation detectors in this case. It seems to lock to red, green, or blue, instead of going in-between, so that needs some work.
-
A set of line end detectors are applied to the previous set. These should activate pixel colors more when at the end of an oriented line. It doesn’t seem to do it quite right, but it still get’s a sparse, stable set of points. Then, to get individual points, I use max pooling and select only the pixel that’s equal to the max pooling value.
-
In the second window, where it looks like an almost random cloud of points, I use the differences in positions of each activated pixel to set the location:
for pixel a:
for pixel b:
c = new_pixel()
c.position = b.position -a.position + center_position
The output is bigger because pixels can be left, right, above, or below each other, which can result in negative values, so I resize it.
It does result an a spatial invariant sparse representation of the input, but adding in the orientation would make it much better. Orientation seems to still be locking to whichever vectors I inputted though, so I need to find a way to allow more in-between orientations before I can use that. I could definitely use some help there.
anyway, the code is available at https://github.com/SimLeek/pySILEnT
I’ll have a lot more free time a little into December, so please tell me anything you’d need to use this as a library. I’ll probably only be able to finish one or two of these before I have to work on other stuff , but here’s what I’m thinking so far:
- A callback with direct translation to nupic SDRs (1)
- Inputting individual images (1)
- translate a different aspect of vision to SDR, like difference between current and previous frame
- easy size selection
- improved orientation detection
- Limiting pixels chosen for space-invariance loop by distance&activation strength
- rudimentary real time sound->SDR conversion, with some streaming audio input library like sounddevice
- Create GPU optimized sparse tensor ops and start optimizing the spatial/temporal poolers.
It’s not installable via pip or conda yet, but you can pull and run it after installing tensorflow, opencv, and my cvpubsubs library.