Help please. Im trying to implement htm in a system which could correctly drive a car in a very simple 3d game. Input data will be encoded images of front view with attached correct turn label(left or right), then it will go into spatial pooling(so we get sdr based on active columns), after that we implement context in each sdr,(temporal memory starts working) so at the end we have: for any given input there are active columns and predicted columns, where active columns are just sdr of input and predicted columns(based on predicted cells) are just predicted sdrās, am I correct? So unfortunately I cannot figure out how implement that output predicted sdrsā¦Could you give me some hints? Is it possible in general htm utilization for this task?
This will undoubtedly be badly written and wrong, but,ā¦
So you will have your raw image data, lets say like MNIST it is a 28 by 28 bit image (already reshaped to 2D), where each bit is represented as a value from 0-255. This is essentially a greyscale image. Typically in the examples Numenta have, is they floor all values to either be 0 or 1 ie on and off, in this case only bits which have a value of 255 will be 1, and all other bits are 0.
With this new 28 * 28 array of values you can then pass this to an alternative encoder which may do something to it to produce an SDR for your spatial pooler.
Or you can just pass this raw, but now [0,1] 2D array data to the spatial pooler as it is. There are I believe some constraints in that if you pass a 28 by 28 bit array to the SP, then the SP has to have a similar / expected topology.
Using the typical python algorithm version of the SP you can simply call compute, passing in i believe 3 variables, these are the SDR or input, a boolean on whether the SP will perform learning or not, and finally an output variable to hold the active columns.
i.e sp.compute(input, True, output)
The output is now a list? of columns (28 * 28) which are either active or not, essentially a new SDR with a sparse representation of active bits. This is created based upon the input etc and the params applied to the SP, like sparsity etc.
Over time this output should stabilise i believe in such that semantic information of the input will remain and you will eventually get similar SDR outputs (active columns) with similar inputs. Think like in MNIST with different variants of the letter ā3ā, each input may be drawn differently but hopefully after training they all will share a high degree of overlap with their outputs (active columns).
You can essentially pass this SP output to the temporal pooler, and if the sequence of inputs is essentially sequential then it āperhapsā might be able to learn what one particular input will follow the next.
But I found this difficult when apply it to the MNIST and a simple scanning saccade approach.
Anyway best of luck.
Vision is not NuPICās strongpoint. There is a lot about this already to read.
Thank you very much for your answer! Also I would like to ask what you think if we convert each grayscale pixel of input to bit array of that pixel i.e. each grayscale pixel is 8 bits ones and zeros, so we would have 2d array of bits which is good for htmā¦would it work? I didnāt find any implementations of this yetā¦but doing so we would not loose any context of imageā¦And also to make that input more sparse we can just add some known number of zero bits to each input image. What do you think? Also I would kindly ask you to advise some links to pure python htm implementations if you know any of course. Any way thank you , would be interesting your opinion.
Given some 3*3 pixel gray scale input: [124,129,35]
[129,253,100]
[127,98,78],
so this would be its bits array:[01111100,10000001,00100011]
[10000001,11111101,01100100]
[01111111,01100010,01001110]
Then we can if we want to add some rows fo zero bits just for sparsity i.e.
if we add only each second row than the idea of image are not broken,:
So as final input to the spatial pooler we would have:
[01111100,10000001,00100011]
[00000000,00000000,00000000]
[10000001,11111101,01100100]
[00000000,00000000,00000000]
[01111111,01100010,01001110]
[00000000,00000000,00000000]
Do you think it would work? Or mb you know someone try this before?
Hi,
So I tried something similar but honestly that particular implementation was wrong so it would be hard to draw to many conclusions. In my example each single ābitā of the 28 by 28 bit array that was used to represent the pixel was represented as a 255 length {0,1} bit array, where i tried to maintain some semantic meaning between values by representing the specific index value i.e ā124ā as a window / range of active or on bits ā1ā while the remaining bits of the 255 array are set to ā0ā.
The difficulty is that the size of this is huge, where is did a single 28 bit length slice so (28 * 255) or 7140 bits.
This slice basically scanned from top to bottom as a simple saccade in a temporal implementation for MNIST classification.
Didnāt work though
Regarding the actual bit array method, as long as there is semantic similarity it might work, but technically i donāt think a binary representation of a value 0-255 maintains that similarity. You can see this in the two arrays for 124, and 129, which should bee similar (if you used say my window approach) but not here with the binary representation.
Hi, thanks a lot anyway for your info, Iāll keep trying making some experiments. Your idea with representing pixel as window range is also interesting.
I am not sure how you expect the HTM to work with pattern recognition.
What are you looking at to to signal this character (say ā3ā for example) vs some other?
I would expect that the system may be excellent in saying -āyes, I have seen this shapeā but not in saying what shape it is.