Suppose I’m taking as input an array of bits from the retina, and they each somehow know X closest distance cells. What’s an algorithm I could implement to learn directional movement? That is, one cell turns off, and another turns on? these cells don’t all change at the same time, but the ones that do, it’s not obvious which nearby active cell they changed into (or if the change was just noise), so they would need to use the change information of their neighbors somehow. Would spatial pooling be useful here? If so how? It seems like I’d need a cell for every possible change for each cell.
What I have so far doesn’t work. I’m storing a map of coinciding neighbour activations for each cell change. If a cell turns off, it then pushes to a list every cell that just turned on to create prev/next pairs, and then it populates a list of every other activated pair for its neighbours, and increments a score for each. after X steps, I then calculate for each neighbour the best coinciding pair, and it’s very sloppy. Some look like they’re kind of working, but overall it’s really not getting it. some even move in the opposite direction as their neighbours, and I don’t understand how that could be the most common movement it detects -__-
I have a small RF watching a 2 hour video to test all this btw.
EDIT: Sorry if I’ve explained this poorly. I’m happy to clarify
What do you mean by an “algorithm to learn directional movement” .
afaik movement (of the eye/camera) presupposes some mechanical device.
If you mean “algorithm to recognize directional movement” then there are quite a few already out under the computer vision umbrella, e.g. optical flow algorithms.
Why does such low level algorithm needs to be implemented in HTM?
So in 1 timestep we have bits that are on, then bits that just turned off. At a simulation of V1 at least, some of those bits are turned off because they “moved” somewhere else. Possible one of the on bits. I don’t necessarily need to precisely know if a bit turned into another, but I need to detect the overall motion with perhaps a bunch of changes that can vote for the most likely motion being witnessed.
I haven’t looked too deeply into optical flow, but doesn’t that require knowing the positions of the pixels? I’m trying to solve without that. each cell just has some neighbors with unknown relationships to each other.
as far as its relationship to htm, it’s too convoluted to get into, and I’d only want to share it if I know it works, or if it’s useful. I’m just exploring a path atm.
Direction is associated with movement. If the input is visual, slow movement makes small changes between two consecutive “frames” or patterns.
Let’s say one of these seemingly arbitrary patterns is pattern A - we have no idea it represents an image. But assume the pattern before it was B and following is C
and the animal registers a B<=A<=C type of movement.
Another time it registers a C=>A=>B type of movement. The latter translates as "first pattern is C, then there is a “=>” movement and C becomes A, a new “=>” and A becomes B.
If every time we encounter pattern A is preceded by either C or B and follows B or C we defined two directions - left and right. No other patterns except those two then we have two directions of movement on a single axis.
Now if you can get to A (or move away from A) without passing through neither C or B, then it must be another direction of movement, e.g. D and E for above and below A. And patterns between C and D, guess what are a bit left and a bit up. A diagonal movement.
For all the above to make sense to any animal, it has to have some control of the movement - e.g. a muscle that moves the eye left and right so they can associate the muscle control with sensory changes in retina.
If you want to transfer that to “pixels” I think the slight movement rule applies. One learning system might track pixel turning on/off correlations according to movement. (which is motor output signals)
So your idea seems useful going from patterns to other patterns, but is there a way to learn that one particular cell of the previous pattern has “become” a cell in the next? If I have 20 cells changing into 20 other cells, I want a 1-to-1 mapping for each cell in the prev pattern to a cell in the next pattern. It doesn’t need to be perfect, but I really want to group these 1-to-1 pairs. Ultimately
I want something that can translate every single cell in a receptive field left for example.
Thanks to @bitking for sending me down this rabbit hole last week: Grey Codes, also known as reflected binary codes (RBC). A 3-bit example is shown below.
The pattern can be extended by adding a fourth bit (set to 1) and then repeating the previous patterns in reverse. For any complete set of (2^n) n-bit codes, the pattern is cyclic.
These codes work really well for encoding both linear and angular positions. As you transition from one value to the next, only one bit is allowed to flip at a time. Thus, adjacent positions retain a great deal of “semantic similarity”.
For 2D position encoding, you can have orthogonal axes that each encode position coordinates (x, y). Then you would have at most 2-bits changing simultaneously if your agent somehow managed to move diagonally through the corner of one “cell” into another.
Now, these patterns are not technically SDRs since they do not maintain any fixed sparsity. In fact they are a dense encoding which utilizes all possible bit combinations. However, it may be possible to change this to an SDR by creating an encoding that flips one bit on when another bit flips off. Probably the easiest way to accomplish this is with a 50% sparsity encoder that consists of an n-bit Grey Code concatenated with it’s complement.
What about taking two of the images (current and last) and then xor (fast GPU matrix math) the two images to create a third array that is an offset of the two images and represents a change that flags 1 as a direction and 0 as either 11 or 00 - i.e. nothing or overload.
The third bit array can then be overlaid on one of the other images to determine pixel direction.
I don’t understand how this solves my problem. I’m trying to group the same direction changes together. assigning a number to the direction change doesn’t help me with that, does it?
Hi @Jossos you should use the reply button(icon) under each message you reply to, because e.g. I’m not sure your question is about my message or other messages posted here.
So I’m assuming you-re addressing me…
So your idea seems useful going from patterns to other patterns, but is there a way to learn that one particular cell of the previous pattern has “become” a cell in the next?
Well, this is a tricky question because a “pattern” could mean something akin to an image (of e.g. a bike) but also it could represent a concept/symbol/name (of same bike). What you-re asking for is akin “If ‘Mike’ moved north then turned 45 degrees where will his left nostril will be placed in the second position relative to the first?”
Because if you work at symbolic level - have a SDR (or binary vector) representing Mike as a whole, unique “thing” you can’t represent all his … cells, bones, muscles or hairs into a few dozen not even thousands “1” bits which are used to encode symbol. Specially if that SDR is supposed to encode Mike’s position, or even more complete, its position and pose.
So one goal could be to have some sort of “auto encoder” which translates from raw image to a more abstract representation of (what, where, pose,…, speed?) and back from the abstract representation to a raw… image-ination.(pardon my grammar)
If this is what you aim at, it seems quite a difficult challenge. Because understanding a scene and “manipulating” it in imagination seems a basic ingredient of intelligence. Even animals have good abilities of predicting how things in the world behave.
Don’t worry, there-s a reply icon at the bottom of each message and there-s one blue button at the bottom, which adds a reply to the whole discussion.
So to summarize I think that integrating the ability to move either of the attention spot (aka fovea) or the “eye” itself, would allow the “animal” to learn a three-way transformation pattern (state, movement → new state) also (old state, new state → movement) and perhaps (new state, old state → reverse movement) and once it can make reliable predictions on these transformations we can assume the “direction” and “movement” are equivalent.
As far as matching parts of image, one possible approach is to have multiple sized receptive fields at different levels or stages of processing. If there is no change on a “bigger” receptive field that feeds back to a lower level. The “edges” that get feedback that there is no change at the higher level but see a change at a lower level would indicate movement of a relatively larger object.
An “object” is a filled in space in the processing map at that level, composed of Calvin Tiles, as I have described many time in this forum. These roughly correspond to “grid cells.”
In the brain the spatial scaling in the various grid fields of the Entorhinal Cortex is about 1:1.14.
The “new” vs. “old” that @cezar_t is mentioning can be the Alpha (10 Hz) basic processing rate in cortex. The relation between the fields can be both spatial (edge) and temporal (movement) pooling.