Let say we have a 5x5 grid and 2 adjacent cells are “XX”.
|X|X|O|O|O|
|O|O|O|O|O|
|O|O|O|O|O|
|O|O|O|O|O|
|O|O|O|O|O|
The question is now that we have the grid how does we encode the figure XX in invariant way iregardless of the sense-movement and the origin (i.e where we start on the grid).
F.e. if XX is the top-left corner, we can detect it with 2 sense-move (Right+X,Right+X)
or the movements can be all over the place and need 5 or 10 steps, but in all cases we need invariant representation.
We can also start from a different place on the grid.
Pooling will not help, i think !
Position encoding also does not work, because the origin can be anywhere in the grid.
Is the solution some feature of the grid? The grid is said to be origin-less