Reference Frame: Invariant representation, origin,grid

Let say we have a 5x5 grid and 2 adjacent cells are “XX”.


The question is now that we have the grid how does we encode the figure XX in invariant way iregardless of the sense-movement and the origin (i.e where we start on the grid).

F.e. if XX is the top-left corner, we can detect it with 2 sense-move (Right+X,Right+X)
or the movements can be all over the place and need 5 or 10 steps, but in all cases we need invariant representation.

We can also start from a different place on the grid.

Pooling will not help, i think !

Position encoding also does not work, because the origin can be anywhere in the grid.

Is the solution some feature of the grid? The grid is said to be origin-less

1 Like

What you need is the ability to decompose the image into simpler pieces. For example, a filter which detects that X is present in the domain (and is a meaningful feature for agent to know is there). You would then have a separate filter that encodes the location and/or orientation of that feature.

In this way, the agent would have access to invariant features that encode the essential nature of the object (and it’s presence in the environment) processed separately from transient features such as position and orientation of the feature with respect to the agent and/or other features in the environment.