Any ideas for an algorithm that learns directions?

As far as matching parts of image, one possible approach is to have multiple sized receptive fields at different levels or stages of processing. If there is no change on a “bigger” receptive field that feeds back to a lower level. The “edges” that get feedback that there is no change at the higher level but see a change at a lower level would indicate movement of a relatively larger object.

An “object” is a filled in space in the processing map at that level, composed of Calvin Tiles, as I have described many time in this forum. These roughly correspond to “grid cells.”

In the brain the spatial scaling in the various grid fields of the Entorhinal Cortex is about 1:1.14.

Please see this page for more details:
Number encoder based off of entoehinal grid cells - #2 by Bitking

The “new” vs. “old” that @cezar_t is mentioning can be the Alpha (10 Hz) basic processing rate in cortex. The relation between the fields can be both spatial (edge) and temporal (movement) pooling.