I made a model of grid cells using a modified spatial pooler, you can read about it here: Video Lecture of Kropff & Treves, 2008
This model used much simpler inputs than yours: instead of having vision & head direction, it just has “location-cell” inputs.