Thoughts on hierarchies, object recognition, and dreams

I’ve been thinking about htm object recognition, point clouds, some stuff I remember from the sensory-motor inference youtube video, deep learning object recognition, voxel octrees, etc., and I’ve come up worth a couple interesting thoughts.

First, I think several small htm layers of exactly the same configuration should be tiled over any large spatial input, all changing in the same way when they learn. However, the data shouldn’t have any indication of where it is in relation to other data. It only needs to be normalized (turning it into a point cloud should work). After all, when you look at an object with your eyes, it looks the same whether it’s in the corner of your eyes or not, despite the number of receptors being different there. It still looks like the same object if you’re disoriented and can’t tell where you are either.

If a layer of htm column neurons receiving input from a random sampled subset of the temporal cells of 4 or 8 or so instances of the ‘texture’ layer, especially if this parent layer were tilted in a similar way over the entirety of the ‘texture’ layers, including all possible overlaps, then it would predict whatever was beside the input of one texture layer since it would receive a portion of the normal input.

So, if the texture layers could be represented by a numerical list: [1,2,3,4], then the parent layers would be: [1-2,2-3,3-4, and 4-1 if looping]. Then, further up would be [1-3, 2-4, and 3-1 and 4-2 if looping], and so on.

If one layer was repeated across an image like that, and its output was taken with only local ordering, then inferences about objects would be made with respect to the local space, usually with the object in the center. The orientation of the object would still cause some recognition problems, but I still have problems with that as a human, so I think it’s good enough.

My second thought was that it would be cool to run networks like these backwards, because it could create 3d dreamlike representations of what was recorded into the network.

So what do you guys think? Have you tried repeating the same htm layer over small portions of an input? What does running HTMs in reverse look like?