I was not at work when this research meeting happened. It was actually a precursor to Temporal Memory via RSM-like models.
One of the issues raised in this video is how the network can know where it’s at in the sequence. This would of course be a requirement for knowing what comes next. I’d go even one step further and say this would be required if you wanted to predict what came before in the sequence (or anywhere in the sequence for that matter). So, what’s missing from their model is a sense of position (e.g. char 5 out of 10). It seems like the investigators went out of their way to deny the network any external training data beyond the images themselves (i.e. labels, positions, etc.). I can respect that, and I think it’s probably the proper way to go about this problem; however, I think their method of presenting the data to the network is flawed.
I think the problem of sequence position can be addressed by simply allowing the network to move forward and backwards through the data. This would most likely require some form of grid cell like representation coupled to a motor output signal. In this scenario, the training set is not simply presented to the network in sequence over and over again, but is instead presented once and the network is allowed to study it as long as necessary. The network can then be tested in a similar manner by presenting it with a subset of the the sequence and seeing if it correctly predicts the missing characters when it encounters an empty (or perhaps corrupted) image. It would be even more interesting if the network learns to deliberately search for certain characters in the sequence in order reduce the uncertainty of it’s predictions (i.e. to reduce ambiguity).
It’s like @jhawkins has been saying: the key to intelligence is motion and feedback. The agent needs to have agency.
Good! Now you are treading the path laid out by biology.
The lizard brain gets a copy of what is presented to the visual cortex. I think it has a set of archetypes that are built-in and based on what it “sees” it directs the eyes to point at areas of interest.
See this post for some more thoughts about this relationship, in particular, the figure 1 in the linked paper. What is missing is the output connections to the frontal lobe that steer the eyes.to areas of interest: