Additionally:
Cortical IO retina database with the correct texts loaded.
(Dealers choice on content, I chose the WIKI database)
How do you do grammar; Parsy McParseface?
What additional soft networks or control hardware is needed?
Could you get it to handle speech in both directions?
For episodic context, I propose a short-term blackboard memory, sort of like a hippocampus.
Who would be the appropriate subject matter experts ?
Wiki + 35k books + (C4 data set option)
The Wiki and 35k books are about 5-8bn items/glyphs depending on filter. C4 covers concepts that do not exist in the other 2 categories but contains “a lot” of useless data (concepts) and repetitions.
The model has to learn the grammar ? How would it work any other way ?