Hi Marty! I will try to answer some of your questions, but I will defer some of the hardware questions to @khunter
The benchmark is running on 8 bits fixed-point. For this particular project, using the GSC dataset, we are training a network in which the sparse topology is fixed from the start and not changed throughout training. That is the same setup from the How Can We Be So Dense paper, with some modifications to fit the hardware, like block sparsity and better tie breaking in k-winners.
On the Imagenet project, we have been extensively researching on how to find better sparse topologies, be it finding a good topology at the start of training (foresight pruning), during training (dynamic sparse), or after training (pruning). We experimented with most of state of the art algorithms, but in the end we came up with our own that is better adapted to the hardware restrictions and incorporates insights from HTM. @mrcslws is leading that research and can talk about it better. We had a few research meetings on this topic, but the work hasn’t yet been published.
Training a sparse network with GPUs is actually a bit slower than training a dense one, since we don’t get the benefits of sparse matrices product in SIMD architectures and there is the added cost of KWinners, that ranks the activations at each forward pass. The benefits highlighted in the announcement are for inference only. We are thinking about how to speed up training too, but that is for a future endeavor.