Fast FF networks speedups

cezar_t · November 22, 2023, 11:57am

Paper one Exponential Faster Language Modelling about using FFF in transformer FF blocks for training & inference.

Paper two Fast Feedforward Networks presenting the actual algorithm

It is a form of sparse execution (e.g. 0.3% of neurons being activated) instead of sparse representation (the HTM’s key feature).

Also interesting - “hacking” GPU for conditional matrix multiplication was used, and there is potential for native support in future hardware.

BrainVx · November 22, 2023, 5:35pm

Quite interesting the inference pass works well with only 0.3% activation, it then raises the question as to how the learning phase could achieve a much lower activation during the learning pass.

Curious that RTE performance is significanltly worse and higher variability (1% activation 53.8 vs 0.5% activation 59.9 whilst 1.8% is 56.2) compared to BERT base. Could this indicate that those 90+% for RTE contain a high degree of subtle aggregate influences for inferred determination ?

cezar_t · November 22, 2023, 6:04pm

You mean to accelerate training too? That would be a more difficult problem, since training is done in batches and routing here (picking what neurons to activate) is computed from input on each FF block.
The only way I can think of is training it with a batch size of 1, but that isn’t going to accelerate things too much. In which case it might not make much sense to even use a GPU.
However,

the gating algorithm might stabilize the slower converging stochastic gradient descent.
parallel training on CPU would benefit from that each training step only updates 1% of the weights.

Topic		Replies	Views
Numenta Technology Demonstration: Sparse networks perform inference 50 times faster than dense networks, with competitive accuracy Machine Learning	6	1223	November 12, 2020
Research on NN sparsity Lounge	10	610	February 19, 2023
Poster Overview: How Can We Be So Slow? Realizing the Performance Benefits of Sparse Networks Current Research	9	947	May 5, 2022
Anyone can explain why Numenta latest algo optimizes Deep Learning 100x? Machine Learning	15	1220	May 15, 2023
Application of HTM in today’s ML frameworks Machine Learning	50	3391	March 28, 2019

Fast FF networks speedups

Related topics