Few thoughts on NISPA presentation

Here-s the video I am talking about.

And here are some follow-up thoughts (what if-s):

  1. do not freeze entirely a task after it was trained, but loop between multiple training sessions with only a subset of the task’s data and freeze only the “low hanging” fruits on each session. It’s a kind of “early accomodation” with each task but avoid specialization before a “general perspective” starts to form.

  2. as above, but occasionally sessions train with two active tasks at a time (pair them randomly), it should select parameters which are most active in both. A means to emphasize cross-task correlations.

  3. Did you considered the same strategy but working up, one layer at a time starting with input layer?

  4. a very wide first (or low level) hidden layer(s) that both carry a majority of weights and are most sparsified/segregated by task, might be useful to start with.

  5. A trained wide layer above + some automatic task recognition & selection would be useful to accelerate inference.
    I mean the more tasks you want the network to expand into you still need to add more and more parameters and sparsity included, a very large, “general” network will degrade in performance.
    Having a means to select only the 2% or 10% of the network useful for current task would help a lot.

  6. If I think well the sub-net selecting trick above might be possible even at training time if a task specific subnet is pre-selected by a task-specific SDR, as in Numenta’s research. It does not select which parameters to freeze but which are available for the current task. And any two SDRs for any random tasks having some level of overlap would allow for inter-task plasticity.

  7. Regarding RL: ideally find a way such tasks would be self-defined/inferred somehow by the agent itself. Even time buffers could provide some task separation. One thing we can safely assume about RL is current task (whatever it might be) doesn’t change abruptly every time step.

Their method reminds me of this neuroscience study which found something similar: that not all neurons can learn at the same time.

1 Like