Papers with Optimization during inference?

maxerbubba · February 22, 2021, 12:33am

Do you folks have any papers that involve “optimization during inference”? (Biologically plausible or otherwise).

I have a couple ideas here, and curious about relevant work. I recently found this one from DeepMind last year: Concept Learning with Energy-Based Models

steve9773 · February 22, 2021, 4:24pm

Does AlphaZero count? It does tree search every move, is that “optimization during inference”? Or what does “optimization during inference” mean? Is it a synonym of “planning / foresight”, or if not how do they relate?

maxerbubba · February 23, 2021, 5:25am

Maybe this is too broad to be useful? Like asking “Got ML papers that use weights?”
In the video, Yannic commented on the idea at 27:48 https://youtu.be/Cs_j-oNwGgg?t=1668

I am looking for a (gradient descent) optimization outer loop, iterating over data points, which backpropagates through some optimization inner loop of several small steps to get to one data point. Each weight would be used multiple times to predict one data point.

HTM, I believe, uses each weight once per data point: NO
Deep learning: only uses each weight once (But across N layers, so could argue it counts… but… ) NO
Energy Based Models: can perform gradient descent for several iterations just to make 1 prediction. (The above paper does a variation of this): YES
Meta-reinforcement learning: where there is an outer-outer loop, across tasks, and the traditional gradient descent inside each task. (But no smaller-than-task loop): NO
“Neural Ordinary differential equations” paper: backpropagate through an ODE solver several times for each data point/prediction: YES

Does AlphaZero count? […] Is it a synonym of “planning / foresight”, or if not how do they relate?

Yes I think that counts, I’ll take a look, thanks! Would you consider “planning/foresight” a specific modeling technique or just an idea?

michaelklachko · February 24, 2021, 8:52am

@maxerbubba what you described is still too broad. The whole family of attention based models (transformers) could be viewed as “SGD outer loop + multistep optimization inner loop” because they compute similarity metric against other datapoints and use that in addition to applying regular model weights to produce a prediction. So I guess you can call attention an “optimization during inference”.
A slightly different idea is used in Capsule Network: https://arxiv.org/pdf/1710.09829.pdf where a small inner optimization loop is used in addition to SGD to compute the right path between layers for any given datapoint.

Topic		Replies	Views
A Review of Learning Rules in Machine Learning - March 8, 2021 Current Research	3	628	April 2, 2021
Paper Review: Superposition of many models into one Current Research journal-club	1	684	July 19, 2019
Zero divergence inference learning Science backprop , predictive-coding	3	1007	May 21, 2021
Numenta Research Meeting - July 15, 2020 Current Research	0	465	July 16, 2020
HTM + Logic for sequence learning Machine Learning sequence-memory	2	468	November 16, 2023

Papers with Optimization during inference?

Related topics