The geometry of addition

Summing is used everywhere in machine learning, seldom though is its filtering behavior considered.
https://sites.google.com/view/algorithmshortcuts/the-geometry-of-addition

1 Like

Summing doesn’t cause the filtering effect. That would be the renormalization step where the resulting vector is projected back down onto the unit sphere (or w radius sphere).

I don’t think the “energy” term does anything special here other than preventing the solution from blowing up provided that you can keep the “energy” bounded (usually by renormalization).

From a representational point of view, the most important aspect of using the vector notation is it’s direction. The orientation of the vector in state space (or feature space) determines the semantic content of the vector representation. So I would argue that the rotation operation is actually a more useful concept when considering the behavior of deep networks.

Assuming that it may be possible to enumerate a set of basis vectors that are clearly associated with specific features or concepts, then projecting the state vector onto these bases would give you a weight associated with each feature in a particular embedding space.

Subsequent layers then transform this state vector into a different state vector in a different embedding space until you reach the final layer, which then performs a final decoding step which is interpreted as the output.

1 Like