The geometry of addition

Summing is used everywhere in machine learning, seldom though is its filtering behavior considered.
https://sites.google.com/view/algorithmshortcuts/the-geometry-of-addition

1 Like

Summing doesn’t cause the filtering effect. That would be the renormalization step where the resulting vector is projected back down onto the unit sphere (or w radius sphere).

I don’t think the “energy” term does anything special here other than preventing the solution from blowing up provided that you can keep the “energy” bounded (usually by renormalization).

From a representational point of view, the most important aspect of using the vector notation is it’s direction. The orientation of the vector in state space (or feature space) determines the semantic content of the vector representation. So I would argue that the rotation operation is actually a more useful concept when considering the behavior of deep networks.

Assuming that it may be possible to enumerate a set of basis vectors that are clearly associated with specific features or concepts, then projecting the state vector onto these bases would give you a weight associated with each feature in a particular embedding space.

Subsequent layers then transform this state vector into a different state vector in a different embedding space until you reach the final layer, which then performs a final decoding step which is interpreted as the output.

1 Like

A bunch of numbers can sum to zero. They disappeared in a puff of arithmetic! I suppose different people can have different concepts of filter though. “If you would converse with me, define your terms.” and all that.

I came up with an outline statistical argument that simple multiplication of a vector by some parameters before a fast transform allows the vector to be rotated by an arbitrary linear mapping to some degree of approximation.
The vector has to be non-sparse (if can’t be mostly zeros) and then an appeal is made to the central limit theorem.
https://sites.google.com/view/algorithmshortcuts/uniform-point-picking-on-a-sphere

A bunch of random numbers say between -1 and 1 added together will mostly go away, you will just be left with a low level Gaussian noise residue. If you look at the projections onto vectors orthogonal to (1,1…1) you likewise see a Gaussian residue. And the sum of the energies of all those add up to the original energy.