The geometry of addition

SeanOConnor · January 27, 2025, 5:57am

Summing is used everywhere in machine learning, seldom though is its filtering behavior considered.
https://sites.google.com/view/algorithmshortcuts/the-geometry-of-addition

CollinsEM · February 8, 2025, 3:54pm

Summing doesn’t cause the filtering effect. That would be the renormalization step where the resulting vector is projected back down onto the unit sphere (or w radius sphere).

I don’t think the “energy” term does anything special here other than preventing the solution from blowing up provided that you can keep the “energy” bounded (usually by renormalization).

From a representational point of view, the most important aspect of using the vector notation is it’s direction. The orientation of the vector in state space (or feature space) determines the semantic content of the vector representation. So I would argue that the rotation operation is actually a more useful concept when considering the behavior of deep networks.

Assuming that it may be possible to enumerate a set of basis vectors that are clearly associated with specific features or concepts, then projecting the state vector onto these bases would give you a weight associated with each feature in a particular embedding space.

Subsequent layers then transform this state vector into a different state vector in a different embedding space until you reach the final layer, which then performs a final decoding step which is interpreted as the output.

SeanOConnor · March 1, 2025, 4:11am

A bunch of numbers can sum to zero. They disappeared in a puff of arithmetic! I suppose different people can have different concepts of filter though. “If you would converse with me, define your terms.” and all that.

I came up with an outline statistical argument that simple multiplication of a vector by some parameters before a fast transform allows the vector to be rotated by an arbitrary linear mapping to some degree of approximation.
The vector has to be non-sparse (if can’t be mostly zeros) and then an appeal is made to the central limit theorem.
https://sites.google.com/view/algorithmshortcuts/uniform-point-picking-on-a-sphere

SeanOConnor · March 1, 2025, 4:31am

A bunch of random numbers say between -1 and 1 added together will mostly go away, you will just be left with a low level Gaussian noise residue. If you look at the projections onto vectors orthogonal to (1,1…1) you likewise see a Gaussian residue. And the sum of the energies of all those add up to the original energy.

Topic		Replies	Views
Information Storage in the Weighted Sum Lounge	11	610	April 27, 2024
Non-linearity sharing in deep neural networks (a flaw?) Lounge	3	1122	May 24, 2019
To Pay Attention, the Brain Uses Filters, Not a Spotlight Tangential Theories	8	1181	September 30, 2019
The weighted sum and adversarial inputs Machine Learning	1	591	July 22, 2019
Towards demystifying over-parameterization in deep learning Lounge	5	1112	May 18, 2019

The geometry of addition

Related topics