Weak learners and fast transforms

SeanOConnor · November 23, 2024, 11:15pm

You can combine weak learners by summing and then use backpropagation to update the weak learners. That gives you one output from group of weak learners.
Say you want 2 outputs.
You can introduce another term that instead of summing all the weak learners sums half and subtracts half of them. Those 2 terms are orthogonal to each other. Updates to one term minimally effect the other term.
The fast Walsh Transform is full of orthogonal summing and subtracting terms.

SeanOConnor · November 24, 2024, 2:49am

If there are n weak learners and n outputs wanted is there any point using n sum-difference terms to combine the weak learners into n outputs or should you just use 1 weak learner for each output?
If the weak learners have binary outputs then there is a major quantization improvement in using the sum-difference terms. It is also possible for output precision to be shifted around. One output can be made more precise at the expense of the other outputs being made slightly less precise.

SeanOConnor · November 24, 2024, 12:03pm

Just noting that the sum of a bunch of numbers can be viewed as a dot product between a vector <1,1,…,1> and the bunch of numbers in vector form. Likewise a sum difference term, eg. the bunch of numbers with <1,-1,-1,…,1,-1> say.
Then 2 sum difference terms are orthogonal if their 1,-1 vectors are orthogonal, ie their dot product equals zero.

SeanOConnor · December 9, 2024, 3:16am

I wrote this somewhere else:
The (fast) Walsh Hadamard transform is a change of basis. The 2-point WHT of (a,b) is (a+b,a-b) a 45 degree rotation. a+b is the same as the dot product of (a,b) with (1,1). The angle between the 2 is a measure of dispersion.If the angle is 0 then a=b and (after a scaling factor) there is a sum square energy balance between a,b and a+b. If the angle is non-zero the sum rejects some of the energy (filtering), which in this case can be fully picked by a-b, the (1,-1) orthogonal direction.

Just summing a bunch of numbers has a filtering effect that is not often thought about.

Topic		Replies	Views
Walsh Hadamard Transform on KDNuggets Lounge	1	400	July 30, 2021
WHD Mixing With Two Path ReLU Lounge	0	14	April 19, 2026
Double weighting for neural networks Lounge	0	477	November 9, 2017
FNet using the Fourier Transform to mix tokens Machine Learning	3	956	May 29, 2021
Compression/Weight Sharing WHT Neural Network Idea Lounge	0	390	January 21, 2019

Weak learners and fast transforms

Related topics