You can combine weak learners by summing and then use backpropagation to update the weak learners. That gives you one output from group of weak learners.
Say you want 2 outputs.
You can introduce another term that instead of summing all the weak learners sums half and subtracts half of them. Those 2 terms are orthogonal to each other. Updates to one term minimally effect the other term.
The fast Walsh Transform is full of orthogonal summing and subtracting terms.
If there are n weak learners and n outputs wanted is there any point using n sum-difference terms to combine the weak learners into n outputs or should you just use 1 weak learner for each output?
If the weak learners have binary outputs then there is a major quantization improvement in using the sum-difference terms. It is also possible for output precision to be shifted around. One output can be made more precise at the expense of the other outputs being made slightly less precise.
Just noting that the sum of a bunch of numbers can be viewed as a dot product between a vector <1,1,…,1> and the bunch of numbers in vector form. Likewise a sum difference term, eg. the bunch of numbers with <1,-1,-1,…,1,-1> say.
Then 2 sum difference terms are orthogonal if their 1,-1 vectors are orthogonal, ie their dot product equals zero.