So firstly I was talking about online learning and also having the liberty to relate to the human mind. Sorry if it wasn’t clear enough.
I would like to start with this question. Do you believe that biological learning causes forgetting? In my case I believe this is true because I feel that I’ve experienced it.
The idea of central limit theorem I believe is useful only if we assume that the existence of variables do not result to disappearance/irrelevance of other variables and vice versa. A normal distribution is just a snapshot of a static distribution but in the real world a distribution could shift in shape so making other points irrelevant or relevant at a particular iteration or state.
But we all know by intuition that data samples exist with hidden causal relationships. Some relationships would cause one variable to become of lesser value and vice versa. These variables even if they are inversely correlated can be modeled together in ML but up to some level only. Otherwise it would need a huge model and requires vast amount of data to model a part of the real world which is very impractical, let alone the computing power that it requires.
My thoughts about ensembling is that, finding for these distributions that can generalize variables with lesser forgetting of course with some parameter that controls the scope of these distributions, and build models from these distributions. The models can them be ensembled.
In the real world, an analogy would be that, it is a fact that many countries’ cultures disagree with many other countrie’s cultures. But we are not currently on world war, one reason is because groups for ideologies, beliefs, cultures, politics, etc that agree with each other can exist, sympathize with each other and neutralize at a certain level that is accepted in the global society. In turn these groups when combined together does not cause any world war, most importantly it doesn’t generally cause extinction of the other groups (forgetting). But without these groups, everyone would try to enforce their interests with little consideration to the effects to other countries and when they succeed they get greedier and pursue colonizaiton for example. This is like generalization in modern and mainstream ML, everyone wants to model something real from static data and hope that the resulting model can model the unseen real-time data stream in the real world which of course can’t be modeled by a static distribution.
I think I can summarize my thoughts to a simple question. Is there a sound theory and proven tests that curve-fitting the real-world real-tme data stream is a practical endeavor? Maybe I missed something as I’m not an ML guru. The self-driving vehicle would be a good test, but we all know that it is still not totally reliable and it is not purely ML-driven.