For an image classifier, a typical input would be a digital image and the output would be a class.
The brain on the other hand, processes (f) signals from the eye (x) and outputs an image (y). IOW the image y is the effect of a particular brain function f, and therefore it is not the input x.
But in ML we use the image as the input. Why? Why are we comfortable about assuming that the output can be used as the input? Am I missing something?
From a practical perspective - because we already have lots simple & cheap devices able to output images called digital cameras and we want (the AI) to figure out what’s there in front of the camera.
Joking aside, I agree from a practical point of view.
Our brain interprets the output of the camera as an image. But the image is y, and what we are sensing (input) we cannot see. Yet in ML we use y as inputs.
The conundrum is in ML the input is assumed to be our brain’s interpretation of an object (e.g. image) yet we expect an ML model to reach AGI. Shouldn’t ML models first fix the representation problem?
If a set of signals (x) causes the brain to interpret x as an image (y), then there is a causal effect between this twonvariables and is represented by some function (f). Therefore, if the brain is fed with y such as f(y), then it might return y only if y is an identity function, otherwise it will not return y. In ML we force f(y) to return y by training.
I just discovered a relevant topic in DL called Disentangled Representations. My intuition of it is that the perception data we used to train/test ML (y) is “tangled” or like a spaghetti as Y. Bengio called it. It’s tangled I believe because it is already processed as opposed to the sensed one which is raw. I also think it can be expressed as a CSP problem, where in the tangled version of the input data, multiple mutually exclusive variables are already aggregated and therefore they are hard to generalize.