How can we infer the underlying 2-D spatial properties/representation of a dataset when all information we have is a series of 784 points, “scrambled” in an arbitrary order?
A more detailed formulation:
- start with a number (hundreds or thousands) of 28x28 digit images from the MNIST dataset
- flatten those pixels to 784 size vectors of 0-255 scalar values.
- permute all vectors with the same random permutation, in order to “obfuscate” any assumptions regarding relative positions between pixels. (which could be inferred from the flatten() implementation)
Given the above dataset (a list of 784 long vectors) can we have an algorithm (or a set of algorithms) which “figures out” that both:
- these 784-sized vectors are in fact encodings for 2D images.
- hopefully is able to project back the scrambled 784 size vectors into actual 2D images that look pretty much obviously like the original ones?
Why do I think it matters - just an intuition that “general intelligences” implement such algorithms to make sense out of (or at least “compress”) the very high dimensionality of our sensory input.
And hoping that whatever algorithm we find out for MNIST might be useful for arbitrary large, fixed size data vectors. Or embeddings like those generated by an autoencoder: what do these N values mean? (N == embedding size)
I think this is a problem space that is best addressed with topology turned on.
Ok, here is my current … attempt at this.
If we could know distances between arbitrary points pair in each scrambled (flattened & permuted) image, then dimensional structure could be inferred in various ways, so we can begin by splitting the problem in two:
- Deriving a measure of distance between any two pixels in a given frame.
- Inferring space dimensionality from the above distance metric.
A relative distance between any two points in a B&W image can be approximated by the ratio between the total number of occurrences of any of the two points and the number of simultaneous ones. In general, a pair of nearby pixels more often have the same “color” than a pair of faraway pixels. While this metric is easily computable, it has a few issues:
1.1 the inferred distances are non-euclidean.
1.2 the metric can (and often is) skewed by the dataset itself, e.g.
- pixels that often form a shape of a given digit appear to be “closer”.
- some pixels may have (almost) always the same “color” so they can not be used to derive a distance via statistical co-occurrence.
2.1 Volume of (hyper)sphere to radius ratio: in 1-D volume increases linearly with radius, in 2D space it is quadratic (area of a circle is proportional to radius squared) and so on.
2.2 counting equidistant points - in 2D we can have no more than 3 points at same distance from each other (equilateral triangles), in 1D we can have only 2 points, in 3D we can find 4 points (tetrahedrons).
2.3 There can be other similar techniques as the two above, by combining them there-s a chance to infer a metric closer to euclidian metric from the co-occurrence metric
The latent space geometry can be investigated further, e.g. after counting a number of implicit dimensions another algorithm may find out relevant orthogonal axes which generate or describe the underlying space.
You might be interested in the new paper MAMBA - apparently they did a permuted MAZE task! So it had to learn the permutation then learn to escape