Am I correct in thinking the encoder bucket mapping is somewhat arbitrary in this class?
The class matching exercise in Encoder
appears to simply be choosing the first bucket for a given sub-encoder. Each encoder is getting an int[]
with dimensions equal to the number of sub encoders it has. The code simply takes the first one. It looks like there’s a method to this - the 4 sub encoders in my DateEncoder
are always ordered with “season” first - but otherwise it’s simply a number (which presumably wants to be predictable, non-negative, and “well-distributed” in some fashion):
...
Object o = encoderInputMap.get(name);
if(DateTime.class.isAssignableFrom(o.getClass())) {
bucketIdx = ((DateEncoder)e).getBucketIndices((DateTime)o)[0];
} else if(Number.class.isAssignableFrom(o.getClass())) {
bucketIdx = e.getBucketIndices((double)o)[0];
} else {
bucketIdx = e.getBucketIndices((String)o)[0];
}
...
If this is the case, it would appear to me the solution might be something like adding a branch
...
} else if(Tuple.class.isAssignableFrom(o.getClass())) {
bucketIdx = ((CoordinateEncoder)e).getBucketIndices((Tuple)o)[0];
} else {
...
and then overriding CoordinateEncoder#getBucketIndices
(which has only a single encoder). This override might return a bucket that is some measure of its Tuple
's contained vector - perhaps the floor of its Manhattan length?