Multidimensional encodings using Java API


#21

@cogmission, This thread shows that my PR must be rejected since my code also forces only 2D arrays to be encoded using the sensors and network api…


#22

Look at lines 58 through 73 in org.numenta.nupic.encoders.GeospatialCoordinateEncoder (which are making the required transformation:

	public void encodeIntoArray(Tuple inputData, int[] output) {
		double longitude = (double)inputData.get(0);
		double lattitude = (double)inputData.get(1);
		double speed = (double)inputData.get(2);
		int[] coordinate = coordinateForPosition(longitude, lattitude);
		double radius = radiusForSpeed(speed);
		
		super.encodeIntoArray(new Tuple(coordinate, radius), output);
	}
	
	public int[] coordinateForPosition(double longitude, double lattitude) {
		double[] coordinate = toMercator(longitude, lattitude);
		coordinate[0] /= scale;
		coordinate[1] /= scale;
		return new int[] { (int)coordinate[0], (int)coordinate[1] };
	}

In short I’m certain that CoordinateEncoder is designed to support an arbitrary number of dimensions (well up to Int.Max anyway ;)).


#23

Good insight @Matheus_Araujo. It’s true the coordinate encoder should be able to handle arrays of any dimension.


#24

I’m going to do it right later and send a new PR.


#25

As seen above, it’s always a 2D array…


#26

That’s in the GeospatialCoordinateEncoder and it’s explainable because the geospatial encoding only recognizes two dimensions. There are no restrictions on size I can see in the CoordinateEncoder itself - it simply takes an int[] (and a radius).

There’s also no argument I can think of from a phenomenological standpoint to make such restrictions a priori - even in the case of Geospatial encoding at least 3 dimensions actually seems useful (eg. altitude).

But as noted above, I am very much not an expert on this topic so if someone who is can shed light on whether or not there are additional assumptions of limits on coordinate dimensions, it would probably be useful information (though also perhaps escaping the bounds of the topic/forum a bit).


#27

It’s really useful for processing XYZ points as I did here.


#28

What’s your stack trace?

Look at this method from Layer:

/**
     * If this Layer has a Sensor, map its encoder's buckets
     * 
     * @param sequence
     * @return
     */
    private Observable<ManualInput> mapEncoderBuckets(Observable<ManualInput> sequence) {
        if(hasSensor()) {
            if(getSensor().getMetaInfo().getFieldTypes().stream().anyMatch(ft -> {
                return ft == FieldMetaType.SARR || ft == FieldMetaType.DARR || ft == FieldMetaType.COORD || ft == FieldMetaType.GEO;
            })) {
                if(autoCreateClassifiers) {
                    throw new IllegalStateException("Cannot autoclassify with raw array input or " + " Coordinate based encoders... Remove auto classify setting.");
                }
                return sequence;
            }
            
            sequence = sequence.map(m -> {
                doEncoderBucketMapping(m, getSensor().getInputMap());
                return m;
            });
        }

        return sequence;
    }

It seems that when you are using a coordinateEncoder you can’t auto create classifiers…


#29

Hmm thanks, that’s helpful. Because I didn’t use a sensor I didn’t get that error.


#30

@cogmission Here’s a repo that reproduces the issue. Th only catch is it will require sbt: https://github.com/keith-nordstrom/htm-stacktrace-example


#31

What is your current status? What do you have issues with, and what do you need? :slight_smile:

I will be reviewing the code and the current PRs toward bringing things up to date (which I will need to do before introducing any new code modifications). Then, I will address whatever issues you and @Matheus_Araujo might currently have?

Also, what is your timeline for making the pitch to use NuPIC in your company?


#32

@cogmission thanks for your attention (and everyone else’s). I am presently unblocked and proceeding. I’ll let you know if I run into anything else that appears blocking to me.

My timeline is somewhat up to me (privilege of being the CTO, I suppose). But my target is the end of the month - any longer than that and I will start having to make uncomfortable justifications to the very stern man who trickles a small portion of his large pile of money into my company each month.

My objective is to provide a viable alternative to a somewhat novel autoencoder approach created by my data science team to predict the health of certain pieces of industrial equipment. The category encoder concept alone gives a leg up, provided predictions can be evaluated with similar recall/precision. Once I can get this calculating something sensible I will likely hand the project to someone on that end of things for fine tuning and full evaluation. If/when that happens I’ll try to make sure context is maintained on forum posts.

Thanks again.


#33

@keith.nordstrom,

You’re very welcome. Keep us up to date?


#34

You were asking about encoders and buckets earlier. Here is a recent post you may find informative.


#35

Thanks, that’s definitely helpful. Is the lack of a bucketing scheme the reason why coordinate encoders don’t have classifiers?