HTM to process 2D SEAGen data

Hi David, can you please help me. I am trying to encode SEAGen data which consists of mapping x,y float coordinates to an integer label of either 0 or 1. I’m using htm.core community with python3.7. I’m currently modifying the hotgym.py file to take my csv file…
c0,c1,c2
float,float,bool**
S
6.5358958546461,1.15006943124406,0
8.72474535182035,2.1233268092272,1

I can’t find any example suitable for just primitive data types …

inside hotgym.py …

‘enc’: {
“value” :
{‘resolution’: 0.88, ‘size’: 700, ‘sparsity’: 0.02},
“time”:
{‘timeOfDay’: (30, 1), ‘weekend’: 21}
},

Can you help me to format the encoder definition for float,float, bool? the bool is the prediction field

btw as far as resolution i only care about 0.0 - 9.9 range

I can’t find any example suitable for just primitive data types …

The RDSE and ScalerEncoder are both encoders for a scalar number. That is a primitive data type. A category can be represented by an numeric index so the RDSE or ScalerEncoders will work. Boolean can be represented by the numbers 0 and 1 so they also can be encoded this way. Encoding an array of values would require an encoder for each element in the array.

You have two primitive values to encode so you will need two encoders. In the hotgym example there are also two values, consumption and time. Each have there own encoders and the results are concatenated. The DateEncoder is actually a collection of several scalar encoders and the result returned by the DateEncoder class is a concatenation of the outputs of each of those encoders. You can concatenate as many encoder outputs as you need. I recommend that you view the HTM School session on Encoders if you want to understand how encoders work.

I hope that helps.

Since I have three pieces of data (x,y,label), I should use 3 encoders rights?

Since I have three pieces of data (x,y,label), I should use 3 encoders rights?

Yes, you will need 3 encoders. You have your choice of RDSE or Scalar encoder.

This is an interested use-case of HTM, so I have moved into its own topic in #engineering.

After looking up SEAGen, I assume the x,y represent locations of sensors floating in a liquid plane? What does the binary value represent?

You may not want to follow the hotgym example for this. You could construct a custom solution with the Network API. It’s more flexible if you want to swap out encoders or even create your own later on.

As for encoding advice, I’m not sure you really want to use an RDSE here when a standard scalar encoder would work. If you know your x/y/z min and max values ahead of time, the simplest way to encode would be best, IMO. It’s also helpful to be able to view the binary values and understand what they mean as you are debugging.

1 Like

Hi Matt, so in my research, for my dissertation, I’m setting up an experiment that compares HTM as a classifier against the following specific classifiers (ARF, DWM, LevBag, OAUE, OSBoost, VFDT) using 9 different data sets:

  • Artificial data streams: (Hyperplane, LED, RT, SEA)

  • Real-world data streams (Airlines, Elec., Forest, KDDcup, and Poker)

The SEA generator creates a artificial data stream of 3d points (x,y,z) and a label (0 or 1) for each point based on 1 of 4 functions (att1 = x, att2 = y, ignores z value). x and y each range from 0.0 to 10.0.

I’m measuring the classification accuracy, time to classify 1 million data points, and the memory size of the data structure needed for the experiment.

Current encoder dictionary setup:

2 Likes

If this is purely a spatial classification task, HTM will not perform well. It looks like there is no temporal structure to your test data.

Thats ok as some of the other 8 data sets should contain temporal structure (electricity usage, airline). I expect my report will highlight that HTM does well with those but not well in artificial data streams. Oh and I forgot to mention that concept drift has been applied to the data streams along with noise. My hypothesis is that HTM’s will do better with concept drift and noise for the real world data.

2 Likes

I agree big time. I think this is one major way that HTM can shine over over approaches, especially how fast it adapts to the emergence of new patterns.

In case you haven’t seen I’d recommend these papers to support that hypothesis:

If you end up with a paper on this I’ll cite you for sure!

2 Likes