HTM to process 2D SEAGen data

barnettjv · March 29, 2020, 6:09am

Hi David, can you please help me. I am trying to encode SEAGen data which consists of mapping x,y float coordinates to an integer label of either 0 or 1. I’m using htm.core community with python3.7. I’m currently modifying the hotgym.py file to take my csv file…
c0,c1,c2
float,float,bool**
S
6.5358958546461,1.15006943124406,0
8.72474535182035,2.1233268092272,1

I can’t find any example suitable for just primitive data types …

inside hotgym.py …

‘enc’: {
“value” :
{‘resolution’: 0.88, ‘size’: 700, ‘sparsity’: 0.02},
“time”:
{‘timeOfDay’: (30, 1), ‘weekend’: 21}
},

Can you help me to format the encoder definition for float,float, bool? the bool is the prediction field

barnettjv · March 29, 2020, 6:38am

btw as far as resolution i only care about 0.0 - 9.9 range

David_Keeney · March 29, 2020, 1:41pm

I can’t find any example suitable for just primitive data types …

The RDSE and ScalerEncoder are both encoders for a scalar number. That is a primitive data type. A category can be represented by an numeric index so the RDSE or ScalerEncoders will work. Boolean can be represented by the numbers 0 and 1 so they also can be encoded this way. Encoding an array of values would require an encoder for each element in the array.

You have two primitive values to encode so you will need two encoders. In the hotgym example there are also two values, consumption and time. Each have there own encoders and the results are concatenated. The DateEncoder is actually a collection of several scalar encoders and the result returned by the DateEncoder class is a concatenation of the outputs of each of those encoders. You can concatenate as many encoder outputs as you need. I recommend that you view the HTM School session on Encoders if you want to understand how encoders work.

I hope that helps.

barnettjv · March 30, 2020, 2:59am

Since I have three pieces of data (x,y,label), I should use 3 encoders rights?

David_Keeney · March 31, 2020, 2:20pm

Since I have three pieces of data (x,y,label), I should use 3 encoders rights?

Yes, you will need 3 encoders. You have your choice of RDSE or Scalar encoder.

rhyolight · March 31, 2020, 5:04pm

This is an interested use-case of HTM, so I have moved into its own topic in Engineering.

After looking up SEAGen, I assume the x,y represent locations of sensors floating in a liquid plane? What does the binary value represent?

You may not want to follow the hotgym example for this. You could construct a custom solution with the Network API. It’s more flexible if you want to swap out encoders or even create your own later on.

As for encoding advice, I’m not sure you really want to use an RDSE here when a standard scalar encoder would work. If you know your x/y/z min and max values ahead of time, the simplest way to encode would be best, IMO. It’s also helpful to be able to view the binary values and understand what they mean as you are debugging.

barnettjv · March 31, 2020, 5:37pm

Hi Matt, so in my research, for my dissertation, I’m setting up an experiment that compares HTM as a classifier against the following specific classifiers (ARF, DWM, LevBag, OAUE, OSBoost, VFDT) using 9 different data sets:

Artificial data streams: (Hyperplane, LED, RT, SEA)
Real-world data streams (Airlines, Elec., Forest, KDDcup, and Poker)

The SEA generator creates a artificial data stream of 3d points (x,y,z) and a label (0 or 1) for each point based on 1 of 4 functions (att1 = x, att2 = y, ignores z value). x and y each range from 0.0 to 10.0.

I’m measuring the classification accuracy, time to classify 1 million data points, and the memory size of the data structure needed for the experiment.

Current encoder dictionary setup:

rhyolight · March 31, 2020, 5:43pm

If this is purely a spatial classification task, HTM will not perform well. It looks like there is no temporal structure to your test data.

barnettjv · March 31, 2020, 5:48pm

Thats ok as some of the other 8 data sets should contain temporal structure (electricity usage, airline). I expect my report will highlight that HTM does well with those but not well in artificial data streams. Oh and I forgot to mention that concept drift has been applied to the data streams along with noise. My hypothesis is that HTM’s will do better with concept drift and noise for the real world data.

sheiser1 · March 31, 2020, 7:01pm

I agree big time. I think this is one major way that HTM can shine over over approaches, especially how fast it adapts to the emergence of new patterns.

In case you haven’t seen I’d recommend these papers to support that hypothesis:

If you end up with a paper on this I’ll cite you for sure!

Topic		Replies	Views
In Random scaler encoder what does the parameter resolution do? NuPIC Community Fork question	9	1204	March 31, 2020
Community CategoryEncoder in python exist? NuPIC	8	858	April 12, 2019
HTM School Episode 6: Datetime Encoding YouTube encoders	3	1458	June 27, 2016
Know nothing about encoders? Want to learn? Engineering encoders , education , newbie	6	882	June 30, 2018
Shoud I update the RDSE encoding resolution while doing anomaly detection? NuPIC question	5	508	January 26, 2021

HTM to process 2D SEAGen data

Related topics