Defining Category encoders for Network configuration

Quense · October 1, 2019, 5:21pm

Dear all,
Thanks for your work on the documentation and tutorials !

However, I’m trying to build a network based on the tutorial available here but I’m kind of stuck. I want to build a network for anomaly detection purposes and I have a problem with the model configuration file. (model.yaml)

I’ve been looking everywhere on the documentation and the forums but cannot find any information on how to define encoders for categories (inside the yaml file). The tutorials only show us how to build RSDR of Datetime encoders as follows :

encoders:
     encoders:
  kw_energy_consumption:
    fieldname: kw_energy_consumption
    name: kw_energy_consumption
    resolution: 0.88
    seed: 1
    type: RandomDistributedScalarEncoder
  timestamp_timeOfDay:
    fieldname: timestamp
    name: timestamp_timeOfDay
    timeOfDay: [21, 1]
    type: DateEncoder
  timestamp_weekend:
    fieldname: timestamp
    name: timestamp_weekend
    type: DateEncoder
    weekend: 21

How can I define an CategoryEncoder ? I tried the following :

  category:
    fieldname: category
    name: bookcategory
    type: CategoryEncoder

But it seems something is missing (I guess its the categories but I don’t know how to define them in this file.
Any ideas would be really appreciated thanks !

rhyolight · October 1, 2019, 6:39pm

Try using type: SDRCategoryEncoder. Does that help?

Quense · October 2, 2019, 9:25am

Hey Matt,
thanks for your reply, you are right I needed to use SDRCategoryEncoder. (omg mom i’m on tv)

So, I managed to make it work using this syntax (for posterity here’s how to completely define it:

fieldname: categoryE
name: categoryE
categoryList: [‘CAT1’, ‘CAT2’, ‘CAT3’, ‘CAT4’]
n: 2048
w: 21
type: SDRCategoryEncoder

However, I’m getting weird behavior from the network. I’m reading a large csv file that has roughly 70 columns (some containing category strings). I have not set an encoder for all columns as I only use some of them. Nevertheless, the FileRecordStream appears to read the entire file :
dataSource = FileRecordStream(streamID=_INPUT_FILE_PATH)

And I get the following error if strings are present in my data:

File "detection_network.py", line 156, in <module>
anomaly_detection(_NUM_RECORDS)
File "detection_network.py", line 140, in anomaly_detection
network.run(N)
File "/home/dse/.local/lib/python2.7/site-packages/nupic/engine/__init__.py", line 673, in run
engine_internal.Network.run(self, n)
File "/home/dse/.local/lib/python2.7/site-packages/nupic/bindings/engine_internal.py", line 1282, in run
return _engine_internal.Network_run(self, *args, **kwargs)
File "/home/dse/.local/lib/python2.7/site-packages/nupic/bindings/regions/PyRegion.py", line 186, in guardedCompute
return self.compute(inputs, DictReadOnlyWrapper(outputs))
  File "/home/dse/.local/lib/python2.7/site-packages/nupic/regions/record_sensor.py", line 430, in compute
self.populateCategoriesOut(categories, outputs['categoryOut'])
  File "/home/dse/.local/lib/python2.7/site-packages/nupic/regions/record_sensor.py", line 357, in populateCategoriesOut
output[i] = cat
ValueError: could not convert string to float: N

The only way to get rid of this error is to delete the columns that contain strings from the file, but then I cannot perform anomaly detection based on the categories

Do you have an Idea of what’s wrong ? My code is exactly the same as the HotGym example (except the features names and the file source).

rhyolight · October 2, 2019, 6:16pm

If you are reading from a file, you should ensure you have a header that defines the fields in the file. See docs if you haven’t already. The format is:

f1,f2,f3,....fN
int,string,datetime,bool,...
R,S,T,,,,....

Here is an example.

Quense · October 2, 2019, 6:45pm

Thank you again for your response.
I am familliar with the format of the data, however, when I use a file format as follows

f1, f2, f3, ..., fn
int, string, string, bool, ...
R, S, T, **C**, ...

Then it does not work the problem appears to be coming from the Category metatype C field. I think the problem comes from the Network api because I managed to make it work following this anomaly detection example while keeping the category metatype field.

I’d rather use the network-API as I find it more convenient to build the network with other features.

I got no idea on where to go from there though

Topic		Replies	Views
Swarm with category encoder NuPIC	1	489	June 13, 2016
Community CategoryEncoder in python exist? NuPIC	8	858	April 12, 2019
How to find correct encoder? NuPIC encoders , category-encoding	4	956	October 16, 2018
How to encode categorical data using CategoryEncoder NuPIC	10	1091	March 22, 2018
Python 3 Migration: encoder attribute NuPIC python , nupic-core	2	791	January 24, 2018

Defining Category encoders for Network configuration

Related topics