Defining Category encoders for Network configuration

Dear all,
Thanks for your work on the documentation and tutorials !

However, I’m trying to build a network based on the tutorial available here but I’m kind of stuck. I want to build a network for anomaly detection purposes and I have a problem with the model configuration file. (model.yaml)

I’ve been looking everywhere on the documentation and the forums but cannot find any information on how to define encoders for categories (inside the yaml file). The tutorials only show us how to build RSDR of Datetime encoders as follows :

encoders:
     encoders:
  kw_energy_consumption:
    fieldname: kw_energy_consumption
    name: kw_energy_consumption
    resolution: 0.88
    seed: 1
    type: RandomDistributedScalarEncoder
  timestamp_timeOfDay:
    fieldname: timestamp
    name: timestamp_timeOfDay
    timeOfDay: [21, 1]
    type: DateEncoder
  timestamp_weekend:
    fieldname: timestamp
    name: timestamp_weekend
    type: DateEncoder
    weekend: 21

How can I define an CategoryEncoder ? I tried the following :

  category:
    fieldname: category
    name: bookcategory
    type: CategoryEncoder

But it seems something is missing (I guess its the categories but I don’t know how to define them in this file.
Any ideas would be really appreciated thanks !

Try using type: SDRCategoryEncoder. Does that help?

1 Like

Hey Matt,
thanks for your reply, you are right I needed to use SDRCategoryEncoder. (omg mom i’m on tv)

So, I managed to make it work using this syntax (for posterity here’s how to completely define it:

fieldname: categoryE
name: categoryE
categoryList: [‘CAT1’, ‘CAT2’, ‘CAT3’, ‘CAT4’]
n: 2048
w: 21
type: SDRCategoryEncoder

However, I’m getting weird behavior from the network. I’m reading a large csv file that has roughly 70 columns (some containing category strings). I have not set an encoder for all columns as I only use some of them. Nevertheless, the FileRecordStream appears to read the entire file :
dataSource = FileRecordStream(streamID=_INPUT_FILE_PATH)

And I get the following error if strings are present in my data:

File "detection_network.py", line 156, in <module>
anomaly_detection(_NUM_RECORDS)
File "detection_network.py", line 140, in anomaly_detection
network.run(N)
File "/home/dse/.local/lib/python2.7/site-packages/nupic/engine/__init__.py", line 673, in run
engine_internal.Network.run(self, n)
File "/home/dse/.local/lib/python2.7/site-packages/nupic/bindings/engine_internal.py", line 1282, in run
return _engine_internal.Network_run(self, *args, **kwargs)
File "/home/dse/.local/lib/python2.7/site-packages/nupic/bindings/regions/PyRegion.py", line 186, in guardedCompute
return self.compute(inputs, DictReadOnlyWrapper(outputs))
  File "/home/dse/.local/lib/python2.7/site-packages/nupic/regions/record_sensor.py", line 430, in compute
self.populateCategoriesOut(categories, outputs['categoryOut'])
  File "/home/dse/.local/lib/python2.7/site-packages/nupic/regions/record_sensor.py", line 357, in populateCategoriesOut
output[i] = cat
ValueError: could not convert string to float: N

The only way to get rid of this error is to delete the columns that contain strings from the file, but then I cannot perform anomaly detection based on the categories

Do you have an Idea of what’s wrong ? My code is exactly the same as the HotGym example (except the features names and the file source).

If you are reading from a file, you should ensure you have a header that defines the fields in the file. See docs if you haven’t already. The format is:

f1,f2,f3,....fN
int,string,datetime,bool,...
R,S,T,,,,....

Here is an example.

Thank you again for your response.
I am familliar with the format of the data, however, when I use a file format as follows

f1, f2, f3, ..., fn
int, string, string, bool, ...
R, S, T, **C**, ...

Then it does not work the problem appears to be coming from the Category metatype C field. I think the problem comes from the Network api because I managed to make it work following this anomaly detection example while keeping the category metatype field.

I’d rather use the network-API as I find it more convenient to build the network with other features.

I got no idea on where to go from there though