maxCategoryCount of SDR classifier

I find very little or almost no explanation of the maxCategoryCount parameter of the SDR classifier. As I understand this is the maximum number of values that the classifier can learn to distinguish. Thus, it seems that this number should be linked to the range of values of the input data. So if I have integer input data that takes no more than 10000 different values, the reasonable thing to do would be to set the maxCategoryCount parameter to 10000. Is my reasoning correct?

Good question. I’ll have to defer to @ycui, @subutai, or @scott.

The parameter is an implementation detail of the SDR classifier region. The SDR classifier doesn’t know the number of categories up front. The region thus allocates an arbitrary size array for the output predictions based on the maxCategoryCount parameter. This size must be at least as large as the number of categories that are actually used or it won’t be large enough for the output.

So just make sure you set it large enough that it provides enough space and you won’t have an issue. The exact value doesn’t actually matter beyond this.

1 Like

According to the 1.0.5 docs, it looks like this parameter isn’t used anymore. I tried searching through the code to see how this is accounted for without the user specifying it, but I’m at a loss. I tried passing it into SDRClassifierFactory.create() hoping the **kwargs would take care of it, but I get an error saying the keyword is unexpected. Does anyone know how I can specify this? Do I even need to?