Columns > 2048 when swarming


I was recently watching some of the videos regarding multi-variate swarming. One of the suggestions for using multiple inputs is to utilize more than the default of 2048 columns as well as larger data sets. My data set is on the magnitude of millions of rows, so I would like to attempt to swarm utilizing 4 or 5 different fields. I can see in the where the number of columns is specified, however I cannot figure out how to change this for swarming purposes. I looked at the schema for the swarm_config.json file and didn’t see that variable anywhere.

Can anyone help shed some light on how to manipulate the number of columns during the swarming process?


Phil Elsasser

1 Like

@pelsasser Sorry for the late response. The lower-level swarming interface described by @scott here will probably give you the flexibility you need, but running those swarms is going to take some serious time and computing power. Correct me if I’m wrong, @alavin, but the SP should be able to handle in input space that is larger than the number of columns.

Since you are mentioning that you have multiple fields that might contribute to the result, I should bring up that many times in the past I’ve thought the same thing and it turned out that none of those fields were chosen by the swarming algorithm to be encoded, which means the swarming process did not think the fields contributed to a better prediction when included.

This discussion might need some details about the data your are analyzing for us to better understand how to help.

Yes, I’d suggest using a representative subset of your data to swarm over.

You are correct :slight_smile:

It’s very common (in ML in general) to find parameters of your dataset to be insignificant. Swarming can help identify if this is the case.

Yes, I’d suggest using a representative subset of your data to swarm over.

You can limit the number of records in the file.