I’ve done some hyperparameter optimisation on a temporal prediction task (203k training rows, RMSE is on 105k testing rows) and found that the best predictions are not necessarily gained when the number of columns is 2048 (which is the default). I’ve done optimisation over most of the available parameters and here’s the result:
I can provide the dataset of all the models, but it looks like the lowest RMSE I can get for this task is ~20.
The downside to so having so many columns is that the model takes longer to run. So why are we given so many columns by default?