[mlpack] Cross-validation and hyper-parameter tuning infrastructure

Mon Apr 10 02:13:50 EDT 2017

Hi Ryan,

I think I’m starting to see your perspective of how grid search optimiser should be implemented. But some concerns remain.
1. Some information (precision) can be lost during conversions between integer and floating-point values (e.g., during coding size_t value into a cell of arma::mat). It is not very likely to happen in practice (requiring very big values for integers), but it should be mentioned anyway.
2. There are some other types of arguments in constructors for machine learning algorithms (models) beside numeric types and data::DatasetInfo. These include a template WeakLearnerType in AdaBoost, templates CategoricalSplitType and NumericSplitType in HoeffdingTree, std::unordered_map<size_t, std::pair<size_t, size_t>>* in HoeffdingTree, arma::mat in LARS. Some non-numerical types of arguments can also emerge in constructors of new machine learning algorithms.
3. In the case of hyper-parameter tuning  I guess a loss function should be a wrap for a cross validation class (we want to optimize performance on validation sets). But it is not clear what type of interface it should provide: DecomposableFunctionType (like for SGD) or FunctionType (like for SA or GradientDescent, all prerequisites for which can potentially be combined in one class).

Best regards,

Kirill Mishchenko

> On 7 Apr 2017, at 23:00, Ryan Curtin <ryan at ratml.org> wrote:
> 
> On Fri, Apr 07, 2017 at 10:26:45AM +0500, Kirill Mishchenko wrote:
>> Hi Ryan.
>> 
>> By now it is hard for me to imagine how to make grid search optimiser
>> to have a similar interface to already implemented optimisers like SGD
>> since they work in slightly different domains. I guess a reasonable
>> interface for grid search optimiser will allow such usage.
>> 
>>  arma::mat data /* = ... */;
>>  arma::Row<size_t> labels /* = ... */;
>> 
>>  GridSearchOptimizer<SoftmaxRegression<>, Accuracy, KFoldCV>
>>      softmaxOptimizer(data, labels);
>> 
>>  std::array<size_t, 1> numClasses = {5};
>>  arma::vec lambdas = arma::logspace(-3, 1); // {0.001, 0.01, 0.1, 1}
>> 
>>  std::tuple<size_t, double> bestSoftmaxParams =
>>      softmaxOptimizer.Optimize(numClasses, lambdas);
>>  double bestSoftmaxAccuracy = softmaxOptimizer.BestMeasurement();
>>  SoftmaxRegression<>& bestSoftmaxModel = softmaxOptimizer.BestModel();
>> 
>> Here when we call Optimize we pass a collection of possible values for
>> each hyper-parameter (each additional constructor argument) of
>> SoftmaxRegression<>. A similar interface of grid search for
>> hyper-parameters is provided by scikit-learn:
>> http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html.
>> The result of the Optimize call is a tuple of selected values (one per
>> hyper-parameter) - I think it is good to return it here since only
>> here we can deduct what type the tuple should have (if we want to be
>> able to provide the tuple of selected hyper-parameters by another call
>> after optimizing, we need to pass information about the tuple type
>> when constructing a GridSearchOptimizer object).
>> 
>> On the other hand, SGT has the following signature for Optimize:
>> 
>>  double Optimize(arma::mat& iterate);
>> 
>> which I think can’t be easily utilized by grid search. Maybe you have some ideas?
> 
> Hi Kirill,
> 
> I think I was thinking at a slightly lower level---the
> GridSearchOptimizer you proposed does the whole thing: it calculates the
> best parameters to use to maximize accuracy using k-fold
> cross-validation with a softmax regression model, and returns those to
> you.  I was thinking more along these lines:
> 
> // Maybe the name could be better.
> HyperparameterSearch<SoftmaxRegression<>, Accuracy, KFoldCV,
>                     GridSearch> search(data, labels);
> 
> And then the GridSearch class could be just like an mlpack optimizer.
> For instance if all of the parameters to SoftmaxRegression<> were
> real-valued (i.e. no categorical parameters), then we could provide the
> same interface as SGD.  If it does have categorical parameters, we could
> give a similar overload.
> 
> template<typename FunctionType>
> class GridSearch
> {
>  // Constructor can take the size of the grid and the ranges in which
>  // to search.
>  GridSearch(const double gridSize,
>             const /* not sure what type yet */ ranges);
> 
>  // This is for when all parameters are numeric.
>  double Optimize(arma::mat& iterate);
> 
>  // This allows categorical parameters.
>  double Optimize(arma::mat& iterate,
>                  data::DatasetInfo& info);
> };
> 
> And then the HyperparameterSearch class could just instantiate the
> GridSearch class (or whatever optimizer) with a loss function type that
> captures the model, cross-validation strategy, and goodness measure.
> Then it can call Optimize() and get the best values, which it can then
> return.
> 
> What is nice about this type of strategy is that now you could use,
> e.g., simulated annealing (src/mlpack/core/optimizers/sa/) to do
> hyperparameter optimization.  (Though that would need to be changed to
> support categoricals too, but that is well-understood.)  In addition,
> there are some neat AutoML optimizers that could be used here; for
> instance, I am hoping to implement one called SMAC soon.
> 
> I know that there are going to be problems to be worked out in what I
> have written above (it is just a first sketch), but the general idea is
> that it would be nice to allow different types of hyperparameter search,
> and it would be even nicer if the optimizers being used for
> hyperparameter search could be the generic enough to be used in other
> parts of mlpack.
> 
> I hope that this is helpful; let me know if my ideas are too crazy.  But
> I think they should be possible... :)
> 
> -- 
> Ryan Curtin    | "That rug really tied the room together."
> ryan at ratml.org |   - The Dude