[mlpack] Cross-validation and hyper-parameter tuning infrastructure

Ryan Curtin ryan at ratml.org
Fri Apr 7 14:00:26 EDT 2017


On Fri, Apr 07, 2017 at 10:26:45AM +0500, Kirill Mishchenko wrote:
> Hi Ryan.
> 
> By now it is hard for me to imagine how to make grid search optimiser
> to have a similar interface to already implemented optimisers like SGD
> since they work in slightly different domains. I guess a reasonable
> interface for grid search optimiser will allow such usage.
> 
>   arma::mat data /* = ... */;
>   arma::Row<size_t> labels /* = ... */;
> 
>   GridSearchOptimizer<SoftmaxRegression<>, Accuracy, KFoldCV>
>       softmaxOptimizer(data, labels);
> 
>   std::array<size_t, 1> numClasses = {5};
>   arma::vec lambdas = arma::logspace(-3, 1); // {0.001, 0.01, 0.1, 1}
> 
>   std::tuple<size_t, double> bestSoftmaxParams =
>       softmaxOptimizer.Optimize(numClasses, lambdas);
>   double bestSoftmaxAccuracy = softmaxOptimizer.BestMeasurement();
>   SoftmaxRegression<>& bestSoftmaxModel = softmaxOptimizer.BestModel();
> 
> Here when we call Optimize we pass a collection of possible values for
> each hyper-parameter (each additional constructor argument) of
> SoftmaxRegression<>. A similar interface of grid search for
> hyper-parameters is provided by scikit-learn:
> http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html.
> The result of the Optimize call is a tuple of selected values (one per
> hyper-parameter) - I think it is good to return it here since only
> here we can deduct what type the tuple should have (if we want to be
> able to provide the tuple of selected hyper-parameters by another call
> after optimizing, we need to pass information about the tuple type
> when constructing a GridSearchOptimizer object).
> 
> On the other hand, SGT has the following signature for Optimize:
> 
>   double Optimize(arma::mat& iterate);
> 
> which I think can’t be easily utilized by grid search. Maybe you have some ideas?

Hi Kirill,

I think I was thinking at a slightly lower level---the
GridSearchOptimizer you proposed does the whole thing: it calculates the
best parameters to use to maximize accuracy using k-fold
cross-validation with a softmax regression model, and returns those to
you.  I was thinking more along these lines:

// Maybe the name could be better.
HyperparameterSearch<SoftmaxRegression<>, Accuracy, KFoldCV,
                     GridSearch> search(data, labels);

And then the GridSearch class could be just like an mlpack optimizer.
For instance if all of the parameters to SoftmaxRegression<> were
real-valued (i.e. no categorical parameters), then we could provide the
same interface as SGD.  If it does have categorical parameters, we could
give a similar overload.

template<typename FunctionType>
class GridSearch
{
  // Constructor can take the size of the grid and the ranges in which
  // to search.
  GridSearch(const double gridSize,
             const /* not sure what type yet */ ranges);

  // This is for when all parameters are numeric.
  double Optimize(arma::mat& iterate);

  // This allows categorical parameters.
  double Optimize(arma::mat& iterate,
                  data::DatasetInfo& info);
};

And then the HyperparameterSearch class could just instantiate the
GridSearch class (or whatever optimizer) with a loss function type that
captures the model, cross-validation strategy, and goodness measure.
Then it can call Optimize() and get the best values, which it can then
return.

What is nice about this type of strategy is that now you could use,
e.g., simulated annealing (src/mlpack/core/optimizers/sa/) to do
hyperparameter optimization.  (Though that would need to be changed to
support categoricals too, but that is well-understood.)  In addition,
there are some neat AutoML optimizers that could be used here; for
instance, I am hoping to implement one called SMAC soon.

I know that there are going to be problems to be worked out in what I
have written above (it is just a first sketch), but the general idea is
that it would be nice to allow different types of hyperparameter search,
and it would be even nicer if the optimizers being used for
hyperparameter search could be the generic enough to be used in other
parts of mlpack.

I hope that this is helpful; let me know if my ideas are too crazy.  But
I think they should be possible... :)

-- 
Ryan Curtin    | "That rug really tied the room together."
ryan at ratml.org |   - The Dude


More information about the mlpack mailing list