[mlpack] Cross-validation and hyper-parameter tuning infrastructure

Sun Apr 16 23:57:21 EDT 2017

Hi Ryan.

> - Use template metaprogramming tricks to, given a type, expand all of
>   its constructor arguments into a list of numeric types.  So say we
>   had:
> 
>     Learner(double a, AuxType b)
>     AuxType(double c, double d)
> 
>   we would ideally want to extract [double, double, double] as our list
>   of types.  I can't quickly think of a strategy for this but it
>   *might* be possible...

Even if we are able to implement this approach, I guess the usage will be quite unintuitive. The implementation (if it is possible) will force a user to pass AuxType constuctor arguments into hyper parameter tuning module instead of AuxType objects themselves since we can’t extract constructor arguments from a created object.

> - Refactor all classes that take an auxiliary class to instead take a
>   template parameter pack to be unpacked into the auxiliary classes'
>   constructors.  This will still be a fair amount of metaprogramming
>   effort but I can see a closer route to a solution with this one.

For this implementation the usage should be more understandable for users (in this solution we provide a constructor of MLAlgorithm that takes arguments that we are going to pass in hyper-parameter tuning module), even though it is still quite complex (we need to pass AuxType constuctor arguments instead of AuxType objects themselves in the first place). But what are we going to do when AuxType is std::unordered_map<size_t, std::pair<size_t, size_t>>* or arma::mat (they appear in HoeffdingTree and LARS respectively)?

>> 3. In the case of hyper-parameter tuning  I guess a loss function
>> should be a wrap for a cross validation class (we want to optimize
>> performance on validation sets). But it is not clear what type of
>> interface it should provide: DecomposableFunctionType (like for SGD)
>> or FunctionType (like for SA or GradientDescent, all prerequisites for
>> which can potentially be combined in one class).
> 
> I'm not sure I fully follow here, can you clarify?

Existing optimizers in mlpack take as the first argument in their constructors a FunctionType object. There are different requirement what the FunctionType object should implement depending on the optimizer type. For instance, for SGD the FunctionType object should have the following method signature:

  double Evaluate(const arma::mat&, const size_t);

whereas for GradientDescent the FunctionType object should have this one:

  double Evaluate(const arma::mat&);

If I understand the whole discussion in the right way, we are ready to restrict ourself to optimise only numerical parameters in order to utilize the existing interface for optimisers. If so, I think it is quite possible to design a solution that allows the following usage.

  arma::mat data /* = ... */;
  arma::Row<size_t> labels /* = ... */;

  HyperParameterTuner<HoeffdingTree<>, Accuracy, KFoldCV>
      hoeffdingTreeTuner(data, labels);

  // Bound arguments
  data::DatasetInfo datasetInfo /* = … */;
  size_t numClasses = 5;
  bool batchTraining = false;
  size_t checkInterval = 100;

  // Setting sets of values to check
  arma::vec successProbabilities = arma::regspace(0.9, 0.01, 0.99);
  std::array<size_t, 2> maxSamplesSet = {0, 3};
  std::array<size_t, 3> minSamplesSet = {50, 100, 150};

  // Making variables for best parameters
  double successProbability;
  size_t maxSamples;
  size_t minSamples;

  // Finding best parameters
  auto bestParameters =
      hoeffdingTreeTuner.Optimize<GridSearch>(Bind(datasetInfo),
          Bind(numClasses), Bind(batchTraining), successProbabilities,
          maxSamplesSet, Bind(checkInterval), minSamplesSet);

  // Unpacking best parameters
  std::tie(successProbability, maxSamples, minSamples) = bestParameters;

In this example we mark the arguments datasetInfo, numClasses, batchTraining, and checkInterval as being bound (they should not be optimised). For other HoeffdingTree constructor arguments we provide sets of values to investigate. Note also that we pass arguments in the same order as for the corresponding HoeffdingTree constructor.

The GridSearch interface will be similar to other optimisers.

  template<typename FunctionType, typename...Collections>
  class GridSearch
  {
  public:
    GridSearch(FunctionType& function,
              const Collections& ...parameterCollections);

    double Optimize(arma::mat& iterate);
  };

A FunctionType function will be an instance of a cross validation wrapper class with approximately such interface.

  template<typename CV, typename...BoundedArgs, int TotalArgs>
  class CVFunction
  {
  public:
    CVFunction(CV& cv, const BoundedArgs& ...boundedArgs);

    double Evaluate(const arma::mat& parameters);
  };

During construction of a CVFunction object, we provide a cross validation object, a sequence of bounded arguments that should contain information about position in the argument list for the method Evaluate of the cross validation object, and total number of arguments that should be passed to the method Evaluate of the cross validation object.

With such design we can reuse GridSearch in other mlpack algorithms, as well as add support for other mlpack optimisers in a relatively simple way. For example, it should be relatively easy to add support for the GradientDescent optimiser with the following usage.

  HyperParameterTuner<SoftmaxRegression<>, Accuracy, KFoldCV>
      softmaxTuner(data, labels);

  // initial value for lambda
  double lambda = 0.001;

  // gradient descent parameters
  double stepSize = 0.001;
  size_t maxIterations = 20;

  double bestLambda;
  std::tie(bestLambda) =
      softmaxTuner.Optimize<GradientDescent>(Bind(numClasses), lambda,
          OptimizerArg(stepSize), OptimizerArg(maxIterations));

Let me know what you think about the proposed idea.

Best regards,

Kirill Mishchenko

> On 15 Apr 2017, at 01:29, Ryan Curtin <ryan at ratml.org> wrote:
> 
> On Mon, Apr 10, 2017 at 11:13:50AM +0500, Kirill Mishchenko wrote:
>> Hi Ryan,
>> 
>> I think I’m starting to see your perspective of how grid search
>> optimiser should be implemented. But some concerns remain.
> 
> Hi Kirill,
> 
> Sorry for the slow response.
> 
>> 1. Some information (precision) can be lost during conversions between
>> integer and floating-point values (e.g., during coding size_t value
>> into a cell of arma::mat). It is not very likely to happen in practice
>> (requiring very big values for integers), but it should be mentioned
>> anyway.
> 
> Agreed.  I think with an IEEE 754 double precision floating point number
> we get 2^54 possible values before loss of precision.
> 
>> 2. There are some other types of arguments in constructors for machine
>> learning algorithms (models) beside numeric types and
>> data::DatasetInfo. These include a template WeakLearnerType in
>> AdaBoost, templates CategoricalSplitType and NumericSplitType in
>> HoeffdingTree, std::unordered_map<size_t, std::pair<size_t, size_t>>*
>> in HoeffdingTree, arma::mat in LARS. Some non-numerical types of
>> arguments can also emerge in constructors of new machine learning
>> algorithms.
> 
> Yes, this is a little bit more difficult.  In most of these situations
> where a class instance is passed, it is usually so that the user can
> specify some of the numeric parameters to those class instances.  For
> instance the AdaBoost WeakLearnerType parameter is used to set the
> parameters of each weak learner that is built.
> 
> So I can see two possibilities although maybe there are more:
> 
> - Use template metaprogramming tricks to, given a type, expand all of
>   its constructor arguments into a list of numeric types.  So say we
>   had:
> 
>     Learner(double a, AuxType b)
>     AuxType(double c, double d)
> 
>   we would ideally want to extract [double, double, double] as our list
>   of types.  I can't quickly think of a strategy for this but it
>   *might* be possible...
> 
> - Refactor all classes that take an auxiliary class to instead take a
>   template parameter pack to be unpacked into the auxiliary classes'
>   constructors.  This will still be a fair amount of metaprogramming
>   effort but I can see a closer route to a solution with this one.
> 
> What do you think?  Do you have any additional ideas?  Note that I have
> not spent significant time thinking or playing with either of these
> ideas so I am nnot fully sure if they will work.
> 
>> 3. In the case of hyper-parameter tuning  I guess a loss function
>> should be a wrap for a cross validation class (we want to optimize
>> performance on validation sets). But it is not clear what type of
>> interface it should provide: DecomposableFunctionType (like for SGD)
>> or FunctionType (like for SA or GradientDescent, all prerequisites for
>> which can potentially be combined in one class).
> 
> I'm not sure I fully follow here, can you clarify?
> 
> Thanks,
> 
> Ryan
> 
> -- 
> Ryan Curtin    | "This room is green."
> ryan at ratml.org |   - Kazan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20170417/c8123370/attachment-0001.html>