[mlpack] Cross-validation and hyper-parameter tuning infrastructure

Mon Apr 24 11:29:58 EDT 2017

On Thu, Apr 20, 2017 at 11:38:32AM +0500, Kirill Mishchenko wrote:
> Hi Ryan.
> 
> > However this makes it unwieldy to optimize over AuxType objects with
> > multiple parameters.  Maybe a thought then is to pass something a little
> > more complex:
> > 
> > std::tuple<std::array<double, 3>, std::array<double, 2>> t =
> >    std::make_tuple({ 1.0, 2.0, 4.0 }, { 2.0, 3.0 });
> > (I think the syntax is probably wrong here, but you get the idea.)
> > 
> > What do you think?  Are these ways that could work?  I think the design
> > space here is pretty complex so we have lots of choices, but many are
> > not good so it is tricky to pick the best design.
> 
> This approach can be problematic. While handling such arguments we
> need to create AuxType objects that will be passed to MLAlgorithm
> constructors. But only from such argument types
> (std::tuple<std::array<double, 3>, std::array<double, 2>> in the
> example above) we don’t know what AuxType should be. I guess a more
> simple and safe solution is to add a constructor MLAlgorithm(const
> DataType&, const PredictionsType&, …, AuxTypeConstructorArgType1, …,
> AuxTypeConstructorArgTypeN, …) instead of/in addition to
> MLAlgorithm(const DataType&, const PredictionsType&, …, AuxType, …)
> where it is required. In this case we will be able to reuse the
> general approach.

Hmm, this is a difficult problem.  One of the key benefits of the
approach of having an AuxType class (which often can be templatized) is
that it can have different types of arguments.  For instance many mlpack
methods have a MetricType template parameter, for which the argument
will usually be LMetric<> (which takes no constructor parameters), but
it could be MahalanobisDistance, which can take a matrix as input.  This
means that in many situations, if we have some kind of AuxType object
that is passed to the constructor, that is a templatized object that
could have many different signatures.  So adding additional
constructors could be problematic, since it is not clear which signature
to give.

The key problem, like you said, is that we don't know what AuxType
should be so we can't call its constructor.  But maybe we can adapt
things a little bit:

template<typename AuxType, typename... Args>
struct Holder /* needs a better name */
{
  // This typedef allows us access to the type we need to construct.
  typedef AuxType Aux;

  // These are the parameters we will use.
  std::tuple<Args...> args;

  Holder(Args... argsIn) { /* put argsIn into args */ }
};

Then we could use this in addition with the Bind() class when calling an
optimizer:

  std::array<double, 3> param3s = { 1.0, 2.0 4.0 };
  std::array<double, 2> auxParam1s = { 1.0, 3.0 };
  std::array<double, 4> auxParam2s = { 4.0, 5.0, 6.0, 8.0 };
  auto results = tuner.Optimize<GridSearch>(Bind(param1), Bind(param2),
      param3s, Holder<AuxType>(auxParam1s, auxParam2s));

Like most of my other code ideas, this is a very basic sketchup, but I
think it can work.  Let me know what you think or if there is some
detail I did not think about enough that will make the idea fail. :)

> > How can we handle optimizing over a given range, though?  For instance
> > we could use, e.g., simulated annealing to optimize for the best lambda
> > to use for softmax regression, but this could result in any arbitrary
> > value between 0 and 1.  So we would want to pass in a range, not just a
> > series of values to test.
> 
> I guess simulated annealing is a good choice for such cases. But
> probably it should be implemented as a separated optimiser  as an
> extension of what we have already discussed., since grid search is
> usually understood as iterating over discrete values (see, e.g., grid
> search implemented in scikit-learn:
> http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
> <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html>).

Right, I see what you mean.  We have this issue where sometimes we are
optimizing over categorical variables (which is what I would say this
grid search case is), and sometimes we would like to optimize over
numeric variables (like what I am understanding of what you wrote below,
where GradientDescent is used and 'lambda' can take any value).

> > Also, as a side note, I see in the proposal that the example
> > hoeffdingTreeOptimizer also optimizes over the DatasetInfo, but I think
> > this doesn't make too much sense since the DatasetInfo specifies which
> > dimensions are categorical and which are numeric.  I suppose, it is
> > possible, by allowing the user to optimize over DatasetInfos, we might
> > be allowing them to decide whether it is better to have their dimensions
> > set as categorical or numeric.  Maybe that is what you had in mind?
> 
> When I was writing a proposal I didn’t have in my mind the requirement
> that we really want to utilize existing optimizers and reuse grid
> search in other algorithms. Rather I proposed the most simple solution
> (in my humble opinion) to make grid search work for hyper-paramer
> tuning - by passing to the method Optimize all additional arguments of
> MLAlgorithm as sets of values even if we want to use only one value.
> It is why datasetInfo was passed as an array containing one value.

Sure; I think maybe we should allow the user to pass in a DatasetInfo
with the training data and labels, to keep things simple.

> > Assuming that the HyperParameterTuner is able to turn all of the
> > OptimizerArg()s into a coherent arma::mat to be optimized, we can use
> > any of the mlpack optimizers.
> 
> It’s true that we can potentially reuse any of the mlpack optimisers,
> but I would like to clarify a bit more how the method Optimize is
> going to be used and implemented (what I have in my mind). Firstly,
> the arguments in the beginning of an argument list should correspond
> to additional arguments of the corresponding MLAlgorithm constructor.
> For the call
> 
>   auto bestParameters =
>       hoeffdingTreeTuner.Optimize<GridSearch>(Bind(datasetInfo),
>           Bind(numClasses), Bind(batchTraining), successProbabilities,
>           maxSamplesSet, Bind(checkInterval), minSamplesSet);
> 
> such arguments are the all passed arguments. For the call
> 
>   std::tie(bestLambda) =
>       softmaxTuner.Optimize<GradientDescent>(Bind(numClasses), lambda,
>           OptimizerArg(stepSize), OptimizerArg(maxIterations));
> 
> such arguments are Bind(numClasses) and lambda. Secondly, the
> mentioned two examples will be handled differently. In the first case
> unbound arguments (successProbabilities, maxSamplesSet, minSamplesSet)
> will be passed to the optimiser constructor (GridSearch). In the
> second case unbounded arguments (lambda) will be passed the method
> Optimize of the optimiser (GradientDescent) through packing into
> arma::mat and will be used as initial values. We decide between these
> two cases by, for example, checking whether the type of the first
> unbound argument is a class type or not. Thirdly, we use OptimizerArg
> (or a function with some more understandable name) to mark an argument
> that is not like the ones from the first example but nevertheless
> should be passed to the optimizer constructor. Without using
> OptimizerArg we don’t know whether, e.g., stepSize should be used as
> an initial value packed to arma::mat or as an argument for the
> GradientDescent constructor. HyperParameterTuner will be implemented
> as if optimizer constructors are of the following general form:
> 
>   Optimizer(FunctionType& function, ArgType1 arg1, ArgType2 arg2, ..., ArgTypeN argN,
>       const argumentsCorrespondingToAdditionalArgumentsOfMLAlgorithmConstructor...args);
> 
> Going back to our examples, successProbabilities, maxSamplesSet,
> minSamplesSet from the first example will be passed to the GridSearch
> constructor as “args"; stepSize and maxIterations from the second
> example will be passed to the GradientDescent constructor as “arg1"
> and “arg2". It is possible to implement an optimizer that should use
> both “args” and “arg1”, …, “argN”. A potential instance of that can be
> simulated annealing, which can use “arg1”, …, “argN” as its
> metaparameters  and “args” for specifying ranges and distributions for
> MLAlgorithm hyper-parameters.

It looks like I misunderstood what OptimizerArg is for; thanks for the
clarification.  I had thought that OptimizerArg was specifying arguments
of the machine learning method that should be optimized, whereas Bind
specifies arguments that should not be optimized.

If you like, we can easily enforce a constraint that the Optimize()
function should take no additional parameters than 'arma::mat& iterate'
(looks like only L_BFGS does this, and I've just gone ahead and
deprecated the extra overload that takes maxIterations).

In this way, we could avoid the need for OptimizerArg() entirely, and
possibly make the hyperparameter optimizer easier to understand, by
having the user modify the optimizer after construction of the
hyperparameter optimizer:

// move optimizer type to class template parameter
HyperParameterOptimizer<SoftmaxRegression<>, Accuracy, KFoldCV, SA> h;

h.Optimizer().Tolerance() = 1e-5;
h.Optimizer().MoveCtrlSweep() = 3;

h.Optimize(...);

Let me know what you think.

Another different question, for the examples above, am I understanding
correctly that

 * if we pass, e.g., 'double lambda' to Optimize(), then the search for
   the best lambda value will be over all possible values of lambda?

 * if we pass, e.g., 'std::array<double, 3> lambdas' to Optimize(), then
   the search for the best lambda value will only be over the three
   given values?

If that's correct, then it might be nice to implement some additional
idea such as when the user passes a 'math::Range<double> lambda', the
search will be over all possible values of lambda within the given
range.  (One can simply modify the objective value to be DBL_MAX when
outside the bounds of the given lambda, or we can consider visiting how
optimizers can work in a constrained context.)

> So I guess we can start by implementing CVFunction with the signature
> of the method Evaluate as in FunctionType classes.

Sure, that sounds good to me.

Thank you for the long discussions; I think it's important to get these
details worked out first.  I realize that with the questions I am asking
it would be easy to see this as scope creep of the proposal, but I don't
think that we need to stuff more things into the summer (unless you want
to of course!).  But I do think it is important to at least think
through how these things might be implemented in the future, so that the
hyperparameter optimizer and cross-validator we do get are generic
enough to work with these expected future use cases.

-- 
Ryan Curtin    | "So, it's just you 57 punks against KUNG FU JOE?"
ryan at ratml.org |   - Kung Fu Joe