[mlpack] Cross-validation and hyper-parameter tuning infrastructure

Wed Apr 26 02:24:18 EDT 2017

Hi Ryan.

> The key problem, like you said, is that we don't know what AuxType
> should be so we can't call its constructor.  But maybe we can adapt
> things a little bit:
> 
> template<typename AuxType, typename... Args>
> struct Holder /* needs a better name */
> {
>  // This typedef allows us access to the type we need to construct.
>  typedef AuxType Aux;
> 
>  // These are the parameters we will use.
>  std::tuple<Args...> args;
> 
>  Holder(Args... argsIn) { /* put argsIn into args */ }
> };
> 
> Then we could use this in addition with the Bind() class when calling an
> optimizer:
> 
>  std::array<double, 3> param3s = { 1.0, 2.0 4.0 };
>  std::array<double, 2> auxParam1s = { 1.0, 3.0 };
>  std::array<double, 4> auxParam2s = { 4.0, 5.0, 6.0, 8.0 };
>  auto results = tuner.Optimize<GridSearch>(Bind(param1), Bind(param2),
>      param3s, Holder<AuxType>(auxParam1s, auxParam2s));
> 
> Like most of my other code ideas, this is a very basic sketchup, but I
> think it can work.  Let me know what you think or if there is some
> detail I did not think about enough that will make the idea fail. :)

I think this approach is quite implementable. Moreover, we should be able to provide support of Bind for aux parameters:

  std::array<double, 3> param3s = { 1.0, 2.0, 4.0 };
  double auxParam1 = 1.0;
  std::array<double, 4> auxParam2s = { 4.0, 5.0, 6.0, 8.0 };
  auto results = tuner.Optimize<GridSearch>(Bind(param1), Bind(param2),
     param3s, Holder<AuxType>(Bind(auxParam1), auxParam2s));

> Sure; I think maybe we should allow the user to pass in a DatasetInfo
> with the training data and labels, to keep things simple.

Can you clarify a bit more what you mean here?

> If you like, we can easily enforce a constraint that the Optimize()
> function should take no additional parameters than 'arma::mat& iterate'
> (looks like only L_BFGS does this, and I've just gone ahead and
> deprecated the extra overload that takes maxIterations).
> 
> In this way, we could avoid the need for OptimizerArg() entirely, and
> possibly make the hyperparameter optimizer easier to understand, by
> having the user modify the optimizer after construction of the
> hyperparameter optimizer:
> 
> // move optimizer type to class template parameter
> HyperParameterOptimizer<SoftmaxRegression<>, Accuracy, KFoldCV, SA> h;
> 
> h.Optimizer().Tolerance() = 1e-5;
> h.Optimizer().MoveCtrlSweep() = 3;
> 
> h.Optimize(…);

In this approach we need to construct an optimizer before the method Optimize (of HyperParamOptimizer(Tuner) in the example above) is called, and it can be very problematic because of two reasons.
1. We don’t know what FunctionType object (which wraps cross validation) to optimize since it depends on what we pass to the method Optimize (in particular, it depends on whether or not we bind some arguments).
2. In the case of GridSearch we also don’t know sets of values for parameters before calling the method Optimize. Recall that we pass these sets of values during construction of an GridSearch object.

> * if we pass, e.g., 'double lambda' to Optimize(), then the search for
>   the best lambda value will be over all possible values of lambda?

Yes. The search will be influenced by the optimizer (like, e.g., gradient descent) and the provided initial value.

> * if we pass, e.g., 'std::array<double, 3> lambdas' to Optimize(), then
>   the search for the best lambda value will only be over the three
>   given values?

Yes again. It is worth to note that we suppose to use only one optimizer at once. That means we shouldn’t mix the mentioned cases in one call of the method Optimize.

> If that's correct, then it might be nice to implement some additional
> idea such as when the user passes a 'math::Range<double> lambda', the
> search will be over all possible values of lambda within the given
> range.  (One can simply modify the objective value to be DBL_MAX when
> outside the bounds of the given lambda, or we can consider visiting how
> optimizers can work in a constrained context.)

I think this behaviour should be handled by optimizers since we suppose to call them only once. I guess we already have touched this feature in the discussion about simulated annealing.

> Thank you for the long discussions; I think it's important to get these
> details worked out first.  I realize that with the questions I am asking
> it would be easy to see this as scope creep of the proposal, but I don't
> think that we need to stuff more things into the summer (unless you want
> to of course!).  But I do think it is important to at least think
> through how these things might be implemented in the future, so that the
> hyperparameter optimizer and cross-validator we do get are generic
> enough to work with these expected future use cases.

In the light of what we have discussed recently I think it is worth to revisit what and when can be implemented as a GSoC project. The changes that we have discussed in the issue 929 about cross validation module should not demand too much additional time to implement (relatively to what I have described in the proposal). So, the associated part of the timeline can be remained unchanged (weeks 1-7). The changes in hyper-parameter tuning module design that we have discussed in this mail thread are more time-consuming. Firstly, we need to implement a wrapper for cross-validation class: CVFunction. In its Evaluate method we should construct an argument list for calling the method Evaluate of the CV object from bound parameters and arma::mat parameters. Secondly, we have discussed significant changes of the Optimize method signature for HyperParameterTuner. In this method we should sort incoming arguments (bound/non-bound), decide how unbounded arguments should be used (to be passed into the optimizer constructor (like for GridSearch) or to be passed into the optimizer Optimize method (like for GradientDescent)). We can reserve weeks 10 and 11 for implementation of these two features (instead of precision, recall and F1). The week 12 and last days of summer can be used for finishing work on the main part of the proposal and for implementing extra functionality, like support for existing mlpack optimizers (adding gradient to CVFunction, handling additional parameters for optimizers).

Best regards,

Kirill Mishchenko

> On 24 Apr 2017, at 20:29, Ryan Curtin <ryan at ratml.org> wrote:
> 
> On Thu, Apr 20, 2017 at 11:38:32AM +0500, Kirill Mishchenko wrote:
>> Hi Ryan.
>> 
>>> However this makes it unwieldy to optimize over AuxType objects with
>>> multiple parameters.  Maybe a thought then is to pass something a little
>>> more complex:
>>> 
>>> std::tuple<std::array<double, 3>, std::array<double, 2>> t =
>>>   std::make_tuple({ 1.0, 2.0, 4.0 }, { 2.0, 3.0 });
>>> (I think the syntax is probably wrong here, but you get the idea.)
>>> 
>>> What do you think?  Are these ways that could work?  I think the design
>>> space here is pretty complex so we have lots of choices, but many are
>>> not good so it is tricky to pick the best design.
>> 
>> This approach can be problematic. While handling such arguments we
>> need to create AuxType objects that will be passed to MLAlgorithm
>> constructors. But only from such argument types
>> (std::tuple<std::array<double, 3>, std::array<double, 2>> in the
>> example above) we don’t know what AuxType should be. I guess a more
>> simple and safe solution is to add a constructor MLAlgorithm(const
>> DataType&, const PredictionsType&, …, AuxTypeConstructorArgType1, …,
>> AuxTypeConstructorArgTypeN, …) instead of/in addition to
>> MLAlgorithm(const DataType&, const PredictionsType&, …, AuxType, …)
>> where it is required. In this case we will be able to reuse the
>> general approach.
> 
> Hmm, this is a difficult problem.  One of the key benefits of the
> approach of having an AuxType class (which often can be templatized) is
> that it can have different types of arguments.  For instance many mlpack
> methods have a MetricType template parameter, for which the argument
> will usually be LMetric<> (which takes no constructor parameters), but
> it could be MahalanobisDistance, which can take a matrix as input.  This
> means that in many situations, if we have some kind of AuxType object
> that is passed to the constructor, that is a templatized object that
> could have many different signatures.  So adding additional
> constructors could be problematic, since it is not clear which signature
> to give.
> 
> The key problem, like you said, is that we don't know what AuxType
> should be so we can't call its constructor.  But maybe we can adapt
> things a little bit:
> 
> template<typename AuxType, typename... Args>
> struct Holder /* needs a better name */
> {
>  // This typedef allows us access to the type we need to construct.
>  typedef AuxType Aux;
> 
>  // These are the parameters we will use.
>  std::tuple<Args...> args;
> 
>  Holder(Args... argsIn) { /* put argsIn into args */ }
> };
> 
> Then we could use this in addition with the Bind() class when calling an
> optimizer:
> 
>  std::array<double, 3> param3s = { 1.0, 2.0 4.0 };
>  std::array<double, 2> auxParam1s = { 1.0, 3.0 };
>  std::array<double, 4> auxParam2s = { 4.0, 5.0, 6.0, 8.0 };
>  auto results = tuner.Optimize<GridSearch>(Bind(param1), Bind(param2),
>      param3s, Holder<AuxType>(auxParam1s, auxParam2s));
> 
> Like most of my other code ideas, this is a very basic sketchup, but I
> think it can work.  Let me know what you think or if there is some
> detail I did not think about enough that will make the idea fail. :)
> 
>>> How can we handle optimizing over a given range, though?  For instance
>>> we could use, e.g., simulated annealing to optimize for the best lambda
>>> to use for softmax regression, but this could result in any arbitrary
>>> value between 0 and 1.  So we would want to pass in a range, not just a
>>> series of values to test.
>> 
>> I guess simulated annealing is a good choice for such cases. But
>> probably it should be implemented as a separated optimiser  as an
>> extension of what we have already discussed., since grid search is
>> usually understood as iterating over discrete values (see, e.g., grid
>> search implemented in scikit-learn:
>> http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html>
>> <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html>>).
> 
> Right, I see what you mean.  We have this issue where sometimes we are
> optimizing over categorical variables (which is what I would say this
> grid search case is), and sometimes we would like to optimize over
> numeric variables (like what I am understanding of what you wrote below,
> where GradientDescent is used and 'lambda' can take any value).
> 
>>> Also, as a side note, I see in the proposal that the example
>>> hoeffdingTreeOptimizer also optimizes over the DatasetInfo, but I think
>>> this doesn't make too much sense since the DatasetInfo specifies which
>>> dimensions are categorical and which are numeric.  I suppose, it is
>>> possible, by allowing the user to optimize over DatasetInfos, we might
>>> be allowing them to decide whether it is better to have their dimensions
>>> set as categorical or numeric.  Maybe that is what you had in mind?
>> 
>> When I was writing a proposal I didn’t have in my mind the requirement
>> that we really want to utilize existing optimizers and reuse grid
>> search in other algorithms. Rather I proposed the most simple solution
>> (in my humble opinion) to make grid search work for hyper-paramer
>> tuning - by passing to the method Optimize all additional arguments of
>> MLAlgorithm as sets of values even if we want to use only one value.
>> It is why datasetInfo was passed as an array containing one value.
> 
> Sure; I think maybe we should allow the user to pass in a DatasetInfo
> with the training data and labels, to keep things simple.
> 
>>> Assuming that the HyperParameterTuner is able to turn all of the
>>> OptimizerArg()s into a coherent arma::mat to be optimized, we can use
>>> any of the mlpack optimizers.
>> 
>> It’s true that we can potentially reuse any of the mlpack optimisers,
>> but I would like to clarify a bit more how the method Optimize is
>> going to be used and implemented (what I have in my mind). Firstly,
>> the arguments in the beginning of an argument list should correspond
>> to additional arguments of the corresponding MLAlgorithm constructor.
>> For the call
>> 
>>  auto bestParameters =
>>      hoeffdingTreeTuner.Optimize<GridSearch>(Bind(datasetInfo),
>>          Bind(numClasses), Bind(batchTraining), successProbabilities,
>>          maxSamplesSet, Bind(checkInterval), minSamplesSet);
>> 
>> such arguments are the all passed arguments. For the call
>> 
>>  std::tie(bestLambda) =
>>      softmaxTuner.Optimize<GradientDescent>(Bind(numClasses), lambda,
>>          OptimizerArg(stepSize), OptimizerArg(maxIterations));
>> 
>> such arguments are Bind(numClasses) and lambda. Secondly, the
>> mentioned two examples will be handled differently. In the first case
>> unbound arguments (successProbabilities, maxSamplesSet, minSamplesSet)
>> will be passed to the optimiser constructor (GridSearch). In the
>> second case unbounded arguments (lambda) will be passed the method
>> Optimize of the optimiser (GradientDescent) through packing into
>> arma::mat and will be used as initial values. We decide between these
>> two cases by, for example, checking whether the type of the first
>> unbound argument is a class type or not. Thirdly, we use OptimizerArg
>> (or a function with some more understandable name) to mark an argument
>> that is not like the ones from the first example but nevertheless
>> should be passed to the optimizer constructor. Without using
>> OptimizerArg we don’t know whether, e.g., stepSize should be used as
>> an initial value packed to arma::mat or as an argument for the
>> GradientDescent constructor. HyperParameterTuner will be implemented
>> as if optimizer constructors are of the following general form:
>> 
>>  Optimizer(FunctionType& function, ArgType1 arg1, ArgType2 arg2, ..., ArgTypeN argN,
>>      const argumentsCorrespondingToAdditionalArgumentsOfMLAlgorithmConstructor...args);
>> 
>> Going back to our examples, successProbabilities, maxSamplesSet,
>> minSamplesSet from the first example will be passed to the GridSearch
>> constructor as “args"; stepSize and maxIterations from the second
>> example will be passed to the GradientDescent constructor as “arg1"
>> and “arg2". It is possible to implement an optimizer that should use
>> both “args” and “arg1”, …, “argN”. A potential instance of that can be
>> simulated annealing, which can use “arg1”, …, “argN” as its
>> metaparameters  and “args” for specifying ranges and distributions for
>> MLAlgorithm hyper-parameters.
> 
> It looks like I misunderstood what OptimizerArg is for; thanks for the
> clarification.  I had thought that OptimizerArg was specifying arguments
> of the machine learning method that should be optimized, whereas Bind
> specifies arguments that should not be optimized.
> 
> If you like, we can easily enforce a constraint that the Optimize()
> function should take no additional parameters than 'arma::mat& iterate'
> (looks like only L_BFGS does this, and I've just gone ahead and
> deprecated the extra overload that takes maxIterations).
> 
> In this way, we could avoid the need for OptimizerArg() entirely, and
> possibly make the hyperparameter optimizer easier to understand, by
> having the user modify the optimizer after construction of the
> hyperparameter optimizer:
> 
> // move optimizer type to class template parameter
> HyperParameterOptimizer<SoftmaxRegression<>, Accuracy, KFoldCV, SA> h;
> 
> h.Optimizer().Tolerance() = 1e-5;
> h.Optimizer().MoveCtrlSweep() = 3;
> 
> h.Optimize(...);
> 
> Let me know what you think.
> 
> Another different question, for the examples above, am I understanding
> correctly that
> 
> * if we pass, e.g., 'double lambda' to Optimize(), then the search for
>   the best lambda value will be over all possible values of lambda?
> 
> * if we pass, e.g., 'std::array<double, 3> lambdas' to Optimize(), then
>   the search for the best lambda value will only be over the three
>   given values?
> 
> If that's correct, then it might be nice to implement some additional
> idea such as when the user passes a 'math::Range<double> lambda', the
> search will be over all possible values of lambda within the given
> range.  (One can simply modify the objective value to be DBL_MAX when
> outside the bounds of the given lambda, or we can consider visiting how
> optimizers can work in a constrained context.)
> 
>> So I guess we can start by implementing CVFunction with the signature
>> of the method Evaluate as in FunctionType classes.
> 
> Sure, that sounds good to me.
> 
> Thank you for the long discussions; I think it's important to get these
> details worked out first.  I realize that with the questions I am asking
> it would be easy to see this as scope creep of the proposal, but I don't
> think that we need to stuff more things into the summer (unless you want
> to of course!).  But I do think it is important to at least think
> through how these things might be implemented in the future, so that the
> hyperparameter optimizer and cross-validator we do get are generic
> enough to work with these expected future use cases.
> 
> -- 
> Ryan Curtin    | "So, it's just you 57 punks against KUNG FU JOE?"
> ryan at ratml.org <mailto:ryan at ratml.org> |   - Kung Fu Joe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20170426/6cfcfa78/attachment-0001.html>