[mlpack] Improve mlpack's tree ensemble support - GSoC 2021

Mon Mar 29 01:29:32 EDT 2021

Hey Ryan, thanks for the feedback.

I also agree with you. XGBoost is one of the most widely used ML
algorithms. It would be really great for MLPACK to have it and this will
undoubtedly attract more and more users to MLPACK. This discussion with
you, has changed my perspective and I think we can prioritise XGBoost over
others.

As you mentioned in the previous mail, it will be straightforward to
implement the core XGBoost algorithm provided the flexible implementation
of trees in MLPACK. But, how can we implement optimisations like
cache-aware access and out-of-core computation with armadillo matrices? I
remember I had a chat with you related to this and you slightly mentioned
that it can be done with a simple tweak. Can you please elaborate it a bit?

Best,

Rishabh

On Sun, Mar 28, 2021 at 11:08 PM Ryan Curtin <ryan at ratml.org> wrote:

> On Sun, Mar 28, 2021 at 04:02:40PM +0530, RISHABH GARG wrote:
> > Hello everyone,
> > In continuation to the previous email, I made a small typo there. It is
> > `DecisionTreeRegressor` instead of `RandomForestClassifier`.
> >
> > I gave a deeper thought and I realised that there is so much more that I
> > can do with gradient boosting trees like adding feature importance, warm
> > start, pruning, etc. So, I have decided to drop the idea of XGBoost from
> > the project and I will be investing the remaining time into implementing
> > these extra features.
> >
> > I have been digging deep into the decision tree implementation and I
> > figured out that it has been built very flexibly and regression tree can
> be
> > implemented through it just by adding a new template parameter (which
> will
> > specify whether we want classification or regression) and adding a few
> > overloads of the existing helper functions. So, I thinking there will be
> no
> > need to make an abstract class and regression can be implemented without
> > doing any drastic refactoring to the existing `DecisionTree` class.
> > Although we will need to add a few fitness functions. I will share the
> full
> > technical details of it in my proposal.
>
> Hey Rishabh,
>
> Actually I think the ideas are kind of one in the same---I believe that
> the XGBoost algorithm could be expressed in such a way that all you'd
> need to do would be implement some new splitting strategies and perhaps
> a new gain function.
>
> One of the reasons why we discussed XGBoost specifically is that at the
> current time, it has a lot of name recognition.  So even if it is
> possible to get other algorithms implemented that may even perform
> better, it could be more useful to drive usage to actually provide
> something that we can say is XGBoost.
>
> Anyway, just a thought---hope it's helpful.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "Avoid the planet Earth at all costs."
> ryan at ratml.org |   - The President
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210329/4775f825/attachment-0001.htm>