[mlpack] Improve mlpack's tree ensemble support - GSoC 2021

Ryan Curtin ryan at ratml.org
Sun Mar 28 13:38:41 EDT 2021


On Sun, Mar 28, 2021 at 04:02:40PM +0530, RISHABH GARG wrote:
> Hello everyone,
> In continuation to the previous email, I made a small typo there. It is
> `DecisionTreeRegressor` instead of `RandomForestClassifier`.
> 
> I gave a deeper thought and I realised that there is so much more that I
> can do with gradient boosting trees like adding feature importance, warm
> start, pruning, etc. So, I have decided to drop the idea of XGBoost from
> the project and I will be investing the remaining time into implementing
> these extra features.
> 
> I have been digging deep into the decision tree implementation and I
> figured out that it has been built very flexibly and regression tree can be
> implemented through it just by adding a new template parameter (which will
> specify whether we want classification or regression) and adding a few
> overloads of the existing helper functions. So, I thinking there will be no
> need to make an abstract class and regression can be implemented without
> doing any drastic refactoring to the existing `DecisionTree` class.
> Although we will need to add a few fitness functions. I will share the full
> technical details of it in my proposal.

Hey Rishabh,

Actually I think the ideas are kind of one in the same---I believe that
the XGBoost algorithm could be expressed in such a way that all you'd
need to do would be implement some new splitting strategies and perhaps
a new gain function.

One of the reasons why we discussed XGBoost specifically is that at the
current time, it has a lot of name recognition.  So even if it is
possible to get other algorithms implemented that may even perform
better, it could be more useful to drive usage to actually provide
something that we can say is XGBoost.

Anyway, just a thought---hope it's helpful.

Thanks,

Ryan

-- 
Ryan Curtin    | "Avoid the planet Earth at all costs."
ryan at ratml.org |   - The President


More information about the mlpack mailing list