[mlpack] Updates on Improve Ensemble Support GSoC Project

RISHABH GARG rishabhgarg108 at gmail.com
Wed Jul 14 11:26:25 EDT 2021


Hello everyone,
I hope that everyone is doing well. This is the mid-GSoC update on my
project.

After finishing the bulk of DecisionTreeRegressor, I began planning about
the API and integration of XGBoost code with the DecisionTreeRegressor. I
discussed with my mentors, the pros and cons of different approaches and we
finally came to a conclusion where we decided to make a few changes to the
DecisionTreeRegressor code to make it compatible to be used as an XGBoost
tree. PR #3014 is the one where this refactoring is done. A few minutes
before writing this, I finished these changes. :)

In the DecisionTreeRegressor PR #2905, I created an interface for optimized
FitnessFunctions to speed up the NumericSplit calculation. This can be used
for the XGBoost Loss Functions too. While implementing this interface, I
faced an error in SFINAE which Ryan helped me figure out. In this PR, I git
rebase this branch in a wrong way which messed up the commit history and I
had to create another PR by cherry-picking the relevant commits.

I also created PR #3003 where I started implementing the Sum of Squared
Error (SSE) Loss function for XGBoost.

That was a sneak peek of the past 3 weeks of my project. Next, I am going
to finish the SSE loss function PR and then finally I would be able, to
begin with, the XGBoost implementation. Since the algorithm is huge and
there are many parts to it some of which are necessary and some are good to
have features. So, to simplify things, I am going to start with the bare
minimum features in the first iteration. Then slowly we can iterate one by
one adding more and more features to improve our implementation.

Thanks for reading!

Best,
Rishabh Garg

On Sun, Jun 27, 2021 at 11:28 PM RISHABH GARG <rishabhgarg108 at gmail.com>
wrote:

> Hey everyone!
>
> Till now, I am having a great summer. Great thanks to my mentors Ryan and
> German for their continuous support and help. A huge thanks to the mlpack
> community too. Hoping my GSOCer (Is this even a valid word?🤔) friends are
> also having a great time too.
>
> I have recently achieved the first big milestone of my project i.e. the
> DecisionTreeRegressor. #2905 is complete and under the review process.
> Hopefully, in the next week, we may finally get it merged. It's been three
> months since I have been working on that PR and now I am feeling very great
> and proud of myself that I am able to contribute such a nice method that
> almost every mlpack user will use.
>
> It would be awesome if more people could take a look at it and give their
> valuable feedback. NG Sai, I remember you earlier mentioned some changes
> and I told you that it was not the right time. I think now is the best time
> for those suggestions. :)
>
> Some of the highlights of the work done are:-
> 1. A separate class DecisionTreeRegressor for regression trees.
> 2. An efficient prefix sum binary split finding algorithm that runs in
> O(n).
> 3. With the above optimization, our DecisionTreeRegressor implementation
> is giving good competition to sklearn in terms of both performance and
> accuracy.
> 4. Improved memory consumption for the Tree object by removing the use of
> `arma::vec` from the splitInfo for regression. (There are some thoughts on
> doing some changes to try to improve the same for the existing Decision
> Tree Classifier. But this is gonna be a future endeavour.)
>
> Now, the next part is going to be even more exciting. The next milestone
> is the XGBoost Regressor. There are going to be some crazy optimizations
> involved in this one and It will be fun to implement it. The architecture
> and API have been finalised and from tomorrow onwards, I will begin working
> on it.
>
> Thanks,
> Rishabh Garg
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210714/de684ebd/attachment.htm>


More information about the mlpack mailing list