[mlpack] Potential Proposal for GSoC 2021

Germán Lancioni gmansoft at hotmail.com
Mon Mar 29 12:17:22 EDT 2021


Hi Anush,

Sounds good. Hopefully you can re-use the existing decision tree infrastructure in mlpack to build XGB.
If you believe the scope of work is big, you can always aim for the implementation of one tree method (e.g. either greedy or histogram) and not necessarily all of them, but I will let you judge how much time it may take.

Looking forward to the proposal.

Regards,
German

________________________________
From: mlpack <mlpack-bounces at lists.mlpack.org> on behalf of Anush Kini <anushkini at gmail.com>
Sent: Sunday, March 28, 2021 09:50 AM
To: mlpack at lists.mlpack.org <mlpack at lists.mlpack.org>
Subject: Re: [mlpack] Potential Proposal for GSoC 2021

Hi everyone,

This mail is in continuation with the previous discussion on a proposal for GSoC 2021.
I spent the past few days going through the feasibility of implementing multiple algorithms.
I have decided that I will focus all the time on implementing the XGBoost algorithm.

Specifically, I would like to implement a XGBoost Regressor and Classifier. This would involve adding support for XGBoost Trees.
Additionally, I am looking into adding features of pruning, approximate greedy algorithms (To speed up the algorithm for large datasets), and feature importance.

Will consolidate the details in a draft proposal soon.
Any opinions or suggestions are welcome.

Regards,
Anush Kini

On Wed, Mar 17, 2021 at 10:42 AM Anush Kini <anushkini at gmail.com<mailto:anushkini at gmail.com>> wrote:
Hi German,

Thanks for the feedback.
I agree. It is better to commit to completely implement one algorithm than to partially implement many.
Will consider this in my proposal.

Regards,
Anush Kini

On Mon, Mar 15, 2021 at 11:14 PM Germán Lancioni <gmansoft at hotmail.com<mailto:gmansoft at hotmail.com>> wrote:
Hi Anush,

This is a great area to work on. As Omar mentioned, a good scope maximizes and focuses your GSoC effort. If you notice that the available GSoC time is not enough, I would recommend implementing just 1 of the algorithms, e.g. XGB so you can concentrate on the completeness of it instead of stretching your time with 3.

Looking forward to your proposal, very exiting!

Regards,
German

________________________________
From: mlpack <mlpack-bounces at lists.mlpack.org<mailto:mlpack-bounces at lists.mlpack.org>> on behalf of Anush Kini <anushkini at gmail.com<mailto:anushkini at gmail.com>>
Sent: Monday, March 15, 2021 09:14 AM
To: Omar Shrit <omar at shrit.me<mailto:omar at shrit.me>>
Cc: mlpack at lists.mlpack.org<mailto:mlpack at lists.mlpack.org> <mlpack at lists.mlpack.org<mailto:mlpack at lists.mlpack.org>>
Subject: Re: [mlpack] Potential Proposal for GSoC 2021

Hi Omar,

Thank you for the inputs.
What you said makes complete sense to me.

I will look towards prioritising algorithm correctness, detailed documentation and tutorials over implementing multiple features.
Additionally, will highlight proof of concept through sample codes and metrics in my proposal.

Thanks & Regards,
Anush Kini

On Mon, Mar 15, 2021 at 3:43 PM Omar Shrit <omar at shrit.me<mailto:omar at shrit.me>> wrote:
Hello Anush,

XGBoost, LightGBM and CatBoost algorithms will be a great addition for
mlpack this year. Since GSoC is shorter, I would concentrate on these
algorithms, with relative tests and examples.

You need to demonstrate in your proposal, that you have a good knowledge
of decision tree algorithms. As always a good starting point is a proof
of concept with relative benchmarks.

These are my suggestions, hope you find this helpful.

Thanks,

Omar

On 03/14, Anush Kini wrote:
> Hi Mlpack team,
>
> I am Anush Kini. My GitHub handle is Abilityguy
> <https://github.com/Abilityguy>.
>
> I have been getting familiar with the code base for the last couple of
> months.
> I am planning to apply for GSoC 2021 and wanted some feedback on my project
> proposal for the same.
>
> I am building on the 'Improve mlpack's tree ensemble support' idea from the
> wiki.
> I would like to implement XGBoost and LightGBM algorithms. If the schedule
> permits, I will look towards implementing CatBoost too.
>
> Additionally, I would like to work on bringing some additional features to
> the ensemble suite:
> 1. I would like to dip into 2619
> <https://github.com/mlpack/mlpack/issues/2619> which aims to implement
> regression support to Random Forests.
> 2. Implementing methods to get the impurity based feature importance
> similar to the one in scikit-learn
> <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.feature_importances_>
> .
>
> Finally, I plan to supplement any new features implemented with tutorials
> in mlpack/examples <https://github.com/mlpack/examples>.
> Looking forward to hearing your opinions and suggestions.
>
> Thanks & Regards,
> Anush Kini

> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org<mailto:mlpack at lists.mlpack.org>
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210329/78d9989a/attachment.htm>


More information about the mlpack mailing list