[mlpack] Potential Proposal for GSoC 2021

Anush Kini anushkini at gmail.com
Sun Mar 28 12:50:37 EDT 2021


Hi everyone,

This mail is in continuation with the previous discussion on a proposal for
GSoC 2021.
I spent the past few days going through the feasibility of implementing
multiple algorithms.
I have decided that I will focus all the time on implementing the XGBoost
algorithm.

Specifically, I would like to implement a XGBoost Regressor and Classifier.
This would involve adding support for XGBoost Trees.
Additionally, I am looking into adding features of pruning, approximate
greedy algorithms (To speed up the algorithm for large datasets), and
feature importance.

Will consolidate the details in a draft proposal soon.
Any opinions or suggestions are welcome.

Regards,
Anush Kini

On Wed, Mar 17, 2021 at 10:42 AM Anush Kini <anushkini at gmail.com> wrote:

> Hi German,
>
> Thanks for the feedback.
> I agree. It is better to commit to completely implement one algorithm than
> to partially implement many.
> Will consider this in my proposal.
>
> Regards,
> Anush Kini
>
> On Mon, Mar 15, 2021 at 11:14 PM Germán Lancioni <gmansoft at hotmail.com>
> wrote:
>
>> Hi Anush,
>>
>> This is a great area to work on. As Omar mentioned, a good scope
>> maximizes and focuses your GSoC effort. If you notice that the available
>> GSoC time is not enough, I would recommend implementing just 1 of the
>> algorithms, e.g. XGB so you can concentrate on the completeness of it
>> instead of stretching your time with 3.
>>
>> Looking forward to your proposal, very exiting!
>>
>> Regards,
>> German
>>
>> ------------------------------
>> *From:* mlpack <mlpack-bounces at lists.mlpack.org> on behalf of Anush Kini
>> <anushkini at gmail.com>
>> *Sent:* Monday, March 15, 2021 09:14 AM
>> *To:* Omar Shrit <omar at shrit.me>
>> *Cc:* mlpack at lists.mlpack.org <mlpack at lists.mlpack.org>
>> *Subject:* Re: [mlpack] Potential Proposal for GSoC 2021
>>
>> Hi Omar,
>>
>> Thank you for the inputs.
>> What you said makes complete sense to me.
>>
>> I will look towards prioritising algorithm correctness, detailed
>> documentation and tutorials over implementing multiple features.
>> Additionally, will highlight proof of concept through sample codes and
>> metrics in my proposal.
>>
>> Thanks & Regards,
>> Anush Kini
>>
>> On Mon, Mar 15, 2021 at 3:43 PM Omar Shrit <omar at shrit.me> wrote:
>>
>> Hello Anush,
>>
>> XGBoost, LightGBM and CatBoost algorithms will be a great addition for
>> mlpack this year. Since GSoC is shorter, I would concentrate on these
>> algorithms, with relative tests and examples.
>>
>> You need to demonstrate in your proposal, that you have a good knowledge
>> of decision tree algorithms. As always a good starting point is a proof
>> of concept with relative benchmarks.
>>
>> These are my suggestions, hope you find this helpful.
>>
>> Thanks,
>>
>> Omar
>>
>> On 03/14, Anush Kini wrote:
>> > Hi Mlpack team,
>> >
>> > I am Anush Kini. My GitHub handle is Abilityguy
>> > <https://github.com/Abilityguy>.
>> >
>> > I have been getting familiar with the code base for the last couple of
>> > months.
>> > I am planning to apply for GSoC 2021 and wanted some feedback on my
>> project
>> > proposal for the same.
>> >
>> > I am building on the 'Improve mlpack's tree ensemble support' idea from
>> the
>> > wiki.
>> > I would like to implement XGBoost and LightGBM algorithms. If the
>> schedule
>> > permits, I will look towards implementing CatBoost too.
>> >
>> > Additionally, I would like to work on bringing some additional features
>> to
>> > the ensemble suite:
>> > 1. I would like to dip into 2619
>> > <https://github.com/mlpack/mlpack/issues/2619> which aims to implement
>> > regression support to Random Forests.
>> > 2. Implementing methods to get the impurity based feature importance
>> > similar to the one in scikit-learn
>> > <
>> https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.feature_importances_
>> >
>> > .
>> >
>> > Finally, I plan to supplement any new features implemented with
>> tutorials
>> > in mlpack/examples <https://github.com/mlpack/examples>.
>> > Looking forward to hearing your opinions and suggestions.
>> >
>> > Thanks & Regards,
>> > Anush Kini
>>
>> > _______________________________________________
>> > mlpack mailing list
>> > mlpack at lists.mlpack.org
>> > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210328/b3b2d81d/attachment.htm>


More information about the mlpack mailing list