[mlpack] Proposing a possible project for GSOC 2021

Sun Mar 14 14:53:24 EDT 2021

On Wed, Mar 10, 2021 at 10:50:59PM +0530, Nippun Sharma wrote:
> Hi everyone,
> 
> Here I will be proposing a possible project for GSOC 2021.
> This project is inspired by #2421
> <https://github.com/mlpack/mlpack/issues/2421>.
> 
> Majority of the machine learning libraries (scikit-learn, xgboost,
> catboost) follow a ".fit" and ".predict" type interface. Where the models
> are trained by doing something like "model.fit(X,y)" and predictions are
> made using "model.predict(X,y)" (inside python).
> 
> Unfortunately mlpack does not support such an interface which makes it
> difficult for people to get familiar (people who use mlpack through
> bindings and not c++ directly) with mlpack.
> 
> So I would like to propose this as an idea for GSOC 2021 that would involve
> changing the bindings to support this interface.
> 
> Since I spend most of my time exploring various ML libraries inside python
> (most of which support this interface), it would be an amazing experience
> for me to work on this so that mlpack can also support this interface. I am
> familiar with the mlpack binding system and have worked on #2787
> <https://github.com/mlpack/mlpack/pull/2787> and currently working on #2868
> <https://github.com/mlpack/mlpack/pull/2868>. Apart from these PR's there
> are more PR's that I have worked on but these are closely related to the
> project.
> 
> I request all the mentors to see if this can be a good GSOC project and if
> anyone would like to mentor this.

Hi Nippun,

Personally, I think that this is a good project.  Going through every
binding would be a lot of work, but if you have a good plan and know how
you will do the refactoring to each binding, then I think it can work.
Note that in C++, all mlpack algorithms do already support an interface
that splits training and prediction into two different methods (usually
`Train()` and `Predict()`), but the original binding implementation was
for the command-line, where it makes more sense to have one command-line
program do both prediction and training.

If you put together a proposal for this project, I'd suggest ensuring
you have at least one pretty clear example or proof-of-concept of taking
one binding and refactoring it in the way that you are planning to.  We
would need to make sure that the 'new' interface makes sense from the
command-line, Python, Julia, R, Go... basically every language we
support. :)

There are lots of possible ways to solve this problem, so please feel
free to get creative and come up with different ideas.

I hope this is helpful!

Thanks,

Ryan

-- 
Ryan Curtin    | "Leave your stupid comments in your pocket!"
ryan at ratml.org |   - Mark