[mlpack] Proposing a possible project for GSOC 2021

Ryan Curtin ryan at ratml.org
Mon Mar 22 20:33:08 EDT 2021


Hi Nippun,

Thanks for writing up your thoughts.  They look pretty good to me.  Some
comments:

> After a survey of the different programming languages that mlpack has
> bindings in, I think that this kind of interface can be supported either by
> using structs (in go, julia) or classes (in python, R).

Don't forget, we also have to find a way to support this from the
command-line interface.  I suppose we can simply have two different
programs---one to train, one to predict.

> I could not completely understand what you meant here by "one output
> parameter". But I have a plan to formalize this idea.

If you try to use the mlpack bindings from Python, you will see that
they return a dict mapping names of output parameters to values.  In
Julia, a tuple of results is returned.  What I am saying is that it
might make sense to force each binding to only return one thing.

> We can categorize each mlpack_method into various categories. Each category
> will have a set of basic functionalities that should be provided to the
> bindings through the member_methods.
> Following are the categories that we can use:
> (These are picked from the mlpack docs page. We can edit this list
> accordingly. Maybe you can suggest some changes?)
> 
> 1) Transformations
> 2) Regression
> 3) Classification
> 4) Clustering
> 5) Preprocessing
> 6) Geometry
> 5) Others

These are already codified in each directory's `CMakeLists.txt`, in
order to generate the documentation correctly.  I think it would be fine
if we wanted to go a little further than that.

> For the "Regression" category we can have some basic member_methods such as
> "fit", "predict", "score", "get_params".
> For the "Classification" category we can have "fit", "predict",
> "predict_proba", "score", "get_params".

I would not recommend imitating the scikit interface directly---it will
cause users to have various expectations that may not apply, as mlpack
algorithms are sometimes implemented differently.

> I am still working on finding an exhaustive list of basic member_methods
> for each category. After that, I will work to create bindings for a single
> mlpack_method as a proof-of-concept.

Why not just use the existing algorithms that we have implemented?  They
are already grouped into categories.

Please keep in mind that the questions highlighted in that issue are
starting places for a design.  The purpose is not to provide specific
questions which require specific answers, but instead to mention some
issues that probably require some deep thought as part of a proposal.
The questions aren't even comprehensive---as you work on your proposal,
be sure to spend time understanding the existing code and system so that
you can have an idea of whether what you're proposing is feasible to do
in the amount of time that you'll have.

-- 
Ryan Curtin    | "Then they attacked a town.  A small town, but a town
ryan at ratml.org | nonetheless.  A town of people.  People who died."


More information about the mlpack mailing list