[mlpack] A potential project idea for GSOC 2021

Marcus Edel marcus.edel at fu-berlin.de
Mon Mar 15 11:42:58 EDT 2021


Personally, I would check some recent publications first and see if there is a
method out there that performs better than other methods; maybe it makes
sense to add that to mlpack as part of the project; if it uses the basic building
blocks you mentioned, great, we can use that as a foundation to think about
the interface that actually reuses some of the building blocks to build the more
complex method.

> On 14. Mar 2021, at 14:49, Ryan Curtin <ryan at ratml.org> wrote:
> 
> On Sun, Mar 14, 2021 at 10:19:24PM +0530, RISHABH GARG wrote:
>> Hello Marcus and Ryan, I did a bit of research and found a few pitfalls in
>> the statsmodels library :-
>>    1. The algorithms written in it are in-memory algorithms, so it is
>> incapable of handling large datasets.
>>    2. It does not have very good documentation.
>> 
>> We can easily beat it in terms of documentation, but I am not sure about
>> the external memory algorithms. Also, I would like to know if the
>> algorithms implemented in mlpack are in-memory or external memory?
> 
> All mlpack models use Armadillo, which only supports in-memory
> computation, but the algorithms themselves are implemented in a generic
> way, so with a little bit of work and hacking it is possible to use
> external memory for mlpack computations (but I think nobody is really
> doing this).
> 
> -- 
> Ryan Curtin    | "Hey, tell me the truth... are we still in the
> ryan at ratml.org | game?" - The Chinese Waiter



More information about the mlpack mailing list