[mlpack] A potential project idea for GSOC 2021

RISHABH GARG rishabhgarg108 at gmail.com
Sun Mar 14 12:49:24 EDT 2021


Hello Marcus and Ryan, I did a bit of research and found a few pitfalls in
the statsmodels library :-
    1. The algorithms written in it are in-memory algorithms, so it is
incapable of handling large datasets.
    2. It does not have very good documentation.

We can easily beat it in terms of documentation, but I am not sure about
the external memory algorithms. Also, I would like to know if the
algorithms implemented in mlpack are in-memory or external memory?

Best regards,
Rishabh Garg

On Sun, Mar 14, 2021 at 4:01 AM Ryan Curtin <ryan at ratml.org> wrote:

> Hey Rishabh,
>
> Totally agreed---we could definitely provide better support for
> forecasting methods.  But, I do agree with Marcus that there needs to be
> some reason that people would pick mlpack over other frameworks.
> Typically that reason might be speed, or a better algorithmic
> implementation, but there are other possibilities too, of course.
>
> One of the things that's important regardless, though, is API---so a big
> question would be, what do we use to represent time-series data?  Is it
> seamless across mlpack algorithms?  For instance, looking at the way we
> represent time-series data for RNNs could be a starting place.  (And
> even that could be changed if there was a compelling reason).  We'd also
> need to make sure that the way we choose to represent time-series data
> matches with the representations used by other tools that prospective
> users might already be familiar with, so that the barrier to entry for
> them is a bit lower.
>
> I hope this is helpful!
>
> Thanks,
>
> Ryan
>
> On Fri, Mar 12, 2021 at 06:23:05PM +0530, RISHABH GARG wrote:
> > Hello Marcus,
> > I think I didn't make a point very clear in my previous email. Actually
> > what I found is that there are a couple of libraries like statsmodels and
> > sktime that are dedicated just for time series forecasting,
> classification,
> > regression etc. but I couldn't find any good open source library in C++
> > that provides easy to use time series models. One C++ library I found is
> > Alglib but that too is not completely open source. Therefore, I think
> > mlpack could be one of the first big open source C++ libraries that
> > provides these methods.
> >
> > Also, the methods I mentioned in the previous email are elementary and
> you
> > can kind of call them as the LEGO blocks of the whole time series
> analysis.
> > One thing that I have discovered in forecasting methods is that they are
> > built progressively on top of each other. For example if we take ARIMA
> then
> > it is a combination of an autoregressive model and moving average with a
> > number of differencing steps i.e. combination of three different methods.
> > The point I am trying to make is that complex models are built on top of
> > many simpler models.
> >
> > Thus, for motivation and what should be the minimum expectations from our
> > API we can reference the above python libraries as they are quite mature.
> > But I don't think we can do benchmarking with them since C++ will surely
> > beat Python in execution time.
> >
> > Whatever I have mentioned above is just scratching the surface. There are
> > lots of research going on in the field, but I think we should first start
> > with the foundations.
> >
> > Please let me know if I missed something or if anything needs further
> > insights. Also, If you like, then I can also provide more details related
> > to implementation and integration with existing codebase or API related
> > details.
> >
> > Sorry if the mail got too big. Thanks for reading :)
> >
> > Regards
> > Rishabh Garg
> >
> >
> >
> > On Thu, Mar 11, 2021 at 10:00 PM Marcus Edel <marcus.edel at fu-berlin.de>
> > wrote:
> >
> > > Hello Rishabh,
> > >
> > > thanks for reaching out and welcome to the community, I like the idea,
> > > but we should check how mlpack can differentiate from the existing
> methods;
> > > is there a recent method that is not available in other frameworks
> (check
> > > for
> > > papers), can we make an existing method faster etc. As you said there
> are
> > > frameworks out there that implemented the methods already, so I think
> it's
> > > a
> > > good idea to check what mlpack can bring to the table.
> > >
> > > Thanks,
> > > Marcus
> > >
> > > On 10. Mar 2021, at 10:27, RISHABH GARG <rishabhgarg108 at gmail.com>
> wrote:
> > >
> > > Hello everyone,
> > >
> > > As most of us know that time series analysis and forecasting methods
> are
> > > quite useful in the real world. In most of the practical life
> datasets, we
> > > see some or many time dependent features. Thus, they are highly useful
> and
> > > powerful methods. Therefore, in my opinion every machine learning /
> data
> > > science library should have these methods. But unfortunately, mlpack
> does
> > > not have any time series method implemented yet :(
> > >
> > > Therefore I would like to propose this as a project idea for GSOC 2021
> of
> > > implementing time series forecasting models. Some of the most famous
> and
> > > commonly used forecasting methods are listed below (mostly taken from
> issue
> > > #2668) -
> > >
> > >    1. Naive
> > >    2. Seasonal naive
> > >    3. Seasonal trend loess decomposition
> > >    4. Holt winters
> > >    5. Exponential smoothing
> > >    6. Arima
> > >    7. Autoregression
> > >
> > >
> > > Over the limited time of GSOC 2021, it might not be possible to
> implement
> > > all of these, so I can pick 2-3 methods from this list and implement
> them.
> > > Also, these methods will require some basic utilities for their
> > > implementations so that would also come under the hood of this project.
> > >
> > > This would be a really interesting project for me to work on. I have
> > > recently done a Data Science course in my university where I came
> across a
> > > couple of these and I was fascinated at how useful these methods can
> be in
> > > real life. I have already done some work implementing the Naive model
> in
> > > #2789 and I would love to continue it over the coming summer.
> > >
> > > I request all mentors to see if this could be a nice GSOC project and
> if
> > > anyone like to mentor this project.
> > >
> > > The valuable feedback of anyone from the mlpack community will be
> > > immensely helpful.
> > >
> > > Thanks and regards,
> > > Rishabh Garg
> > > _______________________________________________
> > > mlpack mailing list
> > > mlpack at lists.mlpack.org
> > > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
> > >
> > >
> > >
>
> > _______________________________________________
> > mlpack mailing list
> > mlpack at lists.mlpack.org
> > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>
>
> --
> Ryan Curtin    | "You can think about it... but don't do it."
> ryan at ratml.org |   - Sheriff Justice
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210314/5e03baea/attachment.htm>


More information about the mlpack mailing list