[mlpack] A potential project idea for GSOC 2021

Ryan Curtin ryan at ratml.org
Sat Mar 13 17:31:01 EST 2021


Hey Rishabh,

Totally agreed---we could definitely provide better support for
forecasting methods.  But, I do agree with Marcus that there needs to be
some reason that people would pick mlpack over other frameworks.
Typically that reason might be speed, or a better algorithmic
implementation, but there are other possibilities too, of course.

One of the things that's important regardless, though, is API---so a big
question would be, what do we use to represent time-series data?  Is it
seamless across mlpack algorithms?  For instance, looking at the way we
represent time-series data for RNNs could be a starting place.  (And
even that could be changed if there was a compelling reason).  We'd also
need to make sure that the way we choose to represent time-series data
matches with the representations used by other tools that prospective
users might already be familiar with, so that the barrier to entry for
them is a bit lower.

I hope this is helpful!

Thanks,

Ryan

On Fri, Mar 12, 2021 at 06:23:05PM +0530, RISHABH GARG wrote:
> Hello Marcus,
> I think I didn't make a point very clear in my previous email. Actually
> what I found is that there are a couple of libraries like statsmodels and
> sktime that are dedicated just for time series forecasting, classification,
> regression etc. but I couldn't find any good open source library in C++
> that provides easy to use time series models. One C++ library I found is
> Alglib but that too is not completely open source. Therefore, I think
> mlpack could be one of the first big open source C++ libraries that
> provides these methods.
> 
> Also, the methods I mentioned in the previous email are elementary and you
> can kind of call them as the LEGO blocks of the whole time series analysis.
> One thing that I have discovered in forecasting methods is that they are
> built progressively on top of each other. For example if we take ARIMA then
> it is a combination of an autoregressive model and moving average with a
> number of differencing steps i.e. combination of three different methods.
> The point I am trying to make is that complex models are built on top of
> many simpler models.
> 
> Thus, for motivation and what should be the minimum expectations from our
> API we can reference the above python libraries as they are quite mature.
> But I don't think we can do benchmarking with them since C++ will surely
> beat Python in execution time.
> 
> Whatever I have mentioned above is just scratching the surface. There are
> lots of research going on in the field, but I think we should first start
> with the foundations.
> 
> Please let me know if I missed something or if anything needs further
> insights. Also, If you like, then I can also provide more details related
> to implementation and integration with existing codebase or API related
> details.
> 
> Sorry if the mail got too big. Thanks for reading :)
> 
> Regards
> Rishabh Garg
> 
> 
> 
> On Thu, Mar 11, 2021 at 10:00 PM Marcus Edel <marcus.edel at fu-berlin.de>
> wrote:
> 
> > Hello Rishabh,
> >
> > thanks for reaching out and welcome to the community, I like the idea,
> > but we should check how mlpack can differentiate from the existing methods;
> > is there a recent method that is not available in other frameworks (check
> > for
> > papers), can we make an existing method faster etc. As you said there are
> > frameworks out there that implemented the methods already, so I think it's
> > a
> > good idea to check what mlpack can bring to the table.
> >
> > Thanks,
> > Marcus
> >
> > On 10. Mar 2021, at 10:27, RISHABH GARG <rishabhgarg108 at gmail.com> wrote:
> >
> > Hello everyone,
> >
> > As most of us know that time series analysis and forecasting methods are
> > quite useful in the real world. In most of the practical life datasets, we
> > see some or many time dependent features. Thus, they are highly useful and
> > powerful methods. Therefore, in my opinion every machine learning / data
> > science library should have these methods. But unfortunately, mlpack does
> > not have any time series method implemented yet :(
> >
> > Therefore I would like to propose this as a project idea for GSOC 2021 of
> > implementing time series forecasting models. Some of the most famous and
> > commonly used forecasting methods are listed below (mostly taken from issue
> > #2668) -
> >
> >    1. Naive
> >    2. Seasonal naive
> >    3. Seasonal trend loess decomposition
> >    4. Holt winters
> >    5. Exponential smoothing
> >    6. Arima
> >    7. Autoregression
> >
> >
> > Over the limited time of GSOC 2021, it might not be possible to implement
> > all of these, so I can pick 2-3 methods from this list and implement them.
> > Also, these methods will require some basic utilities for their
> > implementations so that would also come under the hood of this project.
> >
> > This would be a really interesting project for me to work on. I have
> > recently done a Data Science course in my university where I came across a
> > couple of these and I was fascinated at how useful these methods can be in
> > real life. I have already done some work implementing the Naive model in
> > #2789 and I would love to continue it over the coming summer.
> >
> > I request all mentors to see if this could be a nice GSOC project and if
> > anyone like to mentor this project.
> >
> > The valuable feedback of anyone from the mlpack community will be
> > immensely helpful.
> >
> > Thanks and regards,
> > Rishabh Garg
> > _______________________________________________
> > mlpack mailing list
> > mlpack at lists.mlpack.org
> > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
> >
> >
> >

> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack


-- 
Ryan Curtin    | "You can think about it... but don't do it."
ryan at ratml.org |   - Sheriff Justice


More information about the mlpack mailing list