[mlpack] A potential project idea for GSOC 2021

RISHABH GARG rishabhgarg108 at gmail.com
Fri Mar 12 07:53:05 EST 2021


Hello Marcus,
I think I didn't make a point very clear in my previous email. Actually
what I found is that there are a couple of libraries like statsmodels and
sktime that are dedicated just for time series forecasting, classification,
regression etc. but I couldn't find any good open source library in C++
that provides easy to use time series models. One C++ library I found is
Alglib but that too is not completely open source. Therefore, I think
mlpack could be one of the first big open source C++ libraries that
provides these methods.

Also, the methods I mentioned in the previous email are elementary and you
can kind of call them as the LEGO blocks of the whole time series analysis.
One thing that I have discovered in forecasting methods is that they are
built progressively on top of each other. For example if we take ARIMA then
it is a combination of an autoregressive model and moving average with a
number of differencing steps i.e. combination of three different methods.
The point I am trying to make is that complex models are built on top of
many simpler models.

Thus, for motivation and what should be the minimum expectations from our
API we can reference the above python libraries as they are quite mature.
But I don't think we can do benchmarking with them since C++ will surely
beat Python in execution time.

Whatever I have mentioned above is just scratching the surface. There are
lots of research going on in the field, but I think we should first start
with the foundations.

Please let me know if I missed something or if anything needs further
insights. Also, If you like, then I can also provide more details related
to implementation and integration with existing codebase or API related
details.

Sorry if the mail got too big. Thanks for reading :)

Regards
Rishabh Garg



On Thu, Mar 11, 2021 at 10:00 PM Marcus Edel <marcus.edel at fu-berlin.de>
wrote:

> Hello Rishabh,
>
> thanks for reaching out and welcome to the community, I like the idea,
> but we should check how mlpack can differentiate from the existing methods;
> is there a recent method that is not available in other frameworks (check
> for
> papers), can we make an existing method faster etc. As you said there are
> frameworks out there that implemented the methods already, so I think it's
> a
> good idea to check what mlpack can bring to the table.
>
> Thanks,
> Marcus
>
> On 10. Mar 2021, at 10:27, RISHABH GARG <rishabhgarg108 at gmail.com> wrote:
>
> Hello everyone,
>
> As most of us know that time series analysis and forecasting methods are
> quite useful in the real world. In most of the practical life datasets, we
> see some or many time dependent features. Thus, they are highly useful and
> powerful methods. Therefore, in my opinion every machine learning / data
> science library should have these methods. But unfortunately, mlpack does
> not have any time series method implemented yet :(
>
> Therefore I would like to propose this as a project idea for GSOC 2021 of
> implementing time series forecasting models. Some of the most famous and
> commonly used forecasting methods are listed below (mostly taken from issue
> #2668) -
>
>    1. Naive
>    2. Seasonal naive
>    3. Seasonal trend loess decomposition
>    4. Holt winters
>    5. Exponential smoothing
>    6. Arima
>    7. Autoregression
>
>
> Over the limited time of GSOC 2021, it might not be possible to implement
> all of these, so I can pick 2-3 methods from this list and implement them.
> Also, these methods will require some basic utilities for their
> implementations so that would also come under the hood of this project.
>
> This would be a really interesting project for me to work on. I have
> recently done a Data Science course in my university where I came across a
> couple of these and I was fascinated at how useful these methods can be in
> real life. I have already done some work implementing the Naive model in
> #2789 and I would love to continue it over the coming summer.
>
> I request all mentors to see if this could be a nice GSOC project and if
> anyone like to mentor this project.
>
> The valuable feedback of anyone from the mlpack community will be
> immensely helpful.
>
> Thanks and regards,
> Rishabh Garg
> _______________________________________________
> mlpack mailing list
> mlpack at lists.mlpack.org
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20210312/192d0dcd/attachment.htm>


More information about the mlpack mailing list