[mlpack] interested in GSoC

Ryan Curtin ryan at ratml.org
Wed Feb 21 10:31:59 EST 2018


On Wed, Feb 21, 2018 at 03:28:52PM +0100, Manos Stergiadis wrote:
>  Hello everyone.
> 
> My name is Manos Stergiadis and I am a post-master trainee in Data Science
> with a background in software engineering and machine learning. I found two
> of the project ideas very interesting as well as relevant to my current
> work.

Hi Manos,

Thanks for getting in touch and welcome to the community.

> I am specifically interested in:
> 
> *1.  String Processing Utilies.*
> 
> I am currently contirbuting to an popular NLP framework called gensim
> <https://github.com/steremma/gensim>. In fact the main feature of the
> latest release was developed by myself and had to do with parsing raw
> wikipedia xml. A lot of my contributions in this project have to do with
> transforming text into numerical vectors in order to use it as input to
> sklearn style machine learning algorithms. The ways to do that range from
> simple ones (OHE which was mentioned in the project idea but also BOW or
> n-gram representations) to more complex ones (word2vec variations
> development by Tomas Mikolov over the period 2011-2013). Even though my
> work has been in Python I also have some background (and past open source
> contributions like this <https://github.com/steremma/stdr_simulator> one)
> in C++ and I am eager to improve on it.

Ah, gensim, I am familiar with that project.  It sounds roughly like
there are a lot of similarities between what you did and what the string
processing utilities project might entail.

If you are looking for some more information on the project, this
previous mailing list post might be helpful:

http://knife.lugatgt.org/pipermail/mlpack/2018-January/003456.html

> *2.  Essential Deep Learning Modules*
> 
> I would be very interested in implementing one of the proposed modules
> (perhaps a BRNN) because I have recently started working through relevant
> courses on Coursera and reading the milestone papers in RNNs and LSTMs. I
> find them extremely interesting. I also have some related experience as in
> 2011 I wrote a neural network from scratch in C and parallelized it in CUDA
> (there were no libraries that I knew of back then). The code - which is
> quite ugly since I was a bachelor student back then - can be found here
> <https://github.com/steremma/digitRecognition>.

It's good to hear you are familiar with these techniques somewhat
already; this can help make you a strong candidate.  My suggestion might
be to take a look at the existing mlpack neural network code, and
perhaps implement the same digit recognition task using it.  There's
also a nice digit recognition example with mlpack for Kaggle:

https://github.com/mlpack/models/tree/master/Kaggle

> If my profile seems interesting I would love to have a discussion on next
> steps like preparing a detailed project plan for one of these project or
> addressing a specific issue with a PR. I would also like to discuss the
> time commitment requirements as part of the GSoC period will overlap with
> my current position's responsibilities.

Sure, I would say a good next step is to become familiar with the
codebase so that you are able to develop a detailed proposal.  As for
the time commitment requirements, the expectation is that a GSoC student
will work the equivalent of a full-time job.  So if your current
position is a full-time job and you would be doing that in tandem with
GSoC, I think that this would not work.  On the other hand, if your
current position only overlaps for a week with GSoC (or something like
this), this is something that we can work around.  I'm happy to talk
further about that if you like.

Thanks!

Ryan

-- 
Ryan Curtin    | "I am a meat popsicle."
ryan at ratml.org |   - Korben Dallas 


More information about the mlpack mailing list