[mlpack] GSOC 2016 Aspirant - Parallel Stochastic Optimisation Methods

Aditya Sharma adisharma075 at gmail.com
Thu Mar 3 12:41:05 EST 2016


Hi Ryan,

I read the Hogwild! paper, which to my understanding, gives theoretical
convergence guarantees for just parallelizing SGD without worrying about
locking, etc in a shared memory model, if the data is large enough and
updates happen atomically.

I also went through your implementations of SGD and mini-batch SGD. I think
it would be fairly easy to OpenMP-ize the current implementations along the
lines of Hogwild.

But, in my opinion, if we just use multi-threading, ml-pack might not be
very attractive for researchers working with truly large-scale data.

I think it would be a good idea if we could add support for GPU processing
to the existing optimizers. I have prior experience working with CUDA and I
think I would be able to add a CUDA version of Hogwild! built on the
existing SGD implementations in ml-pack, over the summer. Such that
researchers with little knowledge of CUDA can directly use ml-pack to speed
their code, without worrying about what's under the hood (much like what
Theano does for python).

Another direction could be to add support for distributed computing, by
linking ml-pack to either the Parameter Server by CMU (
http://parameterserver.org) or integrating the MPI based Parameter Server
that I've built, and parallelizing the existing SGD and mini-batch code in
ml-pack along the lines of Downpour SGD (similar to Tensor FLow and
DistBelief systems developed by Googole).

The distributed implementation would be a bit more complicated, but I think
I should be able to do it over the summer, as that's exactly what the focus
of my research is currently.

I would love to know your thoughts and suggestions.

Thank You.

Best,
Aditya
ᐧ

On Tue, Mar 1, 2016 at 8:15 PM, Ryan Curtin <ryan at ratml.org> wrote:

> On Tue, Mar 01, 2016 at 05:39:57PM +0530, Aditya Sharma wrote:
> > Hello,
> >
> > I'm Aditya Sharma, a senior at Birla Institute of Technology and Science,
> > Pilani, India (BITS Pilani) in the final year of my Bachelor's degree in
> > Electrical and Electronics Engineering and I graduate in August 2016.
> >
> > Owing to my experience in the area, I am really interested in the '
> > <
> https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#parallel-stochastic-optimization-methods
> >Parallel
> > stochastic optimisation methods'
> > <
> https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#parallel-stochastic-optimization-methods
> >
> > project
> > idea listed on the GSOC 2016 page of mlpack and would love to know more
> > details regarding the direction in which the community wants to go with
> > regards to this project.
>
> Hi Aditya,
>
> This is a pretty open-ended project.  The goal, of course, is to
> implement parallel stochastic optimization methods.  The particular
> algorithms that are chosen are less important and they are up to the
> student.  Hogwild! might be an interesting one to look into, for
> instance.
>
> mlpack has a nice interface for optimizers; take a look at the existing
> optimizers in src/mlpack/core/optimizers/ for some examples.  Any
> optimizers implemented for the project should follow the same API, so
> that they work with many mlpack methods, like logistic regression, or
> softmax regression, or NCA, and so forth.
>
> I hope this is helpful; please let me know if I can clarify anything.
>
> Thanks!
>
> Ryan
>
> --
> Ryan Curtin    | "But feel, to the very end, the triumph of being
> ryan at ratml.org | alive!"  - Jöns
>



-- 

Aditya Sharma
Fourth Year Undergraduate
Department of Electrical and Electronics Engineering,
Birla Institute of Technology and Science, Pilani
Rajasthan, India - 333031

WWW: http://adityasharma.space

E-mail: adisharma075 at gmail.com, f2012075 at pilani.bits-pilani.ac.in
LinkedIn: https://www.linkedin.com/in/adityabits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20160303/3ab14e52/attachment-0002.html>


More information about the mlpack mailing list