[mlpack] GSOC 2016 Aspirant - Parallel Stochastic Optimisation Methods

Fri Mar 4 21:19:54 EST 2016

Hi Ryan,

I agree that just implementing Hogwild! would be a pretty trivial. Adding
support for distributed computing to ml-pack along with Hogwild! for
multithreading on each node, on the other hand, could be a much more
interesting project.

Your mention of Spark got me thinking, if there were any stable frameworks
for running C++ programs using the hadoop system. And sure enough after a
quick internet search, I came across MR4C
<http://google-opensource.blogspot.in/2015/02/mapreduce-for-c-run-native-code-in.html>
developed
by Google. From the example programs that I have seen, it appears to have a
very clean interface, and would surely help to keep the final code simple.

I have not explored it in detail, but I think it could be a possible option
to add support for distributed computing to ml-pack.

The same thing for CMU's Parameter Server model <http://parameterserver.org>.
I have personally used it, so I know that it offers really good speed up
and the methods are pretty straightforward, so the code remains simple.

Do check these out and tell me if any of them is an area worthy of
exploration for ml-pack.

Thank You!

Best,
Aditya
ᐧ

On Sat, Mar 5, 2016 at 7:16 AM, Ryan Curtin <ryan at ratml.org> wrote:

> On Thu, Mar 03, 2016 at 11:11:05PM +0530, Aditya Sharma wrote:
> > Hi Ryan,
> >
> > I read the Hogwild! paper, which to my understanding, gives theoretical
> > convergence guarantees for just parallelizing SGD without worrying about
> > locking, etc in a shared memory model, if the data is large enough and
> > updates happen atomically.
> >
> > I also went through your implementations of SGD and mini-batch SGD. I
> think
> > it would be fairly easy to OpenMP-ize the current implementations along
> the
> > lines of Hogwild.
> >
> > But, in my opinion, if we just use multi-threading, ml-pack might not be
> > very attractive for researchers working with truly large-scale data.
> >
> > I think it would be a good idea if we could add support for GPU
> processing
> > to the existing optimizers. I have prior experience working with CUDA
> and I
> > think I would be able to add a CUDA version of Hogwild! built on the
> > existing SGD implementations in ml-pack, over the summer. Such that
> > researchers with little knowledge of CUDA can directly use ml-pack to
> speed
> > their code, without worrying about what's under the hood (much like what
> > Theano does for python).
> >
> > Another direction could be to add support for distributed computing, by
> > linking ml-pack to either the Parameter Server by CMU (
> > http://parameterserver.org) or integrating the MPI based Parameter
> Server
> > that I've built, and parallelizing the existing SGD and mini-batch code
> in
> > ml-pack along the lines of Downpour SGD (similar to Tensor FLow and
> > DistBelief systems developed by Googole).
> >
> > The distributed implementation would be a bit more complicated, but I
> think
> > I should be able to do it over the summer, as that's exactly what the
> focus
> > of my research is currently.
> >
> > I would love to know your thoughts and suggestions.
>
> Hi Aditya,
>
> We could definitely use OpenMP on the current SGD implementations, but
> we would have to be careful to ensure that this wouldn't modify the
> result.  Hogwild! is almost certainly easiest to implement in OpenMP.
> (Actually it's sufficiently simple that just a Hogwild! implementation
> would be too little work for a GSoC project I think, but it could
> definitely be a component of a larger project.)
>
> The problem with CUDA is that you will have to be shipping the data back
> and forth from the GPU every iteration, because the optimizer is
> separate from the function it is optimizing.  The optimizer only makes
> calls to function.Evaluate() and function.Gradient(), and it's not
> reasonable to expect that every Evaluate() and Gradient() call will be
> written for GPUs.  This means that the only step that you could put on a
> GPU would realistically be the update step, and given the huge overhead
> of the communication cost, I'm doubtful that we'd see any speedup.
>
> It's a very hard challenge to support GPUs while still keeping the
> algorithms simple enough to be maintained.
>
> I think the same thing is true for MPI; the code written for MPI can end
> up being very complex and hard to maintain.  Here we have another
> problem: mlpack has no support for distributed matrices or distributed
> problems of any form (and in general isn't aimed at that use case; there
> are maybe better tools, like Spark for instance).
>
> I don't mean to say these ideas are impossible: what you've suggested is
> a set of really great improvements and ideas.  But we would need to do a
> lot of thinking to figure out how they would fit into the core
> abstractions of mlpack, how we can preserve the basic interface we have
> now, and (maybe most importantly) how we can keep the code simple.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "Reprogram him!"
> ryan at ratml.org |   - Master Control Program
>

-- 

Aditya Sharma
Fourth Year Undergraduate
Department of Electrical and Electronics Engineering,
Birla Institute of Technology and Science, Pilani
Rajasthan, India - 333031

WWW: http://adityasharma.space

E-mail: adisharma075 at gmail.com, f2012075 at pilani.bits-pilani.ac.in
LinkedIn: https://www.linkedin.com/in/adityabits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20160305/8f109a6a/attachment-0002.html>