[mlpack] GSOC 2016 Aspirant - Parallel Stochastic Optimisation Methods

Ryan Curtin ryan at ratml.org
Sun Mar 6 13:40:27 EST 2016


On Sat, Mar 05, 2016 at 07:49:54AM +0530, Aditya Sharma wrote:
> Hi Ryan,
> 
> I agree that just implementing Hogwild! would be a pretty trivial. Adding
> support for distributed computing to ml-pack along with Hogwild! for
> multithreading on each node, on the other hand, could be a much more
> interesting project.
> 
> Your mention of Spark got me thinking, if there were any stable frameworks
> for running C++ programs using the hadoop system. And sure enough after a
> quick internet search, I came across MR4C
> <http://google-opensource.blogspot.in/2015/02/mapreduce-for-c-run-native-code-in.html>
> developed
> by Google. From the example programs that I have seen, it appears to have a
> very clean interface, and would surely help to keep the final code simple.
> 
> I have not explored it in detail, but I think it could be a possible option
> to add support for distributed computing to ml-pack.
> 
> The same thing for CMU's Parameter Server model <http://parameterserver.org>.
> I have personally used it, so I know that it offers really good speed up
> and the methods are pretty straightforward, so the code remains simple.
> 
> Do check these out and tell me if any of them is an area worthy of
> exploration for ml-pack.

Hi Aditya,

Like I said, the big issue with either of these frameworks is that
mlpack does not have any support for distributed matrices and is not
traditionally a distibuted computing library.  So that would be a
prerequisite for even considering a framework like this, unfortunately.

Thanks,

Ryan

-- 
Ryan Curtin    | "I can't believe you like money too.  We should
ryan at ratml.org | hang out."  - Frito



More information about the mlpack mailing list