[mlpack] Implementation of HOGWILD! in mlpack

Fri Jun 23 09:41:45 EDT 2017

On Wed, Jun 21, 2017 at 02:20:31AM +0530, Shikhar Bhardwaj wrote:
> Hello everyone!
> 
> Yannis and I have been discussing over the past few days about the
> implementation of Parallel SGD using the ideas presented in the HOGWILD!
> paper(https://arxiv.org/pdf/1106.5730.pdf). We also have at hand an
> implementation of the algorithm by the authors of the paper(
> http://i.stanford.edu/hazy/victor/Hogwild/) and a talk given by one of the
> authors about the algorithm (https://www.youtube.com/watch?v=l5JqUvTdZts)
> (the part relevant to the parallel algorithm starts ~26 min).
>
> Overall, we've been able to identify the fact that the algorithm (with its
> free for all update model and simple work sharing description) does not
> contain a lot of complexity. The paper describes a hypergraph structure on
> the loss function to be minimised. However, a sizeable amount of complexity
> goes into the implementation of the description of this loss function
> (which depends on the particular problem being solved by the Optimizer).
> The main concern we have right now is to separate this complexity (of
> defining the loss function) from the Implementation of the Optimizer, while
> maintaining reasonable ease of use. The iterative procedure of the
> algorithm (which is shared across the processors) uses parts of this
> Hypergraph structure.
>
> Do we need to introduce another function interface (SparseFunctionType),
> which allows us to maintain and ask for this structure in the Optimizer?
> The author's implementation is quite difficult to understand. Any insights
> would be helpful.

Hey Shikhar,

It's been a little while since I've thought about Hogwild, so I had to
take a little while to refresh my memory a bit.

I agree that the algorithm is simple, but the paper has a lot more
descriptions and proofs and exposition than I think is completely
necessary (but that's just my opinion).  I think that the hypergraph
induced by the different cost functions is just one way to see things,
and I don't think Hogwild is restricted to only those situations for
which those hypergraphs are known.  Unless I've misunderstood, the
actual hypergraph itself doesn't need to be known.

So under the assumption of some separable cost function

f(x) = sum_{i = 0}^{k} f_i(x),

I think that all we need to hope is that each f_i(x) that we might
sample has sparse support in x (that is, if we take a gradient step
based only on f_i(x), only a few components of x will be updated).

We already have the DecomposableFunctionType abstraction, which I think
will work for Hogwild.  But I think that users should be aware that
Hogwild isn't useful for every problem---only those that are both
decomposable and sparse.  You certainly could use Hogwild in a dense
update situation, it just may not work so well...

One example in mlpack where things should work well is high-dimensional
sparse logistic regression, since the gradient induced by each point in
the dataset should be pretty sparse.

But, it's possible there's something I overlooked, or some reason that
the existing DecomposableFunctionType abstraction wouldn't work... :)

By the way, there is some ongoing discussion about potentially changing
the DecomposableFunctionType abstraction to better support batch
updates: https://github.com/mlpack/mlpack/pull/1034 , but I don't expect
any changes that come out of that discussion to significantly impact
however you choose to implement Hogwild.

-- 
Ryan Curtin    | "I like this game."
ryan at ratml.org |   - Coach