[mlpack] Profiling for parallelization

Fri Mar 16 10:52:11 EDT 2018

On Fri, Mar 16, 2018 at 01:39:34PM +0530, Nikhil Goel wrote:
> Hello
> 
> Thank you for your help! I had a few more questions
> Sequential algorithms like logistic regression are very hard to
> parallelize. While researching for this project, the only way I could find
> was by  computing the gradient in parallel of a batch. But from what I
> could see in mlpack, the batch is provided as a matrix. Matrices operations
> are already parallelized in mlpack as openBLAS is parallelized. So I
> needn't worry about such algorithms?

Hi there Nikhil,

You are right, there are some algorithms for which specific
parallelization is not useful and it is better to depend on a parallel
BLAS.  For logistic regression in particular, there are a few parallel
optimizers that are implemented; you might consider taking a look at
those also.

> Yes, you're right that we can use environment variables but wouldn't it be
> cleaner and better looking to provide users with an option like 'cores'
> with default value as max number of cores available (Or 1, whichever is
> chosen by you) in algorithms that have been parallelized?

No, in my view this would be an unnecessary addition of an extra API
that users have to learn.  If a user learns about OpenMP environment
variables it is useful anywhere OpenMP is used, but if a user instead
learns about some mlpack-specific parallelization API, it is not useful
anywhere except mlpack.

> Also is bagging emsembling implemented in mlpack? It's a pretty popular
> algorithm and I couldn't find it in mlpack. I was wondering if it's needed
> in mlpack?

The only ensembling algorithm we have at the minute is AdaBoost.  It may
be useful to add another algorithm.

Thanks,

Ryan

-- 
Ryan Curtin    | "I can't believe you like money too.  We should
ryan at ratml.org | hang out."  - Frito