[mlpack] Profiling for parallelization
Ryan Curtin
ryan at ratml.org
Fri Mar 16 10:52:11 EDT 2018
On Fri, Mar 16, 2018 at 01:39:34PM +0530, Nikhil Goel wrote:
> Hello
>
> Thank you for your help! I had a few more questions
> Sequential algorithms like logistic regression are very hard to
> parallelize. While researching for this project, the only way I could find
> was by computing the gradient in parallel of a batch. But from what I
> could see in mlpack, the batch is provided as a matrix. Matrices operations
> are already parallelized in mlpack as openBLAS is parallelized. So I
> needn't worry about such algorithms?
Hi there Nikhil,
You are right, there are some algorithms for which specific
parallelization is not useful and it is better to depend on a parallel
BLAS. For logistic regression in particular, there are a few parallel
optimizers that are implemented; you might consider taking a look at
those also.
> Yes, you're right that we can use environment variables but wouldn't it be
> cleaner and better looking to provide users with an option like 'cores'
> with default value as max number of cores available (Or 1, whichever is
> chosen by you) in algorithms that have been parallelized?
No, in my view this would be an unnecessary addition of an extra API
that users have to learn. If a user learns about OpenMP environment
variables it is useful anywhere OpenMP is used, but if a user instead
learns about some mlpack-specific parallelization API, it is not useful
anywhere except mlpack.
> Also is bagging emsembling implemented in mlpack? It's a pretty popular
> algorithm and I couldn't find it in mlpack. I was wondering if it's needed
> in mlpack?
The only ensembling algorithm we have at the minute is AdaBoost. It may
be useful to add another algorithm.
Thanks,
Ryan
--
Ryan Curtin | "I can't believe you like money too. We should
ryan at ratml.org | hang out." - Frito
More information about the mlpack
mailing list