[mlpack] Profiling for parallelization

Ryan Curtin ryan at ratml.org
Mon Mar 12 10:27:09 EDT 2018


On Mon, Mar 12, 2018 at 06:51:20PM +0530, Nikhil Goel wrote:
> Hello
> 
> I am Nikhil Goel (github:nikhilgoel1997), a pre-final year student from
> Birla Institute of Technology and Science, Pilani (BITS, Pilani). I've been
> contributing to mlpack for the past month and have become familiar with the
> codebase. In the past I've done projects on Sentiment analysis, Image
> classification and Financial signal processing using machine learning.
> I wanted to do a project which would help me improve my understanding of
> multiple algorithms and Profiling for parallelization is ideal for that! In
> that direction I've studied and grown familiar with the openMP library.
> While I want to tackle every algorithm that is implemented in mlpack and
> find a way to parallelize it or have a good explanation as to why it is not
> parallelizable, doing it properly by 27th (Last day to submit the proposal)
> might be a little difficult. Since the project description is vague, what
> would be a good number of algorithms for which proper description on how to
> parallelize is given in the proposal for a strong proposal. (I believe
> there are 5 algorithms that have been parallelized in mlpack and till now,
> I've found how to parallelize other algorithms like knn, logistic
> regression, naive bayes, pca)
> As for the API, I think having an additional option in the algorithm for
> using multi-core can be given to the user. Is this a good idea?
> 
> I would love to hear suggestions from the mentors to understand if they
> feel that I'm approaching this project the correct way.

Hi Nikhil,

Thanks for getting in touch.  It's tough to say what a good number of
algorithms to parallelize would be reasonable, because some algorithms
will be harder to parallelize than others.  What I would suggest is that
you take a look at some algorithms that are interesting to you, estimate
how long it might take to OpenMP-ize them, and then use this to
structure your proposal.  Don't worry if the timeline isn't exactly
accurate; we know that sometimes it is hard to estimate, and your mentor
(which in this case I guess will be me) will work with you to
restructure the timeline and scope of work as needed.  But you should
still aim to try and get it as close to reality as you think you can.

For the API, with OpenMP I think no changes are necessary.  The user can
set their desired number of cores with environment variables like
OMP_NUM_THREADS and other variables.

I hope this helps; let me know if I can clarify anything.

-- 
Ryan Curtin    | "Why is it that the landscape is moving... but the boat is
ryan at ratml.org | still?"  - Train Driver


More information about the mlpack mailing list