[mlpack] Introduction of ExecutionPolicy

Thu May 25 17:16:23 EDT 2017

On Mon, May 22, 2017 at 08:24:29PM +0530, Shikhar Bhardwaj wrote:
> Thanks for the replies everyone.
> 
> The primary goal of implementing something like ExecutionPolicy was to make
> writing code more consistent with the rest of the methods in the library
> and if possible introduce an abstraction over the parallelism offered by
> OpenMP.
> 
> For example, instead of introducing "ParallelSGD" (a separate optimizer
> with the same DecomposableFunction and UpdatePolicy policies), we could add
> a template parameter on the existing optimizer SGD, which would select the
> appropriate implementation (parallel or sequential) depending on the
> template parameter passed. This would speed up benchmarking and prototyping
> and keep the number of different methods(in their core logic) minimum in
> number.
> 
> From the above discussion, I could understand that controlling the number
> of threads from ExecutionPolicy may not be a good idea, as OpenMP already
> gives overrides of that decision to the user in the form of the environment
> variables.

Hi Shikhar,

I would really advise avoiding introducing an abstraction over OpenMP.
The reason I say this is that if someone doesn't know mlpack and wants
to contribute, there are some technologies they will have to learn:

 - Armadillo
 - some template metaprogramming
 - parts of Boost
 - the STL
 - maybe OpenMP
 - the various bits of mlpack core functionality that are used all over

Note that all of those, with the exception of the last, are something
that a contributor might know from elsewhere.  When we start to
introduce abstractions over libraries that we are using, then people who
know those libraries now need to also learn our abstractions too,
instead of just using the knowledge they already had (which is
transferable to other situations).  Even if the abstraction turns out to
be easy to learn, there is a mental hurdle to overcome, and also at
first glance the abstraction may not appear to be easy to learn.

In 2010 when we refactored mlpack in full, one idea being floated around
was to wrap Armadillo functionality entirely, so that we could replace
it with another matrix library if, e.g., Armadillo ever died or a better
competitor came along.  (In my view neither has happened, but my
perspective is admittedly biased.)  We ended up deciding against this
approach for a couple of reasons:

 - the maintenance overhead of the abstraction itself; for Armadillo
   that would be a huge amount of code

 - the code we ended up writing would not look like Armadillo or any
   other matrix library, we'd essentially have code that looked like our
   own abstraction only, and this could cause people to avoid
   contributing to or using mlpack because of the unfamiliarity

So I would really suggest that we consider the exact benefits of the
ExecutionPolicy idea as compared to the existing functionality that
OpenMP already gives us through environment variables.  Otherwise we
introduce complexity and maintenance with little gain (other than some
abstraction of OpenMP).

Nothing that's currently in mlpack is parallelized in any other way than
OpenMP so I'm not sure that an abstraction would get us any more
consistency than we already have.

So, if we are going to say 'every mlpack class has to have an
ExecutionPolicy template parameter', then there must be a very good
reason for it---otherwise, we're making contributing and maintenance
significantly harder.  The amount of overhead and learning necessary to
contribute to mlpack is already pretty high, and I want to avoid making
that overhead more.

Let me know what you think.

Thanks,

Ryan

-- 
Ryan Curtin    | "Maybe the next time."
ryan at ratml.org |   - J.G. Ballard