[mlpack] Introduction of ExecutionPolicy

Mon May 22 10:54:29 EDT 2017

Thanks for the replies everyone.

The primary goal of implementing something like ExecutionPolicy was to make
writing code more consistent with the rest of the methods in the library
and if possible introduce an abstraction over the parallelism offered by
OpenMP.

For example, instead of introducing "ParallelSGD" (a separate optimizer
with the same DecomposableFunction and UpdatePolicy policies), we could add
a template parameter on the existing optimizer SGD, which would select the
appropriate implementation (parallel or sequential) depending on the
template parameter passed. This would speed up benchmarking and prototyping
and keep the number of different methods(in their core logic) minimum in
number.

>From the above discussion, I could understand that controlling the number
of threads from ExecutionPolicy may not be a good idea, as OpenMP already
gives overrides of that decision to the user in the form of the environment
variables.

-- 
Shikhar Bhardwaj

On Thu, May 18, 2017 at 8:18 PM, Ryan Curtin <ryan at ratml.org> wrote:

> On Wed, May 17, 2017 at 12:25:59AM +0100, Yannis Mentekidis wrote:
> > I agree that "OMP_NUM_THREADS" and "omp_set_num_threads()" is enough to
> > define how many threads are used.
> >
> > The reason I like Shikhar's ExecutionPolicy template idea is that I think
> > it will clean up the implementation. At the same time, it will help users
> > avoid diving down to the nitty-gritty details of the framework we are
> > using, calling OpenMP functions to select parts of the code that should
> not
> > execute in parallel.
> >
> > Cleaner implementation: We can encapsulate the logic that decides how
> many
> > threads will be used, to a single place in the mlpack code, which will be
> > called from everywhere in the codebase.
> > This will provide a common interface ( = template parameter) for all
> > algorithms we want to parallelize, and will make it possible to change
> our
> > logic or add more policies in one place to impact all of our algorithms.
> > Hopefully this might also encourage other developers to parallelize their
> > code (once they are done implementing, debugging, and profiling it).
> >
> > User-friendliness: We decide what the default behavior of an algorithm is
> > (e.g, to use all threads) and a user will not need to provide any
> template
> > parameters - the default case will be used.
> > If the user wants only 1 thread to be used, they can simply pass a
> > non-default value to the template. I would argue this is better and
> cleaner
> > than making the user call OpenMP functions they possibly know nothing
> about.
> >
> > Code simplicity: Our demo might have been a bit hacky (using an enum
> > directly in the OpenMP pragma). However, the implementation could change
> so
> > that sequential_execution_policy and parallel_execution_policy are
> classes,
> > both of which implement a numThreads() function. This way, the code would
> > be something like
> >
> > int nThreads = EP.numThreads(); // mlpack code with our own logic
> > #pragma omp parallel num_threads(nThreads)
> > // for loop here
> >
> > This way, the openMP lines are really as few as possible. We avoid mixing
> > thread calculation logic inside the actual algorithm, and the user does
> not
> > even know we use OpenMP - only that we are parallel.
>
> Ah, ok, thanks for the clarification.
>
> I want to be sure that we aren't reimplementing things that are already
> supported by OpenMP; there are many other environment variables which
> can control execution:
>
> https://gcc.gnu.org/onlinedocs/libgomp/Environment-Variables.html
> (The GOMP_ variables can be ignored since we shouldn't assume libgomp.)
>
> If we only want to control the number of threads being used in the full
> program, then OMP_NUM_THREADS is fine.  Supposing that, e.g., we have
> something like
>
> #pragma omp parallel for
> for (...)
> {
>   #pragma omp parallel for
>   for (...)
>   {
>     ...
>   }
> }
>
> (i.e. nested parallelism), then by default OpenMP will make only the
> outer loop parallel, assuming that the outer loop has enough iterations
> to use all of the threads.  The user could set OMP_NESTED to true if
> they wanted to change that behavior.
>
> It seems to me, then, that the ExecutionPolicy idea could be useful to
> control parallelism at a deeper level.  For instance, there is no way
> with OpenMP that we can parallelize only the inner loop in the example
> above but not the outer loop.  However, it seems possible to me that the
> ExecutionPolicy could provide some support for this.
>
> If that is the goal, my only question is, is there anywhere where
> parallelizing inner loops is a better option than parallelizing the
> outer loop?  Maybe I have not thought it through fully. :)
>
> One other important consideration is that sometimes, code at a lower
> level than mlpack will use OpenMP... OpenBLAS is one instance.  So any
> configuration with ExecutionPolicy at a higher level may be overruled at
> a lower level by OpenBLAS.  With OpenBLAS, all the behavior is expected
> to be controlled with the environment variables.  If we can stick to
> that as much as possible, I think it will keep things easy for our
> users---they may be familiar with OpenMP and its configuration, but they
> are less likely to be familiar with a new ExecutionPolicy
> infrastructure. :)
>
> It's possible that I've misunderstood some of the key functionality that
> ExecutionPolicy would provide that OpenMP environment variables don't.
> If that's the case, let me know. :)
>
> --
> Ryan Curtin    | "I am."
> ryan at ratml.org |   - Joe
>

-- 
Shikhar Bhardwaj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://knife.lugatgt.org/pipermail/mlpack/attachments/20170522/ad241040/attachment.html>