[mlpack] Introduction of ExecutionPolicy

Thu May 18 10:48:00 EDT 2017

On Wed, May 17, 2017 at 12:25:59AM +0100, Yannis Mentekidis wrote:
> I agree that "OMP_NUM_THREADS" and "omp_set_num_threads()" is enough to
> define how many threads are used.
> 
> The reason I like Shikhar's ExecutionPolicy template idea is that I think
> it will clean up the implementation. At the same time, it will help users
> avoid diving down to the nitty-gritty details of the framework we are
> using, calling OpenMP functions to select parts of the code that should not
> execute in parallel.
> 
> Cleaner implementation: We can encapsulate the logic that decides how many
> threads will be used, to a single place in the mlpack code, which will be
> called from everywhere in the codebase.
> This will provide a common interface ( = template parameter) for all
> algorithms we want to parallelize, and will make it possible to change our
> logic or add more policies in one place to impact all of our algorithms.
> Hopefully this might also encourage other developers to parallelize their
> code (once they are done implementing, debugging, and profiling it).
> 
> User-friendliness: We decide what the default behavior of an algorithm is
> (e.g, to use all threads) and a user will not need to provide any template
> parameters - the default case will be used.
> If the user wants only 1 thread to be used, they can simply pass a
> non-default value to the template. I would argue this is better and cleaner
> than making the user call OpenMP functions they possibly know nothing about.
> 
> Code simplicity: Our demo might have been a bit hacky (using an enum
> directly in the OpenMP pragma). However, the implementation could change so
> that sequential_execution_policy and parallel_execution_policy are classes,
> both of which implement a numThreads() function. This way, the code would
> be something like
> 
> int nThreads = EP.numThreads(); // mlpack code with our own logic
> #pragma omp parallel num_threads(nThreads)
> // for loop here
> 
> This way, the openMP lines are really as few as possible. We avoid mixing
> thread calculation logic inside the actual algorithm, and the user does not
> even know we use OpenMP - only that we are parallel.

Ah, ok, thanks for the clarification.

I want to be sure that we aren't reimplementing things that are already
supported by OpenMP; there are many other environment variables which
can control execution:

https://gcc.gnu.org/onlinedocs/libgomp/Environment-Variables.html
(The GOMP_ variables can be ignored since we shouldn't assume libgomp.)

If we only want to control the number of threads being used in the full
program, then OMP_NUM_THREADS is fine.  Supposing that, e.g., we have
something like

#pragma omp parallel for
for (...)
{
  #pragma omp parallel for
  for (...)
  {
    ...
  }
}

(i.e. nested parallelism), then by default OpenMP will make only the
outer loop parallel, assuming that the outer loop has enough iterations
to use all of the threads.  The user could set OMP_NESTED to true if
they wanted to change that behavior.

It seems to me, then, that the ExecutionPolicy idea could be useful to
control parallelism at a deeper level.  For instance, there is no way
with OpenMP that we can parallelize only the inner loop in the example
above but not the outer loop.  However, it seems possible to me that the
ExecutionPolicy could provide some support for this.

If that is the goal, my only question is, is there anywhere where
parallelizing inner loops is a better option than parallelizing the
outer loop?  Maybe I have not thought it through fully. :)

One other important consideration is that sometimes, code at a lower
level than mlpack will use OpenMP... OpenBLAS is one instance.  So any
configuration with ExecutionPolicy at a higher level may be overruled at
a lower level by OpenBLAS.  With OpenBLAS, all the behavior is expected
to be controlled with the environment variables.  If we can stick to
that as much as possible, I think it will keep things easy for our
users---they may be familiar with OpenMP and its configuration, but they
are less likely to be familiar with a new ExecutionPolicy
infrastructure. :)

It's possible that I've misunderstood some of the key functionality that
ExecutionPolicy would provide that OpenMP environment variables don't.
If that's the case, let me know. :)

-- 
Ryan Curtin    | "I am."
ryan at ratml.org |   - Joe