The OpenMP Programming Model
- OpenMP is an API that can be used with FORTRAN, C, and C++ for programming shared address space machines.
- OpenMP directives provide support for concurrency, synchronization, and data handling while avoiding the need for explicitly setting up mutexes, condition variables, data scope, and initialization.
- OpenMP directives in C and C++ are based on the #pragma compiler directives.
- The directive itself consists of a directive name followed by clauses.
- OpenMP programs execute serially until they encounter the parallel directive.
- This directive is responsible for creating a group of threads.
- The exact number of threads can be specified in the directive, set using an environment variable, or at runtime using OpenMP functions.
- The main thread that encounters the parallel directive becomes the master of this group of threads with id 0.
- The parallel directive has the following prototype:
- Each thread created by this directive executes the structured block specified by the parallel directive.
- It is easy to understand the concurrency model of OpenMP when viewed in the context of the corresponding Pthreads translation.
- In Figure 6.9, one possible translation of an OpenMP program to a Pthreads program is shown.
Figure 6.9:
A sample OpenMP program along with its Pthreads translation that might be performed by an OpenMP compiler.
|
- The clause list is used to specify conditional parallelization, number of threads, and data handling.
- Conditional Parallelization: The clause if (scalar expression) determines whether the parallel construct results in creation of threads.
- Only one if clause can be used with a parallel directive.
- Degree of Concurrency: The clause num_threads (integer expression) specifies the number of threads that are created by the parallel directive.
- Data Handling: The clause private (variable list) indicates that the set of variables specified is local to each thread.
- i.e., each thread has its own copy of each variable in the list.
- The clause firstprivate (variable list) is similar to the private clause, except the values of variables on entering the threads are initialized to corresponding values before the parallel directive.
- The clause shared (variable list) indicates that all variables in the list are shared across all the threads,
- i.e., there is only one copy. Special care must be taken while handling these variables by threads to ensure serializability.
Using the parallel directive;
- Here, if the value of the variable is_parallel equals one, eight threads are created.
- Each of these threads gets private copies of variables a and c, and shares a single value of variable b.
- Furthermore, the value of each copy of c is initialized to the value of c before the parallel directive.
- The clause default (shared) implies that, by default, a variable is shared by all the threads.
- The clause default (none) implies that the state of each variable used in a thread must be explicitly specified.
- This is generally recommended, to guard against errors arising from unintentional concurrent access to shared data.
- Just as firstprivate specifies how multiple local copies of a variable are initialized inside a thread,
- the reduction clause specifies how multiple local copies of a variable at different threads are combined into a single copy at the master when threads exit.
- The usage of the reduction clause is reduction (operator: variable list).
- This clause performs a reduction on the scalar variables specified in the list using the operator.
- The variables in the list are implicitly specified as being private to threads.
- The operator can be one of
Using the reduction clause;
- In this example, each of the eight threads gets a copy of the variable sum.
- When the threads exit, the sum of all of these local copies is stored in the single copy of the variable (at the master thread).
- Computing PI using OpenMP directives (presented a Pthreads program for the same problem).
- The omp_get_num_threads() function returns the number of threads in the parallel region
- The omp_get_thread_num() function returns the integer id of each thread (recall that the master thread has an id 0).
- The parallel directive specifies that all variables except npoints, the total number of random points in two dimensions across all threads, are local.
- Furthermore, the directive specifies that there are eight threads, and the value of sum after all threads complete execution is the sum of local values at each thread.
- A for loop generates the required number of random points (in two dimensions) and determines how many of them are within the prescribed circle of unit diameter.
Note that this program is much easier to write in terms of specifying creation and termination of threads compared to the corresponding POSIX threaded program.
Cem Ozdogan
2010-12-27