The OpenMP Programming Model

Figure 5.5: A sample OpenMP program along with its Pthreads translation that might be performed by an OpenMP compiler.
\includegraphics[scale=0.8]{images/openmppthread.ps}
The clause list is used to specify conditional parallelization, number of threads, and data handling.
1 Conditional Parallelization: The clause if (scalar expression) determines whether the parallel construct results in creation of threads.
2 Degree of Concurrency: The clause num_threads (integer expression) specifies the number of threads that are created by the parallel directive.
3 Data Handling: The clause private (variable list) indicates that the set of variables specified is local to each thread.
  • Each thread has its own copy of each variable in the list.
  • The clause firstprivate (variable list) is similar to the private clause, except the values of variables on entering the threads are initialized to corresponding values before the parallel directive.
  • The clause shared (variable list) indicates that all variables in the list are shared across all the threads,
Figure 5.6: Fork-Join Model.
Image forkjoin
FORK Master thread then creates a team of parallel threads.
 Statements in program that are enclosed by the parallel region construct are executed in parallel among the various threads.
JOIN When the team threads complete the statements in the parallel region construct, they synchronize and terminate, leaving only the master thread.
Shared Variables. OpenMP default is shared variables. To make private, need to declare with pragma:
    1 #include <stdio.h>
    2 #include <omp.h>
    3 #include <unistd.h>
    4 int a,b,x,y,num_threads,thread_num;
    5 int main()
    6 {
    7   printf("I am in sequential part.\n");
    8 #pragma omp parallel num_threads (8)  private (a) shared (b)
    9   {
   10     num_threads=omp_get_num_threads();
   11     thread_num=omp_get_thread_num();
   12     x=thread_num;
   13 //     sleep(1);
   14     y=x+1;
   15     printf("I am openMP parellized part and thread %d. \n X and Y values are %d and %d. \n",omp_get_thread_num(),x,y);
   16   }
   17   printf("I am in sequential part again.\n");
   18 }
X and y are shared variables. There is a risk of data race.

Table 5.1: Correct and Wrong outputs of the program.
Image correctoutput Image wrongoutput


Using the parallel directive;
    1 
    2 pragma omp parallel if (is_parallel == 1) num_threads(8) private (a) shared (b) firstprivate(c) 
    3 { 
    4     /* structured block */ 
    5 }

The reduction clause :
    1 
    2 #pragma omp parallel reduction(+: sum) num_threads(8) 
    3 { 
    4 	/* compute local sums here */ 
    5 } 
    6 /* sum here contains sum of all local instances of sums */
Parallel Loop:

Image parallelloop
Loop Scheduling in Parallel for pragma

  • Master thread creates additional threads, each with a separate execution context.
  • All variables declared outside for loop are shared by default, except for loop index which is private per thread.
  • Implicit "barrier" synchronization at end of for loop.
  • Divide index regions sequentially per thread
    • Thread 0 gets $0, 1, \ldots (max/n)-1$
    • Thread 1 gets $max/n, max/n+1, \ldots 2*(max/n)-1$
    • $\vdots$
Image forkjoin1
Example:
#pragma omp parallel for
for (i=0; i<max; i++) zero[i] = 0;
  • Breaks for loop into chunks, and allocate each to a separate thread.
  • if max = 1000 with 2 threads: assign 0-499 to thread 0, and 500-999 to thread 1.