Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.1
Lecture 10
Programming Shared Memory II
OpenMP (Open Multi-Processing)
IKC-MH.57 Introduction to High Performance and Parallel
Computing at December 22, 2023
Dr. Cem Özdo
˘
gan
Engineering Sciences Department
˙
Izmir Kâtip Çelebi University
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.2
Contents
1 OpenMP: a Standard for Directive Based Parallel Programming
The OpenMP Programming Model
The OpenMP Design Concepts
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.3
OpenMP: a Standard for Directive Based Parallel
Programming I
Although standardization and support for the threaded
APIs has a considerable progress, their use is still
restricted to system programmers as opposed to
application programmers.
One of the r easons for this is that APIs such as Pthreads
are considered to be low-level primitives.
A large class of applications can be efficiently supported
by higher level construct s (or directives)
Which rid the programmer of the mechanics of
manipulating threads.
Such d irective-based languages have standardization
efforts succ eeded in the form of OpenMP.
OpenMP is an API that can be u sed with FORTRAN, C,
and C++ for programming shared address space
machines.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.4
OpenMP: a Standard for Directive Based Parallel
Programming II
Standard API for defining multi-threaded shared-memory
programs.
Allow a programmer to separate a program into
serial regions and parallel regions, rather than
concurrently-executing threads.
NOT parallelize automatically and NOT guarantee
speedup.
General structure:
1 # include <omp . h>
2 main ( ) {
3 i n t var1 , var2 , var3 ;
4 Se r i a l code
5 Beginning of p a r a l l e l s ec ti on . Fork a team o f threads
6 Sp ec if y v a r i a b l e scoping
7 #pragma omp p a r a l l e l p r i v a t e ( var1 , var2 ) shared ( var3 )
8 {
9 P a r a l l e l sectio n executed b y a l l threads
10 A l l threads j o i n master thread and disband
11 }
12 Resume s e r i a l code
13 }
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.5
The OpenMP Programming Model I
OpenMP directives provide support for concu rrency,
synchronization, and data handling while avoiding the
need for explicitly setting up mutexes, condition variables,
data scope, and initialization.
OpenMP directives in C is based on the #pragma compiler
directives.
The direc tive itself consists of a directive name followed by
clauses.
#pragma omp d i r e c t i v e [ clause l i s t ]
OpenMP programs execute serially until they encounter
the parallel directive.
This directive is responsible for creating a group of
threads.
The exact number of threads can be
specified in the directive (num_threads(4)),
set using an environment variable (export
OMP_NUM_THREADS=4 [sh, ksh, bash]),
defined at runtime using OpenMP functions
(omp_set_num_ threads(4)).
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.6
The OpenMP Programming Model II
The main thread that encounters the parallel directive
becomes the master of this group of threads with id 0.
The parallel directive has the following prototype:
#pragma omp p a r a l l e l [ clause l i s t ]
/
*
s tr uc t ur ed block
*
/
Each thread created by this directive executes the
struc tur ed block
specified by the parallel directive (SPMD).
1 i n t main ( ) {
2 omp_set_num_threads ( 4 ) ;
3 / / Do t h i s p a r t i n p a r a l l e l
4 #pragma omp p a r a l l e l
5 {
6 p r i n t f ( " Hello , World ! \ n " ) ;
7 }
8 re t u r n 0 ;
Figure: Creating
four threads for
"printf" fu nction.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.7
The OpenMP Programming Model III
Figure: A sample OpenMP program along with its Pthreads
translation that might be perfor med by an OpenMP compiler.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.8
The OpenMP Programming Model IV
The clause list is used to specify conditional
parallelization, number of threads, and data handling.
1 Conditional Parallelization: The clause
if (scalar expression)
determines whether the parallel
construct results in creation of threads.
2 Degree of Co ncurrency: The clause
num_threads (integer expression) specifies the number of
threads that are created by the parallel directive.
3 Data Handling: The clause pr ivate (variable list) indicates
that the set of variables specified is local to each thread.
Each thread has its own copy of each variable in the list.
The clause firstprivate (variable l ist) is similar to the
private clause, except the values of variables on entering
the threads are initialized to corresponding values before
the parallel directive.
The clause shared (variable list) indicates that all variables
in the list are shared across all t he threads,
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.9
The OpenMP Programming Model V
Figure: Fork-Join Model.
FORK Master thread then creates a team of parallel threads.
Statements in program that ar e enclosed by the parallel
region construc t are exec uted in parallel among the
various threads.
JOIN When the team threads complete the statements in the
parallel region construct, they synchronize and terminate,
leaving only the master thread.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.10
The OpenMP Programming Model VI
Shared Variables.
OpenMP default is s har ed variables. To
make private, need to declare with pragma:
1 # i nc lude <s t d i o . h>
2 # include <omp . h>
3 # include < u n i s t d . h>
4 i n t a , b , x , y , num_threads , thread_num ;
5 i n t mai n ( )
6 {
7 p r i n t f ( " I am i n s e q u e n t i a l p a r t . \ n" ) ;
8 #pragma omp p a r a l l e l num_threads ( 8 ) p r i v a t e ( a ) shared ( b )
9 {
10 num_threads=omp_get_num_threads ( ) ;
11 thread_num=omp_get_thread_num ( ) ;
12 x=thread_num ;
13 / / sleep (1) ;
14 y=x +1;
15 p r i n t f ( " I am openMP p a r e l l i z e d part and thread %d . \ n X and Y
values are %d and %d . \ n " , omp_get_thread_num ( ) , x , y ) ;
16 }
17 p r i n t f ( " I am i n s e q u ential part again . \ n" ) ;
18 }
X and y ar e shared variables. There is a risk of data race.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.11
The OpenMP Programming Model VII
Table: Corre ct and Wrong outp uts of the program.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.12
The OpenMP Programming Model VIII
Using the parallel directive;
1 pragma omp p a r a l l e l i f ( i s _ p a r a l l e l == 1) num_threads ( 8 )
p r i v a t e (a ) shared ( b ) f i r s t p r i v a t e ( c )
2 {
3 /
*
st r uc tu r e d block
*
/
4 }
Here, if the value of the variable is_parallel equals one,
eight threads are created.
Each of these threads gets private copies of variables a
and c, and shares a single value of variable b.
Furthermore, the value of each copy of c is initialized to
the value of c before the parallel directive.
The clause default (shared) implies that, by default, a
variable is shared by all the threads.
The clause default (none) implies that the state of each
variable used in a thread must be explicitly specified.
This is generally recommended , to guard against errors
arising from unintentional concurrent access to shared data.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.13
The OpenMP Programming Model IX
The reduction clause :
Specifies how multiple local copies of a variable at
different threads are combined into a single copy at the
master when threads exit.
The usage of the reduction clause is reduction (operator:
variable list).
This clause performs a reduction on the scalar variables
specified in the list using the operator.
The variables in the list are implicitly specified as being
private to threads.
The operator can be one of
+
*
- & | ^ && ||
Each of the eight threads gets a copy of the variable sum.
1 #pragma omp p a r a l l e l red uc ti o n ( + : sum) num_threads ( 8 )
2 {
3 /
*
compute l o c a l sums here
*
/
4 }
5 /
*
sum here contains sum of a l l lo c a l in s tan c e s o f sums
*
/
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.14
The OpenMP Programming Model X
Parallel Loop:
Compiler calculates loop bounds for each thread directly
from serial sourc e (computation decomposition).
Compiler also manages data partitioning.
Synchronization also automatic (barrier).
Preprocessor calculates loop bounds and divide iterations
among parallel threads.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.15
The OpenMP Programming Model XI
Loop Scheduling in Parallel for pragma
Master thread creates additional threads,
each with a separate execution context.
All variables declared outside for loop are
shared by default, except for loop index
which is private per thread.
Implicit "barrier" sy nchronization at end of
for loop.
Divide index r egions sequentially per thread
Thread 0 gets 0, 1, . . . (max/n) 1
Thread 1 gets
max/n, max/n + 1, . . . 2 (max/n) 1
.
.
.
Example:
#pragma omp p a r a l l e l f o r
f o r ( i =0; i <max; i ++)
zero [ i ] = 0;
Breaks for loop into chunks,
and allocate each to a
separate thread.
if max = 1000 with 2 threads:
assign 0-499 to thread 0, and
500-999 to thread 1.
Programming Shared
Memory II
Dr. Cem Özdo
˘
gan
LOGIK
OpenMP: a Standard
for Directive Based
Parallel Progr amming
The OpenMP Programming
Model
The OpenMP Design
Concepts
10.16
The OpenMP Design Concepts
Load balance, Scheduling overhead, Data locality, Data
sharing, Synchronization.
OpenMP is a c ompiler-based technique to create
concurrent code fr om (mostly) serial code.
OpenMP can enable ( easy) parallelization of loop-based
code with fork-join parallelism.
1 pragm a omp p a r a l l e l
2 pragm a omp p a r a l l e l f o r
3 pragm a omp p a r a l l e l pr i v a t e ( i , x )
4 pragm a omp atomic
5 pragm a omp c r i t i c a l
6 pragm a omp f o r r e d u c ti on (+ : sum)
OpenMP performs comparably to manually-coded
threading.