Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.1

Lecture 8

Programming Using the

Message- Passing Paradigm IV

MPI: the Message Passing Interfac e; Overlapping, Multicast

IKC-MH.57 Introduction to High Performance and Parallel

Computing at December 08, 2023

Dr. Cem Özdo

gan

Engineering Sciences Department

Izmir Kâtip Çelebi Univer sity

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.2

Contents

1 Overlapping Communication with Computation

Non-Blocking Communication Operations

2 Collective Communication and Computation Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.3

Overlapping Communication with Computation

•

The MPI programs we developed so far used blocking

send and receive operations whenever they needed to

perform point-to-point communication.

•

Recall that a blocking send operation remains blocked

until the message has been copied out of the send buffer

• either into a system buffer at the source process

• or sent to the destination process.

•

Similarly, a blocking receive operation returns only after

the message has been received and copied into the

receive buffer.

•

It will be preferable if we can overlap the transmission of

the data with the computation.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.4

Non-Blocking Communication Operations I

•

In order to overlap communication with computation, MPI

provides a pair of functions for performing non-blocking

send and receive operations.

•

MPI_Isend

=⇒ starts a send operation but does not

complete, that is, it returns before the data is copied out of

the buffer.

•

MPI_Irecv

=⇒ starts a receive operation but returns before

the data has been received and copied into the buffer.

i n t MPI_Isend ( vo id

buf ,

i n t count , MPI_Datatype

datatype , i n t dest , i n t tag , MPI_Comm comm,

MPI_Request

request )

i n t MPI_Irecv ( v oi d

buf ,

i n t count , MPI_Datatype

datatype , i n t source , i n t tag , MPI_Comm comm,

MPI_Request

request )

•

MPI_Isend and MPI_Irecv functions allocate a request

object and return a pointer to it in the request variable.

•

At a later point in the program, a process that has

started a non-blocking send or receive operation must

make sure that this operation has completed before it

proceeds with its computations.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.5

Non-Blocking Communication Operations II

•

This is because a process that has started a non-blocking

send operation may want to

• overwrite the buffer that stores the data

that are being

sent,

• or a process that has started a non-blocking receive

operation may want to use the data

•

To c heck the completion of non-blocking send and receive

operations, MPI provides a pair of functions

1 MPI_Te st =⇒ te sts whether or n ot a non-blocking operation

has ﬁnished

2 MPI_Wait =⇒ waits ( i.e., gets blocked) until a non-blocking

operation actually ﬁnishes.

i n t MPI_Test ( MPI_Request

request , i n t

fl a g , MPI_Status

sta t us )

i n t MPI_Wait ( MPI_Reques t

request , MPI_Status

sta t us )

•

The request object is used as an argument in the

MPI_Test and MPI_Wait functions to identify the operation

whose status we want to query or to wait for its completion.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.6

Non-Blocking Communication Operations III

•

MPI_Test tests whether or not the non-blocking send or

receive operation identiﬁed by its request has ﬁnished.

True It returns ﬂag = true (non-zero value in C) if it is

completed.

• The request object pointed t o by request is deallocated

and request is set to MPI_REQUEST_NULL.

• Also the status object is set to contain information about

the operation.

False It returns ﬂag = false (a zero value in C) if it is not

completed.

• The request is not modiﬁed and the value of the status

object is undeﬁned.

• The MPI_Wait function blocks unti l the non-blocking

operation identiﬁed by request completes.

•

A non-blocking communication operation can be matched

with a corresponding blocking operation.

•

For example, a process can send a message using a

non-blocking send operation and this message can be

received by the other process using a blocking receive

operation.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.7

Non-Blocking Communication Operations IV

•

Avoiding Deadlocks; by using non-blocking communication

operations we can remove most of the deadlocks

associated with their blocking counterparts.

•

For example, the following piece of code is not safe.

1 i n t a [ 1 0] , b[ 1 0 ] , myrank ;

2 MPI_Status sta t u s ;

3 . . .

4 MPI_Comm_rank (MPI_COMM_WORLD, &myrank ) ;

5 i f ( myrank == 0) {

6 MPI_Send( a , 10 , MPI_INT , 1 , 1 , MPI_COMM_WORLD) ;

7 MPI_Send( b , 10 , MPI_INT , 1 , 2 , MPI_COMM_WORLD) ;

8 }

9 else i f ( myrank == 1) {

10 MPI_Recv ( b , 10 , MPI_INT , 0 , 2 , &sta tus , MPI_COMM_WORLD ) ;

11 MPI_Recv ( a , 10 , MPI_INT , 0 , 1 , &sta tus , MPI_COMM_WORLD ) ;

12 }

13 . . .

•

However, if we replace either the send or receive

operations with their non-blocking counterparts, then the

code will be s afe, and will c orrectly run on any MPI

implementation.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.8

Non-Blocking Communication Operations V

•

Safe with non-blocking communication operations;

1 i n t a [ 1 0] , b [ 1 0 ] , myrank ;

2 MPI_Status sta t u s ;

3 MPI_ Request requests [ 2 ] ;

4 . . .

5 MPI_Comm_rank (MPI_COMM_WORLD, &myrank ) ;

6 i f ( myrank == 0) {

7 MPI_Send( a , 10 , MPI_INT , 1 , 1 , MPI_COMM_WORLD) ;

8 MPI_Send( b , 10 , MPI_INT , 1 , 2 , MPI_COMM_WORLD) ;

9 }

10 el se i f ( myrank == 1) {

11 MPI_Irecv ( b , 10 , MPI_INT , 0 , 2 , &requests [ 0 ] ,

MPI_COMM_WORLD) ;

12 MPI_Irecv ( a , 10 , MPI_INT , 0 , 1 , &requests [ 1 ] ,

MPI_COMM_WORLD) ;

13 } / / Non−Bl oc king Communication Operations

14 . . .

•

This example also illustrates that the non-blocking

operations started by any process can ﬁnish in any order

depending on the transmission or reception of the

corresponding messages.

•

For example, the second receive operation will ﬁnish

before the ﬁrst does.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.9

Collective Com munication an d C omputation Operations I

•

MPI provides an extensive set of func tions for performing

commonly used collective communication operations

• All of the collective communication functions provided by

MPI take as an argument a communicat or

that deﬁnes the

group of processes that p ar ticipate in the collective

operation.

• All the processes that belong to this communicator

parti cipate in the operation ,

• and all of them must call

the collective communication

function.

•

Even though collective communication operations do not

act like barriers,

•

act like a virtual s ynchronization step.

•

Barrier; the barrier synchronization operation is

performed in MPI using the MPI_Barrier function.

i n t MPI_Barrier (MPI_Comm comm)

•

The call to MPI_Barrier returns only after all the

processes in the group have called this function.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.10

Broadcast I

•

Broadcast; the one-to-all broadcast operation is

performed in MPI using the

MPI_Bcast function.

i n t MPI_Bcast ( void

buf ,

i n t count , MPI_Datatype datatype ,

i n t source , MPI_Comm comm)

•

MPI_Bcast sends the data stored in the buffer buf of

process source to all the other processes in the group.

•

The data that is broadcast consist of count entries of type

datatype.

•

The data received by each process is stored in the buffer

buf.

•

Since the operations are virtually synchronous, they

do not r equire tags.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.11

Broadcast II

Figure: Diagram for Broa dcast.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.12

Reduction I

•

Reduction; the all-to-one reduction operation is performed

in MPI using the

MPI_Reduce function.

i n t MPI_Reduce ( void

sendbuf ,

voi d

recvbuf ,

i n t count ,

MPI_Datatype datatype , MPI_Op op , i n t t a rge t , MPI_Comm

comm)

• combines the elements stored in the buffer sendbuf o f

each process in the group,

• using the operation speciﬁed in op,

• returns the combined values in the buffer recvbuf of the

process with rank target.

•

Both the sendbuf and recvbuf must have the same

number of count items of type datatype.

•

When count is more than one, then the combine operation

is applied element-wise on each entry of the sequence.

•

Note that all processes must provide a recvbuf array, even

if they are not the target of the reduction operation.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.13

Reduction II

Figure: Diagram for Reduce.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.14

Reduction III

•

MPI provides a list of predeﬁned operations that can be

used to combine the elements stored in sendbuf ( See

Table).

Table: Predeﬁned reduction operations.

Operation Meaning Datatypes

MPI_MAX Maximum C integers and ﬂoating point

MPI_MIN Minimum C integers and ﬂoating point

MPI_SUM Sum C integers and ﬂoating point

MPI_PROD Product C integers and ﬂoating point

MPI_LAND Logical AND C integers

MPI_BAND Bit-wise AND C integers and byte

MPI_LOR Logical OR C integers

MPI_BOR Bit-wise OR C integers and byte

MPI_LXOR Logical XOR C integers

MPI_BXOR Bit-wise XOR C integers and byte

MPI_MAXLOC max-min value-location Data-pairs

MPI_MINLOC min-min value-location Data-pairs

•

MPI also allows programmers to deﬁne their own

operations.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.15

Gather I

•

Gather; the all-to-one gather operation is performed in

MPI using the MPI_Gather function.

i n t MPI_Gather ( vo id

sendbuf ,

i n t sendcount , MPI_Datatype

senddatatype , vo id

recvbuf ,

i n t recvcount , MPI_Datatype

recvdatatype , i n t t a rg e t , MPI_Comm comm)

•

Each proc ess, including the target process, sends the

data stored in the array sendbuf to the target process.

•

As a res ult, the target process receives a total of p buffers

(p is the number of processor s in the communication

comm).

•

The data is s tored in the array recvbuf of the target

process, in a rank order.

•

That is, the data from process with rank i are stored in the

recvbuf starting at location i * sendcount (assuming that

the array recvbuf is of the same type as recvdatatype).

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.16

Gather II

•

The data sent by each process must be of the same size

and type.

•

That is, MPI_Gather must be called with the sendcount

and senddatatype arguments having the same values

at each process.

•

The information about the receive buffer, its length and

type applies only for the target process

and is ignored for

all the other proc esses.

•

The argument recvcount speciﬁes the number of elements

received by each process and not the total number of

elements it receives.

•

So, recvcount must be the same as sendcount and their

datatypes must be matching.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.17

Gather III

Figure: Diagram for Gather.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.18

Gather IV

•

MPI also provides the MPI_Allgather function in which the

data are gathered to all the processes and not only at the

target process.

i n t MPI_Allgather ( vo id

sendbuf ,

i n t sendcount , MPI_Datatype

senddatatype , vo id

recvbuf ,

i n t recvcount , MPI_Datatype

recvdatatype , MPI_Comm comm)

•

The meanings of the various parameters are similar to

those for MPI_Gather;

•

However, each process must now supply a recvbuf array

that will store the gathered data.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.19

Gather V

Figure: Diagram for All_Gather.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.20

Scatter I

•

Scatter; the one-to-all scatter operation is performed in

MPI using the MPI_Scatter function.

i n t MPI_Scatter ( voi d

sendbuf ,

i n t sendcount , MPI_Datatype

senddatatype , vo id

recvbuf ,

i n t recvcount , MPI_Datatype

recvdatatype , i n t source , MPI_Comm comm)

•

The source process sends a different part of the send

buffer sendbuf to each processes, including itself.

•

The data that are received are stored in recvbuf.

•

Process i receives sendcount contiguous elements of type

senddatatype starting from the i * sendcount location of

the sendbuf of the source process (assuming that sendbuf

is of the s ame type as senddatatype).

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.21

Scatter II

Figure: Diagram for Scatter.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.22

All-to-All I

•

Alltoall; the all-to-all communication operation is

performed in MPI by using the MPI_Alltoall function.

i n t M P I _A l l to a l l ( v oi d

sendbuf ,

i n t sendcount , MPI_Datatype

senddatatype , vo id

recvbuf ,

i n t recvcount , MPI_Datatype

recvdatatype , MPI_Comm comm)

•

Each proc ess sends a different portion of the sendbuf

array to each other process, including itself.

•

Each proc ess sends to pr ocess i sendcount contiguous

elements of type senddatatype starting from the i *

sendcount location of its sendbuf array.

•

The data that are received are stored in the recvbuf array.

•

Each proc ess receives from process i recvcount elements

of type recvdatatype and stores them in its recvbuf array

starting at location i * recvcount.

Programming Using the

Message-Passing

Paradigm IV

Dr. Cem Özdo

gan

LOGIK

Overlapping

Communication with

Computation

Non-Blocking

Communication Operations

Collective

Communication and

Computation

Operations

Broadcast

Reduction

Gather

Scatter

All-to-All

8.23

All-to-All II

Figure: Diagram for Alltoall.