Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.1
Lecture 6
Programming Using the
Message- Passing Paradigm II
MPI: the Message Passing Interface; Unicast
IKC-MH.57 Introduction to High Performance and Parallel
Computing at November 17, 2023
Dr. Cem Özdo
˘
gan
Engineering Sciences Department
˙
Izmir Kâtip Çelebi University
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.2
Contents
1 MPI: the Message Passing Interface
Starting and Terminating the MPI Library
Communicators
Getting Information
Sending and Receiving Messages
Avoiding Deadlocks
Sending and Receiving Messages Simultaneously
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.3
MPI: the Message Passing Interface I
Many early generation commercial parallel computers
were based on the message-passing architecture due to
its lower cost
relative to shared-address-space
architectures.
Message-passing became the modern-age form of
assembly language, in which every hardware vendor
provided its own library.
Performed very well on its own hardware, but was
incompatible with the parallel computers offer ed by other
vendors.
Many of the differences between the various
vendor-specific message-passing libraries were only
syntactic.
However, often enough there were some serious semantic
differences that required significant re-engineering to port
a message-passing program from one library to another.
The message-passing interface (MPI) was created to
essentially solve this problem.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.4
MPI: the Message Passing Interface II
MPI defines
a standard library for message-passing,
can be used to develop portable message-passing
programs.
The MPI standard defi nes both the syntax as well as the
semantics
of a core set of library routines.
The MPI library contains many routines, but the number of
key concepts is much smaller.
In fact, it is possible to w rite fully-functional
message-passing programs by using only six routines
(see Table).
Table: The minimal set of MPI routines.
MPI_Init Initializes MPI
MPI_Finalize Terminat es MPI
MPI_Comm_size Determine s the number of processes
MPI_Comm_rank Determine s the label of the calling process
MPI_Send Sends a message
MPI_Recv Receives a message
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.5
Starting and Terminating the MPI Library
MPI_Init is called prior to any calls to other MPI routines.
Its purpose is to initialize the mpi environment.
Calling MPI_Init more than once during the execution of a
program will lead to an error.
MPI_Finalize is called at the end of the computation.
It performs various clean-up tasks to terminate the MPI
environment.
No MPI calls may be performed after MPI_Finalize has
been called, not even MPI_Init.
Upon successful execution, MPI_Init and MPI_Finalize
return MPI_SUCCESS; otherwise they return an
implementation-defined er ror c ode.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.6
Communicators I
A key concept used throughout MPI is that of the
communication domain
.
A communication domain is a set of processes that are
allowed to communicate with each other
.
Information about communication domains is stored in
variables of type MPI_Comm, that are called
communicators
.
These communicators are used as arguments to all
message transfer MPI routines.
They uniquely identify the processes participating in the
message transfer operation.
In general, all the processes may need to
communicate with each other.
For this reason, MPI defines a default communicator
called MPI_COMM_WORLD which includes all the
processes involved.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.7
Getting Information I
MPI_Comm_size function = number of processes
MPI_Comm_rank function = label of the calling process
The calling sequences of these routines are as follows:
int MPI_Comm_size(MPI_Comm comm, int
*
size)
int MPI_Comm_rank(MPI_Comm comm, int
*
rank)
The function MPI_Comm_size returns in the variable size
the number of processes that belong to the communicator
comm.
Every process that belongs to a communicator is uniquely
identified by its rank.
The rank of a process is an integer that ranges from zero
up to the size of the communicator minus one.
Up on return, the variable rank stores the rank of the
process.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.8
Sending and Receiving Messages I
The basic functions for sending and receiving mess ages in
MPI are the MPI_Send and MPI_Recv, respectively.
The calling sequences of these routines are as follows:
int MPI_Send(void
*
buf, int count,
MPI_Datatype datatype,
int dest, int tag,
MPI_Comm comm)
int MPI_Recv(void
*
buf, int count,
MPI_Datatype datatype,
int source, int tag,
MPI_Comm comm,
MPI_Status
*
status)
1 MPI_Send sends the data stored in the buffer pointed by
buf.
This buffer consists of consecutive entries of the type
specified by the parameter datatype.
The number of entries in the buffer is given by the
parameter count.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.9
Sending and Receiving Messages II
Table:
Correspondence between the datatypes supp orted by MPI and
those supported by C.
MPI Datatype C Datatype
MPI_CHAR signed char
MPI_SHORT signed short int
MPI_INT signed int
MPI_LONG signed long int
MPI_UNSIGNED_CHAR unsigned char
MPI_UNSIGNED_SHORT unsigned short int
MPI_UNSIGNED unsigned int
MPI_UNSIGNED_LONG unsigned long int
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double
MPI_BYTE
MPI_PACKED
Note that for all C datatypes, an equivalent MPI datatype is
provided.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.10
Sending and Receiving Messages III
MPI allows two additional datatypes that are not part of the
C language.
These are MPI_BYTE and MPI_PACKED.
MPI_BYTE cor responds to a byte (8 bits)
MPI_PACKED corr esponds to a collection of data items that
has been created by packing non-contiguous data
.
Note that the length of the message in MPI_Send, as well
as in other MPI routines, is specified in terms of the
number of entries being sent and not in terms of the
number of bytes.
The destination of the message sent by MPI_Send is
uniquely specified by
dest argument. This argument is the rank of the destination
process in the communication domain specified by the
communicator comm.
comm argument.
Each message has an integer-valued tag associated with
it.
This is used to distinguish different types of messages.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.11
Sending and Receiving Messages IV
2
MPI_Recv receives a message sent by a process whose
rank is given by the source in the communication domain
specified by the comm ar gument.
The tag of the sent message must be that specified by the
tag argument.
If there are many messages with identical tag from the
same process, then any on e of these messages is
received.
MPI allows specification of wild card arguments for both
source and tag.
If source is set to MPI_ANY_SOURCE, then any process o f
the communication domain can be the source of the
message.
Similarly, if tag is set to MPI_ANY_TAG, then messages
with any tag are accepted.
The received message is stored in continuous locations in
the buffer pointed to by buf.
The count and datatype arguments of MPI_Recv are used
to specify the length of the supplied buffer.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.12
Sending and Receiving Messages V
The received message should be of length equal to or less
than this length.
If the received message is larger than the supplied buffer,
then an overflow error will occur, and the routine will return
the error MPI_ERR_TRUNCATE.
After a message has been received, the status variable
can be us ed to get information about the MPI_Recv
operation.
In C, status is stored using the MPI_Status data-structur e.
This is implemented as a structure with three fields, as
follows:
typedef struct MPI_Status {
int MPI_SOURCE;
int MPI_TAG;
int MPI_ERROR;
};
MPI_SOURCE and MPI_TAG store the source and the tag
of the received message.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.13
Sending and Receiving Messages VI
They are particularly useful when MPI_ANY_SOURCE
and MPI_ANY_TAG are used for the source and tag
arguments.
MPI_ERROR stores the error- code of the received
message.
The status argument also returns information about the
length of the received message.
This information is not directly accessible from the status
variable, but it can be retrieved by calling the
MPI_Get_count function.
The calling sequence:
int MPI_Get_count(MPI_Status
*
status,
MPI_Datatype datatype,
int
*
count)
MPI_Get_count takes as arguments the status r eturned
by MPI_Recv and the type of the received data in
datatype, and returns the number of entries that were
actually received in the count variable.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.14
Sending and Receiving Messages VII
The MPI_Recv returns only after the requested message
has been received and copied into the buffer.
That is, MPI_Recv is a blocking receive operation.
However, MPI allows two different implementations for
MPI_Send.
1 MPI_Send returns only after the corresponding MPI_Recv
have been issued and the message has been sent to the
receiver.
2 MPI_Send first copies the message into a buffer and then
returns, without waiting for the corresponding MPI_Recv
to be executed.
MPI programs must be able to run correctly regardless of
which of the two methods is used for implementing
MPI_Send. Such programs are called safe.
In writing s afe MPI programs, sometimes it is helpful to
forget about the alternate implementation of MPI_Send
and just think of it as being a blocking send operation.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.15
Avoiding Deadlocks I
The semantics of MPI_Send and MPI_Recv place some
restrictions on how we can mix and match send and
receive operations.
Consider the following not complete code in which process
0 sends two messages with different tags to process 1,
and process 1 receives them in the reverse order.
1 i n t a [ 1 0 ] , b [ 10 ] , myrank ;
2 MPI_Status s t a tus ;
3 . . .
4 MPI_Comm_rank (MPI_COMM_WORLD, &myrank ) ;
5 i f ( myrank == 0) {
6 MPI_Send ( a , 10 , M PI_INT , 1 , 1 , MPI_COMM_WORLD) ;
7 MPI_Send ( b , 10 , M PI_INT , 1 , 2 , MPI_COMM_WORLD) ;
8 }
9 el se i f ( myrank == 1) {
10 MPI_Recv ( b , 10 , MPI_INT , 0 , 2 , MPI_COMM_WORLD) ;
11 MPI_Recv ( a , 10 , MPI_INT , 0 , 1 , MPI_COMM_WORLD) ;
12 }
13 . . .
If MPI_Send is implemented using buffering , then this
code will run correctly (if sufficient buffer space is
available).
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.16
Avoiding Deadlocks II
However, if MPI_Send is implemented by blocking until the
matching receive has been issued, then neither of the two
processes will be able to proceed.
This code fragment is not safe, as its behavior is
implementation dependent.
The problem in this program can be cor rected by matching
the order in which the s end and receive operations are
issued.
Similar deadlock situations can also occur when a process
sends a message to itself.
Improper use of MPI_Send and MPI_Recv can also lead
to deadlocks in situations when each processor needs to
send and receive a message in a circ ular fashion.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.17
Avoiding Deadlocks III
Consider the following not complete code, in which
process i sends a message to process i + 1 (modulo the
number of processes),
process i receives a message from process i 1 ( module
the number of processes).
1 i n t a [ 1 0 ] , b [ 1 0 ] , npe s , myrank ;
2 MPI_Status s t a tus ;
3 . . .
4 MPI_Comm_size(MPI_COMM_WORLD, &npes ) ;
5 MPI_Comm_rank (MPI_COMM_WORLD, &myrank ) ;
6 MPI_Send ( a , 10 , MPI_INT , ( myrank +1)%npes ,1 , MPI_COMM_WORLD) ;
7 MPI_Recv ( b , 10 , MPI_INT , ( myrank1+npes )%npe s ,1 ,
MPI_COMM_WORLD) ;
8 . . .
When MPI_Send is implemented using buffering, the
program will work corr ectly,
since every call to MPI_Sen d will get buffered, allowing the
call of the MPI_Rec v to be performed, which will transfer
the required data.
However, if MPI_Send blocks until the matching receive
has been issued,
all processes will enter an infinite wait state, waiting for the
neighbouring process to issue a MPI_Recv operation.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.18
Avoiding Deadlocks IV
Note that the deadlock still remains even when we have
only two processes.
Thus, when pairs of processes need to exchange data, the
above method leads to an unsafe program
.
The above example can be made safe, by rewriting:
1 i n t a [ 1 0 ] , b [ 1 0 ] , np , myra nk ;
2 MPI_Status s t a tus ;
3 . . .
4 MPI_Comm_size(MPI_COMM_WORLD, &np ) ;
5 MPI_Comm_rank (MPI_COMM_WORLD, &myrank ) ;
6 i f ( myrank%2 == 1) {
7 MPI_Send ( a , 10 , MPI_INT , ( myrank +1)%np , 1 ,MPI_COMM_WORLD) ;
8 MPI_Recv ( b , 10 , MPI_INT , ( myrank1+np )%np , 1 , MPI_COMM_WORLD) ;
9 }
10 els e {
11 MPI_Recv ( b,1 0 , MPI_INT , ( myrank1+np )%np ,1 ,MPI_COMM_WORLD) ;
12 MPI_Send ( a,10 , MPI_INT , ( myrank +1)%np , 1 ,MPI_COMM_WORLD) ;
13 }
14 . . .
This version partitions the processes into two groups.
One consists of the odd-numbered processes and the
other of the even-numbered processes.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.19
Sending and Receiving Messages Simultaneously I
The above communication pattern appears frequently in
many message-passing programs,
For this reason MPI provides the MPI_Sendrecv function
that both sends and receives a message.
MPI_Sendrecv does not suffer from the circular deadlock
problems of MPI_Send and MPI_Recv.
You can think of MPI_Sendrecv as allowing data to travel
for both send and receive simultaneously.
The calling sequence of MPI_Sendrecv is as the
following:
i n t MPI_Sendrecv ( vo id
*
sendbuf ,
i n t sendcount , MPI_Datatype
senddatatype , i n t dest , i n t sendtag ,
vo id
*
recvbuf ,
i n t recvcount , MPI_Datatype recvdatatype ,
i n t source , i n t recvtag ,
MPI_Comm comm, MPI_Status
*
status )
The arguments of MPI_Sendrecv are essentially the
combination of the arguments of MPI_Send and
MPI_Recv.
Programming Using the
Message-Passing
Paradigm II
Dr. Cem Özdo
˘
gan
LOGIK
MPI: the Message
Passing Interface
Starting and Terminating
the MPI Library
Communicators
Getting Information
Sending and Receiving
Messages
Avoiding Deadlocks
Sending and Receiving
Messages Simultaneously
6.20
Sending and Receiving Messages Simultaneously II
The safe version of our previous example using
MPI_Sendrecv is as the following;
1 i n t a [ 1 0 ] , b [ 1 0] , npes , myrank ;
2 MPI_Status s t a tus ;
3 . . .
4 MPI_Comm_size(MPI_COMM_WORLD, &npes ) ;
5 MPI_Comm_rank (MPI_COMM_WORLD, &myrank ) ;
6 MPI_SendRecv ( a , 10 , MPI_INT , ( myrank+1)%npes , 1 , b , 10 ,
MPI_INT , ( myrank1+np es )%npes , 1 , MPI_COMM_WORLD, &
status ) ;
7 . . .