Avoiding Deadlocks
- The semantics of MPI_Send and MPI_Recv place some restrictions on how we can mix and match send and receive operations.
- Consider the following not complete code in which process 0 sends two messages with different tags to process 1, and process 1 receives them in the reverse order.
- If MPI_Send is implemented using buffering, then this code will run correctly (if sufficient buffer space is available).
- However, if MPI_Send is implemented by blocking until the matching receive has been issued, then neither of the two processes will be able to proceed.
- This code fragment is not safe, as its behavior is implementation dependent.
- It is up to the programmer to ensure that his or her program will run correctly on any MPI implementation.
- The problem in this program can be corrected by matching the order in which the send and receive operations are issued.
- Similar deadlock situations can also occur when a process sends a message to itself.
- Improper use of MPI_Send and MPI_Recv can also lead to deadlocks in situations when each processor needs to send and receive a message in a circular fashion.
- Consider the following not complete code, in which
- process sends a message to process (modulo the number of processes),
- process receives a message from process (module the number of processes).
- When MPI_Send is implemented using buffering, the program will work correctly,
- since every call to MPI_Send will get buffered, allowing the call of the MPI_Recv to be performed, which will transfer the required data.
- However, if MPI_Send blocks until the matching receive has been issued,
- all processes will enter an infinite wait state, waiting for the neighbouring process to issue a MPI_Recv operation.
- Note that the deadlock still remains even when we have only two processes.
- Thus, when pairs of processes need to exchange data, the above method leads to an unsafe program.
- The above example can be made safe, by rewriting it as follows:
- This new implementation partitions the processes into two groups.
- One consists of the odd-numbered processes and the other of the even-numbered processes.
Cem Ozdogan
2010-12-27