Next: Two-Dimensional Matrix-Vector Multiplication
 Up: Introduction to Parallel Algorithms
 Previous: Cannon's Matrix-Matrix Multiplication with
     Contents 
One-Dimensional Matrix-Vector Multiplication
- The following http://siber.cankaya.edu.tr/ParallelComputing/cfiles/odrwmvm.c code segment is for multiplying a dense 
 matrix 
 with a vector 
, i.e., 
. 
 
- One way of performing this multiplication in parallel is to have each process compute different portions of the product-vector 
. In particular, each one of the 
 processes is responsible for computing 
 consecutive elements of 
. 
 
- This algorithm can be implemented in MPI by distributing the matrix 
 in a row-wise fashion, such that each process receives the 
 rows that correspond to the portion of the product-vector 
 it computes. Vector 
 is distributed in a fashion similar to 
.
 
- Code segment shows the MPI program that uses a row-wise distribution of matrix 
. The dimension of the matrices is supplied in the parameter 
, the parameters 
 and 
 point to the locally stored portions of matrix 
 and vector 
, respectively, and the parameter 
 points to the local portion of the output matrix-vector product. This program assumes that 
 is a multiple of the number of processors.
 
 1   RowMatrixVectorMultiply(int n, double *a, double *b, double *x, 
 2                           MPI_Comm comm) 
 3   { 
 4     int i, j; 
 5     int nlocal;        /* Number of locally stored rows of A */ 
 6     double *fb;        /* Will point to a buffer that stores the entire vector b */ 
 7     int npes, myrank; 
 8     MPI_Status status; 
 9 
10     /* Get information about the communicator */ 
11     MPI_Comm_size(comm, &npes); 
12     MPI_Comm_rank(comm, &myrank); 
13 
14     /* Allocate the memory that will store the entire vector b */ 
15     fb = (double *)malloc(n*sizeof(double)); 
16 
17     nlocal = n/npes; 
18 
19     /* Gather the entire vector b on each processor using MPI's ALLGATHER operation */ 
20     MPI_Allgather(b, nlocal, MPI_DOUBLE, fb, nlocal, MPI_DOUBLE, 
21         comm); 
22 
23     /* Perform the matrix-vector multiplication involving the locally stored submatrix */ 
24     for (i=0; i<nlocal; i++) { 
25       x[i] = 0.0; 
26       for (j=0; j<n; j++) 
27         x[i] += a[i*n+j]*fb[j]; 
28     } 
29 
30     free(fb); 
31   }
 
- An alternate way of computing 
 is to parallelize the task of performing the dot-product for each element of 
. That is, for each element 
, of vector 
, all the processes will compute a part of it, and the result will be obtained by adding up these partial dot-products.
 
- This algorithm can be implemented in MPI by distributing matrix 
 in a column-wise fashion. Each process gets 
 consecutive columns of 
, and the elements of vector 
 that correspond to these columns.
 
- Furthermore, at the end of the computation we want the product-vector 
 to be distributed in a fashion similar to vector 
. The following http://siber.cankaya.edu.tr/ParallelComputing/cfiles/odcwmvm.c code segment shows the MPI program that implements this column-wise distribution of the matrix.
 
 1   ColMatrixVectorMultiply(int n, double *a, double *b, double *x, 
 2                           MPI_Comm comm) 
 3   { 
 4     int i, j; 
 5     int nlocal; 
 6     double *px; 
 7     double *fx; 
 8     int npes, myrank; 
 9     MPI_Status status; 
10 
11     /* Get identity and size information from the communicator */ 
12     MPI_Comm_size(comm, &npes); 
13     MPI_Comm_rank(comm, &myrank); 
14 
15     nlocal = n/npes; 
16 
17     /* Allocate memory for arrays storing intermediate results. */ 
18     px = (double *)malloc(n*sizeof(double)); 
19     fx = (double *)malloc(n*sizeof(double)); 
20 
21     /* Compute the partial-dot products that correspond to the local columns of A.*/ 
22     for (i=0; i<n; i++) { 
23       px[i] = 0.0; 
24       for (j=0; j<nlocal; j++) 
25         px[i] += a[i*nlocal+j]*b[j]; 
26     } 
27 
28     /* Sum-up the results by performing an element-wise reduction operation */ 
29     MPI_Reduce(px, fx, n, MPI_DOUBLE, MPI_SUM, 0, comm); 
30 
31     /* Redistribute fx in a fashion similar to that of vector b */ 
32     MPI_Scatter(fx, nlocal, MPI_DOUBLE, x, nlocal, MPI_DOUBLE, 0, 
33         comm); 
34 
35     free(px); free(fx); 
36   }
 
- Comparing these two programs for performing matrix-vector multiplication we see that the row-wise version needs to perform only a MPI_Allgather operation whereas the column-wise program needs to perform a MPI_Reduce and a MPI_Scatter operation.
 
- In general, a row-wise distribution is preferable as it leads to small communication overhead. However, many times, an application needs to compute not only 
 but also 
. In that case, the row-wise distribution can be used to compute 
, but the computation of 
 requires the column-wise distribution (a row-wise distribution of 
 is a column-wise distribution of its transpose 
).
 
- It is much cheaper to use the program for the column-wise distribution than to transpose the matrix and then use the row-wise program.
 
 
 
 
  
 Next: Two-Dimensional Matrix-Vector Multiplication
 Up: Introduction to Parallel Algorithms
 Previous: Cannon's Matrix-Matrix Multiplication with
     Contents 
Cem Ozdogan
2006-12-27