
Beyond OpenMP & MPI:
GPU parallelization
Dr. Cem Özdo
˘
gan
LOGIK
Exploring the GPU
Architecture
Execution and
Programming Models
12.14
Execution and Programming Models VII - Hello World II
Cuda Code:
1 # i nclu de < s t dio . h>
2 # inc l ude <cuda . h>
3 # inc l ude <cuda_runtime . h>
4 # defi n e N 720 / / number of computations
5 # defi n e GRID_D1 20 / / c onst a nts f o r g rid and blo c k siz e s
6 # defi n e GRID_D2 3 / / c o nstan t s f o r gr i d and b loc k size s
7 # defi n e BLOCK_D1 12 / / c onst a nts f o r g r id and blo c k siz e s
8 # defi n e BLOCK_D2 1 / / c onsta n ts f o r grid and bloc k siz e s
9 # defi n e BLOCK_D3 1 / / c onsta n ts f o r grid and bloc k siz e s
10
11 __global__ voi d he l l o ( voi d ) / / t h i s i s the kerne l f unc t i o n ca l l e d f o r each thr e ad
12 {
13 / / CUDA va r i a b l e s { thr e adId x , blo c kIdx , blockDim , gridDim } t o determ ine a unique t hrea d ID
14 i n t myblock = bl o ckIdx . x + b l ockId x . y
*
gridDim . x ; / / i d of the bloc k
15 i n t b loc k s ize = blockDim . x
*
blockDim . y
*
blockDim . z ; / / siz e o f each b lock
16 i n t s ub thread = t h r eadI d x . z
*
( blockDim . x
*
blockDim . y ) + threa dIdx . y
*
blockDim . x +
thre a dIdx . x ; / / i d of t hrea d in a given blo c k
17 i n t idx = myblock
*
blo c ksize + subthread ; / / a ssign o v e r a l l i d / index of the t hrea d
18 i n t nt hr eads = b locks i ze
*
gridDim . x
*
gridDim . y ; / / Tot a l # o f threads
19 i n t chunk =20; / / Vary t h i s v alue t o see t he changes at the o u tput
20 i f ( i dx < chunk | | i d x > nt hreads−chunk ) { / / o n ly p r i n t f i r s t and l a s t chunks o f t hreads
21 i f ( i dx < N) {
22 p r i n t f ( " Hel l o wor ld ! My b l ock index i s (%d,%d ) [ Grid dims=(%d,%d ) ] , 3D−thr e ad
index w i t hin bloc k=(%d,%d,%d ) => threa d index=%d \ n " , b l ockI d x . x , b loc k I dx . y , gridDim .
x , gr idDim . y , thr e adIdx . x , t h r eadI d x . y , t h r eadI d x . z , i dx ) ;
23 }
24 el s e
25 {
26 p r i n t f ( " Hel l o wor ld ! My b l ock index i s (%d,%d ) [ Grid dims=(%d,%d ) ] , 3D−thr e ad
index w i t hin bloc k=(%d,%d,%d ) => threa d index=%d [ ### t h i s t hrea d would not be used
f o r N=%d ###] \ n " , blo c k I dx . x , b lockI d x . y , gridDim . x , gridDim . y , threa dIdx . x , thre a dIdx
. y , t hrea d Idx . z , idx , N) ;
27 }
28 }
29 }