Why Threads?

The primary motivation for using threads is to realize potential program performance gains.
When compared to the cost of creating and managing a process, a thread can be created with much less OS overhead.
Managing threads requires fewer system resources than managing processes.
Threaded programming models offer significant advantages over message-passing programming models along with some disadvantages as well.
Software Portability;
- Threaded applications can be developed on serial machines and run on parallel machines without any changes.
- This ability to migrate programs between diverse architectural platforms is a very significant advantage of threaded APIs.

Latency Hiding;
- One of the major overheads in programs (both serial and parallel) is the access latency for memory access, I/O, and communication.
- By allowing multiple threads to execute on the same processor, threaded APIs enable this latency to be hidden.
- In effect, while one thread is waiting for a communication operation, other threads can utilize the CPU, thus masking associated overhead.
Scheduling and Load Balancing;
- While in many structured applications the task of allocating equal work to processors is easily accomplished,
- In unstructured and dynamic applications (such as game playing and discrete optimization) this task is more difficult.
- Threaded APIs allow the programmer
  - to specify a large number of concurrent tasks
  - and support system-level dynamic mapping of tasks to processors with a view to minimizing idling overheads.

Ease of Programming, Widespread Use
- Due to the mentioned advantages, threaded programs are significantly easier to write (!) than corresponding programs using message passing APIs.
- With widespread acceptance of the POSIX thread API, development tools for POSIX threads are more widely available and stable.
Overlapping CPU work with I/O: For example, a program may have sections where it is performing a long I/O operation. While one thread is waiting for an I/O system call to complete, CPU intensive work can be performed by other threads.
Priority/real-time scheduling: tasks which are more important can be scheduled to supersede or interrupt lower priority tasks.
Asynchronous event handling: tasks which service events of indeterminate frequency and duration can be interleaved. For example, a web server can both transfer data from previous requests and manage the arrival of new requests.

A number of vendors provide vendor-specific thread APIs. Standardization efforts have resulted in two very different implementations of threads.
Microsoft has its own implementation for threads, which is not related to the UNIX POSIX standard or OpenMP.

POSIX Threads. Library based; requires parallel coding.
- C Language only. Very explicit parallelism; requires significant programmer attention to detail.
- Commonly referred to as Pthreads.
- POSIX has emerged as the standard threads API, supported by most vendors.
OpenMP. Compiler directive based; can use serial code.
- Jointly defined by a group of major computer hardware and software vendors.
- The OpenMP C/C++ API was released in late 1998.
- Portable / multi-platform, including Unix and Windows platforms
- Can be very easy and simple to use - provides for “incremental parallelism“.

MPI $\Longrightarrow$ on-node communications,
- MPI libraries usually implement on-node task communication via shared memory, which involves at least one memory copy operation (process to process).
Threads $\Longrightarrow$ on-node data transfer.
- For Pthreads there is no intermediate memory copy required because threads share the same address space within a single process.
- There is no data transfer.
- It becomes more of a cache-to-CPU or memory-to-CPU bandwidth (worst case) situation.
- These speeds are much higher.