Introduction

High performance may come from fast dense circuitry, packaging technology, and parallelism. Parallel processors are computer systems consisting of multiple processing units connected via some interconnection network plus the software needed to make the processing units work together. There are two major factors used to categorize such systems: the processing units themselves, and the interconnection network that ties them together.

Uniprocessor - Single processor supercomputers have achieved great speeds and have been pushing hardware technology to the physical limit of chip manufacturing.

Physical and architectural bounds (Lithography, $\mu$ m size, destructive quantum effects.
Proposed solutions are maskless lithography process and nanoimprint lithography for the semiconductor).
While clock rates of high-end processors have increased at roughly 40% per year over the past decade, DRAM access times have only improved at the rate of roughly 10% per year over this interval (presents a tremendous performance bottleneck). This growing mismatch between processor speed and DRAM latency is typically bridged by a hierarchy of successively faster memory devices called caches that rely on locality of data reference to deliver higher memory system performance. In addition to the latency, the net effective bandwidth between DRAM and the processor poses other problems for sustained computation rates.
Uniprocessor systems can achieve to a limited computational power and not capable of delivering solutions to some problems in reasonable time.

Multiprocessor - Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution.

New issues arise;

Multiple threads of control vs. single thread of control
Partitioning for concurrent execution
Task Scheduling
Synchronization
Performance

Figure 1.1: View of the Field and Abstraction Layers

$\includegraphics[scale=0.4]{figures/view.ps}$

$\includegraphics[scale=0.5]{figures/layers.ps}$

Past Trends in Parallel Architecture (inside the box)

Completely custom designed components (processors, memory, interconnects, I/O). The first three are the major components for the aspects of the parallel computation.
- Longer R&D time (2-3 years).
- Expensive systems.
- Quickly becoming outdated.
While parallel computing, in the form of internally linked processors, was the main form of parallelism, advances in computer networks has created a new type of parallelism in the form of networked autonomous computers.

New Trends in Parallel Architecture (outside the box)

Instead of putting everything in a single box and tightly couple processors to memory, the Internet achieved a kind of parallelism by loosely connecting everything outside of the box.
Network of PCs and workstations connected via LAN or WAN forms a Parallel System. Compete favorably (cost/performance).
Utilize unused cycles of systems sitting idle.

Parallel and Distributed Computers. The processing units can communicate and interact with each other using either shared memory or message passing methods. The interconnection network for shared memory systems can be classified as bus-based versus switch-based.

MIMD Shared Memory

Figure 1.2: MIMD Shared Memory, MIMD Distributed Memory, SIMD Distributed Computers, and Clusters.

$\includegraphics[scale=0.4]{figures/mimdsharedmem.ps}$ $\includegraphics[scale=0.4]{figures/mimddistributedmem.ps}$

$\includegraphics[scale=0.4]{figures/simdcomputers.ps}$ $\includegraphics[scale=0.4]{figures/clusters.ps}$
Bus based
Switch based
CC-NUMA
MIMD Distributed Memory
SIMD Computers
Clusters
Grid Computing
- Grids are geographically distributed platforms for computation.
- They provide dependable, consistent, general, and inexpensive access to high end computational capabilities.

In message passing systems, the interconnection network is divided into static and dynamic.

Static connections have a fixed topology that does not change while programs are running.
Dynamic connections create links on the fly as the program executes.

Figure 1.3: Interconnection Network Taxonomy and Four Decades of Computing.

$\includegraphics[scale=0.3]{figures/interconnet.ps}$

$\includegraphics[scale=0.6]{figures/decades.ps}$

Subsections

Four Decades of Computing

Up:

Ceng 505 Parallel Computing

List of Figures

Cem Ozdogan 2006-12-27