Next: Basic Cache Coherency Methods Up: Shared Memory Architecture Previous: Cache-Only Memory Architecture (COMA) Contents

Bus-based Symmetric Multiprocessors

Shared memory systems can be designed using bus-based or switch-based interconnection networks.
The simplest network for shared memory systems is the bus.
The bus/cache architecture facilitates the need for expensive multi-ported memories and interface circuitry as well as the need to adopt a message-passing paradigm when developing application software.
However, the bus may get saturated if multiple processors are trying to access the shared memory (via the bus) simultaneously.
High-speed caches connected to each processor on one side and the bus on the other side mean that local copies of instructions and data can be supplied at the highest possible rate.
If the local processor finds all of its instructions and data in the local cache, we say the hit rate is 100%. The miss rate of a cache is the fraction of the references that cannot be satisfied by the cache, and so must be copied from the global memory, across the bus, into the cache, and then passed on to the local processor.
Hit rates are determined by a number of factors, ranging from the application programs being run to the manner in which cache hardware is implemented.
- A processor goes through a duty cycle, where it executes instructions a certain number of times per clock cycle.
- Typically, individual processors execute less than one instruction per cycle, thus reducing the number of times it needs to access memory.
- Subscalar processors execute less than one instruction per cycle, and superscalar processors execute more than one instruction per cycle.
- In any case, we want to minimize the number of times each local processor tries to use the central bus. Otherwise, processor speed will be limited by bus bandwidth.
- We define the variables for hit rate, number of processors, processor speed, bus speed, and processor duty cycle rates as follows:
  - number of processors;
  - hit rate of each cache, assumed to be the same for all caches;
  - miss rate of all caches;
  - bandwidth of the bus, measured in cycles/second;
  - processor duty cycle, assumed to be identical for all processors, in fetches/cycle;
  - peak processor speed, in fetches/second.
- The effective bandwidth of the bus is fetches/second.
  - If each processor is running at a speed of , then misses are being generated at a rate of .
  - For an -processor system, misses are simultaneously being generated at a rate of .
  - This leads to saturation of the bus when processors simultaneously try to access the bus. That is, $N*(1-h)*V \leq B*I$ .
- The maximum number of processors with cache memories that the bus can support is given by the relation,
  
  $\displaystyle N \leq \frac{B*I}{(1-h)*V}$ (4.1)
- Example: Suppose a shared memory system is
  - constructed from processors that can execute instructions/s and the processor duty cycle .
  - the caches are designed to support a hit rate of 97%,
  - the bus supports a peak bandwidth of cycles/s.
  - Then, , and the maximum number of processors is
    
    $\displaystyle N \leq \frac{106}{(0.03 * 107)}=3.33$
  - Thus, the system we have in mind can support only three processors!
  - We might ask what hit rate is needed to support a 30-processor system. In this case,
    
    $\displaystyle h=1-\frac{B*I}{N*V}=1-\frac{106*1}{30*107}=1-\frac{1}{300}$
    so for the system we have in mind, . Increasing by 2.8% results in supporting a factor of ten more processors.

Next: Basic Cache Coherency Methods Up: Shared Memory Architecture Previous: Cache-Only Memory Architecture (COMA) Contents

Cem Ozdogan 2006-12-27