next up previous contents
Next: TCP Messaging Up: Linux & Cluster Previous: File Systems   Contents


Other Considerations

You can explore several other basic areas in seeking to understand the performance and behavior of your Beowulf node running the Linux operating system. Many scientific applications need just four things from a node:
  1. CPU cycles,
  2. memory,
  3. networking (message passing),
  4. and disk I/O.
Trimming down the kernel and removing unnecessary processes can free up resources from each of those four areas.Because the capacity and behavior of the memory system are vital to many scientific applications, it is important that memory be well understood. One of the most common ways an application can get into trouble with the Linux operating system is by using too much memory. Demand-paged virtual memory,
#include <stdlib.h>
#include <stdio.h>
#define MEGABYTES 1100
main() {
  int *x, *p, t=1, i, numints = MEGABYTES*1024*1024/sizeof(int);
  x = (int *) malloc(numints*sizeof(int));
  if (!x) { printf("insufficient memory, aborting\n"); exit(1); }
  for (i=1; i<=5; i++) {
    printf("Loop %d\n",i);
    for (p=x; p<x+numints-1; p+=1024) {
      *p = *p + t;
    }
  }
}
On a Linux server with 256 megabytes of memory, this program -which walks through 300 megabytes of memory, causing massive amounts of demand-paged swapping- can take about 5 minutes to complete and can generate 377,093 page faults. If, however, you change the size of the array to 150 megabytes, which fits nicely on a 256-megabyte machine, the program takes only a half a second to run and generates only 105 page faults. While this behavior is normal for demand-paged virtual memory operating systems such as Linux, it can lead to sometimes mystifying performance anomalies. A couple of extra processes on a node using memory can push the scientific application into swapping. Since many parallel applications have regular synchronization points, causing the application to run as slow as the slowest node, a few extra daemons or processes on just one Beowulf node can cause an entire application to halt.

Subsections
next up previous contents
Next: TCP Messaging Up: Linux & Cluster Previous: File Systems   Contents
Cem Ozdogan 2009-01-05