|
|
Cluster
Computers
The invention of
the mass reproducible microprocessor by Ted Hoff of Intel some
twenty years ago paved the way towards a 'network-centric'
computing. Relegating mainframe or 'host-centric' computing to
the background, it ushered in the era of a distributed
'client-server' approach.
The present generation of RSIC microprocessors scores over
mainframes in terms of costs and performances. Collection of
RSIC microprocessors assembled together as a parallel computer
can even outperform the vector supercomputers.
Clusters of high performance workstations can be realistically
used for a variety of applications either to replace
mainframes, vector supercomputers and parallel computers or to
better manage already installed collection of workstations.
True, cluster computers have limitations, yet the substantial
benefit that can be derived is attracting many institutions
and companies towards exploring this option.
Software to maintain such clusters is still at an embroynic
stage of development. However cluster computing is a rapidly
maturing technology that is certain to play a dominant role in
the network-centric computing future. Use of clusters of
workstations to increase the throughput of user application is
becoming popular in the US and Europe.
Six current CMS ( Cluster Management Software ) package,
two public domain and four commercial, have been identified as
being worth serious investigation.
If finances permit, it is wise to choose one of the commercial
packages. This will minimize the load of the on-site staff and
leave the responsibility on vendor to ensure that their s/w is
installed, used and supported properly.
Nearly all the CMS packages are designed to run on UNIX
workstations and MPP ( Massively Parallel Processor ) systems.
Some of the public domain package support Linux, which runs on
PCs. Codeine, a commercial package, supports Linux. JP1[1]
from Hitachi Ltd. is designed to run on Windows NT platforms.
No CMS package supports Windows-95.
WWW software and HTTP protocols can be used as part of an
integrated CMS package. Most CMS package work completely
outside the kernel, and on top of a machine existing operating
system. This means its installation does not require
modification of the kernel, and so basically the CMS package
is installed like any other software on the machine.
Cluster Computing Software and the means of managing and
scheduling applications to run on these systems are becoming
increasingly commonplace. CMS packages have come about for a
number of reasons; including load balancing, utilizing spare
CPU cycles, providing fault tolerant systems, managed access
to powerful systems and so on. But overall the main reason for
their existence is their ability to provide a increased and
reliable throughput of user application on the system they
manage.
Assessing the features and functionality of a CMS package
against a set of criteria is heavily based upon that devised
by Kaplan and Nelson at NASA.
A job description file which includes the job's name, maximum
run time and the desired platform is sent by the client
software resident on the user's workstation to a master
scheduler whose main task is to evenly balance the load on the
resources that it is managing. So, when a new job is submitted
it not only has to match the requested resources with those
that are available, but also needs to ensure that the
resources being used are load balanced.
Another responsibility of the master scheduler is ensuring
jobs that will complete successfully. It does this by
monitoring jobs until they successfully finish. However if a
job fails, due to problem other than an application runtime
error, it will reschedule the jobs to run again.
Two useful features of a CMS package are process migration and
check pointing, for ensuring load balancing and completion of
a job properly.
With the advent of relatively cheap inter-processor
communication (Giganet, Gigabyte Ethernet, Myrinet etc.)
complete parallel HPC systems (16 - 256 processors) are
relatively inexpensive to purchase and maintain. But even so,
personnel, environment (power, cooling), software, and system
support must still be factored into the costs of a production
system. The latter expenses do not follow the commodity
component price curves.
– Subhajit
Ghosh
January 4, 2000
Top
|
|