In a nutshell, network clustering connects otherwise independent computers to work together in some coordinated fashion. Because clustering is a term used broadly, the hardware configuration of clusters varies substantially depending on the networking technologies chosen and the purpose (the so-called "computational mission") of the system. Clustering hardware comes in three basic flavors: so-called "shared disk," "mirrored disk," and "shared nothing" configurations.
Shared Disk Clusters
One approach to clustering utilizes central I/O devices accessible to all computers ("nodes") within the cluster. We call these systems shared-disk clusters as the I/O involved is typically disk storage for normal files and/or databases. Shared-disk cluster technologies include Oracle Parallel Server (OPS)and IBM's HACMP.
Shared-disk clusters rely on a common I/O bus for disk access but do not require shared memory. Because all nodes may concurrently write to or cache data from the central disks, a synchronization mechanism must be used to preserve coherence of the system. An independent piece of cluster software called the "distributed lock manager" assumes this role.
Shared-disk clusters support higher levels of system availability: if one node fails, other nodes need not be affected. However, higher availability comes at a cost of somewhat reduced performance in these systems because of overhead in using a lock manager and the potential bottlenecks of shared hardware generally. Shared-disk clusters make up for this shortcoming with relatively good scaling properties: OPS and HACMP support eight-node systems, for example.
Shared Nothing Clusters
A second approach to clustering is dubbed shared-nothing because it does not involve concurrent disk accesses from multiple nodes. (In other words, these clusters do not require a distributed lock manager.) Shared-nothing cluster solutions include Microsoft Cluster Server (MSCS).
MSCS is an atypical example of a shared nothing cluster in several ways. MSCS clusters use a shared SCSI connection between the nodes, that naturally leads some people to believe this is a shared-disk solution. But only one server (the one that owns the quorum resource) needs the disks at any given time, so no concurrent data access occurs. MSCS clusters also typically include only two nodes, whereas shared nothing clusters in general can scale to hundreds of nodes.
Mirrored Disk Clusters
Mirrored-disk cluster solutions include Legato's Vinca. Mirroring involves replicating all application data from primary storage to a secondary backup (perhaps at a remote location) for availability purposes. Replication occurs while the primary system is active, although the mirrored backup system -- as in the case of Vinca -- typically does not perform any work outside of its role as a passive standby. If a failure occurs in the primary system, a failover process transfers control to the secondary system. Failover can take some time, and applications can lose state information when they are reset, but mirroring enables a fairly fast recovery scheme requiring little operator intervention. Mirrored-disk clusters typically include just two nodes.
Conclusion
Network clusters offer a high-performance computing alternative to SMP and massively parallel computing systems. Aggregate system performance aside, cluster architectures also can lead to more reliable computer systems through redundancy. Choosing a hardware architecture is just the beginning step in building a useful cluster: applications, performance optimization, and system management issues must also be handled.