Clustering on Exchange Server
Clustering is a Windows Server technology that allows for a "stand-by" server (call it "Server 2") to take over the load and activity of another server (call it "Server 1"), if that server should fail (i.e., if Server 1 should crash). Clustering basically is designed to replace the "box", not the storage. Server 1 and Server 2 are both connected to "shared storage", which is normally a SAN (Storage Area Network). Only one server uses the shared storage at a time. However, all of the clustered servers access a "quorom disk" - this is what they use to communicate among themselves.
In Windows Server Enterprise Edition, you can have up to four servers in a cluster. With Datacenter Edition, you can have up to eight servers in a cluster. Just to make the vocabulary lesson complete, in cluster terminology, each server is called a node.
Each node is either "active" (meaning that it is currently doing something) or "passive" (meaning that it is just sitting there waiting for another node to fail to take over that node's load). When clustering with Exchange Server, at least one node must be passive (called an A/P - active/passive cluster). To properly use a cluster requires that the application be "cluster aware".
Microsoft has a number of cluster aware applications - including Exchange Server, SQL Server, and Windows Server itself (think clustered file server).
As you might can imagine, clusters tend to be more expensive than non-clustered solutions. You have the cost of additional servers and the shared storage. Shared storage tends to be more expensive than DAS (direct attached storage, disks stuffed inside a server).
Clusters allow a well-managed server farm to go from 99.9% availability to 99.99% availability.
Emphasis on “well-managed”.
In my personal experience, well-managed clusters are, quite frankly, few and far between. In most shops that I’ve seen, clusters actually reduce availability.
If your server farm isn't ALREADY well-managed, then clusters are going to give you nothing but grief. They are not suddenly going to make things better for you. If your server farm is well-managed, you are probably beating 99.9% by a fair margin yourself. I averaged better than 99.95% in more than one environment in the past.
On the other hand - these days, you can install Exchange, configure it a little bit, set up your backups, and it’ll just hum along for you for months and months without requiring you to touch it. Run a patch install at 3am on Sunday morning once a month, and you’re done.
If you buy good hardware for the server – you’ll probably not have to touch it except when a disk needs replacing. And of course, you have SMART/SCSI monitoring, so you know when that happens, you pick up a phone and order a disk. Or you have it in inventory because e-mail is a critical service. So, it’s hot-swappable and auto-rebuilds, so you swap it. You’re done.
No frickin’ way you get off that easy with a cluster. It requires a LOT more care and feeding. To apply patches is at least a six-step process that must be done manually (apply to passive node, reboot, failover to passive node, apply to primary node, reboot, failover to primary node). And a wrong step could bring the entire cluster down.
Is this worth it?
You tell me. Three nines (99.9%) of uptime is equal to 8 hours and 45 minutes of downtime per year. Four nines (99.99%) of uptime is equal to about 53 minutes of downtime per year. Five nines (99.999%) of uptime is equal to about five minutes of downtime per year.
Four and a half nines (99.95%) is about four hours of downtime per year.
You can do the math. And then compare the costs.
In certain (very rare) circumstances, I can believe that the expert care and feeding that a cluster requires is worth the increased availability. But honestly, I don’t think I’ve EVER seen one.
Until next time...
As always, if there are items you would like me to talk about, please drop me a line and let me know!