Exchange ESE, Failed Backups, the Checkpoint Depth, and You
A question came up recently on a mailing list I hang around. A poster wrote: "I'm looking for a concise explanation of when/how the transaction logs are committed. Specifically, is there another way to force logs to be committed other than the backup or stopping the IS? (E2K3)"
As with many things in Exchange, when you get down to the bits and the bytes, things are not always as they seem. My response, which some of you might be interested in, went more-or-less less like the below (I've expanded it a little, and added a few paragraphs on checkpoint exhaustion). Also note that this only applies to streaming backups. Shadow-copy backups are different.
A concise explanation? Hrmmm.
Well, first, doing a backup does NOT force logs to be committed. It means that the backup process waits until a checkpoint occurs, flushes the current transaction log, and during the backup additional checkpoints are not allowed to occur (that is, nothing is allowed to be flushed to the ESE database until after the backup is complete – leading to an increasing checkpoint depth, and in extreme situations, checkpoint exhaustion – discussed a little later). The ESE buffers are not flushed. The "checkpoint depth" is how many checkpointed logfiles must be processed before the ESE database is current with the transaction log files.
So...in the normal case, data is written in a serialized fashion to the in-memory ESE cache and to the log buffers, as updates occur in an Exchange database. Logs are written to disk as logs fill up, or as transaction commits occur. (This may mean that a log can have nothing but a checkpoint record in it – but that is not the normal case.) In the ESE buffers, I/O is accumulated and prioritized by a process known as the “lazy writer”. The lazy writer scans the ESE buffers for dirty (modified) pages and builds an optimized list to flush those to disk. As that list is flushed to disk, the pages are marked as “clean”, and the checkpoint is marked as having advanced (on a transaction by transaction basis, not a log by log basis). Whenever the transaction in a log are fully advanced, then the checkpoint file and the database header are updated.
During a backup, the lazy writer is paused.
The ONLY time you are assured that logs are fully committed is during clean shutdown, or after running soft recovery.
Now, consider the situation when a backup is in process and it gets hung (i.e., waiting on a tape, or there was a fatal write error on the tape and the backup application crashed). What happens? There are three possible answers.
1) The backup application caught the exception (i.e., the error that caused the crash) and cleaned up after itself. In this case, it appears as if the bad backup never happened and everything just goes back to normal. This is the best possible case.
2) The backup application didn't catch the exception, but "notices" when it next runs that there was a backup in process and isn't one now, but the backup state is still "dirty". Then, the backup application cleans up the old "dirty" backup state, and proceeds as normal. This is the middle of the road case.
3) The backup application didn't catch the exception, and when it goes to start the next backup, it gets an error from Exchange - "backup already in process". Then, because of that error, the backup aborts. This is the worst situation. To clear the "backup in process" flag requires a restart of the Microsoft Exchange Information Store service.
In any of these cases, the checkpoint depth is continuing to advance - and the log files are accumulating, but nothing is being committed to the database. This is a problem! Eventually, the Information Store will say "enough!" and dismount the storage group. That's right - it will dismount the entire storage group.
This is called "Checkpoint depth exhaustion". After 1,008 logfiles have been generated, Exchange automatically shuts down the storage group. There is an implementation limit of 1,024 logfiles in a checkpoint, and this shutdown prevents the Exchange database from being corrupted.
After the storage group is dismounted, MSExchangeIS can simply be restarted and the storage group remounted (this will cause all of those transaction logfiles to be replayed - it may take quite awhile!).
This can also happen in very very high update situations (such as running many move mailbox operations all at once time).
Your best bet is to monitor for this. If you are using OpsMgr 2007, this is already one of the issues checked for. If you want to add this check manually, add the performance counter "Database ==> Instances\Log Generation Checkpoint Depth".
For more information on this issue, see KB 905801 and KB 819771.
Until next time...
As always, if there are items you would like me to talk about, please drop me a line and let me know!