Per our previous NOC posting, we’ve been redesigning our maintenance plan for rebalancing the user distribution on HUEY. The original plan to defrag the database was abandoned as the timeframe for completion was not acceptable.
Tonight starting at 9PM Eastern we will be taking the HUEY database offline for about 5 minutes as we clear out the memory cache in preparation for mailbox moves tonight. Throughout the night we will be moving users between two new databases to even out the load. During the move, mailboxes that are actively moving will be inaccessible to users as Exchange 2007 did not feature Online moves. Upon completion, users will be able to access their mailbox on the new database. Move times will depend on the mailbox size and item count.
Update 6:45 PM Eastern: After our previous update our metric test completed and we’ve noticed that there are write lock delays on the OS drive for HUEY. We’ve made an adjustment to our above outline. Prior to starting the mailbox moves we will be performing a full database backup at the NTFS file level. Unfortunately this means we will have to offline the database as we are capturing a raw file backup instead of a VSS backup. After the backup is completed we will scan the surface error on the OS drive for HUEY for any corruption. We anticipate this entire process will take up to 4 hours to complete. We will update this post as progress is made starting at 9PM when work begins.
Update 8:30 PM Eastern: The backup job of the OS is taking a bit longer than expected. We are pushing back dismounting the database to 9:30 PM. We will update this blog after 9:00 PM if we anticipate the backup taking longer.
Update 8:50 PM Eastern: We’ve received a request from a few west coast customers asking us to postpone maintenance until 10. In the interest of disturbing service as least as possible, we will be postponing maintenance until 10 PM Eastern.
Update 10:45 PM Eastern: The backup is estimated to complete in the next hour. We will then begin the surface error test on the OS drive. This is estimated to be the longest part of the process and will require a disruption of service as we take the server offline. We estimate the entire process to be 4 hours as described earlier. We will update this post once the work begins.
Update 11:30 PM Eastern: We are beginning to dismount the mailbox databases and stop Exchange services.
Update 2:05 AM Eastern: The surface area test has revealed issues on the OS drive. We are running a repair to on the drive and monitoring the progress.
Update 4:17 AM Eastern: We’ve replaced a bad drive in HUEY on the OS drive and we are proceeding to perform integrity checks before turning on any services.
Update 5:47 AM Eastern: The integrity check failed and we will be restoring from the backup image taken prior to maintenance. We will continue to update this blog as progress is made.
To clarify the issue is specifically with the operating system and not the database integrity.
Update 9:44 AM Eastern: The restoration process is proceeding as planned, this is courtesy update to ensure partners work is continuing.
Update 12:20 PM Eastern: The restoration surface test is underway and we are looking to confirm data consistency on the OS drive.
Update 1:57 PM Eastern: In order to achieve resolution in the fastest manner possible, we are beginning to concurrently restore the backup image on a spare server to eliminate any potential issues that may be affecting the physical host.
Update 7:00 PM Eastern: The integrity check has processed half of the files on the OS drive and overall progress is about 25 percent complete
Update 9:00 PM Eastern: This is a courtesy update as the process above is still continuing successfully without any halts. We understand this is an urgent issue and we appreciate your patience with this process.
Update 1:30 AM: The integrity check has processed about 90% of the files on the drive and overall progress is near 75% completed.
Update 2:35 AM: The integrity check has completed and we’ve successfully booted windows into safe mode. We are now proceeding to boot normally and resume services on HUEY
Update 3:15 AM: Service on HUEY has been restored and all queued mail is being delivered to user mailboxes.