Throughout the past week users with accounts on backup90 have noticed sporadic periods where service was interrupted (events were posted on the NOC blog), restored, and would randomly go back offline after 8 hours. Since this was an event that wasn’t reproducible on demand the amount of time to diagnose the issue was slightly longer than normal. Once the issue was discovered to be a faulty disk we elected to upgrade the entire storage controller for backup90. Over the past quarter backup90 has seen a 340% growth in accounts and doesn’t appear to be slowing down – unfortunately the current controller risks slower performance to clients as growth continues in the next month.
To alleviate this issue we’ve shutdown service on backup90 to start a copy process to the new controller. After all data has been copied to the new controller we will start a random write/read seek test to ensure the stability and performance of the new controller. Unfortunately since backups are an extremely intensive IO operation, leaving Ahsay service running while trying to copy would yield extremely poor results and more than likely would require multiple attempts to copy the data – all in all, it would eventually lead to a longer downtime for clients.
We estimate that the copy process will begin tonight (3/7/12) around 5:30 PM and is expected to continue throughout Friday and Saturday. We will create a new NOC entry once the upgrade process on backup90 has begun.
VP, Network Operations, ExchangeDefender
(877) 546-0316 x757