The outage looks like an LDAP query was null routing, but not dying and eating up resources — We killed the query and was able to establish connection to our test account. Users should be able to reconnect in the next few minutes as the databases reconnect. The outage looks to have started around 11:10 AM and was resolved by 11:20 AM.
Our Exchange 2007 network DEWEY is currently registering as offline in our monitoring software. We are in the process of investigating this issue now.
Update 2:09 PM Eastern: The issue is related to one of the core networks used by DEWEY. We detected and thwarted a DDoS attack. Service is back to 100% operation.
On Saturday July 30st and Sunday Aug 1st we will be performing an all day storage upgrade on the LOUIE exchange mailbox servers. The scheduled updates will provide clients improved speeds in performing API calls to the server, such as sending messages or resolving the GAL.
During the upgrade clients will experience disconnections from their mailbox as their database moves to the new storage medium.
Clients requiring on-demand access to their mail should utilize LiveArchive (https://livearchive.exchangedefender.com) if they are unable to access their mailbox on the LOUIE network.
Update Saturday 3:30 PM Eastern: We are about to begin the hardware installation for the LOUIEMBOX servers. During the installation, customers will be unable to reach the Exchange server. We expect the downtime for the upgrade to be no more than 20 minutes.
Update Saturday 4:30 PM Eastern: The hardware installation completed about 30 minutes ago and everything is going as expected. Throughout the night we will be preparing the new storage array before the databases are moved.
Update Sunday 1:10 PM Eastern: The storage array initialization has completed. We are beginning the process of moving mailbox databases to the new storage array. During the mailbox move, customers may experience disconnection from Outlook and OWA, however, the migration is estimated to take up to an hour and a half per database.
Update Sunday 1:19 PM Eastern: The first mailbox database has finished copying to the new storage array. The nest database will take about two hours to complete moving.
Update Sunday 4:44 PM Eastern: After successfully moving the databases, we discovered that the backup software cannot work with our drive structure. We are moving the databases to a temporary holding spot as we restructure the drives.
Update Sunday10:05 PM Eastern: The upgrades to LOUIE have been complete and everything was migrated successfully. We appreciate our partners and their customer’s patience as this upgrade went in place.
We are conducting an emergency reboot on the LOUIE GC servers, which includes one of the mailbox servers and a CAS array because of an inconsistency in the GC. Customers will lose connection from Outlook for up to 10 minutes, but will reconnect after the reboot is complete.
Update 1:54 PM Eastern: The servers are back from the reboot, but service has not been restored. We do not estimate the outage to last more than 10 more minutes.
Update 2:00 PM: Service has been restored on the LOUIE network. We are making a final reboot on the Blackberry server for LOUIE.
The Exchange 2007 DEWEY server is currently experiencing issues with IIS. We are working to restore connectivity and do not expect the outage to last longer than an hour. Clients needing immediate access to email can utilize livearchive @ https://livearchive.exchangedefender.com/owa
Update 5:32 PM Eastern: Service has been restored to DEWEY
Throughout the night we will be installing new drives to a few LOUIE servers. The following services will be intermittently interrupted throughout the upgrade which is scheduled to last between 5:30 PM and 7:00 PM
The CAS array host will be rebooted which will leave Outlook disconnected for up to five minutes
The BES database is being moved to a new drive which will leave BB devices unable to synchronize for up to 30 minutes
The Sharepoint SQL host is being moved to a new drive which will leave Sharepoint portals unavailable for up to five minutes
Update 7:14 PM Eastern: The database move for the BES server is taking longer than expected. We are still working to finalize the upgrades as soon as possible.
Update 8:02 PM Eastern: The CAS array is being upgraded now. Users will experience two brief disconnects which are scheduled to last up to 10 minutes each.
SCROOGE (Exchange 2007) will be going down momentarily for a reboot after tripping our monitoring threshold a few times. The reboot is being done as a proactive measure to avoid any issues during ‘business hours’.
The server is expected to only be down for 15 minutes while the reboot completes.
Update 8:56 AM: Service has been restored 100% on Scrooge
We received reports from partners on the DEWEY Exchange 2007 about performance issues. We’ve scheduled a reboot to happen at Midnight (6/10/2010). Service is only expected to be impacted for 15 minutes while the reboot completes.
Synchronization between blackberry devices and the LOUIE server is currently in the ‘critical-interrupted’ state, as new mail is not being delivered to devices. We are currently working with Microsoft and RIM to pin point the issue, but it appears that our quickest option is to reinstall the BES server. We do not anticipate the need for BB devices to be re-parried with the BES server is a reinstallation is necessary.
Update 9:00 PM Eastern: Service has been restored on the LOUIE BES server. Partners on the LOUIE BES server should have received emails from OWN Support containing the new activation passwords for users.
Earlier this morning we had a few reports that users were not able to send outgoing emails from their BB devices. We located and resolved the issue. As an additional measure, we’ll be moving the BES server to our newer Exchange 2010 cluster. Starting at 2 PM Eastern users will be unable to synchronize from their BB devices until the migration is completed. We anticipate the migration only taking 30 minutes.
Update 2:08 PM Eastern: The server is about to go offline for the migration.
Update 3:10 PM Eastern: The migration is taking longer than anticipated, however, leaving it on the current host is only providing a consistent experience for a majority of users, and not all; we have to finish the migration.
Update 6:17 PM Eastern: Service has been restored 100% and contact with all user BB devices has completed.
We are currently in the process of moving one of the LOUIE CAS node servers to a new host. Earlier we had alerts from Exchange monitoring that was pausing the Exchange processes. We expect service to only be temporarily interrupted, and the louie network should be in normal operation in less than 15 minutes. Later on tonight there will be an additional reboot needed on the CAS array nodes to apply changes to our thresholds/monitoring.
Update 9:49 PM Eastern: The CAS array is about to go offline for the update of the monitoring software. Service on louie should be back to 100% functionality by 10:30 PM Eastern.
Update 10:42 PM Eastern: The updates are completed and the CAS array nodes are coming back online.
Tonight, May 17th 2010 at 10 PM Eastern, we will be rebooting the DEWEY server to address issues with the OWA search indexer, and the amount of resources utilized by the Exchange store. Access to mailboxes will be limited and the downtime is proposed to be under 15 minutes.
Update 9:50 PM Eastern: We are beginning to stop the Exchange Services in preparation for the reboot. Customers needing immediate access to their mail can utilize LiveArchive during the reboot.
Update 10:14 PM Eastern: The reboot is complete and all services are back online
The Exchange 2007 server DEWEY will be rebooting at 11PM EST tonight to resolve an issue with the OWA Search Indexer that is returning no results.
On Saturday, April 17th at 9 PM EST We will being the process of moving one of the mailbox databases to a upgraded RAID array. During this time, users on DEWEYMBOX1 will be unable to connect to their mailboxes through Outlook or OWA. We strongly urge partners to familiarize their clients that require constant access to their email with LiveArchive as this will contain live mail during the downtime.
Update 8:27 PM: The database move is confirmed to begin at 9PM EST. During the move, users on MBOX1 will be unable to access their mailboxes.
Update 12:14 AM EST: The database move has completed. All service has returned to 100% operation.
This Sunday, 4-11-10 at 7PM EST We will be dismounting MailboxDB3 on the DEWEY exchange server to move it back to the primary RAID array. During this period, users on MailboxDB3 will be unable to access their mailboxes, however, users needing immediate access to their mailbox can utilize LiveArchive to retrieve their mail.
On April 8th, 2010 @ 9PM EST we will being the process of implementing split GALs on australia.exchangedefender.com
During this period users in Outlook may receive warnings about not being able to resolve the name in the address book. This will be a temporary issue shouldn’t last more than 1 hour.
Update 2:56 PM EST: We are beginning the process to implement split GALs.
Tonight at 10:00 PM EST DEWEY (Exchange 2007 US Server) will be going undergoing scheduled maintenance, which is estimated to last 4 hours. During the maintenance schedule, DEWEY may be rebooted to update our backup software. Service is expected to not be impacted outside the reboots.
We have identified a performance issues on the DEWEY cluster and the team is working on restoring full functionality. While most users are not affected, we recommend using https://livearchive.exchangedefender.com if you are experiencing performance issues.
Update: 4:30 PM EST: Performance optimization complete, status for all users is back to optimal.
Update: 5:25 PM EST: Users on one of the databases are still reporting issues in delays. We’ve dismounted the mailbox database to check it for inconsistencies.
Update 9:13 PM EST: The database checks came back fine and we’ve had the database mounted for a few hours. The server is going down for a reboot to add new hardware and once it’s back online, we will be running a precautionary extra backup of the database.
Update 5:16 AM EST: Stats on the server are looking normal and performance is expected to be back to normal operating levels. Throughout the week we will be migrating users across the new sets of databases.
At approximately 1:10 PM EST we started receiving reports of disconnections for Outlook clients. The issue was performance related and although it should not be affecting connectivity at times it does. We restarted the mail store service and that addressed the issue.
Update 2:27 PM EST: Some partners are still reporting issues connecting via RPC over HTTPs. We are going to perform a quick reboot on DEWEY to resolve the issue.
Update 2:50 PM EST: The issue has been resolved and service has been restored 100%