Throughout the night we prepared LOUIE for the new DAG network. As we finalize the changes, users may notice brief disconnection notices. We anticipate all the work being completed by 8AM Eastern.
Throughout the night we prepared LOUIE for the new DAG network. As we finalize the changes, users may notice brief disconnection notices. We anticipate all the work being completed by 8AM Eastern.
Tonight at 9:30 PM Eastern we will be moving DEWEYDB3 to a new RAID array. During the move, users on this database will be unable to connect to their mailbox. After the move is completed, any mail that arrived during the transition will be delivered to the user’s mailbox. The move is estimated to take an hour and a half to complete.
Due to the massive adjustments over the weekend to LOUIE, we’ve noticed random errors with the previous index database for OWA searches. In an attempt to resolve search results returning “nothing” we’ve reset the index database. Over the next day the index databases will repopulate with searchable results.
Update 1:15 PM Eastern: We will be rebooting the mailbox servers to reapply changes to the Exchange Schema that were lost over the weekend upgrade. The reboot will only take 15 minutes and will conducted at 00:00 -5 GMT (Eastern) 11/17/2010
Update 12:30 AM: The mailbox servers are now rebooting.
We are in the midst of updating our monitoring software on the LOUIE network and installing driver updates from DELL. There will be a brief period of service interruption as the mailbox servers are rebooted.
10/14/2010 9:06 EST – We are working on the upgrade on LOUIE. Right now we are just making software checks after the upgrade to ensure all of the nodes are connected to the cluster correctly and that all the services are restarting correctly. Please stand by for an update in the NOC blog for work completion.
10/14/2010 6:22 EST – All LOUIE services have been fully restored with a successful upgrade.
We are currently rebooting the servers in the scrooge network. The services should be restored momentarily.
Update 6:47 AM: We’ve identified a severe network issue and are currently replacing the affected hardware (storage array). To eliminate any issues during business time or during the week, we’re doing an emergency hard drive swap right now.
Update 7:40 AM: We’ve replaced the hardware and we’re in the process of performing stress tests on the new hardware before allowing client connections. We’re on schedule to be completed in the next hour.
Update 8:30 AM: Service has been restored to SCROOGE and our stress tests were a success.
Update 9:19 AM: We had to perform an emergency reboot on the mailbox server for SCROOGE to finalize the drive sync. The mailbox server is expected to be up and fully operational after the reboot.
Update 9:45 AM: After the reboot the DB RAID array did not mount on POST. We are going to reseat all drives and reboot the server.
Update 11:30 AM: After mounting the databases, we received alerts that 10% of the mailboxes could not be loaded. We are in the process of restoring from a snapshot that was taken at 3 AM, and then we will merge any new data.
Update 12:04 PM: Due to the length that scrooge has been in production, the integrity check is taking a long time to process all the log files. At this point there is no ETA on a resolution. Customers are urged to utilize livearchive. Optionally, for an immediate solution, partners can request to be moved to louie via support request, and can export the previous mailbox if the user was in cached mode.
Update 7:20 PM: After the log replay finished, we noticed there were still CRC issues in accessing RAID array. We are in the progress of making a snapshot backup before we migrate the database to a new external RAID array. The next NOC update is scheduled for 1AM Eastern.
Update 1:24 AM: We are currently restoring from a recent backup this morning. As it stands now, no mail is expected to be lost. Once the restore is complete, an integrity check will be started to ensure reliability before the start of business. Unless there is a change in progress, the next update is scheduled around 4AM.
Update 6:20 AM: The integrity check is underway. The restore job was stopped after one hour as it showed an estimated time beyond 9AM Eastern. We reconfigured the attached drive setup which allowed for a faster restore. The integrity check will provide an estimated time in about 20 minutes.
Update 6:50 AM: The integrity check is estimated to take around 2.5 hours, putting us extremely close to the desired service restore time of 9AM Eastern. We will keep this post updated as the time changes. We will give the integrity check until 10AM Eastern until a decision is made. We do not anticipate the check failing, but if it does, we will run the day with blank mailboxes, delivering the spooled mail from yesterday. As the integrity check finishes, we will begin to migrate mail. We appreciate everyone’s patience as we are working as fast as we can to restore service without introducing more instability.
Update 7:42 AM: The integrity check on the smallest database completed successfully. Once the final two databases complete, we will attempt to mount all three databases. Time estimate still looks to complete around 10AM Eastern.
Update 9:18 AM: The integrity check has completed on all the databases successfully. We’re in the process of mounting them now. The previous ETA of 10AM Eastern is still on schedule.
Update 10:05 AM: After the integrity check completed with no errors, we attempted to mount the database and we were presented with errors. We are in the process of running a background scan, but in the interest of restoring service to clients, we’ve moved all users to a temporary database.
Users with Outlook using Cached Mode will be presented with a new option when they open outlook, warning them that their mailbox has moved and they can either use the new profile or the temporary profile. If users select temporary profile, Outlook will open the Cached profile (the profile before the outage that is cached on the local machine).
After hours we are going to try a couple different methods, but as far as we can still see, all data is still intact.
Update 5:45 PM: After running a more aggressive integrity check which resulted in no errors, the database still would not mount. We’ve restored different combinations of backups, however since this was the first Exchange 2007 server network for OWN, the backups ran incremental backups to our secondary datacenter. Sadly, with the number of backups, it would take nearly four days to rebuild the files from backup. At this point, the quickest means for restoring data is to run a local repair on the database and then seed the previous mailbox into the new, live mailbox. We began a repair on a copy of the previous database around noon Eastern, and currently it’s only at 15%. We believe that the repairs may last into tomorrow. To be on the safe side, we are sending the backups from the secondary datacenter overnight to Dallas to begin rebuilding the files from backup in the background.
Update 10:17 PM: The repair has finally made a step forward..although it’s not a grand step, the repair finally shows actual progress. We will gauge the speed of the progress and a resolution time estimate will be posted around 2 AM Eastern.
Update 4:30 AM: The repair hasn’t made any new progress, but the database has a modified timestamp from seven minutes ago. Unfortunately we cannot make an estimate on time until the next bit of progress.
Update 12:01 AM Eastern: The database repair has completed and we now proceeding with the isinteg check before attempting to mount the database.
Update 2:06 PM Eastern: The integrity check completed and we’ve successfully mounted the databases in a RSG. We are proceeding with seeding back mailbox content.
Update 10:48 AM Eastern: We are still continuing the seeding process. Everything has been successful thus far. Unfortunately, the seeding process does not give us a measurable progress bar. But please rest assured, folks are getting their data seeded we will update this blog once all seeding has been completed.
Update 2:49 PM Eastern: We have completed the mailbox data seeding process for the server for everyone.
This weekend we will be performing maintenance on the following servers Sunday, 10/10/10:
DEWEYHUB: The hub server for the DEWEY network will be moved to a new host server. Mail flow service will be impacted for up to 15 minutes, but clients will not loose connection to their mailbox.
SCROOGE: The mailbox server for SCROOGE will be performing an offline defrag. Access to mailboxes will be impacted for up to 3 hours starting around 9AM Eastern, however, clients should utilize LiveArchive during the maintenance.
DONALD: The DONALD Exchange 2003 network will be moved to a new IP address. Clients should not have to reconfigure any settings.
LAUNCHPAD: The Sharepoint 2007 network, LAUNCHPAD will be offline for 30 minutes while a hardware investigation is performed.
We’ve received a few sporadic reports from our partners about the GAL in outlook showing up as empty. At 9:00 PM Eastern we will be rebooting the mailbox servers for LOUIE. We were unable to reboot the mailbox servers after the SP1 install because of time constraints. Clients on should only see a 10 minute disconnect from their mailbox when their respective server goes offline.
Update 9:01 PM Eastern: We are beginning the server reboots, beginning with the mailbox servers, then hub, and finally CAS.
Update 9:38 PM Eastern: All the servers rebooted successfully. We are now in the process of adding LOUIECAS3 into the CAS Array.
Update 10:15 PM Eastern: The addition of LOUIECAS3 was completed and connection was verified with our test accounts.
Update 10:45 PM Eastern: We are in the process of rekeying the SSL certificate to apply to LOUIECAS3. Clients will be unable to connect to Outlook as the SSL gets processed.
We are about to begin the Service Pack 1 install for the LOUIE network. During the install, some Exchange services will have to be restarted, however, clients connected to cas.louie.exchangedefender.com should only see a disconnection of about 15 seconds as they switch between CAS servers.
Update 3:41 AM Eastern: We’ve completed the install on the CAS array servers. We are now proceeding with the HUB and Mailbox servers.
Update 6:43 AM Eastern: All HUB servers were updated to SP1 around 4:40, however the mailbox servers are still in the process of finalizing.
Update 7:18 AM Eastern: Half of the mailbox databases are mounted, and it looks like the SP1 installation is near completion.
Update 7:23 AM Eastern: The installation of SP1 on the mailbox servers has completed. We are in the process of bringing all the services back online.
Update 7:54 AM Eastern: Service has been restored on LOUIE.
Our monitoring software just alerted of a backpressured message queue on the LOUIE Exchange 2010 network. Upon investigation, there was a message in the poison queue that needed to be removed. Mail is once again flowing on LOUIE and we apologize for the inconvenience. The message entered the poison queue at 7:50PM Eastern and was removed at 8:20 PM Eastern.
We are beginning the maintenance schedule for Scrooge (Exchange 2007). During our maintenance we will be attempting to move the mailbox database, and finally performing an offline defrag of the database. Clients should anticipate the inability to connect to their mailboxes and utilize LiveArchive during the maintenance.
As always, progress will be updated in this post.
Update 9/11/10 1:50 PM Eastern: The database copy failed as it did previously, where the server stops making progress on the copy. We are going to use the same method we used last week to move the DB to the Temp drive, but now moving it to the permanent location.
Update 9/12/10 1:30 AM Eastern: The database move was successful. We are currently letting the log files replay and then will be running an offline defrag of the database before putting it back into production.
For any clients that do not want to wait for the checks to complete, we can move you to a new database with a blank mailbox. You will not have to reconfigure outlook, but if you use Blackberry Enterprise Service, you will have to redo enterprise activation. After the checks are completed (likely to be mid Sunday) we can move you back to your original mailbox and then seed in the mail from the new database.
Update 9/12/10 8:31 AM Eastern: The log files replayed successfully. We decided to mount the databases, and schedule the offline defrag for next Sunday, as opposed to the Saturday Maintenance. Again, we cannot stress enough how much we appreciate everyone’s patience and understanding as we aren’t doing this for fun nor would we be doing it as aggressively as we are if we didn’t feel this was required for the overall network reliability and performance
This weekend we will be performing maintenance on:
scrooge – Exchange
deweymbox2 – Exchange
Maintenance on scrooge will begin Friday (9/10/2010) evening and likely to end Saturday afternoon. We will be performing database maintenance such as offline defrag and user migration. During the maintenance on Scrooge, users should expect to be disconnected from Outlook.
Update 9/11/10 1:44 PM Eastern: The stress testing on DEWEYMBOX2 has completed and is showing excellent resource utilization. Any mail that was destined for users on DEWEYMBOX2 is being delivered from queue. Users should be able to connect to their mailboxes without issue as service is now 100% operation on DEWEY.
Maintenance on deweymbox2 will begin Saturday (9/10/2010) morning and likely to end Saturday afternoon. We will be performing hardware upgrades and running more stress tests to ensure the stability of the server. Clients may see brief periods of disconnection from their mailbox.
Throughout the night, starting at 8PM Eastern we are going to be doing various upgrades to SCROOGE, such as driver updates, monitoring software, and a Microsoft Exchange hotfix.
During the driver update and Microsoft hotfix, we will have to dismount the mailbox databases. We do not anticipate the interruption to last more than 15 minutes.
Update 8:01 PM Eastern: The installs have been rescheduled for 9:00 PM Eastern tonight.
Update 9:01 PM Eastern: We are beginning the upgrades now. We’ll start with the Microsoft hotfix which will require a restart of the information store.
Update 9:17 PM Eastern: The installs were completed. We are going to monitor the server over the next couple of hours as we move users between databases to balance out the load.
We are reattempting the database move for Scrooge to bring improved performance to all users. We anticipate the move taking all night into the early morning. As always, service will be restored by 9AM Eastern. We will update this NOC post with any updates along the way.
Update 4:00 AM Eastern: The database copy is still underway. At this point we are planning on letting the DB copy finish throughout the Friday day as we cannot afford to let the DB copy extend past this weekend.
Users on Scrooge will see Empty mailboxes throughout the day, however rest assured that your previous mailbox is still intact. Over the weekend we will converge all user mailboxes.
Clients using Outlook should have their cached profiles throughout the day that will enable them to work like normal. Clients using OWA are recommended to concurrently use LiveArchive to access mail prior to the DB move.
Update 5:10 AM Eastern: We’ve decided not to reset user mailboxes, and instead run of the current drive for one more day. We understand that users will continue to experience delays, however we feel it would be in the best interest of all users to complete the DB move over the weekend instead of doing a converge.
Update 5:28 AM Eastern: We’ve re-enabled the previous DBs. For the time being, public folders will be unavailable on the server to try and save as much reads/writes as possible on the server until the backup restarts tonight.
Update 11:42 PM Eastern: We are about to restart the database migration. We anticipate the migration taking at least 12 hours. Customers should use LiveArchive if they need access to their live mail.
Update 1:25 PM Eastern: The database backup was successfully completed and we began the process of restoring the backup to the new RAID array. We anticipate mailboxes being available by 7PM Eastern tonight and public folders remounted tomorrow afternoon.
Update 5:45 PM Eastern: We’ve had a few inquires from partners on the lack of updates with the work done yesterday and today. While we love keeping this NOC blog updated with fairly live updates, the majority of time in the past two days has been watching databases and users copy between drives. We do appreciate everyones patience as we attempt to finalize all changes by the end of the holiday weekend.
Update 6:57 PM Eastern: The second attempt at the restore is still in progress. The earlier restore slowed down about half way through, so we restarted the server and disconnected the older RAID array. The current restore is looking good and is estimated to complete in 45 minutes. After the restore is complete, we are going to try to mount the database and then move the databases to it’s final location.
Update 7:50 PM Eastern: The restore was completed and we’ve enabled OWA access to Scrooge. All pending mail is being delivered to the user mailboxes and Outlook access will be enabled after the queues are flushed out. We’re going to test a DB move to the array at 10 PM, and once an estimated time is established we will update this post.
Update 9:27 PM Eastern: Service has been restored and we’ve postponed the test move and we will update this post.
We successfully stood up the new mailbox server for DEWEY. We are in the process of doing nightly moves for clients who have reported issues in the past week.
We will be taking requests to move clients via support requests starting Tuesday 10/06/10
Clients should not have to reconfigure outlook as long as an autodiscover record was created.
During a mailbox move, the mailbox will be unavailable to Outlook or OWA until the move completes. It generally takes about an hour to process two gigs, but can be lower or higher depending on the item count and the mailbox structure.
Update 11:45 PM Eastern: We are beginning to move users to the new database server.
The DEWEY Exchange 2007 server stopped answering OWA requests around 2:30 PM Eastern. We’ve tried to restart the service which bought no change. Unfortunately we will have to reboot the server to bring Exchange services back online. The server is expected to be online by 3PM Eastern.
Update 3:30 PM: The server has been online for the past twenty minutes, but we noticed an extremely large amount of traffic going from DEWEY. Upon investigation, it looks like DEWEY was encountering issues delivering RPC packets to a client who has a misconfigured router as the packets are going back to DEWEY and creating an infinite loop. We’ve disabled traffic to that host and will continue to monitor this over the next 24 hours.
Tonight we will be completing the migration efforts from last week. At 9 PM Eastern we will be dismounting the mailbox databases and move them to the new RAID array on the new server. The move is expected to take about 3 hours to complete.
Update 9:30 PM Eastern: We are beginning the database move on Scrooge. Service is expected to be restored by 12:30 AM Eastern.
Update 1:40 AM Eastern: The database move is still in progress. Currently there is no ETA on a restore of service, however service will be restored before the start of business.
Last Friday we installed a new HUB server to the DEWEY network to alleviate pressure noticed by users. Unfortunately, the quickest and largest yielding result we could perform was to kill the transport services on DEWEY’s main hub, and offload it to a SMTP array of servers.
Because of the changes required, users can no longer submit messages via SMTP to 126.96.36.199 and must change their outgoing SMTP server to hub.dewey.exchangedefender.com
This change should only affect users connecting to DEWEY through non MAPI or non RPC/HTTPs connections, such as Mac clients, IMAP4 and POP3 users.
We received a couple reports of inhibited mail flow on the DEWEY server, with some clients reporting that they were disconnected from the server. Upon investigation, the transport service had a poisoned message in queue that locked up mail flow. After removing the message, clients reported they were able to connect.
The mail flow issues occurred between 5:30 PM Eastern and 6:00 PM Eastern
On Sunday August 22nd we will be migrating the HUEY and SCOOGE Exchange 2007 Servers to new Dell servers. During the upgrade clients will not be able to access their mailboxes via outlook, active sync or OWA. Customers needing constant access to their mailboxes should become familiar with using LiveArchive before the outage occurs.
Maintenance is scheduled to begin at 5PM Eastern and is likely to continue until the early AM on Monday. Service will be restored prior to 9AM on Monday August 23rd.
Update 12:35 AM Eastern Monday: We are beginning the upgrade process on HUEY tonight. Access will be unavailable to users on the HUEY server during the upgrades.
Update 7:54 AM Eastern Monday: The upgrade process took significantly longer than expected and was terminated to meet the 9AM deadline. The installation will continue today at 830pm Eastern
The DEWEY Exchange 2007 server was reported to have delays in processing requests from clients. We are restarting the information store to resolve the issue.
Update 1:00 PM: The information store has finished restarting and we’ve confirmed access to mailboxes on each DB.
Update 3:46 PM: Multiple parties have reported issues in delivery speed. We are rebooting the DEWEY server in order to clear out the cached memory.
Update 4:08 PM: We’ve been monitoring the server reboot via KVM. The server is in the process of installing windows updates before it completes the reboot.
Update 4:21 PM: The installation of the windows update has completed and the server is now stopping the Exchange services. The server should complete the reboot in the next 15 minutes.
Update 4:49 PM: The reboot has completed. We’ve verified that mail is being routed and confirmed that our test users could login.
Update 5:30 PM: After the reboot, the DEWEY transport started routing mail through one of it’s alias IPs instead of the primary IP, causing outgoing messages to be rejected as “Relaying denied, proper authentication required” The issue has been resolved and mail is routing out.
Our readiness kit contains valuable resources designed specifically to help businesses with GDPR requirements.DOWNLOAD OUR GDPR READINESS KIT
Introducing our newest security solution for IoT devices. Protect and secure your IoT environment with robust built in Security.READ MORE
See why you should consider our partner program. Become a partner at no cost, with no annual commitment, cancel anytime.MORE INFORMATION