Uncategorized – ExchangeDefender Network Operations

We’re currently working on LiveArchive as some Mail databases are not accepting delivery of new mail. Up to this point access to the Mailbox had not been affected. We’re in the process of performing emergency maintenance. Please stand by for an update as to when the work is complete.

Update: 70% of the Databases are remounted and beginning to receive mail. We’ll update shortly when the work has completed.

Update: 100% of the Databases have remounted with the exception of DB8 which was already under maintenance.

We do apologize for any inconvenience this issue with Louie may have caused your client. Our engineers discovered that the backup jobs on the server went to truncate the log files and it forced a dismount of the database. Our monitoring software attempted to remount the database, but the backup job would continue to dismount it. As of this moment the engineers have confirmed that the backup has been completed and all services have been fully restored and are currently active.

We’re currently researching a network outage that seems to have stricken our Level3 Network into our data center.

Update 11:50: This is has been fully mitigated. Everything is back up and running! We apologize for any inconvenience to our Level3 clients, this shouldn’t happen again. We’ll be posting additional details as we complete our investigation.

We became aware of a backup malfunction in our system that created a backup job that “nested” itself and filled up the entire volume. When ExchangeDefender systems storage reaches critical levels additional activity is queued. As this activity cascades throughout our redundant systems some users noticed sluggishness in response. During this interval no mail was lost but performance was not where it needed to be. We have addressed the problem and the root issue that caused it but are continuing to monitor even more data points as this should not have happened in the first place.

We’re currently researching access issues on backup73.ownwebnow.com . While we work on the server access will be intermittent.

8:04 AM EST: We’ve identified some slowness on DEWEY. Please stand by while we research any issues we identify. We’ll have more details once the issue is resolved.

8:39 AM EST: We’ll be restarting the Information Store shortly. This will cause a 10-15 minute outage. We do apologize for the inconvenience, please remember that your clients can and should be using LiveArchive during this time period.

8:52 AM EST: We were able to discipate the delays and queue delivery issues without stopping the Information Store service. We’ll hold off on maintenance until after hours tonight unless a pressing issue arises. Thank you.

9:15 AM EST: We’ve received reports of accounts not being accessible. We’re looking at emergency maintenance. Please stand by, and remember LiveArchive is there for your business continuity needs.

9:50 AM EST: Unfortunately, we’re now rebooting the server to alleviate the problems. We should be back online shortly.

10:25 AM EST: DEWEY is back online, we’re currently spooling all the queued mail.

We’re currently investigation an issue where a limited number of mailboxes are not accessible on LOUIE . Please be sure to have your users use LiveArchive while we investigate and resolve this issue. We will be updating this post shortly with further details.

Update 9:48 AM Eastern: The issues with the mailboxes that are unavailable seems to be related to user mailbox moves that began last night. To clear up the issues, we will have to restart the individual mailbox servers in LOUIE in a staggered manner. Clients on each server will experience an outage of 15 minutes.

We’ve exhausted all possible ways to resolve the communication issue, leaving the restart as the last viable option.

The LOUIE mailbox servers will begin the reboot process at 10:00 AM Eastern and will last 15 minutes per mailbox server.

Update 10:14 AM Eastern: In the interest of restoring service as soon as possible, we will be doing a dial tone database restore. In short, users will receive a temporary blank mailbox as we work to restore full service to users on LOUIEMBOX1.

Update 10:18 AM Eastern: The outage appears to only affect about 15% of the users on the LOUIE network. We are running a test restore of a user mailbox and if all goes well, we will be able to bring the mailboxes back to the state they were in right before the move requests completed.

Update 11:00 AM Eastern: The test restore went through successfully. We are beginning to seed in the mail for the affected users. We anticipate that this process may take the rest of the day to complete, however, users will being to see mail populate in their mailbox.

Update 12:51 PM Eastern: The restore process is about 10% done and appears to be processing the larger mailboxes first. We will continue to provide updated statistics throughout the restore process.

3/5/11 Update 12:08 PM Eastern: The process is well over 50% completed and we anticipate the restores to be completed by midnight.

3/6/11 Update 9:44 AM Eastern: The final batch of 50 users is taking an unusual amount of time to complete. While most mailboxes have successfully restored, we have a few restores that failed which we are working towards restoring full content.

3/6/11 Update 6:30 PM Eastern: We are retrying the mailbox restores on users that failed to restore content during the weekend restore. In the background we are also replaying the log files from the database on Friday morning as a possible restore point for any mailboxes that continue to fail on restore.

We’re experiencing an ISP level issue with our ExchangeDefender nodes on our Backup Network. The only production impact should be a delay in released messages within those nodes. The messages impacted should be minimal as it not our core network. We’ll update this as we receive more information on the routing issue.

Update 8:00 EST – The ISP connectivity issue has been resolved and we’re fully operational. All queued requests for message releases should be processing right now.

Around 7 AM Eastern this morning, we received reports about user inaccessibility to the LOUIE network. It appears that an automation process caused the information store to dead-lock on a memory leak. We’ve disabled the step that caused the dead lock and will continue to monitor this over the next 48 hours.

01/12/2011 10:00 AM Eastern – We experienced a DDOS attack this morning on our outbound network dedicated to our Exchange clients. There may have been a small amount of emails impacted, however this is only limited to a slight delay while we got the attack under control. The issue is resolved, the queues have been completely caught up and are currently delivering on time.

As part of our on going background maintenance with LOUIE, we’ve moved all user mailboxes from our active mbox server to our passive mbox server in preparation of the reinstall of mbox1.

Tonight (1/6/11) at 9PM we will be uninstalling the mbox role and HUB transport role from mbox1. Upon competition of the role uninstall, we will change the louiembox1.louie.exchangedefender.com DNS record to a CNAME record pointing to louiembox2.louie.exchangedefender.com. Finally, we will disconnect mbox1 for the rest of the week to ensure all dependent services have been moved before the reinstallation

This change is expected to be completely transparent to the users, and the only users that will be potentially affected are POP3/IMAP based users who rely on SMTP services to transmit mail. Outlook, OWA, ActiveSync and BES users will not be affected by the change.

If any POP3/IMAP clients are unable to send mail after the change, either flush their DNS so louiembox1 will resolve to louiembox2, or change the client configuration to use louiembox2.louie.exchangedefender.com

The scheduled maintenance ran about 2 hours behind to allow the backup job to fully complete. We are beginning the maintenance schedule at 5 AM Eastern and an updated time schedule will be posted shortly.

5:00 AM Eastern: The blackberry BES server has been shutdown. We are proceeding with the database move on DB1

5:25 AM Eastern: Maintenance is currently stopped as a preliminary safety check failed on MBOX2. DB1 has been remounted on MBOX1.

5:42 AM Eastern: Maintenance is resuming as the server passed all safety checks. DB1 is dismounted and being copied to MBOX2

5:46 AM Eastern: The copy for DB1 estimated 5 hours. We’re going to change our approach and do mailbox moves instead of a DB move as this will allow for an online copy (Users should not get disconnected).

6:20 AM Eastern: We are in the process of doing an online mailbox move for users on DB1 to MBOX2 tmpDB1. Mailbox moves take significantly longer than an entire database move, however users will see a seamless transition to the tmpDB and should not see a disconnect. We estimate the user moves taking at least 8 hours for DB1 and longer for DB2.

We are going to leave blackberry powered off as per the original plan.

The reinstall of MBOX1 will most likely occur on Sunday as the online mailbox copy drastically changes the time schedule. We feel that the online mailbox copies will provide users with a better experience during the maintenance period.

The next update is scheduled for 10am, but may be earlier or later depending on the progress of the copy.

Update: 10:46 AM Eastern: Due to the time estimated for the mailbox move to complete, we are going to cancel the maintenance this weekend.

The preliminary plan is to do online database moves starting 12/24, which should allow enough time for all users to be moved to the temporary database by the weekend. We are moving the couple users back who successfully moved and then we will be re-enabling BES service.

Update 4:30 AM Eastern: BES service has been reenabled and all users have been moved back to the primary database.

Our ExchangeDefender network on network sustained an extended DDoS attack since approximately 7 AM EST. While the attack was addressed nearly immediately, due to the failover routing in ExchangeDefender, this network was removed from the scanning grid and deactivated. At this point, we are reactivating it and all traffic patterns should be back to normal.

Users should not have have been able to notice any interruption as the network is load balanced across other data centers. There may have been some limited delays in email delivery for any messages that were in-scanning when the DDoS started.

To fully resolve the OWA Search issues on LOUIEMBOX1, we will have to reinstall Exchange on MBOX1. To minimize the impact on clients the following schedule has been established:

Friday December 3rd:

11:00 PM Eastern: Blackberry Enterprise Server will be shutdown and will remain off until the repairs are complete.

11:15 PM Eastern: Mailbox Databases will be moved from MBOX1 to MBOX2. Due to the state of MBOX1, we cannot utilize the DAG for the cutover, so during the DB moves users will be unable to access their mailboxes.


Saturday December 4th:

*Estimated* 2:00 AM Eastern: DB1 should be moved over and access to the mailboxes for users on DB1 will be restored. We will then start the move on DB2

*Estimated* 4:00 AM Eastern: DB2 should be moved and access to all user mailboxes should be restored.

4:30 AM Eastern: The HUB role will be transferred to MBOX2 as the primary HUB server and the IP address for LOUIEMBOX1 HUB service will be moved to MBOX2.

5:00 AM Eastern: The reinstallation of OS on MBOX1 will begin.

7:00 AM Eastern: The installation of Exchange 2010 will begin on MBOX1

The roadmap after the installation is complete will vary based on the time of completion. We will provide updates on the post that will be opened when maintenance begins.

A recent software update on one of the outbound nodes ended up being corrupted.  The corrupted update affected roughly 400 messages and the issue on that node was addressed. It was a performance issue related to a technology we had discontinued and we caught it almost as soon as it started but due to the volume of mail processed by the server at a given moment a small percentage of mail was affected.

We are in the progress of adding an additional CAS server to the LOUIE network. During the final parts of our testing, users connected to Outlook will experience brief disconnections as we test the fail over of the cluster.

We are currently investigating reports from our monitoring software about packet loss. Only one of our remote monitoring sites is reporting packet loss, but we are working with the datacenter to track it down.


Update 9:20 AM Eastern: We experienced a partial network degradation in our Dallas DC from 8:30 – 9:00. The issue started with several servers being attacked (DDoS) which quickly escalated as we raced to block as many of the inbound floods as possible. The attack was isolated and removed quickly but as is usually the case with even small hiccups, the amount of pending traffic (mail, file servers, scheduled jobs) added  more instability. The issue was recognized at 8:30, addressed by 8:40 and by 9:00 AM the performance was back to normal.

All inbound/outbound phone facilities are down while we resolve the issue

Update 9:45 AM Eastern: All inbound phone calls are now being forwarded directly to support as we continue our investigation.

We’ve received an alert from our datacenter about a current unexpected outage. The initial report indicates that a major carrier went down in the data center, but the staff is already switching to the backup provider. We will have a detailed report in the morning, but service is expected to be restored within the next 30 minutes.

This post will be updated as information comes along.

Update 1:01 AM Eastern: Service has been restored to the Australia server

Below is the current maintenance schedule between September 13th – 19th.

  • Monday 9/13/10 – 9/14/10 – Final testing on DELLA, the new EUROPE Exchange 2010 server
  • Wednesday 9/15/10 11pm Eastern – Installation of Exchange 2010 SP1 on LOUIE. During the installation, Exchange services may be required to restart, however, clients connecting to cas.louie.exchangedefender.com should automatically switch between CAS servers in the event of a interruption during the install.
  • Thursday 9/16/10 11pm Eastern – Installation of Exchange 2010 SP1 on LIVEARCHIVE.
  • Friday 9/17/10 11pm Eastern – IP address change of Exchange 2003 server DONALD. Clients should not have to make any changes to automatically connect to the new IP address.
  • Sunday 9/19/10 5pm Eastern – Offline database defrag scheduled for SCROOGE. During the defrag, clients will be unable to connect to their mailboxes. The defrag is estimated to take up to 5 hours to complete, and is one of the final steps in the upgrades to Scrooge.

Update 9/15/10 4:00 PM: We have decided to push the offline defrag until October 9th-10th. We understand that September is one of the busiest months for our partners, so we’ve pushed back the maintenance until next month.

Update 9/16/10 3:00PM Eastern: We have decided to push the Exchange 2010 SP1 upgrade for LiveArchive to a later date.


Our readiness kit contains valuable resources designed specifically to help businesses with GDPR requirements.


IoT Security Solution

Introducing our newest security solution for IoT devices. Protect and secure your IoT environment with robust built in Security.


Are you an MSP?

See why you should consider our partner program. Become a partner at no cost, with no annual commitment, cancel anytime.