Server Information – ExchangeDefender Blog

March 7, 2012

Backup90: Offline Upgrades

Filed under: Server Information, Support — vlad @ 4:38 pm

Throughout the past week users with accounts on backup90 have noticed sporadic periods where service was interrupted (events were posted on the NOC blog), restored, and would randomly go back offline after 8 hours. Since this was an event that wasn’t reproducible on demand the amount of time to diagnose the issue was slightly longer than normal. Once the issue was discovered to be a faulty disk we elected to upgrade the entire storage controller for backup90. Over the past quarter backup90 has seen a 340% growth in accounts and doesn’t appear to be slowing down – unfortunately the current controller risks slower performance to clients as growth continues in the next month.

To alleviate this issue we’ve shutdown service on backup90 to start a copy process to the new controller. After all data has been copied to the new controller we will start a random write/read seek test to ensure the stability and performance of the new controller. Unfortunately since backups are an extremely intensive IO operation, leaving Ahsay service running while trying to copy would yield extremely poor results and more than likely would require multiple attempts to copy the data – all in all, it would eventually lead to a longer downtime for clients.

We estimate that the copy process will begin tonight (3/7/12) around 5:30 PM and is expected to continue throughout Friday and Saturday. We will create a new NOC entry once the upgrade process on backup90 has begun.

Travis Sheldon
VP, Network Operations, ExchangeDefender
(877) 546-0316 x757
travis@ownwebnow.com


February 20, 2012

Della Upgrade SP2 & DAG Activation

Filed under: Server Information, Support — vlad @ 10:00 am

This weekend (02/24/12 – 02/25/12 19:00 Eastern [00:00 GMT]) we will be performing SP2 upgrades to the Europe Exchange 2010 Cluster: Della. Upgrade to Exchange 2010 SP2 will be performed on all passive nodes in Della. Upon successful upgrade clients will be moved from the active server to a passive node. . This upgrade is not expected to impact customer access however, there will be critical changes prior to the upgrade.

02/21/2012:

· New load balancer will be activated across the passive nodes.

· IP address for cas.della.exchangedefender.com will be modified to the new load balancer (Expected to be 213.229.89.253)

On Friday evening users on the active node will be moved to the passive nodes. The switch over from active to passive should be transparent to users.

Unfortunately, BES services may be interrupted as BES does not detect and handle upgrades seamlessly. If BES service is interrupted we will work on restoring service after SP2 has been successfully applied.

Travis Sheldon
VP, Network Operations, ExchangeDefender
(877) 546-0316 x757
travis@ownwebnow.com


February 13, 2012

Rockerduck Maintenance

Filed under: Server Information, Software, Support — vlad @ 9:45 am

On February 9th – February 11th 2012 the ExchangeDefender staff performed maintenance on the Rockerduck cluster in which two separate individual ‘outages’ affected client access on the late half of the evenings and early half of the following morning on February 9th -10th and February 10th-11th .

under-construction_l9wi2On the eve of Feb. 9th 2012 (~11:30 PM Eastern) we began upgrades on a failed/failing VPN device that is used to connect ROCKERDUCK:DAL and ROCKERDUCK:LA active directory and internal communication between sites. During the upgrade we began to notice random network related events in which communication seemed saturated and sluggish and randomly affected across the entire network. After various attempts (and configurations) to bring the new VPN router online we determined that the new VPN device was occasionally malfunctioning and flooding the network with ‘dead packets’. Unfortunately the massive flood of packets from the VPN device caused the Database Availability Group (DAG) on ROCKERDUCK to lose communication between nodes and eventually lose quorum. Once quorum was lost between nodes all databases between both sites were automatically dismounted as the DAG was considered unhealthy to Exchange. For the next few hours we worked to restore service to RD clients by replacing the failed VPN routers with our backup VPNs (new vendor) and restoring communication with Los Angeles. After communication was re-established clients were able to access their mailboxes. This outage affected all clients and lasted between the hours of midnight and roughly 3:15 AM.

On the eve of Feb. 10th (~10:30 PM Eastern) we began work to finalize the VPN communication by consolidating both VPN devices in California to the one backup vendor VPN device. The reason we elected to replace the ‘working’ VPN device in California was due to the fear of the abnormal workings of the similar VPN device in Dallas. As part of our protocol to ‘down’ a data center in Exchange hosting we paused SMTP services on Rockerduck. After replacing the VPN device in California we resumed all services (including SMTP) and mail resumed normal flow. Around 5:30 AM Eastern we started to receive alerts about back pressured queues in Rockerduck which would amount to delivery delays. Upon investigation it was discovered that the issue was mail delivery between the EDGE server network and the HUB server network on RD. After two hours of investigating the issue internally (and opening a case with Microsoft) we were able to determine that our course of action would be reapplying the SP2 update to the edge networks. Once SP2 was reapplied to all EDGE nodes mail delivery returned on ROCKERDUCK by 9:15 AM Eastern.

Finally there were about 5% of users who were left in a disconnected state through Outlook but had service through OWA (and some through active sync) between Saturday and Sunday as the database their mailboxes were housed was moved to Los Angeles for the content index database in Dallas to rebuild for RDDB9. Service was restored to these users by noon Eastern.

Travis Sheldon
VP, Network Operations, ExchangeDefender
(877) 546-0316 x757
travis@ownwebnow.com


February 7, 2012

Maintenance Weekend

Filed under: Server Information, Support — Carlos @ 12:00 pm

Maintenance 1This weekend a big part of our team will be doing some massive infrastructure upgrades to improve our network performance and stability in our Dallas DC. These changes should not have any effect on service if we find that the planned change may impact a service of some sort we will be sure to update the NOC blog accordingly. Remember that it’s available at http://www.exchangedefender.com/noc and it’s RSS enabled at http://www.exchangedefender.com/noc/feed/ for you to subscribe for yourself and your staff.

We will also be increasing our power capability by 20% this will help us with any required growth we may need due to business growth. We’ll also begin deploying a new 2010 cluster, but I can’t provide any additional details on that as Vlad will share that news once it’s live as far its purpose and target.

Maintenance 2The last change will impact one of our backup servers, backup90 as it will receive a storage upgrade to accommodate increasing demand for that service in the upcoming months. The service impact there should be minimal and will be blogged.

So we are basically doing a big push to create a big buffer on the availability and performance so that your teams can continue to focus on just moving the product, we will take care of the ugly part of the business for you!

Carlos Lascano
VP Support Services, ExchangeDefender
carlos@ownwebnow.com
(877) 546-0316 x737


January 31, 2012

Rockerduck SP2 Install

Filed under: Server Information — vlad @ 11:11 am

Over the weekend of 01/27/12 – 01/29/12 our team successfully deployed Exchange 2010 Service Pack 2 to our ROCKERDUCK cluster. Exchange 2010 Service Pack 2 brings a whole slew of fixes and new features but three were a primary reason for expediting approval of SP2.

Cross Site OWA Redirect: In the event of a data center failure prior to SP2 any OWA clients that attempted to login into an OWA site under maintenance would be redirected to the CAS servers in the secondary data center would have to login again. With SP2 users will silently be redirected and logged into the secondary site CAS servers

OWA Mobile: Quick and efficient mobile version of OWA that is used by clients who do not have rich smart phones

Address book policies: With address book policies we will now be able to support customers that have multiple ‘entities’ or organizations that need to be “separated” but still allow certain sharing functions like address books.

While we do currently provide GAL segregation, Address book policies will allow us to provide clients with a granular level control address lists for users in the organization.

Travis Sheldon
VP, Network Operations, ExchangeDefender
(877) 546-0316 x757
travis@ownwebnow.com


January 11, 2012

Are Your Exchange Servers Experiencing Delays?

Filed under: Server Information, Support — vlad @ 11:39 am

Imagine it’s 4:00 PM, you’re getting ready to close for the day and your cell phone starts ringing off the hook; it is your biggest clients’ CFO and he is very upset. The incoming caller complains that “email is slow” and “it is taking forever to do xyz”, but any attempt to get more information is greeted with either hostility or an abrupt “I don’t know”.

Does that sound familiar?

Almost every single request that has very little detail from the client in terms of what is “slow” magically gets fixed and then the client is convinced that the issue is Exchange…how do you fight back? How do you know for sure that a server / network out of your control is performing up to par. How do you know that your hosting vendor is keeping redundancy healthy and performing backups? For the most part you can’t… or can you?

One of the most common inquiries that we receive from partners is “Is XYZ server experiencing delays today?” after the partner gets alerted by their client that things seem to be “slow”. Our staff then tries to qualify the phrase “slow”…is it email? Is it Outlook response? How about OWA? After we have an idea of what the client is reporting as slow then we have to dig through logs and statistics files for performance data to provide back to the client…this process takes forever.

What if we could automate it? What if we could provide partners with an “at a glance” view of the server’s health and their client’s statistics? What if we could provide you with a list of available backup dates so you can choose what date you’d like to restore from? What if we could provide you the number of messages in queue for Exchange or overall latency for clients and response times?

What if we could provide you with up to the minute stats of the CAS server your user is on, the CPU percentage used by the client, the amount of latency experienced by the client.

We can, and we will…

As far as I know this level of information and statistics has never been provided by a service provider before…

Below is the draft version of the User Monitor that will be adding to our Staff control panel and will more than likely find it’s way into service manager.

 

xyz

 

Travis Sheldon
VP, Network Operations, ExchangeDefender
(877) 546-0316 x757
travis@ownwebnow.com