March 31, 2008
Over the past six months as the volume of SPAM has increased nearly exponentially, we are seeing more and more larger mail servers fail and start rejecting mail outright. Here is an update from 1&1:
Dear Mr. Kim Lannia,
thank you for your request.
Due to a problem in our systems on Friday we rejected your emails between 4:30 pm and 6:00 pm (central european time).
We apologize for these problems, sending email is now possible again.
1&1 Internet AG
Amtsgericht Montabaur HRB 6484
Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
We have been working on a new infrastructure upgrade to address some of these misconfigurations, some popular others not so much. We have directly investigated 100’s of “why didn’t my mail get there?” support tickets and in all but three (and orange.co.uk) the mail got to the recipients mail server without problem. As a result, we throttled down our notifications so that the users receive an alert within 3 hours and within 1 day of deliveries being delayed, deferred, rejected or dropped so that the users have a way to contact the person directly if the communication is urgent but the mail systems are not working as they should.
Later this week we will put into production split mail relays over multiple networks that will implement the same intelligent routing technology we use in our inbound servers. Paperwork to get the agreements with larger ISPs (AOL, Yahoo) take a little while but we are confident they will get done this week. Some system changes will be required if you use SPF records and they will be noted here as we get closer to putting those into production.
March 28, 2008
In about 10 minutes (11 CST, -6 GMT) we will be shutting down the Offsite Backup infrastructure for the memory and networking upgrades to accommodate the new global replication platforms and proxies. The maintenance interval is expected to last less than an hour and we do not expect anything out of the ordinary.
March 27, 2008
Earlier this morning Yahoo and Yahoo UK & Ireland started experiencing problems with their RBL code. As a result, large number of messages have been rejected from our customer base to theirs. We are still working with Yahoo to resolve the issue, and a case is still open.
Users that had their email bounced would have seen returns in their inboxes. Please ask them to resend the messages. We will update the ticket when the issue is completely resolved.
Mail is flowing without issue at the moment, but the case with Yahoo is still open. We will update this site when the issue has been resolved completely.
March 26, 2008
We are currently performing some investigative maintenance on backup73.ownwebnow.com to assure data integrity and troubleshoot some failed logins by our clients. Please stand by, we will update with the resolution time momentarily.
Update: Service restored.
March 25, 2008
Our Los Angeles data center carrier has suffered an HVAC failure, and the connectivity to the network has been severed for the time being. The facilities team is in touch with the building owner, service restore is under way. All services provided by this data center are unfortunately affected and down at the moment.
Services affected: some ExchangeDefender, some SharePoint Hosting, some Virtual Servers.
We will update this ticket when all services have been restored. This ticket is ranked urgent. Our priority will be to restore services that are not redundant first: virtual servers, followed by SharePoint hosting.
Update (@ 3:00 AM PST -8 GMT, 6 AM EST -5 GMT): We expect SharePoint and Virtual Server services to be restored around 6 AM PST (-8 GMT). ExchangeDefender services are not impacted (please be patient with SPAM releases however). We will update this ticket at 6 AM or when services start coming back online.
Update (@ 3:44 AM PST -8 GMT, 6:44 AM EST -5 GMT): All services have been restored.
Total LA DC1 outage: 53 minutes.
March 21, 2008
At roughly 5 AM EST (GMT -5) our primary backup proxy server in Dallas, TX went down for basic hardware maintenance. Upon restart, the primary RAID array controller lost its boot configuration and the system hung after all the drives were initialized. As you may imagine our backup infrastructure is huge and a restart can take up to 30-40 minutes to spin up all the drives over all controllers. It takes a while to determine the issue when all hardware reports correctly, the case was escalated and addressed right away.
We are sorry about the inconvenience.
Note: Our storage infrastructure does not follow the same maintenance interval as the remainder of our network. While almost all of our services have the least amount of usage during early Saturday morning hours (EST), offsite backups tend to have the strongest usage during those hours. Large backup sets are usually scheduled to start Friday afternoon after most 9-5 workers leave and it generally runs through the weekend. Likewise, we do global network snapshots over the weekend right before major maintenance tasks on the network. For this reason, our maintenance window for offsite backups is pushed up one day.
March 19, 2008
Generally this blog is used to document any and all activity related to something that is broken. Since things have been working quite well lately and we’re mostly on new projects and performance related tasks I wanted to update you on the status of the entire network.
ExchangeDefender – Solid. No issues, no latencies, no delays and overall flawless performance. Over the past week we have not had any delay reports that turned out to be on this end, we have had no DDoS issues to report, we have our lowest latency ever (between 5-10 seconds, network-to-network) and fewest false positives in a while.
Hosting – Solid. No issues, working on Exchange 2007 SP1 upgrades and 2003-2007 migrations as well as WSS2 to WSS3 migrations. We are working on integrating our new global points of presence in Australia and Europe and they are coming together well.
Offsite backups – Solid. No major issues, one patch and hardware enhancement scheduled for this weekend to help with large file backups.
VoIP – We are adding another IAX2 provider to our network, should become available shortly.
Training – Mostly scenario and interface tests for ExchangeDefender 4.0. Bringing on new materials, including Shockey Monkey 2.0. Web site undergoing a major update.
That is all. It’s so boring it already feels like the summer.
March 13, 2008
ExchangeDefender 4.0 engine has been online for a few days now and as of Monday/Tuesday night (USA) time we have addressed all the outstanding issues regarding latency, non-delivery, rejections and the garden variety of performance problems. Today we had a relatively flawless day, with the highest SPAM detection rate ever, lowest false positive day ever and just the best thing we could have hoped for given the very smooth upgrade.
With that in mind, we want to beg you to open a support ticket if you see or notice anything even minor with the performance and reliability with ExchangeDefender over the last 24 hours at most (don’t bother going back further than that as we cannot address the problems that have very likely already been fixed).
So here is to a great ExchangeDefender 4.0 engine. If you see anything unusual please open a support request and we will investigate it with the highest priority free of charge. We absolutely appreciate your help and the time you put in to help us improve the product.
March 5, 2008
From approximately 10 AM – 2 PM EST, our shared mail hosting platform (mail1.ownwebnow.com) suffered a large scale distributed denial of service attack (DDoS). Everything is under full control now and we have been able to filter out the offending systems. Unfortunately, there is little that can be done in terms of scale and protection against a DDoS as we already have both Cisco and Tipping Point in place.
DDoS attacks tend to be flared up by the regular user activity, as the system slows down end users keep clicking Send/Receive and effectively flood the connection until it times out. Systems are back to normal and messages are starting to arrive in regular sequence.
We are currently in the process of bringing the next generation of reporting infrastructure to ExchangeDefender. The new grid is currently being activated, will be completed within two hours. This work is meant to create a more reliable way to deliver email reports.
We will update as soon as everything is back to 100%.
Update: (8:00 GMT, March 5th, 2008): Maintenance window completed after 53 minutes, 8 seconds. All systems are back to normal and are catching up. We expect the transaction latency to get back to realtime within the next 20 minutes. Update coming shortly on resuming of daily and intraday reports.
Update: (9:30 GMT, March 5th, 2008): Currently working on restoring intraday, daily and ondemand reporting features. We expect the work to be completed by noon, EST.
Update: (15:00 GMT, 10 AM EST, March 5th, 2008): Maintenance completed. Reports (daily and intraday) will resume within a few minutes (at 10 AM central) and will be faster than before. This also opens up a whole new range of reporting options within ExchangeDefender which will launch with ExchangeDefender 4.x. We will also make another offering available immediately to address those of you who have a critical need for ExchangeDefender reports.
March 2, 2008
The following is an update on the extended network maintenance on the ExchangeDefender networks spanning both ExchangeDefender Exchange Hosting network and ExchangeDefender SMTP Security networks. The work continues, full details on the changes, upgrades and enhancements will be published Monday:
Completed network maintenance tasks:
- Full implementation of split queue and RBL prioritization and resource weights completed.
- Throttle control of delivery queues completed.
- Sender watermarking completed.
- New AS engine and SPAM policy handling completed (99.1% effectiveness, 0 false positives against the new rules, lowest complaint rate at SPAM levels in history of the product, woohoo!)
- Slave server database management tools and self-healing systems online.
- Additional ExchangeDefender nodes online in Dallas and Los Angeles.
- Load balancer enhancements including traffic shaping of spamming networks through the ExchangeDefender staging systems for R&D completed.
- Enahnced logging for NDR, Reject, Malformed Headers, Data Format Errors and more.
- Network latency problems fixed.
Remaining tasks for Sunday:
- Microsoft Exchange 2007 SP1 rollout
- ExchangeDefender split queue delivery handlers with skipahead enabled
- Provisioning and testing of EU LiveArchive
on Wednesday (1 AM – 3 AM CMT, 6:00 – 9:00 GMT) we will be provisioning the new reporting server infrastructure and full integration into the ExchangeDefender core network/systems. This will allow us (and you) to directly manage the behavior of each filter on a granular level.
More on this on Monday.
Powered by WordPress