At approximately 4AM EST we have noticed a failure in updates from one of our AV vendors. That failure produced higher than expected virus matches which ended up queuing a larger than normal amount of messages. We have resolved the issue with the update and are currently re-processing all the mail that was quarantined over the past few hours.
Please stand by, we will deliver all mail.
Update: 7:29 AM EST: Nearly all the mail that was affected by the faulty AV update has been processed and has been dispatched to delivery queues. As of the previous update, all new mail has been delivered in realtime. It is important to note that we are only processing the backlog for the messages that did get trapped by the faulty AV update.
Update: 9:15 AM EST: 99% of the messages have been flushed out. By the time you read this posting all the mail would have been delivered. No mail has been dropped during the period, if you experience further issues with delays please follow our deployment guide and support documentation, we find most delays are related to the on-premise issues relating improper firewall configuration, connection rate limiting (by far) and other SPAM/malware scanning that does not properly whitelist ExchangeDefender systems.
We are continuing the “Case of Mondays” posts covering OWN infrastructure events we are working on so you can be more connected with Own Web Now. On behalf of the entire team thank you for the nice emails you’ve sent to us regarding this. As a result of such tremendous feedback will soon be opening a forum to continue the conversation throughout the week.
#1 DIY RBL: We are implementing new systems to help deal with the less and less responsive RBL department activity at major service providers. As a result of the SPAM problem many larger carriers have implemented their own SPAM RBL systems which have been proven less than reliable in managing SPAM. Due to the volume of messages sent by ExchangeDefender we tend to fall towards the top of the statistical profile for some of these providers and we notice issues even when there are none. At this point we are investigating complaints about Comcast. Hotmail and AT&T issuing random errors, though we are unable to successfully replicate the issue and their offices are either nonexistant (unattended web form) or don’t work during the weekends. We are working on this issue and hope the solution we are put in place is working.
#2 Too Much SPAM: ExchangeDefender SPAM levels are still higher than usual. Over the past two weeks we have shifted to our own honeypot system in addition to the external subscriptions we receive from others. Our Outlook 2007 addin allows you to submit SPAM, which virtually guarantees that you will never see that SPAM again. It took a lot of experimentation and management to get this into ExchangeDefender and we are still not at 100% on its deployment so you might see elevated SPAM levels though they should be dropping off significantly already. If you are tired of SPAM help us kill it, www.exchangedefender.com and click on New! Download ExchangeDefender Outlook 2007 Addin.
#3 Offsite Backup reports from Europe: there are still some inconsistencies successful email reports in our European offsite backup network as we roll out the new control panels and a new OSB product. Failed, missed and setting change reports from our Europe grid are coming through fine, but successful backup reports are still not being sent to 100% of the customer base. Solution to this issue is expected today and we are working on a workaround.
Issue with successful job Offsite Backup reports in USA has been resolved.
Issues related to Exchange 2007 back pressure has been resolved.
Issues with BT have been resolved.
We will adding an adaptive feature for outbound mail routing to address future #1 issues. This will be announced somewhere towards the end of the week and will require an adjustment to your SPF records if you use that technology. Because large ISPs do not use the same RBLs that rest of us use we have no ability to monitor them and proactively address issues. Going forward we will do manual monitoring and route mail through the IPs that do not appear to be affected by a single provider.
Tonight we will be rolling out master control panels for OWN services. This will give you full control over accounts and settings for our Microsoft hosting and Offsite backup services with centralized management. As with any software rollout there could be issues, we will be staffing extra support to address them all.
New Exchange grids coming online this week. We will be adding another Exchange 2007 cluster to the service which will require some documentation adjustments and modifications.
A number of users have reported inability to receive email. Sending email works fine but receiving does not. In multiple tests against multiple sites, both protected by ExchangeDefender and not, we have seen nearly 60% connection drops.
At this point we believe this is related to the latest Microsoft patches – and a simple reboot appears to clear out the issue of no port 25 connectivity. Don’t worry, ExchangeDefender is holding your email. It does not appear to be fixed by removing the patches and reapplying them.
As our friend Susan Bradley just told me via IM: “Full moon passing”
Vlad Mazek, MCSE
CEO, Own Web Now Corp
As you may have noticed over the past few weeks, the SPAM levels have increased slightly. Unfortunately, even a slight increase in the SPAM levels as a percentage can result in getting a piece or two an hour as opposed to a piece or two a day. Yesterday we finally isolated the issue that was causing this thanks to a few of our partners and the new ExchangeDefender Outlook 2007 addin. We are still working on automating the distribution and monitoring of the new processes that will keep this from coming up again.
ExchangeDefender has multiple grids around the world. All grids use a central RBL distribution database that is centrally managed and monitored. Every grid has it’s own DNS caching servers that hold both the RBL data as well as our clients IP address information for delivery, routing and SPAM definitions. Since the latest update to our core distribution the DNS server performance has been flaky and would simply stop returning results. Because our RBL code is set to look for matches in the RBL zone the servers lack of response, or lack of correct response, means that the messages that were certainly SPAM were allowed to go through the less-restrictive SPAM scanning and unfortunately that contributes to 1-2% difference in the SPAM load and in some cases latency for nodes that are about to go into the shutdown/maintenance mode and are flushing out their queues. Because ExchangeDefender delivery queues run off the same DNS infrastructure (technical limitation) this compounds the problem and issues as the resolutions do not come from the primary (on-node) or secondary (on-grid) but a tertiary (central OWN NOC) DNS server.
What we have done so far is implementing a system that does local resolver check and restarts the DNS service if it is not returning proper data.
What we are currently working on is a monitoring system to centrally report the issues with the resolver latency (one of the things we currently do not measure) as the lookups have to skip to the secondary or tertiary systems.
We expect to have all the issues handled by the end of the weekend. From statistical breakdowns we know that the issue has not been widespread (only certain users would even have noticed the difference) and only about a dozen people have complained so far. Unfortunately for us, the people likely to notice are the people that get the most mail and the ones that likely love our product the most. We’ll get this one taken care of for you folks, thanks for your patience.
As a continued effort to keep you completely in the loop of all the events at Own Web Now we will start a new post series called “Case of Mondays” where we discus ongoing issues with network and any changes that are being made throughout the weekend Maintenance schedules to keep the systems running along as well as any issues that we are working on that might come up.
We are still working on British Telecom which is rejecting messages with Relaying Denied. In the meantime please create a separate SMTP connector and attempt to deliver messages to them directly. We are still working with them on daily basis to resolve the issue but their response so far has not led to a solution.
ExchangeDefender Outlook 2007 Addin is being improved towards 1.1 and fixes are being rolled up as well. Currently most issues reported have to do with 64bit Vista. Other issues are noted in the documentation (for example, do not install out of a zip file, do not install from a UNC/network share).
Shockey Monkey bugfix update is scheduled for this Saturday, 10/18/2008, with changes to the reports and email notification upgrades.
Offsite Backups reports are now functioning 100% across the client base. The issue of some reports (backup pass/fail) have been resolved by AhSay with upgrade to 5.5.1. You do not need to upgrade your agents to take advantage of this fix, the issue was on the servers not agents.
ExchangeDefender email reports issue has been fixed. There was a brief sync issue between the report servers and admin servers which resulted in delayed updates to the configuration changes. This has been addressed but we are also working on a completely different algorithm that can compensate for SQL replication issues.
Exchange 2007 backpressure problems have been solved by removing the feature on all servers.
Tuesday/Wednesday nightmares continue this week with our Microsoft network. We are expecting 11 updates this week that we will be rolling out within 24 hours of successful testing. Although none of the updates affect the server-side of the network (all are related to Internet Explorer and MBSA) we have a policy of keeping our servers up-to-date and will be applying the patches silently.
We have still not completed our evaluation of Exchange 2007 SP1 Update Rollup #4. The problems that we have reported still exist in the new code and at this point we do not see a reason to upgrade. Here are the outstanding issues we are tracking with Exchange 2007:
- iPhone sync crashes store
- Mailbox database dismounts without event log notification, service remains in running state.
- Store service hangs.
- Backpressure reports invalid data and does not respect treshhold definitions.
We have received several reports of issues with BT. You may receive this error when sending messages to btinternet.com recipients.
The e-mail system was unable to deliver the message, but did not report a specific reason. Check the address and try again. If it still fails, contact your system administrator.
< outbound2.exchangedefender.com #5.0.0 SMTP; 554 <email@example.com>: Relay access denied>”
We have notified BT by e-mail and phone regarding the issue, the problem is on their end. Since this is a configuration issue on BT network we have no ETA, no resolution time or idea of what may be going wrong.
For more information about proxy errors, click here.
We are currently tracking issues that have been reported by multiple users:
- Email reports for ExchangeDefender SPAM quarantines are not being delivered to the users that have been configured to receive them. So far we have narrowed it down to the 00:00 EST time reporting interval for daily reports. We will know more about this around midnight.
- Offsite Backup reports are not reaching some clients. We are working with AhSay to isolate the issue and will likely be applying a hotfix later in the day. This is not a widespread issue either but we are taking it seriously since it has been reported multiple times.
We will update as we get more information.
Over the past week or so we have been tracking sporadic complaints about mail delivery delays to Exchange 2007 hosting. It appears that in rare cases the Exchange 2007 network temporarily rejected the message with a 4.3.1: Insufficient system resources message even though plenty of resources were available on the system to process the message and plenty of storage was available on the volume.
We have seen this issue creep up from time to time on multiple Exchange 2007 issues and have provided information to Microsoft regarding a possible bug but have so far been unsuccessful in getting a response because the issue is so sporadic that it cannot be pinned down to a single factor and only a small number of messages encounters an issue. What is even more frustrating is that even during heavy maintenance and optimization tasks the issue does not creep up where it is expected with back pressure controls. This Sunday we ran a database optimization maintenance process and out of roughly 1,000 messages processed less than 20 were deferred with the Insufficient system resources error even though the CPU utilization was above 80%.
Due to the level of complaints and our inability to isolate this issue with Microsoft we have disabled the back pressure feature of Microsoft Exchange 2007 on all our systems until further notice.
Further details about Exchange 2007 and back pressure is available at Microsoft TechNet.
Note: This issue is isolated to the new feature in Exchange 2007 called back pressure. Exchange 2003, ExchangeDefender and ExchangeDefender LiveArchive are not affected by this issue.