As you may have noticed over the past few weeks, the SPAM levels have increased slightly. Unfortunately, even a slight increase in the SPAM levels as a percentage can result in getting a piece or two an hour as opposed to a piece or two a day. Yesterday we finally isolated the issue that was causing this thanks to a few of our partners and the new ExchangeDefender Outlook 2007 addin. We are still working on automating the distribution and monitoring of the new processes that will keep this from coming up again.
ExchangeDefender has multiple grids around the world. All grids use a central RBL distribution database that is centrally managed and monitored. Every grid has it’s own DNS caching servers that hold both the RBL data as well as our clients IP address information for delivery, routing and SPAM definitions. Since the latest update to our core distribution the DNS server performance has been flaky and would simply stop returning results. Because our RBL code is set to look for matches in the RBL zone the servers lack of response, or lack of correct response, means that the messages that were certainly SPAM were allowed to go through the less-restrictive SPAM scanning and unfortunately that contributes to 1-2% difference in the SPAM load and in some cases latency for nodes that are about to go into the shutdown/maintenance mode and are flushing out their queues. Because ExchangeDefender delivery queues run off the same DNS infrastructure (technical limitation) this compounds the problem and issues as the resolutions do not come from the primary (on-node) or secondary (on-grid) but a tertiary (central OWN NOC) DNS server.
What we have done so far is implementing a system that does local resolver check and restarts the DNS service if it is not returning proper data.
What we are currently working on is a monitoring system to centrally report the issues with the resolver latency (one of the things we currently do not measure) as the lookups have to skip to the secondary or tertiary systems.
We expect to have all the issues handled by the end of the weekend. From statistical breakdowns we know that the issue has not been widespread (only certain users would even have noticed the difference) and only about a dozen people have complained so far. Unfortunately for us, the people likely to notice are the people that get the most mail and the ones that likely love our product the most. We’ll get this one taken care of for you folks, thanks for your patience.