ExchangeDefender eliminates Image SPAM

ExchangeDefender eliminates Image SPAM

Spam wars tend to evolve over time. Initially, SPAM looked just like the offers you still find in your fax machine – direct, informative, actionable. You’re almost pushed to buy something with countless incentives, promotions and reinforcement of just what a great deal you’re getting. We eliminated a bulk of this years ago through use of Bayesian analysis, or text patterns, found in the SPAM messages. Notice how when you get a spam you can tell within seconds that it is garbage just by glancing at its formatting?

The second evolution of SPAM was when it became convenient to make a purchase. No longer were you sold and promoted to but just asked to click on a link and proceed to buy that latest watch or drug. We eliminated those easily through the use of URIBL, specific blacklists of URL (web site addresses) and additional HTML analysis.

The latest evolution of SPAM has been the most difficult to isolate by far. You’ve seen dozens of these in your inbox nearly every day: Image SPAM. The email is very easy to characterize, it has a big gif or jpeg image followed by paragraphs of garbage text. At first, there was just an image – which contained text that used to be a part of the SPAM you’ve been receiving for years. Except because it was stored in an image it bypassed all SPAM filters. Fine, we easily discarded messages that contained no text. Then spammers started adding text. No problem, we eliminated them by calculating the ratio of the screen being taken up by text vs. image. Think about how often you get an email message that starts with an image that takes up most of your screen? Easy solution. Following the natural evolution of the spam war, image spam became harder and harder to detect.

We have finally come up with a set of solutions that effectively eliminates nearly all known strains of Image SPAM:

  • All inline JPG and GIF images are OCR’ed. By using optical character recognition we can convert the image into plain text and determine whether it is SPAM or not.
  • Parsing JPG and GIF image info. Each picture has series of image attributes, such as the Camera maker, model, F-Stop, Max aperture and so on. Dynamically generated image spam does not.
  • Finally, we have spent the past month developing an image footprint database.

Image footprint database is something exclusive to ExchangeDefender. We strip known SPAM messages from our honeypot (public email addresses that only exist to collect SPAM) and store the known images in spam into a database. We then run analysis on them and compare all new incoming messages against the known samples of SPAM.

OCRing images is very expensive in terms of processor cycles and as expensive as it is for us to analyze each incoming message it is even more expensive for the spammers to create these images for each SPAM they send out. They create a single SPAM message that is then broadcast millions of times – and we’re ready for it!

So thank you for your continuing support of ExchangeDefender and as always, we’ll keep your mailbox clean for you.