Greylisting and its derivative Tar-Pit Techniques

We currently believe that Greylisting (and its derivatives) together with SPF are the most effective techniques to fight the ever rising tide of random incoming SPAM. DKIM represents another approach that a mail sender may use, combined with some reputation system, to authenticate that outgoing mail from a particular source is legitimate.

It is estimated that over 15 billion SPAM messages are sent every day. Some days it used to feel like they all arrived in our mailboxes.

As the volume of spam rises, the anti-spam tools and content filters are becoming increasingly aggressive such that the number of false positives is becoming perilously high. We probably all know of at least one incident where our genuine email either got stuck in a spam folder or was probably discarded as junk.

Classic Solutions

The problem in fighting spam is finding a cure that is not worse than the disease.

We have reviewed and rejected some potential solutions:

Black lists: We refuse to implement a Black List because we feel it can too easily penalise legitimate mail while doing very little to stop SPAM - your SPAM clogged mailboxes are witness to the total lack of effectiveness of Black Lists. Having been the unwitting victim of a blacklisting, which took less than 2 hours to fix when brought to our notice, but took two years for all the effects to mostly disappear (though we still have one residual effect 5 years later) we feel the implementations on average are not production quality. And have you ever tried to contact a mail administrator in a busy email vendor about fixing their dumb blacklist that are years out of date (and those are the good implementations). On its own, Black Listing is a fatally flawed technique. In combination with other techniques (liking using a reputable blacklist source and updating it frequently) it can add value.
Incoming Mail SPAM Filters: It is not up to us, nor should it be, to decide what constitutes SPAM and what does not. One person's legitimate mail may be another persons SPAM and vice versa. While not doing anything to demean the quality of spam-filtering software, especially the newer generation of Bayesian based filters, the technology relies on inspection of the mail content. This a very subjective matter and will inevitably lead to false positives, which is why most such systems place suspected spam in a special folder. You still have to check this material - much of it profoundly offensive. How effective is that?

Finally, the key point. Neither of these techniques have affected the rate at which SPAM can be generated by the bad guys. They do not hurt spammers, they simply innoculate the receiving site from the effects of the bad guys. Indeed, in the case of SPAM filters, the good guys pay all the penalty. SPAM filters are serious users of CPU resources and, paradoxically, use the most resources when they pass good mail, since it must, by definition, pass all the CPU-intensive filtering tests. Bottom line, what have all these passive on-site tests done to help keep the wider world SPAM free? Nothing. Absolutely nothing.

So now let's get positive and look at what can be done.

Hurt the Bad Guys and Help all the Good Guys

The economics of SPAM are at best marginal. Any attempt to make those economics worse is bound to work against the spammers. This, seemingly trivial, insight has profound consequences in fighting spam and has led to a whole new battery of techniques including Greylisting (credited to Evan Harris in this 2003 article).

In practical terms, this insight means that doing anything which causes the use of additional resources will disproportionately affect spammers. And have the happy side-effect of reducing their capacity to send out any spam. Greylisting was the first of a family of techniques that have the following broad characteristics:

Cause more effort to be expended by the spammer. Leaving them with less resources to hit the good guys.
Require tighter compliance with the specifications. A Good Thing™. Most zombie mailers are, at best, trivial or marginal implementations of the mail specifications. Simply rejecting mail that is not in full compliance is surprisingly effective.
Limit the rate that spammers can send email. And thereby the volume of span that can be sent in any given period.
Take more time to respond when it is from a known - or even suspected - spam-source (here Black Lists can play a serious and useful role with no downside risk). Implementations using this approach are generically referred to as tar-pit techniques which conjures up a wonderful image of all movement slowing down. Rather that doing the obvious thing, such as immediately reject a suspected SPAM source which most software that uses Black Listing and other similar techniques does, tar-pit software does exactly the opposite. It takes a long, long, long time to send back the rejection (or any) message with the maximum delay between each character. The bad guys are stuck until the last character arrives (even if they decide that the receiver is a tar-pit they take serious time to figure it out). And when the SPAM source is stuck communicating with a tar-pit enabled system it can't be sending SPAM to someone else. Limiting the bad guys to help all the good guys.

Grey Listing

Grey listing is currently the most highly developed, and resource light, of these techniques and is implemented on our server (using postgrey) where it has had a dramatic effect. Currently over 90% of the SPAM load has gone. Period. Not a SPAM filter in sight. No false positives.

Greylisting looks absolutely terrifying at first glance and works like this:

Every time the mail server sees an email it constructs a unique triplet consisting of the senders email address, the recipients email address and the sending mail servers IP address. If the mail server has never seen this triplet before it stores the information in a database - and then discards the email with a temporary failure message. Yes. It throws the email away, without looking at its content, and will not allow it to be retransmitted for a small period of time (a blackout period) lasting perhaps two minutes and that is normally configurable by the mail server operator.
The email RFC's specify that compliant mail servers MUST retry under temporary rejection conditions using some form of delay back-off algorithm. Legitimate mail servers will retry, normally in 5 to 15 minutes, automatically. The sender of the email is not involved with the process and sees no effect of this temporary rejection/retry policy. Spammers may also retry but typically do so immediately and get caught in the blackout period. In any case, spammers have no real incentive to retry because it consumes more resources. A marginal business just got more marginal. And a single retry operation has just reduced the Spammers total capacity by 50%. You read that number right - 50%. Not too shabby for a days work.
Once the re-tried mail has been received, normally after the 5 - 15 minute retry delay interval, the mail server marks the mail source as valid and will not throw away anymore email for a period of time ( 4 -6 weeks, usually configurable by the email operator). The whole process is self-regulating.

It all sounds too good to be true. And unfortunately it is. Problems can arise for legitimate email in three areas:

The first time mail is received from a new source it will be delayed by 5 to 15 minutes. Thereafter it will arrive normally. However, the delay period can effect emails for password resets or other security operations which, typically, may be time sensitive.
Some mail servers, typically through poor implementation, can take a long time (multiple hours) to retransmit - even though, with normal mail servers, this will typically be 5 to 15 minutes.
Some very big mail servers farms may use a different sending IP address with each retry - thus defeating the triplet mechanism.

There are a variety of implementation techniques that can both ameliorate the initial delays and solve the problems identified above:

Whitelists can be built to bypass checks from known good sources or domain names.
Many greylist implementations allow operators to set policies that will permanently whitelist senders after receipt of a number of emails.
The number of servers that have very long retries or use different sending IP addresses is gradually being discovered and global whitelists are emerging. However, since spammers could just use faked addresses from a whitelisted source it is important this technique is used in conjuction with SPF which can then catch this abuse. A classic 1-2 punch. No third strike required here.

Anti-spam is increasingly not a single technique, but rather a battery of techniques. Serious work is being done in the area of email authentication (DKIM from Yahoo and others) and other techniques are at the idea stage. Sure the spammers will fight back, but if the problem can be made manageable then we have made progress. Maybe.

We feel very self-righteous because our Anti-Spam strategy is helping you. Now let's talk about what your Anti-SPAM technique is doing to help us......

Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.

Greylisting and its derivative Tar-Pit Techniques

Classic Solutions

Hurt the Bad Guys and Help all the Good Guys

Grey Listing

SPAM & Mail