The Spam Problem: Moving Beyond RBLs

Summary: Alternatives to Realtime Blackhole Lists (RBLs) should be actively deployed because of serious well-known problems with the RBL spam filtering technique.

Table of Contents

Disclosures
Intentions
Definitions
Organizations and individual users
History and overview of the RBL architecture
Common arguments and justifications
Problems with RBLs
Other anti-spam systems
Are open relays necessary anymore?
Hard numbers
Properties of a real solution
Conclusion
Resources
Contact

Disclosures

I am not a spammer and never have been. I do not support spammers and never have done so. I believe that spam is a first-class annoyance and ought to be stopped somehow. One of my servers is listed on an RBL in spite of the fact that no spam has ever passed through it (I know all of the users of this particular system personally). The reason for this RBL listing is because the entire netblock that houses my server is blacklisted. What? The entire netblock? Yes, the entire thing. I represent Collateral Damage in some rather vigilant person's effort to change my ISPs opinions about the best way to combat spam.

Intentions

People have strong views about spam. I intend for this to be an accurate, even-handed and balanced discussion of RBLs. Please do not contact me with death threats, flames and the like, because I will ignore them. If you disagree or can provide further information on the subject of RBLs, please contact me with your opinion, supporting or refuting evidence, your own personal stories, etc. and I will endeavor to make any necessary corrections to this document.

I hope after reading this paper that, at least, you will understand what an RBL is on a deeper level than "a simple spam blocking mechanism" which it really is not. These mechanisms have implications and their architecture creates power and renders people subject to this power. As such, everybody subject to RBLs should question and understand them.

I believe that all of the ideas that I put forth here are consistent with the Electronic Frontier Foundation's public statement on the use of RBLs as tools to combat spam.

Definitions

Spam or UCE (Unsolicited Commercial Email)

Typically, when we refer to spam, we also imply bulk email, which is sent to many users at the same time. We call it unsolicited because the user does not have a relationship with the sender. We differentiate spam from "unwanted marketing emails" in that the latter may be sent to a user, for example, after making a purchase with a particular vendor. Beyond the bounds of this definition, we can argue that things start to become unclear for various reasons (i.e. the sender refuses to unsubscribe the recipient from future mailings, etc.). Specifically, emails sent from computer viruses (e.g. Klez) do not fall into the definitions used within this document. Classic examples of email that does fall into this definition are the anonymous garbage emails that you routinely delete each morning that declare, "Mortgage rates have dropped!" and the like.

The State of California, USA has a definition that is nicely summarized by FindLaw in this article (PDF file):

The statute defines "unsolicited e-mail documents" as "any e-mailed document or documents consisting of advertising material for the lease, sale, rental, gift offer, or other disposition of any realty, goods, services, or extension of credit" when the documents (a) are addressed to recipients who do not have existing business or personal relationships with the initiator and (b) were not sent at the request of or with the consent of the recipient. (§ 17538.4, subd. (e).)

It became obvious to me while talking to people about this paper that even coming to a consensus about what the term spam means is not the straightforward task that you might initially think it is. Rather than focus on boundary conditions, I am focusing on the 98+% of the emails sitting in my spamtrap folder that meet the criteria which I have outlined because they exhibit characteristics of "classic" spam.

For balance, please see an RBL operator's definition of spam and compare it with your own experience.

Open Relay

A mail server that is configured to relay mail from end users to any destination address is known as an open relay. A spammer may discover that Company A left their mail server, smtp.company-a.com in a vulnerable state. He may then be able to send tens of thousands of messages from his desktop computer on a dial-up modem using some software that connects to the smtp service on smtp.company-a.com which, in turn, accepts the mail for delivery to thousands of users whose domain's mail exchangers reside on potentially many other hosts across the net.

Keep in mind that not all emails passing through an open relay are necessarily spam. It is very possible (perhaps even the normal case) that an open relay is used to deliver regular "good" emails as well as spam messages. This is an important point to remember because a RBL does not focus on the contents of the message but rather the server that the mail passes through in the process of delivery to the intended recipient.

Closed Relay

A mail server that is configured to only accept mail based on certain criteria such as the destination address or the originating IP address. Using the example of Company A above, the mail administrator may only allow the smtp service to accept messages that meet certain conditions in order to be accepted for delivery by the SMTP service:

Messages have to originate from hosts on Company A's network OR
Messages have to be addressed to user@company-a.com OR
Messages have to be addressed to user@another-allowed-domain.com

There may be many other cases where relaying are allowed, but these are prototypical configurations that are commonly used to allow an organization's users to send messages to anybody and to only allow external users who are trying to relay through smtp.company-a.com to send messages to users at Company A.

RBL - Realtime Blackhole List / Relay Blocking List

I think there are two definitions here. The definition I am proposing is:

A system for arbitrarily rejecting email messages (spam or otherwise) based on an unknown entity's unknown criteria

Please read the rest of this document before deciding for yourself. Although I am not an RBL operator and do not intend to speak for all RBL operators, I believe that a definition acceptable to an RBL operator would be:

A list of servers which send out spam or are known to be open relays

This is fundamentally different. An RBL is a list. People who administer mail systems choose to subscribe to lists such as the RBL, presumably in order to block spam mail. Most mail system administrators also assume that they are blocking only spam, which is not true. RBL operators do not promise accuracy and frequently they say that their lists are not intended for mail blocking or suitable for anything. This is standard warranty language, though, so it does not really represent anything unusual in that regard. Some RBL operators misuse their positions of power and knowingly block open relays which have never sent spam, but could be used to do so. Other RBL operators also block websites for reasons unrelated to spam such as disagreements based on certain ideas relating to spam.

Like I just said, I encourage you to read the rest of this document before you make up your own mind, rather than just accepting what people tell you (whether that comes from me or an RBL operator).

Organizations and Individual Users

Different types of people are affected by RBL usage. An organization's mail administrator may choose to install some anti-spam mechanisms to make his or her job easier and to reduce the amount of time the organization as a whole wastes in manually filtering and deleting unwanted messages. These mail system administrators are important because they inherently make choices that affect their entire user base ranging from upper management to the secretary at the front desk. Typically, an organizational mail system administrator has authoritative control over the relationship they have with their end users.

Individual users (i.e. home users with personal email addresses received from their ISP) are a different story. These users typically pay for service from a DSL, dialup or cable modem provider or perhaps they use a free web-based email service such as those offered by Hotmail or Yahoo! An individual user has the authoritative position in this relationship because he or she can acquire a new email address from another email service, absorb the switching costs and use that instead.

As you read this document, you will likely fall in to both categories. You probably have an email address at home and at work. You control your personal email account in that you can stop using it and switch to a new one at any time while control of your work email account is delegated to your organization's mail system administrator who makes these choices for you.

History and Overview of the RBL architecture

Several years ago, mail system administrators noticed that a disproportionately high number of spam emails were originating on open relays. So they started blocking inbound mail from these open relays as they noticed cases of abuse. On each case of abuse, they would sometimes send a test message to themselves via the open relay. If they received the test message in their mailbox, they could deduce with some level of confidence that the system in question was an open relay since they could apparently send messages through it to anybody. Sometimes, the mail administrator would contact the owner of the open relay and ask them to close it. If the relay owner refused or did not respond to the request, the mail administrator would then blacklist the server as a known source of spam run by an "irresponsible" manager.

Then the mail sysadmins started sharing lists manually until one day someone created a distributed and automated system for sharing these lists. This type of system is called an RBL (Realtime Blackhole List). As people reported the IP addresses of possible open relays that were possibly - or in fact - being used to send out spam messages, the system would send a test message to itself via the mail server on the IP address in question. If the RBL system received the message, the IP address in question would be added to the blacklist.

At this point, other mail system administrators who had plugged the RBL filter into their mail delivery chain would automatically start rejecting messages sent from these alleged open relays. The level of spam messages that they received dropped dramatically and so they were happy because their workload was reduced, their mail server’s load was reduced and the amount of annoying spam that their users received dropped.

Sometimes, the owners of these open relays found out that other people were blocking their mail or that their system was unwittingly being used to send out spam messages. Some of these open relay owners decided to fix the problem by configuring their mail relay system so that it became "closed" such that only selected users could relay messages through it. As these open relays were configured to be closed relays, some of the RBLs removed the IP address of the "fixed" systems.

Around this time, disagreements happened, splinter groups formed, friendly competition and hackery took over and now we have probably 30 commonly used RBLs that are used by thousands of mail sysadmins in order to reduce inbound spam (see Resources for an abbreviated list). These RBLs differ in many ways, including the means in which relays are blacklisted, their policies for being removed from their blacklist, their technological performance, redundancy, and accuracy.

Common Arguments and Justifications

RBL operators commonly respond with certain arguments when they are questioned about why they provide their services. I wanted to try to summarize these points here for the curious reader. Please know that something that one RBL operator says does not necessarily apply to other RBL operators. If you are the owner of one of these sites that I have quoted and feel that the quote is taken out of context or misrepresentative, please contact me so we can find a solution.

Problems with RBLs

Network Effects and the Unscalable Nature of RBLs

If you had a mail system that rejected all inbound emails, you would not have a very useful mail system. Likewise, as RBL lists grow in size and their use amongst mail system administrators increases, their value diminishes. If you block mail from a relay that sends thousands of spam messages to your organizations, it may be considered a positive step in that you have reduced the amount of inbound junk email. However, what if this mail relay was also the relay used by a person important to a few of your users (i.e. a business partner, a close family relative, etc.)? Users will start asking their mail administrator why they can't receive mail from the person who legitimately uses the open relay. It is the hope of RBL operators that the feedback loop will eventually land on the desk of the owner of the open relay who will then be required to address the problem or face the ire of the inconvenienced users that he represents.

However, this process quickly becomes impractical and burdensome on the users who are ultimately inconvenienced by the RBL. Is it better to have a thousand spam messages blocked along with communications from a critical business contact than the alternative? Maybe. Maybe not. The answer depends on the circumstances of the situation and can really only be determined by the end users of the situation. Clearly, if an RBL operator was to put AOL's or Earthlink's SMTP servers onto their blacklist, the RBL operator's reputation would be questioned and the value of the RBL would be diminished because it would suddenly be blocking an enormous amount of legitimate email.

Because RBLs filter or block mail coming from certain network addresses and do not operate on an individual message basis, they rely on having a low false-positive ratio. A false positive is a legitimate email message that is treated as though it was spam due to its origin. If the number of false positives becomes too high, the value of the RBL is diminished and nobody will really care about the thousand blocked spam messages that they didn't get if they also cannot communicate with their co-workers, friends, and colleagues.

Collateral Damage and Legitimate Users

Collateral damage is the term used to describe the predicament that unintended victims of RBL usage occasionally find themselves in. Consider a large ISP with thousands of customers on dialup accounts. Quite possibly, this ISP has hundreds of thousands of legitimate messages passing through its outbound SMTP server each day. If a few spammers can gain access to dialup accounts provided by the ISP, they can quite easily send out many spam messages before the ISP would have grounds for suspending their access to the dialup account. And it is a difficult problem to solve for ISPs providing dialup accounts, because their users can quite easily be located in foreign countries or simply are using compromised usernames and passwords to get on the network via the dialup link.

Frequently, we see RBL operators growing impatient with the response time of the ISP in shutting down spammers or simply the volume of spam coming from this large ISPs smtp server. So they drop the ISP onto the RBL that they operate and now the spam problem is solved. No more spam from this large ISPs mail server. Problem solved... at least, as far as the RBL operator's own needs are concerned.

However, what about all the other legitimate users of the ISPs services? Now, suddenly, their outgoing email messages are getting mysteriously rejected by all of the mail servers on the net that are using this particular RBL. The theory on the part of the RBL operator is that either this customer should pick up the phone and yell at the ISP until the spam problem is solved or they should find a more socially acceptable ISP that doesn't have as much of a spam problem.

This puts undue pressure on a potentially responsible ISP and causes a disproportionate amount of inconvenience on the part of the affected ISPs customers. Why are they being punished? Should they automatically have to shop for a new access provider, reconfigure their computer and inform everybody in their addressbook of the new email address provided by the new ISP? Large ISPs are almost always going to be immune from RBL operators. If an RBL operator was to put the smtp servers of AOL, Earthlink, AT&T and a few other cable providers onto the RBL, the value of the filter would be reduced and many users would start wondering why they can no longer communicate with users at these large ISPs. RBL usage necessarily hurts small and medium size organizations whose proportional value in the network is small but who can easily be damaged by being listed on an RBL.

Geopolitics and Blackholes

A huge amount of spam is being sent through unsecured relays in Asia and South America. Consequently, an overwhelmingly large percentage of the hosts listed on RBLs are in fact based in these countries (see Wired article: Not All Asian E-Mail Is Spam). This amounts to nothing less than discrimination and isolationism that is being used to slowly cut off countries that have a critical importance in global matters. If a company cannot communicate through its ISP with a company based in the US or Europe, its ability to provide services to a foreign organization are severely limited. If you believe in democracy and the principles upon which much of the Internet's architecture is based, then for this reason alone, you should consider the discriminating effect that running an RBL has on those locales.

This is not the case of 10 open relays rendering a country's email infrastructure useless, but a slow and systematic effort on the part of some RBL operators to completely shut down email originating in Korea or China (see Hard Numbers for details; also see blackholes.us for a list of per-country IP ranges that can be used to block mail) because it's generally too much of a hassle to deal with. Blocking mail for your own purposes is one thing, but when you install a blocking mechanism that affects tens of thousands of email users, it becomes a responsibility to examine the biases that are inherent in the system you installed.

Cutting off thousands of users in foreign countries (read: not based in the US) flies in the face of the idea of equality and the possibility of sharing ideas, viewpoints and knowledge across political borders. Using certain RBLs is sometimes the technical equivalent of saying, "Hmm... I'll just block all inbound mail from Asia fix my spam problem!" Why punish citizens of a country by cutting them off from emailing the rest of the world? The focus should always be on the spammer (see the Collateral Damage section for more info).

Invisible Authorities

When I used RBLs, I admittedly did not give much thought to who was running them and who was making the choices to put certain mail relays onto the list. Most people don't until they have a problem or take issue with one of the blacklisted hosts. In fact, it is quite likely the case that most organizations running RBLs never tell their users that their mail is being filtered. Consider the relationship at play here:

The RBL operator plays a critical role in the chain and, in most cases, their role remains invisible and wholly transparent to the end user. Let's change the scenario by substituting phone service for email service. What if you found out that a mysterious organization was at the other end of your phone company's service and that they were blocking inbound calls to you based on certain criteria that you have no control over? This is an invisible authority. When we subject ourselves to control, we ought to know who and what we are subjecting ourselves to.

There was an interesting posting on Slashdot (entitled MAPS RBL is now Censorware) a while ago that talked about MAPS' (an RBL operator) choice to put 1500 IP addresses belonging to an ISP in California, USA onto their RBL. The reason? Because the ISP hosted sites that sold bulk email software that was allegedly used as a spamming tool. The ISP itself had an anti-spam policy and seemingly enforced it, too, which made the circumstances even more bizarre because most people using the RBL thought they were blocking only open relays. As this example alone illustrates, it's not always the case.

Dependence and Self-Direction

The nature of an RBL is that they are shared, common resources, which are controlled by third party RBL operators. They are not usually customized for individual use in that a single end user does not have the ability to say to their ISP, "Please block all inbound mail from the smtp.bad-evil-spammer.com" by putting this on your blacklist. Or the reverse: "Please take this IP off of your blacklist". As an end user of email, I feel that I should have the right to know what the filtering criteria are that are being used to process my incoming email. For example, if I want to receive all sorts of messages from the customer of an ISP in Brazil, I should be able to do so regardless of whether that particular user's SMTP relayer is listed on the RBL that my ISP subscribes to. In theory, RBLs discriminate only against open relays; in practice, this is only partially true.

The idea of choice is an important one and it simply is not a part of the common architecture of RBLs. Consider the new Mail application (aptly called "Mail") that ships with Apple's OS X operating system. It contains a junk mail filter that is operated by and customized by the end user. The end user has the ability to completely shut off the junk mail feature or to make it as vigilant as he or she feels is appropriate.

RBLs do not operate in this manner. They represent a heavy handed and desperate attempt to quell the large spam problem that gets increasingly annoying for those of us who use email. Users are generally totally unable to customize the filtering criteria based on their individual choices and instead are subjected to someone else's filtering criteria ad nauseam.

Appeals and Corrections

Sometimes, the operator of an open relay will decide to close the relay in order to be removed from an RBL so that their users are not complaining that various mail servers that use RBLs are rejecting their emails. However, it is somewhat rare to find documented appeals and corrections processes published by RBL operators. Sometimes there are none and you end up being listed indefinitely. Sometimes you are lucky and your work in closing an open relay is immediately recognized by RBL operators who gleefully remove your SMTP host from their list. Other times, this process is lengthy.

Disagreements occur. Technical glitches happen. You stay on. You are stuck. Sometimes, the only resort that you have in order to get taken off of an RBL is to change the IP address of your smtp server.

Problems with the Law and the Legality of Relay Testing

An RBL should by definition only aid in blocking emails originating from open relays. Another type of filtering mechanism may simply block messages from "known" sources of spam. But in order to know if a mail relay is open, someone must conduct a test to determine this. How is this done? Usually, this is a simple matter of configuring some software to connect to the smtp service on the server in question and attempt to send a message to an email account that you own in order to examine the headers of the email that you receive. By examining the headers of the email, you can examine the path it took in order to land in your mailbox. For example:

[...]

Received: from unknown (HELO exchange1.abc.net.uk) (195.72.39.250) 
	by cambridge.ns.whirlycott.com with SMTP; 21 Jun 2002 13:21:30 -0000
Received: from baldanpdc.baldan.co.uk (BALDANPDC [195.72.34.245])
	by exchange1.abc.net.uk
	with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id NL13NBJ1; Fri, 21 Jun 2002 11:09:50 +0100
Received: from host205-66.pool21759.interbusiness.it by baldanpdc.baldan.co.uk
	with SMTP (Microsoft Exchange Internet Mail Service Version 5.0.1459.74)
	id NFH0V3RK; Fri, 21 Jun 2002 09:32:39 +0100

[...]

So we can see from this actual piece of spam that I received that it was subsequently delivered to my mail server (cambridge.ns.whirlycott.com) by exchange1.abc.net.uk.

If I was running an RBL, I may then wonder if exchange1.abc.net.uk is an open relay, which would make a good candidate for my RBL. In order to determine this, I could send an email through the server in question to myself. If I received it, I could then say that it was an open relay and should be blocked. And this is exactly what people do. Just as an example of this type of activity, look at this.

Now, the problem is that the very act of testing may in fact be illegal. There is a lot of debate about this right now and I don't think there is anything resembling consensus out there. Some folks say that relay testing is legal and ought to stay that way. Other people say that it is not and cite 18 USC Sec. 1030 (a) 2 (c) which declares lots of things illegal including:

Whoever ... intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains ... information from any protected computer if the conduct involved an interstate or foreign communication ... shall be punished as provided in subsection (c) of this section.

Still others argue that if someone has the ability to spam their mail server, they should have the right to block the mail and/or verify that the delivering server is an open relay.

Obviously, things get a little trickier when this testing is conducted across borders, whether between US states or country borders. But even within the US, the situation is a little vague. If you are conducting relay tests, you should research any applicable laws in your locale and try to enforce whatever your opinion is.

As a practical exercise, one of my colleagues prints out the following message at the beginning of every connection to his SMTP server (which happens to be an open relay):

220-[server name deleted] ESMTP Sendmail; Sun, 22 Dec 2002 19:54:33 -0500 (EST)
220-Authorized Users Only
220-Notice: Unauthorized use billed at $1 per message, $10 per bounce,
220-$90/hr for cleanup.
220 Unauthorized use over $5000 defined as criminal by 18 USC 1030(A)(4).

I will cover later on the idea of the value of an open relay, but for now I will examine his response message to any SMTP connections. My colleague's argument is that any of his customers can use his relay. They fall into the category of "authorized users". However, anybody else who connects is notified of the agreement that they are about to accept. By actually sending mail through the relay, they agree to the terms outlined in the greeting message.

Many would rightfully argue that this is a little absurd. First of all, most people are using applications that don't print out informational messages such as this one, so they never see the agreement in the first place. And secondly, since it doesn't stop the ensuing onslaught of spam from being delivered out to the net, it is effectively useless. However, the presence of the warning is important in that if ever a trial were to occur, the lack of such a message clearly would not help my colleague's case. However, it makes us wonder - if everybody was to put a message like this into their mail server configuration such that any connection attempts received the notification, it might be interesting to see what the effect on the growing number of anti-spammer lawsuits might be.

Another relevant US law is the Sherman Antitrust Act, which states:

Every contract, combination in the form of trust or otherwise, or conspiracy,
in restraint of trade or commerce among the several States, or with foreign nations,
is declared to be illegal.

Obviously, if you are not in the US, this probably isn't terribly interesting to you, but the MAPS RBL cites the Sherman act as a potential cause of concern and notes that no action has hitherto been taken against them to the point that it would actually test the validity and relevance of this act to managing an RBL.

Punishable Protests

Some mail system administrators reading this document may be wondering what would happen if they blocked access to their systems from some of the RBL operator servers. Discovering the IP addresses of these RBL operators is not a difficult task (see the MXDB) and implementing a block on a host is also trivial. So what happens? You are put on the blackhole list! This is hardly surprising when you think about it, though. If you were deliberately running a spam-friendly open relay server, one of the best ways to ensure that you could not be put on an RBL would be to block access from the RBL site so they could not send their test probe emails to your server.

Clearly, the RBL operators are not stupid people and realized this even before they deployed their RBL blocking mechanisms. But the end result is that most mail system administrators are obliged to accept spam probes from RBL operators; the consequence of blocking these probes is punishment by being placed on the blacklist.

Other Anti-spam Systems

Realtime Blackhole Lists are not the only type of anti-spam system. Here are a few other classes of solutions that are popular today.

Content scanners

A content scanner is a mechanism that performs analysis on an individual message to determine whether that individual message meets certain criteria representative of spam. These are deployed either on the end-user's email application or on the receiving mail gateway itself. The content scanner may examine both the mail headers as well as the message content itself during the analysis process. One key difference between a content scanner and an RBL is that a content scanner operates on a per-message basis. Using this system (assuming an unrealistic 100% accuracy on the part of a content scanner), legitimate email messages sent via an open relay could still land in the mailbox of the intended recipient. So this in itself represents an improvement over an RBL, which only looks at inbound connections from a particular server.

But good things don't come for free. The resources consumed by performing textual analysis and processing is almost always substantially greater than a simple DNS lookup against an RBL database. And some of these content scanners are quite complex, using techniques ranging from Bayesian analysis (recently, but not originally proposed by Paul Graham) to heuristic evaluations.

There is a Spam Conference being held at MIT (Cambridge, MA, USA) on January 17th, 2003 where a lot of these techniques will be discussed.

Distributed Notification Systems

Like a content scanner, this operates on a per-message basis, which is a leap forward over an RBL in my view. The basic idea is that since many spam messages are alike, once a person has determined that a message is actually spam, others should be able to benefit from this person's analysis instead of having to conduct their own tests. In this class of system, all of the actual spam reporting is manual. That is, all messages that are determined to be spam are actually analyzed by a user. The user in turn reports the offending message to a centralized database. Other users of the system can then theoretically be spared from having to re-analyze a particular message by looking up an incoming spam message automatically. If the Razor database knows that the message in question is spam, the user can then treat the incoming email accordingly.

This class of solution still suffers from some of the same problems that an RBL does (see Invisible authorities, Dependence and self-direction, Appeals and corrections), but the fact that it operates on a per-message basis avoids some of the real downfalls of an RBL such as Collateral Damage and Unscalability. Expensive? Again, yes, but not perhaps as much as a content scanner since the reports are only generated once per message throughout the system while lookups are fairly lightweight.

Customized or Manual Blocking Criteria

An organization’s mail administrator on the mail gateway configures these either on the end-user’s desktop mail application or, more likely, on their mail gateway or firewall.

I know some people who operate their organization's mail gateways using customized filtering rules. They perform both network-based and textual analysis on samples of inbound messages and simply discard anything that declares, "Mortgage rates have dropped!" or "Buy stainless online" or that originates from a particularly troublesome network. Obviously, for organizations, this affords the maximum amount of flexibility and customization, but it also requires a ton of manual, active maintenance.

For the end user, it's the same story: a lot of work for the ultimate in control, plus a whole lot of work. But, as an end user dropping your own filtering rules into your email application, you are in control and are not subject invisible authorities, other people's political biases, etc.

In the end, this isn't really much of a solution as it is a stopgap measure. It's the "no system is better than a faulty system" mentality, which can hardly be true in this day and age. Some of the mail systems listed above have accuracy rates above 95%. At that rate, it would take less time to weed out the false positives by scanning a mail folder manually than it would be to generate new rules every day. So it's hard to imagine this being beneficial except in the most extreme cases of abuse (i.e. getting lots and lots of spam messages from a particular relaying mail server).

Law

Is law an anti-spam mechanism? Sure, albeit not a technical one. Laws that threaten hefty fines, jail sentences, or other punishments can act as deterrents against socially unacceptable behavior. In the US, the FTC has started investigating deceptive spammers (more info from the FTC) and some companies have even won large court settlements in their efforts to curb spam. The European Union has started its own investigations and it looks like they will outlaw unsolicited email between member states altogether.

Professor Larry Lessig also has thrown some support behind the idea of a national law (USA) that would require spammers to label their messages with an "ADV: " prefix to the Subject: header of the email message (in fact, he's bet his job on it causing a substantial reduction in the level of spam). The other part of this law also establishes a bounty for those who can track down spammers who violate the labeling requirement.

However, it's frequently hard to track down spammers, let alone prosecute them. Again, it gets tricky when they are spamming from another country because cross-border lawsuits are both expensive and complicated due to jurisdiction issues. Besides, laws are not nice and automated like our other anti-spam mechanisms. They take time to implement, they take time to use (i.e. prosecute) and they do not always result in the outcome that a particular user wants.

Regardless, they are a piece of the anti-spam puzzle and for this reason alone, we should not write them off as useless and outdated because they do provide value as a deterrent as well as a tool for gross violations that result in system crashes, economic losses, etc.

Education

Again, this is another non-technical mechanism, but consider what would happen if 100% of all users chose not to make purchases or otherwise send payments based on spam messages. My view of this falls into the traditional economic models that you study in high school. In this case, I think that sellers would realize that spam is an unprofitable channel and they would slowly stop spamming. But the problem is that the margins on spamming are quite high, so in reality, close to 100% of spam recipients would need to not make purchases, which is probably pretty close to what is currently happening. But I think a well-educated user of email should be able to recognize scams and spam messages when they seem them and act accordingly by reporting the spam message and not making any purchases from that supplier.

Are open relays necessary anymore?

I wish there was an easy answer to this question. It is the point of view of many people that open relays are totally unnecessary and should not be used at all. As of December 2002, there are several technical solutions to allowing users to send email through a particular server ranging from authentication based on username/password combinations to network location. However, none of these mechanisms were specified in the original RFCs for SMTP (RFC 821, RFC 822).

The SMTP protocol was designed as a mail delivery protocol that is de-coupled from the protocols that actually allow users to retrieve their mail (POP3 and IMAP are commonly used, but others exist, e.g. whatever it is that handles the backend for Microsoft Exchange). These protocols were designed to be loosely coupled so that the process of transmitting and receiving messages were totally separate. Either mechanism could be totally ripped out, redesigned and a new implementation could be put in its place without affecting the other.

Some effort has been exerted to try and put some kind of understanding into the origin of an incoming TCP connection (see RFC 1413, the identification protocol used in Unix identd servers). However, these have been widely ignored because some of their features have been viewed as potential security risks and also that running an ident server is currently mostly unnecessary because most SMTP servers do not require a positive response from a remote ident server in order to accept the email.

Just as open relay operators say they have the right to run the open relay for whatever technical problems they cannot solve using authentication mechanisms, many remote operators say they have an equally well-founded right to filter or block mail from these servers. Who is right? Probably both have or should have the right to perform these actions, but blocking all mail from an open relay - which is what an RBL is used for - results in the problems outlined in this document.

Other solutions include POP-before-SMTP and ASMTP. These solutions basically provide an authentication layer on top of or inside SMTP service. Both your mail client and mail server need to be modified in order to support these solutions, but they may represent interesting viable alternatives to you. These are very popular with ISPs that provide dialup service or roaming access services. Adoption of these techniques has been slow mainly because of implementation differences in various mail clients. As a service provider, it's optimal to have only one solution that works for all of your users and this has historically been somewhat tricky to do for large user bases.

If an open relay can be closed without detrimental impact on the functions it needs to perform, then it probably should be closed. In other words, if there's no good reason for a relay to be open, then close it (see Resources).

Hard Numbers

Data is hard to come by and varies greatly according to the RBL operator.

ORDB (an RBL) publishes a document that breaks down the number of RBLs per TLD (top level domain). They also categorically reject the possibility of getting a copy of their list for analysis by third parties. Obviously, there is some basis to this and it is partially understandable. If their list was published, spammers would then have a large list of approximately 250,000 available open relays to route spam through. Fair enough, but it would be immensely helpful if RBL lists could be made available to reputed university organizations and the like in order to better understand the contents of their lists and to generate high-level reports that could be published for public scrutiny.

Other RBL operators do make their lists publicly available. One of these, Blackholes.us, has a series of lists that can ostensibly be used to block mail originating in certain countries or from certain ISPs. Another RBL that is publicly available is the Danish list called no-more-funn. Some of these lists contain IP addresses while others contain entire netblocks, such as those attempting to block all inbound mail from China, as seen in these excerpts from the no-more-funn list (the second column shows the IP range [i.e. 61.0.0.0]):

$GENERATE 128-191   *.$.61              CNAME   china.spam
$GENERATE 240-243   *.$.61              CNAME   china.spam

$GENERATE 96-121    *.$.202             CNAME   china.spam

$GENERATE 40-47     *.$.210             CNAME   china.spam

$GENERATE 72-78     *.$.210             CNAME   china.spam
$GENERATE 82-83     *.$.210             CNAME   china.spam
$GENERATE 81-91     *.$.211             CNAME   china.spam
$GENERATE 94-103    *.$.211             CNAME   china.spam
$GENERATE 144-167   *.$.211             CNAME   china.spam

$GENERATE 0-31      *.$.218             CNAME   china.spam
$GENERATE 56-75     *.$.218             CNAME   china.spam
china.spam                          IN  A   127.0.0.2
                                    TXT "added 2001-04-19; china does not seem to care about spam"

Some of these entries contain tens of thousands of IP addresses and affect an unknown number of people. There are more examples of bias; this is just the tip of the iceberg.

Properties of a Real Solution

I will not propose a solution the spam problem here, but instead will discuss some properties of a possible solution. The focus of this paper is really to advance the rationale for not using an RBL and to attempt to refocus those readers who are developing anti-spam mechanisms to learn from the mistakes of the RBL architecture so that they will not be repeated in future implementations.

Those of you who are actively implementing anti-spam systems are encouraged to read these ideas and measure your own architectural choices against these to see where you stand. I view the solution not on a purely technical level, but rather something that is a balanced mixture of Law, Social Behavior and Technology.

Technology

Operates on a per message basis - only a message can be considered as spam, not a TCP connection or a server or a person. Accordingly, a solution should operate on a per message basis and should not discard all messages originating from a certain locale, network or particular server.
Configurable by the end user - unlike an RBL, the end user should have an interface for creating, modifying and deleting the rules that are used to filter his or her spam. This is architecturally prohibitive with RBLs, but is more easily accomplished using mechanisms like content scanners.
Scalable (resources) - the solution should not be prohibitive in terms of computational resources required to process messages.
Scalable (architecture) - More importantly, the value of the system should not decrease as its use increases (see Unscalability, above).
Automatic - the solution should be automatic and should not rely on human intervention for message analysis.
Whitelisting - as a general principle, I probably want to accept messages from anybody or any organization that I have communicated with in the past. Any other messages probably ought to be considered as suspicious and should be analyzed accordingly. Most people use a fairly wide access control rule when receiving email, but then factor in other rules such as "allow from all, deny from a, deny from b, deny from c [...]". More probably, we should use something like "analyze from all, allow from x, allow from y, allow from z [...]". We do not want a system that automatically rejects email, but would prefer a system that automatically quarantines mail from unknown senders. Only users on our whitelist will be allowed to send email that passes straight through the filter.
Verifiable - where possible, messages should be digitally signed so that the probability of accurately recognizing a sender is greatly increased over alternate schemes such as headers or originating network location, both of which can readily be forged.

Social Behavior

Economics - Spammers will stop spamming when it is no longer economically profitable for them to do so. The solution should make it economically unfeasible for spammers to make money, at which point their activity should be greatly reduced. My guess is that this is easier said than done.
Education - If everyday people understood the spam industry, they would be better equipped to understand when they are being scammed, why they should not pass along chain emails to a hundred of their closest friends, etc.

Law

Should not be exclusionary in nature - not only is it unreasonable to expect that all the governments of the world could agree on a policy against spam, I believe it is wrong to try to make them. Each state should be allowed to make it's own laws and should not be excluded from making use of the Internet because it refuses to meet certain standards which are acceptable in other countries. Each state should have its own right to self-direction.
Should be enforceable across borders - spam sent by a spammer in country A to a user in country B should constitute a crime in country B if it violates country B's anti-spam laws. Accordingly, the user in country B should be able to sue the spammer in country A. This is likely incredibly difficult to orchestrate, but nevertheless it seems like a worthwhile goal.
Punishment - if sending spam is offensive behavior, it should be deterred through the creation of laws that threaten appropriate punishments. The grounds for this idea are that spam represents an unwarranted waste of people's time, computational resources, and also represents an invasion of privacy.

Conclusion

RBL mechanisms frequently cause a lot of trouble for legitimate Internet users who are trying to send non-spam email in addition to their intended goal. The intention of this paper is to outline the technical and social problems with RBLs and to talk about some high-level properties of whatever a next generation solution might be. I might get into more details about such a solution in the future, but that will depend on my time over the next few weeks.

For now, if you agree with what you have read here and are using an RBL, please consider replacing it with another anti-spam mechanism. If you are running an open relay or do not know if you are running an open relay, please get someone with experience to look at your systems and figure out if this is necessary or if access to these relays can be tightened without getting in the way of your business. If you are an end user and are affected by an RBL implemented by your ISP or organization, consider pointing the managers of this organization to this document and asking them to read it.

Resources

Incomplete list of well-known RBL service providers

MAPS

ORDB

Osirusoft

Spamhaus

Dorkslayers

/dev/null.dk

SPEWS

Spamcop

Government agencies

U.S.A. - Federal Trade Commission

European Union - see the summary of Article 13 which talks about "Unsolicited Communications"

Information on closing your open relay

I started compiling a list of links myself but I think that MAPS (an RBL operator) has a good source of information on closing open relays. If this is something you want to do, it's a good starting point.

Alternative anti-spam mechanisms

Content scanners

Spam Assassin - be sure to turn off the RBL tests!

PureMessage

Brightmail

Deersoft

Cloudmark provides a product called Authority

Also see Google's directory of antispam software

Distributed notification systems

Cloudmark provides SpamNet (Windows version of Vipul's Razor)

DCC

Uncategorized links

The Spam Archive maintains a ton of available spam messages for use in testing filtering mechanisms.

Spam Laws - a site that has lots of links to legislation relevant to spam in different locales

Contact

I invite you to contact me, Philip Jacob, by emailing [ rblwatcher at whirlycott dot com ]. I reserve the right to publicly post any emails sent to me. Also, there is a mailing list for discussing this paper here.