TerminalDigit

Good to the last bit.

7 February 2007

Technology

Sperm: The Spam We Can’t Filter

The problem

I was having a conversation with a friend today about how I get so much e-mail from people I know regarding topics that I just don’t care about. I can’t imagine that I’m alone in this regard. You know how it works: people at work copy you on items that don’t really pertain to you “just in case,” your crazy uncle keeps forwarding you those hilarious Chuck Norris jokes, and your medical school classmates keep inviting you to their meetings to discuss how best to help all the homeless, starving, HIV-infected orphans whose puppies have cancer. For me, it’s turning into a real problem. We’ve got an amazing arsenal of tools to filter spam—whitelists, blacklists, graylists, all kinds of content-based filters, etc.—and they work pretty well. These days, I get very little spam.

Sperm

But this isn’t spam. Spam tends to come from people you don’t know, and in general, it’s pretty easy for a stranger who knows nothing about you to look at your inbox and pick out the spam. This is different. This is something I’m calling sperm. Why sperm? Well, sperm tends to be a bother/concern about 99.9% of the time, but there are a few (usually less than 5) instances in a man’s lifetime that sperm turns out to be quite useful, and it is these few instances that keep him from blocking sperm outright for most of his days. The sort of e-mail I’m talking about has very similar characteristics. The spermers send you an enormous amount of garbage, but you can’t filter their mail straight to the trash because on rare occasions, their messages are useful. Currently, 90% of the e-mail I get is sperm.

How can we stop sperm?

Honestly, I don’t know. I do know that it’s not going to be easy. Sperm-filtering must be content-based, and the line which divides sperm content from non-sperm content is more subtle than it is with spam. I’m not sure that we have filters which are capable of reliably differentiating sperm and non-sperm messages regardless of how large a training set we use. Furthermore, sperm is much more personal than spam, so solutions based on group-reporting won’t work, either.

Hopefully I’m wrong. Ideas about how to implement a solution and stories about how people are currently dealing with sperm are welcomed in the comments.

No Comments Yet

Leave a Comment