Mac Lab Report

Fighting Spam with Bayesian Filtering

- 2003.02.27

My recent article on spam filtering led reader Lee Williams to send me this message.

A slight variant on your suggestion is based in AI, and it's called bayesian classification. Rather than having the user explain to the machine every keyword it wants to flag, the bayesian classifier simply assigns probabilities to each (word, phrase) based on what class you put the email in. Then, when it sees another email, it is able to use these probabilities to estimate the class it belongs in. It's actually a lot simpler than having the enduser manually create these filters.

Paul Graham discusses this in his "plan for spam".

The key element in both solutions is that they are completely customizable. The existence of individual filters would not necessarily eliminate 100% of spam to each person, but in the aggregate it would mean that each email reaches considerably fewer people, thereby hopefully making it at least somewhat less cost-effective.

That article is good reading. I think it would be something I would definitely try. An email client with such filtering would be a great hit, in my opinion. [Editor's note: Apple's Mail client in OS X 10.2 includes a Bayesian filter. I've been using it for over a month and am still training it. dk]

If we can write software that recognizes their messages, there is no way they can get around that.

- Paul Graham

I wonder if a combination of the traditional manual filtering and this would be even better. The manual filtering would allow you to set a rule for specific phrase or person (say your ex for example) which gives a 100% probability of being filtered.

One interesting point related to your statement about the email reaching fewer people - which he doesn't make either - is that if a user actually wanted to receive these emails - after all, someone must be responding to them - they'd like this filter, too, because they'd sort the spam they like into the "keep" folder.

Contrary to Graham's hypothesis, spam need not end. It would become highly targeted at those people most likely to respond to it. The rest of us would ignore it. If everyone had such filters, the things you're most interested in would be the only things that would get through.

Everyone would like this type of filtering. Haters of spam, responders of spam, and senders of spam.

Now for an even spookier thought - apply the same technology to a TiVo with voice recognition and filter your TV commercials the same way.

Join us on Facebook, follow us on Twitter or Google+, or subscribe to our RSS news feed

is a longtime Mac user. He was using digital sensors on Apple II computers in the 1980's and has networked computers in his classroom since before the internet existed. In 2006 he was selected at the California Computer Using Educator's teacher of the year. His students have used NASA space probes and regularly participate in piloting new materials for NASA. He is the author of two books and numerous articles and scientific papers. He currently teaches astronomy and physics in California, where he lives with his twin sons, Jony and Ben.< And there's still a Mac G3 in his classroom which finds occasional use.

Today's Links

Recent Content

About LEM Support Usage Privacy Contact

Custom Search

Follow Low End Mac on Twitter
Join Low End Mac on Facebook

Favorite Sites

Cult of Mac
Shrine of Apple
The Mac Observer
Accelerate Your Mac
The Vintage Mac Museum
Deal Brothers
Mac Driver Museum
JAG's House
System 6 Heaven
System 7 Today
the pickle's Low-End Mac FAQ

The iTunes Store
PC Connection Express
Macgo Blu-ray Player
Parallels Desktop for Mac

Low End Mac's store


Open Link