SpamSieve is a Bayesian
spam filter that is very effective and easy to use. A Bayesian filter uses characteristics
of spam and "good" messages to statistically identify spam. Paul Graham
identified the Bayesian approach in his "Plan for Spam". This approach provides
an easy way to develop a personalized spam filter by simply marking spam and "good"
messages. The approach tokenizes each spam and good message that is added to the
SpamSieve corpus. The frequency of occurrence of each token in each type of message
is computed. This information is used to calculate the probability whether a message
is spam or not, which has proven to be an effective way to identify spam.
A Bayesian filter is incorporated in Apple's Mail program. SpamSieve provides this
capability for Entourage, Eudora, Emailer, Mailsmith and PowerMail running in a Mac
OS X 10.1.5 or 10.2.2 environment. The basic architecture of the tool is the same
for all five email clients. It consists of the SpamSieve application and a set of
AppleScripts that are provided for each client. To install it, you simply need to
copy the application to your /Applications folder and then copy the AppleScripts
that correspond to your location of the scripts for your client. The user's manual
provides the folder that should be used. The scripts are housed in a folder on the
.dmg image that follows the same hierarchy used for the email client. You may need
to restart the email client to get it to recognize the new script.
I use Entourage for my client, so I will describe my experience with SpamSieve in
that context. The interface differs very little among the five clients. The four
scripts used in Entourage are "Add Spam", "Add Good", "Move
Spam", and "Mark Spam". "Add Spam" simply adds a selection
of spam messages to the SpamSieve corpus by tokenizing the spam. "Add Good"
is the same as "Add Spam" but tags the information as a good message. "Move
Spam" places any identified spam in a folder to a folder named "Spam".
"Mark Spam" is used to simply tag the message as spam.
Figure 1 - Edit Rule window
Setting up SpamSieve
to handle spam is very easy. It is just a matter of creating a rule to invoke the
appropriate script. I chose to set up a separate spam folder rather than simply marking
it (see Figure 1 above). I used a number of items in my Inbox to start the corpus
and marked them as spam by selecting them and running the "Add Spam"script.
After getting things set up, I invoked the spam rule. It turned out to be a little
disconcerting because Entourage became unresponsive. After some force quits and restarts,
I realized that this aberrant behavior was due to the volume of messages it was asked
to process. It took about five minutes to process the 500 messages. Since that time,
I have not had any performance problems. Another adjustment that I made was to execute
SpamSieve as the last rule in my rule sequence. This allowed me extract any white
hat messages prior to testing for spam. The manual discusses setting up the rules
structure in far more detail.
Examples of Statistics window (Figure 2) and SpamSieve log (Figure 3)
SpamSieve has a number
of menu items that allow you to view how it is functioning. The Corpus window summarizes
the messages, showing spam versus good. The Statistics window (see Figure 2 above)
describes how well SpamSieve is functioning. The SpamSieve log (see Figure 3 above)
describes each action taken by SpamSieve.
Overall, I have found SpamSieve very easy to install and use. The filter was easy
to create by simply selecting the category of a selected set of messages. The performance
of the filter is very good with few false positives and negatives. Maintaining the
filter is intuitive and simple to do by adding a message to the spam or good category.
Despite a few problems that occurred early on, I found SpamSieve worth the $20 shareware
fee, since it has greatly simplified the management of the spam that I receive. If
you like Mail's spam filter, but like using any of the supported mail clients, then
I highly recommend buying SpamSieve.
- Easy to Install and
- Well Documented
- Processing a large
number of messages takes a long time
4 out of 5 Mice