SCANMAIL(8) SCANMAIL(8)
NAME
scanmail, testscan - spam filters
SYNOPSIS
upas/scanmail [ options ] [ qer-args ] root mail sender
system rcpt-list
upas/testscan [ -avd ] [ -p patfile ] [ filename ]
DESCRIPTION
Scanmail accepts a mail message supplied on standard input,
applies a file of patterns to a portion of it, and dis-
patches the message based on the results. It exactly
replaces the generic queuing command qer(8) that is executed
from the rc(1) script /mail/lib/qmail in the mail processing
pipeline. Associated with each pattern is an action in
order of decreasing priority:
dump the message is deleted and a log entry is written
to /sys/log/smtpd
hold the message is placed in a queue for human inspec-
tion
log a line containing the matching portion of the mes-
sage is written to a log
If no pattern matches or only patterns with an action of log
match, the message is accepted and scanmail queues the mes-
sage for delivery. Scanmail meshes with the blocking facil-
ities of smtpd(6) to provide several layers of filtering on
gateway systems. In all cases the sender is notified that
the message has been successfully delivered, leaving the
sender unaware that the message has been potentially delayed
or deleted.
Scanmail accepts the arguments of qer(8) as well as the fol-
lowing:
-c Save a copy of each message in a randomly-named
file in directory /mail/copy.
-d Write debugging information to standard error.
-h Queue held messages by sending domain name. The
-q option must specify a root directory; messages
are queued in subdirectories of this directory.
If the -h option is not specified, messages are
accumulated in a subdirectory of /mail/queue.hold
named for the contents of /dev/user, usually none.
-n Messages are never held for inspection, but are
delivered. Also known as vacation mode.
Page 1 Plan 9 (printed 10/28/25)
SCANMAIL(8) SCANMAIL(8)
-p filename
Read the patterns from filename rather than
/mail/lib/patterns.
-q holdroot
Queue deliverable messages in subdirectories of
holdroot. This option is the same as the -q option
of qer(8) and must be present if the -h option is
given.
-s Save deleted messages. Messages are stored, one
per randomly-named file, in subdirectories of
/mail/queue.dump named with the date.
-t Test mode. The pattern matcher is applied but the
message is discarded and the result is not logged.
-v Print the highest priority match. This is useful
with the -t option for testing the pattern matcher
without actually sending a message.
Testscan is the command line version of scanmail. If
filename is missing, it applies the pattern set to the mes-
sage on standard input. Unlike scanmail, which finds the
highest priority match, testscan prints all matches in the
portion of the message under test. It is useful for testing
a pattern set or implementing a personal filter using the
pipeto file in a user's mail directory. Testscan accepts
the following options:
-a Print matches in the complete input message
-d Enable debug mode
-v Print the message after conversion to canonical form
(q.v.).
-p filename
Read the patterns from filename rather than
/mail/lib/patterns.
Canonicalization
Before pattern matching, both programs convert a portion of
the message header and the beginning of the message to a
canonical form. The amount of the header and message body
processed are set by compile-time parameters in the source
files. The canonicalization process converts letters to
lower-case and replaces consecutive spaces, tabs and newline
characters with a single space. HTML commands are deleted
except for the parameters following A HREF, IMG SRC, and IMG
BORDER directives. Additionally, the following MIME escape
sequences are replaced by their ASCII equivalents:
Escape Seq ASCII
---------- -----
=2e .
Page 2 Plan 9 (printed 10/28/25)
SCANMAIL(8) SCANMAIL(8)
=2f /
=20 <space>
=3d =
and the sequence =<newline> is elided. Scanmail assembles
the sender, destination domain and recipient fields of the
command line into a string that is subjected to the same
canonical processing. Following canonicalization, the com-
mand line and the two long strings containing the header and
the message body are passed to the matching engine for anal-
ysis.
Pattern Syntax
The matching engine compiles the pattern set and matches it
to each canonicalized input string. Patterns are specified
one per line as follows:
{*}action: pattern-spec {~~override...~~override}
On all lines, a # introduces a comment; there is no way to
escape this character.
Lines beginning with * contain a pattern-spec that is a
string; otherwise, the pattern-spec is a regular expression
in the style of regexp(6). Regular expression matching is
many times less efficient than string matching, so it is
wiser to enumerate several similar strings than to combine
them into a regular expression. The action is a keyword
terminated by a : and separated from the pattern by optional
white-space. It must be one of the following:
dump if the pattern matches, the message is deleted.
If the -s command line option is set, the message
is saved.
hold if the pattern matches, the message is queued in a
subdirectory of /mail/queue.hold for manual
inspection. After inspection, the queue can be
swept manually using runq (see qer(8)) to deliver
messages that were inadvertently matched.
header this is the same as the hold action, except the
pattern is only applied to the message header.
This optimization is useful for patterns that
match header fields that are unlikely to be pre-
sent in the body of the message.
line the sender and a section of the message around the
match are written to the file /sys/log/lines. The
message is always delivered.
loff patterns of this type are applied only to the
canonicalized command line. When a match occurs,
Page 3 Plan 9 (printed 10/28/25)
SCANMAIL(8) SCANMAIL(8)
all patterns with line actions are disabled. This
is useful for limiting the size of the log file by
excluding repetitive messages, such as those from
mailing lists.
Patterns are accumulated into pattern sets sharing the same
action. The matching engine applies the dump pattern set
first, then the header and hold pattern sets, and finally
the line pattern set. Each pattern set is applied three
times: to the canonicalized command line, to the message
header, and finally to the message body. The ordering of
patterns in the pattern file is insignificant.
The pattern-spec is a string of characters terminated by a
newline, # or override indicator, ~~. Trailing white-space
is deleted but patterns containing leading or trailing
white-space can be enclosed in double-quote characters. A
pattern containing a double-quote must be enclosed in
double-quote characters and preceded by a backslash. For
example, the pattern
"this is not \"spam\""
matches the string this is not "spam". The pattern-spec is
followed by zero or more override strings. When the spe-
cific pattern matches, each override is applied and if one
matches, it cancels the effect of the pattern. Overrides
must be strings; regular expressions are not supported.
Each override is introduced by the string ~~ and continues
until a subsequent ~~, # or newline, white-space included.
A ~~ immediately followed by a newline indicates a line con-
tinuation and further overrides continue on the following
line. Leading white-space on the continuation line is
ignored. For example,
*hold: sex.com~~essex.com~~sussex.com~~sysex.com~~
lasex.com~~cse.psu.edu!owner-9fans
matches all input containing the string sex.com except for
messages that also contain the strings in the override list.
Often it is desirable to override a pattern based on the
name of the sender or recipient. For this reason, each
override pattern is applied to the header and the command
line as well as the section of the canonicalized input con-
taining the matching data. Thus a pattern matching the com-
mand line or the header searches both the command line and
the header for overrides while a match in the body searches
the body, header and command line for overrides.
The structure of the pattern file and the matching algorithm
define the strategy for detecting and filtering unwanted
messages. Ideally, a hold pattern selects a message for
Page 4 Plan 9 (printed 10/28/25)
SCANMAIL(8) SCANMAIL(8)
inspection and if it is determined to be undesirable, a spe-
cific dump pattern is added to delete further instances of
the message. Additionally, it is often useful to block the
sender by updating the smtpd control file.
In this regime, patterns with a dump action, generally match
phrases that are likely to be unique. Patterns that hold a
message for inspection match phrases commonly found in unde-
sirable material and occasionally in legitimate messages.
Patterns that log matches are less specific yet. In all
cases the ability to override a pattern by matching another
string, allows repetitive messages that trigger the pattern,
such as mailing lists, to pass the filter after the first
one is processed manually. The -s option allows deleted
messages to be salvaged by either manual or semi-automatic
review, supporting the specification of more aggressive pat-
terns. Finally, the utility of the pattern matcher is not
confined to filtering spam; it is a generally useful admin-
istrative tool for deleting inadvertently harmful messages,
for example, mail loops, stuck senders or viruses. It is
also useful for collecting or counting messages matching
certain criteria.
FILES
/mail/lib/patterns default pattern file
/sys/log/smtpd log of deleted messages
/mail/log/lines file where log matches are logged
/mail/queue/* directories where legitimate messages
are queued for delivery
/mail/queue.hold directory where held messages are queued
for inspection
/mail/queue.dump/* directory where dumped messages are
stored when the -s command line option
is specified.
/mail/copy/* directory where copies of all incoming
messages are stored.
SOURCE
/sys/src/cmd/upas/scanmail
SEE ALSO
mail(1), qer(8), smtpd(6)
BUGS
Testscan does not report a match when the body of a message
contains exactly one line.
Page 5 Plan 9 (printed 10/28/25)