Rule set 3 is the first to process every address. It puts each into a form that simplifies the tasks of other rule sets. The most common method is to have rule set 3 focus an address (place angle brackets around the host part). Then later rules don't have to search for the host part, because it is already highlighted. For example, consider trying to spot the recipient host in this mess:
uuhost!user%host1%host2
Here, user
is eventually intended to receive the mail
message on the host uuhost
. But where should sendmail
send the message first? As it happens,
sendmail selects uuhost
. Focusing
on this address therefore results in the following:
user%host1%host2<@uuhost.uucp>
Note that uuhost
was moved to the end, the !
was changed
to an @
, and .uucp
was appended.
The @
is there
so that all focused parts uniformly contain an @
just before the targeted host.
Later, when we take up postprocessing,
we'll show how rule set 4 moves the uuhost
back to the beginning
and replaces the !
.
The first rule in a typical rule set 3 handles addresses that are composed of empty angle brackets. These represent the special case of an empty or nonexistent address. Empty addresses should be turned into the address of the pseudo-user that bounces mail, Mailer-Daemon:
# handle "from:<>" special case R$*<>$* $@<@> empty becomes special
Here, empty angle brackets, no matter what surrounds them ($*
),
are rewritten to be a lone @
. Other rule sets later turn this
special token into $n
(which contains Mailer-Daemon
as its value).
Addresses can be legally expressed in only four formats: [2]
[2] Actually, we are fudging for simplicity. Addresses can appear in various permutations of those shown and we completely ignore the
list:members;
form of address.
address address (full name) <address> full name <address>
When sendmail preprocesses an address that is in the second format, it removes (and saves for later use) the full name from within the parentheses. The last two formats, however, contain additional characters and information that are not discarded during preprocessing. As a consequence, rule set 3 must take on the job of discarding the unwanted information:
# basic textual canonicalization R$*<$*<$*<$*>$*>$*>$* $4 3-level <> nesting R$*<$*<$*>$*>$* $3 2-level <> nesting R$*<$*>$* $2 basic RFC821/822 parsing
Here, we discard everything outside of and including the innermost pair of angle brackets. Three rules are required to do this because of the minimal-matching nature of the LHS operators (see Section 8.7.2, "Minimal Matching"). Consider trying to de-nest a three-level workspace using only a rule like the third:
the workspace A < B < C < D > C > B > A $* matches A < matches < $+ matches B < C < D > matches > $* matches C > B > A
Clearly, the result B<C<D
is not the value between the innermost
pair of angle brackets and will result in an address that produces
the error message:
Unbalanced '<'
John Halleck designed a clever alternative to the above traditional technique that is now included with V8 sendmail:
R$* $: < $1 > housekeeping <> R$+ < $* > < $2 > strip excess on left R< $* > $+ < $1 > strip excess on right R<> $@ < @ > MAIL FROM:<> case R< $+ > $: $1 remove housekeeping <>
Here, angle bracket pairs are stripped first from the left of an address, then from the right, and finally whatever is left must be the address.
The sendmail program must be able to handle addresses that are in route address syntax. Such addresses are in the form @A,@B:user@C (which means that mail should be sent first to A, then from A to B, and finally from B to C). [3] The commas are converted to colons for easier design of subsequent rules. They must be converted back to commas by rule set 4. Rule set 3 uses a simple rule to convert all commas to colons:
[3] Also see the
DontPruneRoutes
option in Section 34.8.20, DontPruneRoutes (R) and theF=d
delivery agent flag in Section 30.8.16, F=d.
# make sure list syntax is easy to parse R@ $+ , $+ @ $1 : $2 change all "," to ":"
The iterative nature of rules comes into play here. As long as there
is an @
followed by anything ($+
), then a comma,
then anything, this rule repeats, converting the comma to
a colon. The result is then carried
down to the next rule that focuses:
R@ $+ : $+ $@ <@ $1> : $2 focus route-addr
Once that host has angle brackets placed around it (is focused),
the job of rule set 3 ends, and it exits (the $@
prefix in the RHS).
A whole book is dedicated to the myriad forms of addressing that might face a site administrator: !%@:: A Directory of Electronic Mail Addressing & Networks by Donnalyn Frey and Rick Adams (O'Reilly & Associates, 1993). We won't duplicate that work here; rather, we point out that most such addresses are handled nicely by existing configuration files. Consider the format of a DECnet address:
host::user
One approach to handling such an address in rule set 3 is to convert it into the Internet user@host.domain form:
R$+ :: $+ $@ $2 @ $1.decnet
Here, we reverse the host
and user
and put them into
Internet form. The .decnet
can later be used by rule set 0
to select an appropriate delivery agent.
This is a simple example of a special address problem from the many that can develop. In addition to DECnet, for example, your site may have to deal with Xerox Grapevine addresses, X.400 addresses, or UUCP addresses. The best way to handle such addresses is to copy what others have done.
The last few rules in our illustration of rule set 3 are used to process the Internet-style user@domain address:
# find focus for @ syntax addresses R$+ @ $+ $: $1 <@ $2> focus on domain R$+ < $+ @ $+ > $1 $2 <@ $3> move gaze right R$+ <@ $+ > $@ $1 <@ $2> already focused
For an address like something@something, the first rule
focuses on all the tokens following the first
@
as the name of the host.
Recall that the $:
prefix to the RHS
prevents potentially infinite recursion. Assuming that the workspace started
with:
user@host1@host2
this first rewrite results in
user<@host1@host2>
The second rule (move gaze right
) then attempts to
fine-tune the focus by making sure only the rightmost @host
is selected. This rule can move the focus right, using recursion, and
can handle addresses that are as extreme as the following:
user<@host1@host2@host3@host4> becomes user@host1@host2@host3<@host4>
The third rule checks to see whether the workspace
has been focused. If it has, it returns the focused workspace
(the $@
prefix in the RHS), and its job is done.
Any address that has not been handled by rule set 3 is unchanged and probably not focused. Since rule set 0 expects all addresses to be focused so that it can select appropriate delivery agents, such unfocused addresses may bounce. Many configuration files allow local addresses (just a username) to be unfocused.