Now that we've given you a nice set of tools, let's talk about how you can use them to diagnose real problems. There are some problems that are easy to recognize and correct. We should cover these as a matter of course - they're some of the most common problems because they're caused by some of the most common mistakes. Here are the contestants, in no particular order. We call 'em our "Unlucky Thirteen."
The main symptom of this problem is that slave name servers don't pick up any changes you make to the zone's db file on the primary. The slaves think the zone data hasn't changed, since the serial number is still the same.
How do you check whether or not you remembered to increment the serial number? Unfortunately, that's not so easy. If you don't remember what the old serial number was, and your serial number gives you no indication of when it was updated, there's no direct way to tell whether it's changed.[1] When you signal the primary, it will load the updated zone file regardless of whether you've changed the serial number. It will check the file's timestamp, see that it's been modified since it last loaded the data, and read the file. About the best you can do is to use nslookup to compare the data returned by the primary and by a slave. If they return different data, you probably forgot to increment the serial number. If you can remember a recent change you made, you can look for that data. If you can't remember a recent change, you could try transferring the zone from a primary and from a slave, sorting the results, and using diff to compare them.
[1] On the other hand, if you encode the date into the serial number, as many people do (e.g., 1998010500 is the first rev of data on January 5, 1998), you may be able to tell at a glance whether you updated the serial number when you made the change.
The good news is that, although determining whether the zone was transferred is tricky, making sure the zone is transferred is simple. Just increment the serial number on the primary's copy of the db file and signal the primary to reload. The slaves should pick up the new data within their refresh interval, or sooner if they use NOTIFY. If you want to make sure the slaves can transfer the new data, you can execute named-xfer by hand (on the slaves, naturally):
#/etc/named-xfer -z movie.edu -f db.movie -s 0 terminator
#echo $?
If named-xfer returns 1, the zone was transferred successfully. Other return values indicate that no zone was transferred, either because of an error or because the slave thought the zone was up-to-date. (See Section 13.2.1, "How to Use named-xfer," earlier in this chapter, for more details.)
There's another variation on the "forgot to increment the serial number" line. We see it in environments where administrators use tools like h2n to create db files from the host table. With scripts like h2n, it's temptingly easy to delete old db files and create new ones from scratch. Some administrators do this occasionally because they mistakenly believe that data in the old db files can creep into the new ones. The problem with deleting the db files is that, without the old db file to read for the current serial number, h2n starts over at serial number 1. If your primary's serial number rolls all the way back to 1 from 598 or what-have-you, the slaves (versions 4.8.3 and earlier) don't complain; they just figure they're all caught up and don't need zone transfers. A 4.9 or later slave server, however, is ever watchful, and will emit a syslog error message warning you that something might be wrong:
Jun 7 20:14:26 wormhole named[29618]: Zone "movie.edu"
(class 1) SOA serial# (1) rcvd from [192.249.249.3]
is < ours (112)
So if the serial number on the primary looks suspiciously low, check the serial number on the slaves, too, and compare them:
%nslookup
Default Server: terminator.movie.edu Address: 192.249.249.3 >set q=soa
>movie.edu.
Server: terminator.movie.edu Address: 192.249.249.3 movie.edu origin = terminator.movie.edu mail addr = al.robocop.movie.edu serial = 1 refresh = 10800 (3 hours) retry = 3600 (1 hour) expire = 604800 (7 days) minimum ttl = 86400 (1 day) >server wormhole.movie.edu.
Default Server: wormhole.movie.edu Addresses: 192.249.249.1, 192.253.253.1 >movie.edu.
Server: wormhole.movie.edu Addresses: 192.249.249.1, 192.253.253.1 movie.edu origin = terminator.movie.edu mail addr = al.robocop.movie.edu serial = 112 refresh = 10800 (3 hours) retry = 3600 (1 hour) expire = 604800 (7 days) minimum ttl = 86400 (1 day)
wormhole, as a movie.edu slave, should never have a larger serial number than the primary master, so clearly something's amiss.
This problem is really easy to spot, by the way, with the tool we'll write in Chapter 14, Programming with the Resolver and Name Server Library Routines, coming up next.
Occasionally, you may forget to signal your primary master name server after making a change to the conf file or to the db file. The name server won't know to load the new data - it doesn't automatically check the timestamp of the file and notice that it changed. Consequently, any changes you've made won't be reflected in the name server's data: new zones won't be loaded, and new records won't percolate out to the slaves.
To check when you last signaled the name server to reload, scan the syslog output for the last entry like this:
Mar 8 17:22:08 terminator named[22317]: reloading nameserver
This is the last time you sent a HUP signal to the name server. If you killed and then restarted the name server, you'll see an entry like this:
Mar 8 17:22:08 terminator named[22317]: restarted
or, on a 4.9 name server:
Mar 8 17:22:08 terminator named[22317]: starting
If the time of the restart doesn't correlate with the time you made the last change, signal the name server to reload its data again. And check that you incremented the serial numbers on db files you changed, too.
If a slave name server can't get the current serial number for a zone from its master name server, it'll log a message like the following via syslog:
Jan 6 11:55:25 wormhole named[544]: Err/TO getting serial# for "movie.edu"
On a BIND 4 name server, that looks like this:
Mar 3 8:19:34 wormhole named[22261]: zoneref: Masters for secondary zone movie.edu unreachable
If you let this problem fester, the slave will expire the zone:
Mar 8 17:12:43 wormhole named[22261]: secondary zone "movie.edu" expired
Once the zone has expired, you'll start getting SERVFAIL errors when you query the name server for data in the zone:
%nslookup robocop wormhole.movie.edu.
Server: wormhole.movie.edu Addresses: 192.249.249.1, 192.253.253.1 *** wormhole.movie.edu can't find robocop.movie.edu: Server failed
There are three leading causes of this problem: a loss in connectivity to the master server due to network failure, an incorrect IP address for the master server in the conf file, and a syntax error in the zone data file on the master server. First check the conf file's entry for the zone and see what IP address the slave is attempting to load from:
zone "movie.edu" { type slave; file "db.movie"; masters { 192.249.249.3; }; };
On a BIND 4 server, the directive would look like this:
secondary movie.edu 192.249.249.3 db.movie
Make sure that's really the IP address of the master name server. If it is, check connectivity to that IP address:
%ping 192.249.249.3 -n 10
PING 192.249.249.3: 64 byte packets ----192.249.249.3 PING Statistics---- 10 packets transmitted, 0 packets received, 100% packet loss
If the master server isn't reachable, make sure that the server's host is really running (e.g., is powered on, etc.), or look for a network problem. If the server is reachable, make sure named is running on the host, and that you can manually transfer the zone:
#named-xfer -z movie.edu -f /tmp/db.movie -s 0 192.249.249.3
#echo $?
2
A return code of 2 means that an error occurred. Check to see if there is a syslog message. In this case there was a message:
Jan 6 14:56:07 zardoz named-xfer[695]: record too short from [192.249.249.3], zone movie.edu
At first glance, this error looks like a truncation problem. The real problem is easier to see if you use nslookup:
%nslookup - terminator.movie.edu
Default Server: terminator.movie.edu Address: 192.249.249.3 >ls movie.edu
--This attempts a zone transfer [terminator.movie.edu] *** Can't list domain movie.edu: Query refused
What has happened here is that named is refusing to allow you to transfer its zone data. The remote server has secured its zone data with the allow-transfer substatement, the secure_zone resource record, or xfrnets boot file directive.
If the master server is responding as not authoritative for the zone, you'll see a message like this:
Jan 6 11:58:36 zardoz named[544]: Err/TO getting serial# for "movie.edu" Jan 6 11:58:36 zardoz named-xfer[793]: [192.249.249.3] not authoritative for movie.edu, SOA query got rcode 0, aa 0, ancount 0, aucount 0
If this is the correct master server, the server should be authoritative for the zone. This probably indicates that the master had a problem loading the zone, usually because of a syntax error in the zone data file. Contact the administrator of the master server and have him check his syslog output for indications of a syntax error (see problem 5, later in this chapter).
Because the mappings from host names to IP addresses are disjointed from the mappings from IP addresses to host names in DNS, it's easy to forget to add a PTR record for a new host. Adding the A record is intuitive, but many people who are used to host tables assume that adding an address record takes care of the reverse mapping, too. That's not true - you need to add a PTR record for the host to the appropriate in-addr.arpa domain.
Forgetting to add the PTR record for a host usually causes that host to fail authentication checks. For example, users on the host won't be able to rlogin to other hosts without specifying a password, and rsh or rcp to other hosts simply won't work. The servers these commands talk to need to be able to map the connection's IP address to a domain name to check .rhosts and hosts.equiv. These users' connections will cause entries like this to be syslogged:
Aug 15 17:32:36 terminator inetd[23194]: login/tcp: Connection from unknown (192.249.249.23)
Also, many large ftp archives, including ftp.uu.net, refuse anonymous ftp access to hosts whose IP addresses don't map back to domain names. ftp.uu.net's ftp server emits a message that reads, in part:
530- Sorry, we're unable to map your IP address 140.186.66.1 to a hostname 530- in the DNS. This is probably because your nameserver does not have a 530- PTR record for your address in its tables, or because your reverse 530- nameservers are not registered. We refuse service to hosts whose 530- names we cannot resolve.
That makes the reason you can't use anonymous ftp pretty evident. Other ftp sites, however, don't bother printing informative messages; they simply deny service.
nslookup is handy for checking whether you've forgotten the PTR record or not:
%nslookup
Default Server: terminator.movie.edu Address: 192.249.249.3 >beetlejuice
--Check for a hostname-to-address mapping Server: terminator.movie.edu Address: 192.249.249.3 Name: beetlejuice.movie.edu Address: 192.249.249.23 >192.249.249.23
--Now check for a corresponding address-to-hostname mapping Server: terminator.movie.edu Address: 192.249.249.3 *** terminator.movie.edu can't find 192.249.249.23: Non-existent domain
On the primary for 249.249.192.in-addr.arpa, a quick check of the db.192.249.249 file will tell you if the PTR record hasn't been added to the db file yet, or if the name server hasn't been signaled to load the file. If the name server having trouble is a slave for the zone, check that the serial number was incremented on the primary and that the slave has had enough time to load the zone.
Syntax errors in the conf file and in zone database files are also relatively common (more or less, depending on the experience of the administrator). Generally, an error in the conf file will cause the name server to fail to load one or more zones. Some typos in the options statement will cause the name server to fail to start at all, and to log an error like this via syslog:
Jan 6 11:59:29 terminator named[544]: can't change directory to /var/name: No such file or directory
Note that you won't see an error message when you try to start named on the command line, but named won't stay running for long.
If the syntax error is in a less important line in the boot file - say, in zone statement - only that zone will be affected. Usually, the name server will not be able to load the zone at all (say, you misspell "master" or the name of the data file, or you forget to put quotes around the file name or domain name). This would produce syslog output like:
Jan 6 12:01:36 terminator named[841]: /etc/named.conf:10: syntax error near 'movie.edu'
If a db file contains a syntax error, yet the name server succeeds in loading the zone, it will either answer as "non-authoritative" for all data in the zone or will return a SERVFAIL error for lookups in the zone:
%nslookup carrie
Server: terminator.movie.edu Address: 192.249.249.3 Non-authoritative answer: Name: carrie.movie.edu Address: 192.253.253.4
Here's the syslog message produced by the syntax error that caused this problem:
Jan 6 15:07:46 huskymo named[693]: db.movie:11: Priority error (postmanrings2x.movie.edu.) Jan 6 15:07:46 huskymo named[693]: master zone "movie.edu" (IN) rejected due to errors (serial 1997010600) Jan 6 15:07:46 huskymo named[693]: slave zone "movie.edu" (IN) removed
If you looked in the db file for the problem, you'd find this record:
postmanrings2x IN MX postmanrings2x.movie.edu.
The MX record is missing the preference field, which causes the error.
Note that unless you correlate the lack of authority (when you expect the name server to be authoritative) with a problem, or scan your syslog file assiduously, you might never notice the syntax error!
Starting with BIND 4.9.4, an "invalid" host name can be a syntax error:
Jan 6 12:04:10 terminator named[841]: owner name "ID_4.movie.edu" IN (primary) is invalid - rejecting Jan 6 12:04:10 terminator named[841]: db.movie:11: owner name error Jan 6 12:04:10 terminator named[841]: db.movie:11: Database error (a) Jan 6 12:04:10 terminator named[841]: master zone "movie.edu" (IN) rejected due to errors (serial 1997010600)
It's very easy to leave off trailing dots when editing a db file. Since the rules for when to use them change so often (don't use them in the boot file, don't use them in resolv.conf, do use them in db files to override $ORIGIN...), it's hard to keep them straight. These resource records:
zorba IN MX 10 zelig.movie.edu movie.edu IN NS terminator.movie.edu
really don't look that odd to the untrained eye, but they probably don't do what they're intended to. In the db.movie file, they'd be equivalent to:
zorba.movie.edu. IN MX 10 zelig.movie.edu.movie.edu. movie.edu.movie.edu. IN NS terminator.movie.edu.movie.edu.
unless the origin were explicitly changed.
If you omit a trailing dot after a domain name in the resource record's data (as opposed to leaving off a trailing dot in the resource record's name), you usually end up with wacky NS or MX records:
%nslookup -type=mx zorba.movie.edu.
Server: terminator.movie.edu Address: 192.249.249.3 zorba.movie.edu preference = 10, mail exchanger = zelig.movie.edu.movie.edu zorba.movie.edu preference = 50, mail exchanger = postmanrings2x.movie.edu.movie.edu
The cause of this should be fairly clear from the nslookup output. But if you forget the trailing dot on the domain name field in a record (as in the movie.edu NS record above), spotting your mistake might not be as easy. If you try to look up the record with nslookup, you won't find it under the name you thought you used. Dumping your name server's database may help you root it out:
$ORIGIN edu.movie.edu. movie IN NS terminator.movie.edu.movie.edu.
The $ORIGIN
line looks odd enough to stand out.
If, for some reason, you forget to install a cache file on your host, or if you accidentally delete it, your name server will be unable to resolve names outside of its authoritative data. This behavior is easy to recognize using nslookup, but be careful to use full, dot-terminated domain names, or else the search list may cause misleading failures.
%nslookup
Default Server: terminator.movie.edu Address: 192.249.249.3 >ftp.uu.net.
- A lookup of a name outside your name server's authoritative data - causes a SERVFAIL error... Server: terminator.movie.edu Address: 192.249.249.3 *** terminator.movie.edu can't find ftp.uu.net.: Server failed
A lookup of a name in your name server's authoritative data returns a response:
>wormhole.movie.edu.
Server: terminator.movie.edu Address: 192.249.249.3 Name: wormhole.movie.edu Addresses: 192.249.249.1, 192.253.253.1 >^D
To confirm your suspicion that the cache data are missing, check the syslog output for an error like this:
Jan 6 15:10:22 terminator named[764]: No root nameservers for class IN
Class 1, you'll remember, is the IN, or Internet, class. This error indicates that because no cache data were available, no root name servers were found.
Though the Internet is more reliable today than it was back in the wild and woolly days of the ARPANET, network outages are still relatively common. Without "lifting the hood" and poking around in debugging output, these failures usually look like poor performance:
%nslookup nisc.sri.com.
Server: terminator.movie.edu Address: 192.249.249.3 *** Request to terminator.movie.edu timed out ***
If you turn on name server debugging, though, you'll see that your name server, anyway, is healthy. It received the query from the resolver, sent the necessary queries, and waited patiently for a response. It just didn't get one. Here's what the debugging output might look like:
Debug turned ON, Level 1
Here nslookup sends the first query to our local name server, for the IP address of nisc.sri.com. You can tell it's not another name server because the query is received from a port other than 53, the name server's port. Notice that the query is forwarded to another name server, and when no answer is received, it is resent to a different name server:
datagram from [192.249.249.3].1051, fd 5, len 30 req: nlookup(nisc.sri.com) id 18470 type=1 class=1 req: missed 'nisc.sri.com' as 'com' (cname=0) forw: forw -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms retry 4 sec resend(addr=1 n=0) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms
Now nslookup is getting impatient, and it queries our local name server again. Notice that it uses the same port. The local name server ignores the duplicate query and tries forwarding the query two more times:
datagram from [192.249.249.3].1051, fd 5, len 30 req: nlookup(nisc.sri.com) id 18470 type=1 class=1 req: missed 'nisc.sri.com' as 'com' (cname=0) resend(addr=2 n=0) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms resend(addr=3 n=0) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms
nslookup queries the local name server again, and the name server fires off more queries:
datagram from [192.249.249.3].1051, fd 5, len 30 req: nlookup(nisc.sri.com) id 18470 type=1 class=1 req: missed 'nisc.sri.com' as 'com' (cname=0) resend(addr=4 n=0) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms resend(addr=0 n=1) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms resend(addr=1 n=1) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms resend(addr=2 n=1) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms resend(addr=3 n=1) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms resend(addr=4 n=1) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms resend(addr=0 n=2) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms Debug turned OFF
From the debugging output, you can extract a list of the IP addresses of the name servers that your name server tried to query, and then check your connectivity to them. Odds are, ping won't have much better luck than your name server did:
%ping 198.41.0.4 -n 10
--ping first name server queried PING 198.41.0.4: 64 byte packets ----198.41.0.4 PING Statistics---- 10 packets transmitted, 0 packets received, 100% packet loss %ping 128.9.0.107 -n 10
--ping second name server queried PING 128.9.0.107: 64 byte packets ----128.9.0.107 PING Statistics---- 10 packets transmitted, 0 packets received, 100% packet loss
If it does, you should check that the remote name servers are really running. You might also check whether your Internet firewall is inadvertently blocking your name server's queries. If you've upgraded to BIND 8 recently, see the sidebar "A Gotcha with BIND 8 and Packet Filtering Firewalls", and see if it applies to you.
If ping can't get through, either, all that's left to do is to locate the break in the network. Utilities like traceroute and ping's record route option can be very helpful in determining whether the problem is on your network, the destination network, or somewhere in the middle.
You should also use your own common sense when tracking down the break. In this trace, for example, the remote name servers your name server tried to query are all root name servers. (You might have had their PTR records cached somewhere, so you could find out their domain names.) Now it's not very likely that each root's local network went down, nor is it likely that the Internet's commercial backbone networks collapsed entirely. Occam's razor says that the simplest condition that could cause this behavior - namely, the loss of your network's link to the Internet - is the most likely cause.
Even though the InterNIC does its best to process your requests as quickly as possible, it may take a day or two for your domain's delegation to appear in the root name servers. If the InterNIC doesn't manage your parent domain, your mileage may vary. Some parents are quick and responsible, others are slow and inconsistent. Just like in real life, though, you're stuck with them.[2]
[2] Until the GTLD Memorandum of Understanding is adopted, that is. See http://www.gtld-mou.org/.
Until your delegation data appear in your parent domain's name servers, your name servers will be able to look up data in the Internet domain name space, but no one else on the Internet (outside of your domain) will know how to look up data in your name space.
That means that even though you can send mail outside of your domain, the recipients won't be able to reply to it. Furthermore, no one will be able to telnet to, ftp to, or even ping your hosts by name.
Remember that this applies equally to any in-addr.arpa subdomains you may run. Until the parent delegates those subdomains to your servers, name servers on the Internet won't be able to reverse map addresses on your networks.
To determine whether or not your zone's delegation has made it into your parent zone's name servers, query a parent name server for the NS records for your zone. If the parent name server has the data, any name server on the Internet can find it:
%nslookup
Default Server: terminator.movie.edu Address: 192.249.249.3 >server a.root-servers.net.
--Query a root name server Default Server: a.root-servers.net Address: 198.41.0.4 >set norecurse
- Instruct the server to answer out of its own data >set type=ns
- and to look for NS records >249.249.192.in-addr.arpa.
- for 249.249.192.in-addr.arpa Server: a.root-servers.net Address: 198.41.0.4 *** a.root-servers.net can't find 249.249.192.in-addr.arpa.: Non-existent domain
Here, the delegation clearly hasn't been added yet. You can either wait patiently, or if an unreasonable amount of time has passed since you requested delegation from your parent, contact your parent and ask what's up.
Incorrect subdomain delegation is another familiar problem on the Internet. Keeping delegation up to date requires human intervention - informing your parent zone's administrator of changes to your set of authoritative name servers. Consequently, delegation information often becomes inaccurate as administrators make changes without letting their parents know. Far too many administrators believe that setting up delegation is a one-shot deal: they let their parents know which name servers are authoritative once, when they set up their zone, and then they never talk to them again. They don't even call on Mother's Day.
An administrator may add a new name server, decommission another, and change the IP address of a third, all without telling the parent zone's administrator. Gradually, the number of name servers correctly delegated to by the parent zone dwindles. In the best case, this leads to long resolution times, as querying name servers struggle to find an authoritative name server for the zone. If the delegation information becomes badly out of date, and the last authoritative name server host is brought down for maintenance, the information within the zone will be inaccessible.
If you suspect bad delegation from your parent to your zone, from your zone to one of your children, or from a remote zone to one of its children, you can check with nslookup:
%nslookup
Default Server: terminator.movie.edu Address: 192.249.249.3 >server a.root-servers.net.
- Set server to the parent name server you suspect has bad delegation Default Server: a.root-servers.net Address: 198.41.0.4 >set type=ns
- Look for NS records >hp.com.
- for the zone in question Server: a.root-servers.net Address: 198.41.0.4 Non-authoritative answer: hp.com nameserver = RELAY.HP.COM hp.com nameserver = HPLABS.HPL.HP.COM hp.com nameserver = NNSC.NSF.NET hp.com nameserver = HPSDLO.SDD.HP.COM Authoritative answers can be found from: hp.com nameserver = RELAY.HP.COM hp.com nameserver = HPLABS.HPL.HP.COM hp.com nameserver = NNSC.NSF.NET hp.com nameserver = HPSDLO.SDD.HP.COM RELAY.HP.COM internet address = 15.255.152.2 HPLABS.HPL.HP.COM internet address = 15.255.176.47 NNSC.NSF.NET internet address = 128.89.1.178 HPSDLO.SDD.HP.COM internet address = 15.255.160.64 HPSDLO.SDD.HP.COM internet address = 15.26.112.11
Let's say you suspect that the delegation to hpsdlo.sdd.hp.com is incorrect. You now query hpsdlo for data in the hp.com zone and check the answer:
>server hpsdlo.sdd.hp.com.
Default Server: hpsdlo.sdd.hp.com Addresses: 15.255.160.64, 15.26.112.11 >set norecurse
>set type=soa
>hp.com.
Server: hpsdlo.sdd.hp.com Addresses: 15.255.160.64, 15.26.112.11 Non-authoritative answer: hp.com origin = relay.hp.com mail addr = hostmaster.hp.com serial = 1001462 refresh = 21600 (6 hours) retry = 3600 (1 hour) expire = 604800 (7 days) minimum ttl = 86400 (1 day) Authoritative answers can be found from: hp.com nameserver = RELAY.HP.COM hp.com nameserver = HPLABS.HPL.HP.COM hp.com nameserver = NNSC.NSF.NET RELAY.HP.COM internet address = 15.255.152.2 HPLABS.HPL.HP.COM internet address = 15.255.176.47 NNSC.NSF.NET internet address = 128.89.1.178
If hpsdlo really were authoritative, it would have responded with an authoritative answer. The administrator of the hp.com zone can tell you whether hpsdlo should be an authoritative name server for hp.com, so that's who you should contact.
Another common symptom of this is a "lame server" error message:
Oct 1 04:43:38 terminator named[146]: Lame server on '40.234.23.210.in-addr.arpa' (in '210.in-addr.arpa'?): [198.41.0.5].53 'RS0.INTERNIC.NET': learnt(A=198.41.0.21,NS=128.63.2.53)
Here's how to read that: your name server was referred by the name server at 128.63.2.53 to the name server at 198.41.0.5 for a name in the domain 210.in-addr.arpa specifically 40.234.23.210.in-addr.arpa. The server at 198.41.0.5's response indicated that it wasn't, in fact, authoritative for 210.in-addr.arpa, and therefore either the delegation that 128.63.2.53 gave you is wrong or the server at 198.41.0.5 is misconfigured.
Despite the resolv.conf file's simple syntax, people do occasionally make mistakes when editing it. And, unfortunately, lines with syntax errors in resolv.conf are silently ignored by the resolver. The result is usually that some part of your intended configuration doesn't take effect: either your domain or search list isn't set correctly, or the resolver won't query one of the name servers you configured it to query. Commands that rely on the search list won't work, your resolver won't query the right name server(s), or it won't query a name server at all.
The easiest way to check whether your resolv.conf
file is having the intended effect is to run nslookup. nslookup
will kindly report the default domain and search list it derives from
resolv.conf, plus the name server it's querying, when you type set
all
, as we showed you in Chapter 11, nslookup:
%nslookup
Default Server: terminator.movie.edu Address: 192.249.249.3 >set all
Default Server: terminator.movie.edu Address: 192.249.249.3 Set options: nodebug defname search recurse nod2 novc noignoretc port=53 querytype=A class=IN timeout=5 retry=4 root=ns.nic.ddn.mil. domain=movie.edu srchlist=movie.edu >
Check that the output of set all is what you expect, given your resolv.conf file. For example, if you'd set search fx.movie.edu movie.edu in resolv.conf, you'd expect to see:
domain=fx.movie.edu srchlist=fx.movie.edu/movie.edu
in the output. If you don't see what you're expecting, look carefully at resolv.conf. If you don't see anything obvious, look for nonprinting characters (with vi's set list command, for example). Watch out for trailing spaces, especially; a trailing space after the domain name will set the default domain to include a space. No real domain names actually end with spaces, so all of your non-dot-terminated lookups will fail.
Failing to set your default domain is another old standby gaffe. You can set it implicitly, by setting your hostname to your host's fully qualified domain name, or explicitly, in resolv.conf. The characteristics of an unset default domain are straightforward: folks who use single-label names (or abbreviated domain names) in commands get no joy:
%telnet br
br: No address associated with name %telnet br.fx
br.fx: No address associated with name %telnet br.fx.movie.edu
Trying... Connected to bladerunner.fx.movie.edu. Escape character is '^]'. HP-UX bladerunner.fx.movie.edu A.08.07 A 9000/730 (ttys1) login:
You can use nslookup to check this one, much as you do when you suspect a syntax error in resolv.conf:
%nslookup
Default Server: terminator.movie.edu Address: 192.249.249.3 >set all
Default Server: terminator.movie.edu Address: 192.249.249.3 Set options: nodebug defname search recurse nod2 novc noignoretc port=53 querytype=A class=IN timeout=5 retry=4 root=ns.nic.ddn.mil. domain= srchlist=
Notice that neither the local domain nor the search list is set. You can also track this down by enabling debugging on the name server. (This, of course, requires access to the name server, which may not be running on the host the problem's affecting.) Here's how the debugging output might look after trying those telnet commands:
Debug turned ON, Level 1 datagram from [192.249.249.3].1057, fd 5, len 20 req: nlookup(br) id 27974 type=1 class=1 req: missed 'br' as '' (cname=0) forw: forw -> [198.41.0.4].53 ds=7 nsid=61691 id=27974 0ms retry 4 sec datagram from [198.41.0.4].53, fd 5, len 20 ncache: dname br, type 1, class 1 send_msg -> [192.249.249.3].1057 (UDP 5) id=27974 datagram from [192.249.249.3].1059, fd 5, len 23 req: nlookup(br.fx) id 27975 type=1 class=1 req: missed 'br.fx' as '' (cname=0) forw: forw -> [128.9.0.107].53 ds=7 nsid=61692 id=27975 0ms retry 4 sec datagram from [128.9.0.107].53, fd 5, len 23 ncache: dname br.fx, type 1, class 1 send_msg -> [192.249.249.3].1059 (UDP 5) id=27975 datagram from [192.249.249.3].1060, fd 5, len 33 req: nlookup(br.fx.movie.edu) id 27976 type=1 class=1 req: found 'br.fx.movie.edu' as 'br.fx.movie.edu' (cname=0) req: nlookup(bladerunner.fx.movie.edu) id 27976 type=1 class=1 req: found 'bladerunner.fx.movie.edu' as 'bladerunner.fx.movie.edu' (cname=1) ns_req: answer -> [192.249.249.3].1060 fd=5 id=27976 size=183 Local Debug turned OFF
Contrast this with the debugging output produced by the application of the search list in Chapter 12. The only names looked up here are exactly what the user typed, with no domains appended at all. Clearly the search list isn't being applied.
One problem we've seen increasingly often in the DNS newsgroups is the "response from unexpected source." This was once called a Martian response: it's a response that comes from an IP address other than the one your server sent a query to. When a BIND name server sends a query to a remote server, BIND conscientiously makes sure that answers come only from the IP addresses on that server. This helps minimize the possibility of accepting spoofed responses. BIND is equally demanding of itself: a BIND server makes every effort to reply via the same network interface that it received a query on.
Here's the error message you'd see upon receiving a possibly unsolicited response:
Mar 8 17:21:04 terminator named[235]: Response from unexpected source ([205.199.4.131].53)
This can mean one of two things: either someone is trying to spoof your name server, or - more likely - you sent a query to an older BIND server or a different make of name server that's not as assiduous about replying from the same interface it receives queries on.