The risks of eavesdropping affect all Internet protocols, but are of particular concern on the World Wide Web, where sensitive documents and other kinds of information, such as credit card numbers, may be transmitted. There are only two ways to protect information from eavesdropping. The first is to assure that the information travels over a physically secure network (which the Internet is not). The second is to encrypt the information so that it can only be decrypted by the intended recipient.
Another form of eavesdropping that is possible is traffic analysis. In this type of eavesdropping, an attacker learns about the transactions performed by a target, without actually learning the content. As we will see below, the log files kept by Web servers are particularly vulnerable to this type of attack.
Information sent over the Internet must be encrypted to be protected from eavesdropping. There are four ways in which information sent by the Web can be encrypted:
Link encryption. The long-haul telephone lines that are used to carry the IP packets can be encrypted. Organizations can also use encrypting routers to automatically encrypt information sent over the Internet that is destined for another office. Link encryption provides for the encryption of all traffic, but it can only be performed with prior arrangement. Link encryption has traditionally been very expensive, but new generations of routers and firewalls are being produced that incorporate this feature.
Document encryption. The documents that are placed on the Web server can be encrypted with a system such as PGP. Although this encryption provides for effective privacy protection, it is cumbersome because it requires the documents to be specially encrypted before they are placed on the server and they must be specially decrypted when they are received.
SSL (Secure Socket Layer). SSL is a system designed by Netscape Communications that provides an encrypted TCP/IP pathway between two hosts on the Internet. SSL can be used to encrypt any TCP/IP protocol, such as HTTP, TELNET, or FTP.
SSL can use a variety of public-key and token-based systems for exchanging a session key. Once a session key is exchanged, it can use a variety of secret key algorithms. Currently, programs that use SSL that are distributed for anonymous FTP on the Internet and that are sold to international markets are restricted to using a 40-bit key, which should not be considered secure.
A complete description of the SSL protocol can be found at the site http://home.netscape.com/newsref/std/SSL.html.
SHTTP (Secure HTTP). Secure HTTP is an encryption system for HTTP designed by Commerce Net. SHTTP only works with HTTP.
A complete description of the SHTTP protocol can be found at http://www.eit.com/projects/s-http/.
It is widely believed that most commercial Web servers that seek to use encryption will use either SSL or SHTTP. Unlike the other options above, both SSL and SHTTP require special software to be running on both the Web server and in the Web browser. This software will likely be in most commercial Web browsers within a few months after this book is published, although it may not be widespread in freely available browsers until the patents on public key cryptography expire within the coming years.
When using an encrypted protocol, your security depends on several issues:
The strength of the encryption algorithm
The length of the encryption key
The secrecy of the encryption key
The reliability of the underlying software that is running on the Web server
The reliability of the underlying software that is running on the Web client
During the summer of 1995, a variety of articles were published describing failures of the encryption system used by the Netscape Navigator. In the first case, a researcher in France was able to break the encryption key used on a single message by using a network of workstations and two supercomputers. The message had been encrypted with the international version of the Netscape Navigator using a 40-bit RC4 key. In the second case, a group of students at the University of California at Berkeley discovered a flaw in the random number generator used by the UNIX-based version of Navigator. [3] In the third case, the same group of students at Berkeley discovered a way to alter the binaries of the Navigator program as they traveled between the university's NFS server and the client workstation on which the program was actually running.[4]
[3] The random number generator was seeded with a number that depended on the current time and the process number of the Navigator process.
[4] The students pointed out that while they attacked the Navigator's binaries as it was moving from the server computer to the client, they could have as easily have attacked the program as it was being downloaded from an FTP site.
All of these attacks are highly sophisticated. Nevertheless, most of them do not seem to be having an impact on the commercial development of the Web. It is likely that within a few years the encryption keys will be lengthened to 64 bits. Netscape has improved the process by which its Navigator program seeds its random number generator. The third problem, of binaries being surreptitiously altered, will likely not be a problem for most users with improved software distribution systems that themselves use encryption and digital signatures.
Most Web servers create log files that record considerable information about each request. These log files grow without limit until they are automatically trimmed or until they fill up the computer's hard disk (usually resulting in a loss of service). By examining these files, it is possible to infer a great deal about the people who are using the Web server.
For example, the NCSA httpd server maintains the following log files in its logs/ directory:
A list of the individual accesses to the server.
A list of the programs that have been used to access the server.
A list of the errors that the server has experienced, whether generated by the server itself or by your CGI scripts. (Anything that a script prints to Standard Error is appended into this file.)
This log file contains entries that include the URL that the browser previously visited and the URL that it is currently viewing.
By examining the information in these log files, you can create a very comprehensive picture of the people who are accessing a Web server, the information that they are viewing, and where they have previously been.
In Chapter 10, Auditing and Logging, we describe the fields that are stored in the access_log file. Here they are again:
The name of the remote computer that initiated the transfer
The remote login name, if it was supplied, or "-" if not supplied
The remote username, if supplied, or "-" if not supplied
The time that the transfer was initiated (day of month, month, year, hour, minute, second, and time zone offset)
The HTTP command that was executed (can be GET for receiving files, POST for processing forms, or HEAD for viewing the MIME header)
The status code that was returned
The number of bytes that were transferred
Here are a few lines from an access_log file:
koriel.sun.com - - [06/Dec/1995:18:44:01 -0500] "GET /simson/resume.html HTTP/1.0" 200 8952 lawsta11.oitlabs.unc.edu - - [06/Dec/1995:18:47:14 -0500] "GET /simson/ HTTP/1.0" 200 2749 lawsta11.oitlabs.unc.edu - - [06/Dec/1995:18:47:15 -0500] "GET /icons/back.gif HTTP/1.0" 200 354 lawsta11.oitlabs.unc.edu - - [06/Dec/1995:18:47:16 -0500] "GET /cgi-bin/counter?file=simson HTTP/1.0" 200 545 piweba1y-ext.prodigy.com - - [06/Dec/1995:18:52:30 -0500] "GET /vineyard/history/" HTTP/1.0" 404 -
As you can tell, there is a wealth of information here. Apparently, some user at Sun Microsystems is interested in Simson's resume. A user at the University of North Carolina also seems interested in Simson. And a Prodigy user appears to want information about the history of Martha's Vineyard.
None of these entries in the log file contain the username of the person conducting the Web request. But they may still be used to identify individuals. It's highly likely that koreil.sun.com is a single SPARCstation sitting on a single employee's desk at Sun. We don't know what lawsta11.otilabs.unc.edu is, but we suspect that it is a single machine in a library. Many organizations now give distinct IP addresses to individual users for use with their PCs, or for dial-in with PPP or SLIP. Without the use of a proxy server such as SOCKS (see Chapter 22) or systems that rewrite IP addresses, Web server log files reveal those addresses.
And there is more information to be "mined" as well. Consider the following entries from the refer_log file:
http://www2.infoseek.com/NS/Titles?qt=unix-hater -> /unix-haters.html http://www.intersex.com/main/ezines.html -> /awa/ http://www.jaxnet.com/~jdcarr/places.html -> /vineyard/ferry.tiny.gif
In the first line, it's clear that a search on the InfoSeek server for the string "unixhater" turned up the Web file /unix-haters.html on the current server. This is an indication that the user was surfing the Web looking for material having to deal with UNIX. In the second line, a person who had been browsing the InterSex WWW server looking for information about electronic magazines followed a link to the /awa/ directory. This possibly indicates that the user is interested in content of a sexually oriented nature. In the last example, apparently a user, jdcarr@jaxnet.com, has embedded a link in his or her places.html file to a GIF file of the Martha's Vineyard ferry. This indicates that jdcarr is taking advantage of other people's network servers and Internet connections.
By themselves, these references can be amusing. But they can also be potentially important. Lincoln Stein and Bob Bagwill note in the World Wide Web Security FAQ that a referral such as this one could indicate that a person is planning a corporate takeover:
file://prez.xyz.com/hotlists/stocks2sellshort.html ->http://www.xyz.com
The information in the refer_log can also be combined with information in the access_log to determine the names of the people (or at least their computers) who are following these links. Version 1.5 of the NCSA httpd Web server even has an option to store referrals in the access log.
The access_log file contains the name of the complete URL that is provided. For Web forms that use the GET method instead of the POST method, all of the arguments are included in the URL, and thus all of them end up in the access_log file.
Consider the following:
asy2.vineyard.net - - [06/Oct/1995:19:04:37 -0400] "GET /cgi-bin/vni/useracct?username=bbennett&password=leonlikesfood&cun=jayd&fun=jay+Desmond&pun=766WYRCI&add1=box+634&add2=&city=Gay+Head&state=MA&zip=02535&phone=693-9766&choice=ReallyCreateUser HTTP/1.0" 200 292
Or even this one:
mac.vineyard.net - - [07/Oct/1995:03:04:30 -0400] "GET /cgi-bin/change-password?username=bbennett&oldpassword=dearth&newpassword=flabby3 HTTP/1.0" 200 400
This is one reason why you should use the POST method in preference to the GET method!
Finally, consider what can be learned from agent_log. Here are a few sample lines:
NCSA_Mosaic/2.6 (X11;HP-UX A.09.05 9000/720) libwww/2.12 modified via proxy gateway CERN-HTTPD/3.0 libwww/2.17 NCSA_Mosaic/2.6 (X11;HP-UX A.09.05 9000/720) libwww/2.12 modified via proxy gateway CERN-HTTPD/3.0 libwww/2.17 Proxy gateway CERN-HTTPD/3.0 libwww/2.17 Mozilla/1.1N (X11; I; OSF1 V3.0 alpha) via proxy gateway CERN-HTTPD/3.0 libwww/2.17 Mozilla/1.2b1 (Windows; I; 16bit)
In addition to the name of the Web browser that is being used, this file reveals information about the structure of an organization's firewall. It also reveals the use of beta software within an organization.
Some servers allow you to restrict the amount of information that is logged. If you do not require information to be logged, you may wish to suppress the logging.
In summary, users of the Web should be informed that their actions are being monitored. As many firms wish to use this information for marketing, it is quite likely that the amount of information that Web browsers provide to servers will, rather than decrease, increase in the future.