Internet Engineering Task Force K. Crispin Internet-Draft May 2001 Document: draft-crispin-alt-roots-tlds-00.txt Expires: November 2001 Alt-Roots, Alt-TLDs Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This Internet Draft discusses the "alternate root" and "alternate TLDs (Top Level Domains)", in an attempt to help clear up misunderstandings on their use in the Internet. For the past 6 years or so various organizations and individuals have implemented "alternate roots" to support their own Top Level Domains (TLDs). Some have gone so far as to argue that alternate roots are good for the Internet, and actually enhance stability. Such a position is seriously mistaken, and reflects a serious lack of understanding of technical realities involved. The Domain Name System (DNS) is a complicated system that is commonly misunderstood, and this paper is an attempt to help clear up these misunderstandings. It is complementary to the IETF's RFC 2826, "IAB Statement on the Unique DNS Root", which very clearly states the principles involved. Crispin Expires: November 2001 [Page 1] Internet-Draft Alt-Roots, Alt-TLDs May 2001 Background In this paper I define the term "alternate root" to mean "a DNS root zone connected to the Internet, but with contents that differ from the ICANN roots". That is, as I use the term, an alternate root by definition includes "alternate TLDs (Top Level Domains)", and hence, alternate roots and alternate TLDs are really both characteristics of the same phenomenon. Sometimes I use the term "multiple roots", or "multiple root regime" to indicate a hypothetical situation where several distinct alternate roots exist, and are seeing significant use. [Note that the current existing alternate roots, with the possible exception of new.net's activities, are not heavily used, and consequently our current situation probably does not constitute a "multiple root regime".] ICANN Alt Alt root root(E) root(F) /|\ /|\ /|\ / | \ / | \ / | \ / | \ / | \ / | \ / | \ / | \ / | \ / | \ / | \ / | \ .com............ .com....... .biz(E) .com....... .biz(F) In the above diagram, there are three root systems, and all three of them support ".com", ".net", and the rest of the "legacy" Top Level Domains (TLDs). But Alt-root(E) [English] and Alt-root(F) [French] support different versions of ".biz". This situation (having the same name for two different things) is referred to as a "name conflict". Note that since the DNS is a public system, the managers of .biz(E) and .biz(F) could take information from each other, and the two .biz's could share a great deal of information as well. If a root server provides exactly the same information as some other root server, then it is a "replica" of that server, and not an "alternate" to it. This is just a semantic distinction, but it is an important one, because replication is a highly desirable thing that provides redundancy and reliability, whereas alternate roots/TLDs have quite the opposite effect. Note that if two roots provide the same information at the TLD level, then the information must be the same throughout the tree. Since I believe that the negative characteristics of alternate roots are decisive, this document is essentially structured as one long reductio ad absurdum argument: frequently, I implicitly postulate that multiple roots exist and are in heavy use and that the Internet Corporation for Assigned Names and Numbers (ICANN) has somehow approved of them, and then show that the consequences are Crispin Expires: November 2001 [Page 2] Internet-Draft Alt-Roots, Alt-TLDs May 2001 undesirable. That is, I frequently postulate things that I believe will simply never happen. Economics of Name Conflicts Name conflicts are generally considered a bad thing. If company "A" uses Alt-Root(E), and company "B" uses Alt-root(F), then "A" and "B" will see different versions of .biz. "A" may put its web page in .biz(E), and "B" may put its web page in .biz(F), and neither company will see the other's web site. This may not matter much if .biz(E) only supports the English language, and .biz(F) only supports the French language, but in general one of the major values of the Web is it's universal visibility. It has been argued that economic forces will eliminate name conflicts in alternate TLDs, and that in time all root servers would converge to providing the same information. This argument, however, is based on a very simplistic view of economic reality. The core argument is presented by Karl Auerbach: As a general rule, customers of a root server system will act much like subscribers to a cable TV system -- they will want as many TLDs (or as many channels) as they can get. This will drive the root server system operators to include as many viable TLDs as they can into their inventory. The net result of all the root system operators following this strategy will be that they all attempt to trump one another by each including more TLDs. The end of this is that all root server operators will incorporate all viable TLDs. The benefit of this is that the domain names of all people and organizations who have registrations in these TLDs will be essentially universally resolvable no matter which root server system us being used [1]. A moments thought about the French/English versions of .biz discussed above should make it clear that there are a myriad of factors other than pure economics that have effect. In fact, there has been much talk about different countries/linguistic groups forming their own TLDs and running them in alternate roots. These efforts are apparently primarily motivated by political and cultural considerations, not economic. From a purely economic point of view, competing TLDs (and roots) would essentially be waging what Shapiro and Varian [2] call a "Standards War", where the competitors are seeking to get their version of the TLD to be considered as *the* standard. And, as Shapiro and Varian note, there are many possible results of such a war -- the protracted battle over cellular phone standards, or HDTV Crispin Expires: November 2001 [Page 3] Internet-Draft Alt-Roots, Alt-TLDs May 2001 standards, clearly indicate that competing standards can fight it out for a very long time. And, as the English/French example above indicates, the regionalization that slows the convergence of cellular standards, for example, can easily be replicated in DNS. [Note that it would be quite possible for large international companies to simply register in both versions of .biz, just as they currently register in many ccTLDs.] Moreover, Auerbach's belief that root server operators would tend to include all TLDs rests on the simplistic assumption that the interests of root server operators are distinct from the interests of TLD operators. As the case of new.net illustrates, this is not necessarily so: clearly, new.net is not interested in supporting other versions of its own TLDs. In fact, if multiple roots were somehow blessed by ICANN there is no question but that other large and well financed companies could enter the market instantly -- Verisign, for example, could announce that it was supporting its own versions of some of new.net's TLDs, and quite possibly Verisign's reputation, expertise, and established distribution channels would quickly destroy whatever lead new.net might have, which (frequently the case in Standards Wars) lead to the demise of new.net. From an economics point of view, it would require enormous resources to overcome the network advantages (in the economics sense) enjoyed by the ICANN root. But presuming that a multiple root regime could be established, there is no indication whatsoever that it would ever converge to a single root zone. In purely economic terms it is just as likely that a multiple root environment lead to large scale and protracted name conflicts. Here are some concrete examples of possible results of name conflicts: I. A business in England, with a web-site in .biz(E), is trying to expand their business to France. The salesman calls the prospective client in France and in the conversation says "check our web site at xxx.biz for our prices". However, the French company uses .biz(F), and can't access .biz(E). The French company sends email to the English company, reporting the problem, but of course the email doesn't arrive. II. A English "Road Warrior" tries to check his email from a cybercafe at a conference in France. ("Road Warriors" are individuals that travel extensively, and use their laptops and the Internet to keep connected with their work.) The operator of the cybercafe, being a patriot, uses .biz(F), so the Road Warrior Crispin Expires: November 2001 [Page 4] Internet-Draft Alt-Roots, Alt-TLDs May 2001 can't check his mail. III. There is another issue with email -- intermediate servers are used to transport the mail. So email delivery may fail because an intermediate server uses a different root system. More insidiously, the path email takes through the network is not fixed, and a secondary path may be taken if there is a problem on the primary path. So, sometimes the mail would be delivered, and sometimes the message would be returned with an error saying that the address did not exist. And indeed, sometimes the email could be delivered, but to a different person than the intended one. IV. The "intermediate node" problem mentioned above for email is far more common than people realize. It can be illustrated by a common service on the Internet, "ping" pages. A "ping" page is a web page that allows you to type in a domain name or network address, and the server will see if it can reach the specified site. This can be very useful in network debugging -- site a.b.com is not reachable from your location, but you access a remote "ping" page, and discover that the site *is* reachable from an external site. Of course, in our multiple root scenario, if you specify "a.b.biz", you really don't know if you are reaching the site you mean to reach. Network engineers are comfortable with raw IP addresses, but sometimes you don't get the IP address -- you may just have a trouble ticket entry with a domain name in it. V. Trouble tickets aren't that familiar, but online orders certainly are. In almost all cases, the confirmation for the order is sent back via email, and consequently the online order form almost always asks you for your email address. But if your email address is jax@xxx.biz, and the place where from which you order uses a different .biz, then you will never receive your confirmation. VI. The above scenarios all express what we might call "inconsistency due to location" -- the result you get for a domain name lookup depends on the location in cyberspace from which you made the request. But when you think about it, "location in cyberspace" is almost meaningless -- two computers in the same room can have vastly different locations in cyberspace. In fact, DNS largely defines what most of us mean by "location in cyberspace", and consequently, name conflicts undermine our most basic navigational tool. Crispin Expires: November 2001 [Page 5] Internet-Draft Alt-Roots, Alt-TLDs May 2001 VII. Finally, "inconsistencies due to location" are only a part of the problem. Another very significant issue revolves around what we might call "inconsistencies due to timing". This was hinted at above in the case where an intermediate mail transport server fails and another takes over -- where your mail goes depends on *when* it was sent, relative to the time of the server failure. In a multiple root regime, the pattern of DNS lookups over time, independent of any other factor, can cause you to get different results. How this can happen is complicated, and will be explained in much greater detail below. But this factor interacts with all the above scenarios, and adds an element of complete non-determinism. Interlude on the Significance of Design Goals The above discussion of name conflicts underscores a fundamental point: DNS wasn't designed to deal with name conflicts. In fact, the fundamental design goal of the DNS is to provide unique and stable names for certain resources on the Internet. A "resource" may be, for example, an IP address (or, in some cases, a group of IP addresses), an email server, or a portion of the Domain Name Space itself. The resources are represented by objects in DNS; the fundamental service provided by the DNS is retrieval of an object, given the name for the object. Providing unique and stable names for millions of objects requires that there be millions of unique names available, and managing millions of unique names is a large job. The most basic management task is "registration": the assignment of names to the objects that they name. In practice, registration also involves the identification of an associated party who is responsible for the object. The names provided by the DNS are structured in a hierarchical manner, which allows the management of the names to be distributed. Instead of a single gigantic name registry, the registration of names can be spread across many registries. The visible DNS hierarchy starts with what are called "Top Level Domains" (TLDs). The next level of the hierarchy is made up of "Second Level Domains" (SLDs), the level are "Third Level Domains" (3LDs), and so on. The familiar ".com" is a TLD, "example.com" is a SLD, "an.example.com" is a 3LD. "this.is.an.example.com" would be a domain name with 5 levels. Crispin Expires: November 2001 [Page 6] Internet-Draft Alt-Roots, Alt-TLDs May 2001 It is important to realize, however, that while the names are structured (and the registration is distributed), the fundamental purpose of DNS is still to provide unique, stable names, and that the entire obscure and complex technical design of DNS supports this fundamental purpose. Now consider for a moment the design of a lawn-mower. Power lawn-mowers are devices that are designed to cut the grass in a lawn to a uniform height. They typically have sharp spinning blades that rotate parallel to the ground, blades that spin fast enough so that the grass is cut, rather than pushed aside. All the decisions in the design of a lawn-mower are made with that particular task in mind, and over time the design has evolved to a high degree of complexity. For example, the spinning blades also act as a fan, and the shape of the blade housing is carefully designed to confine the airflow produced by the blades. The *intended* consequence of this design is for the airflow to carry the grass clippings up and into a collection bag. In an interesting case from product liability law, the plaintiff attempted to use his lawn mower as a hedge trimmer, and some of his fingers were cut off. Thus, an *unintended* consequence of the design of the blade housing is that there isn't room for fingers between the blade housing and the blades. The defendant, the manufacturer of the mower, argued that the lawn- mower was perfectly safe when used as designed, that clearly a lawn mower is not designed to be used as a hedge trimmer, and that use of a product in a manner for which it is not designed is likely to cause all kinds of unanticipated problems. This example illustrates an important general principle: a design has high level goals, and design decisions are made to fit those high level goals. Those design decisions may have unintended consequences, if the end product is used in a way not consistent with the original goals. If the design is very complicated, with many internal decisions, it may be *very* difficult to predict the consequences of misuse of the end product. Certainly the individual in question did not predict that his fingers would be amputated. As mentioned above, the fundamental goal of DNS is a single unified name space, and intrinsic to that design is the assumption that there is a single root zone. DNS is a complicated protocol, and use of multiple root zones was not and is not a design goal of the DNS. Consequently, it is simply not possible to predict all the problems may result from use of multiple root zones. Crispin Expires: November 2001 [Page 7] Internet-Draft Alt-Roots, Alt-TLDs May 2001 Moreover, DNS as currently deployed is an enormous system, and it provides infrastructure that is absolutely critical to the operation of the Internet. Operations on such a large scale must be considered in a different light than operations on the small. Hooking up tiny alternate root zones with essentially no traffic tells us very little about how such a system would work on a large scale -- peddling a tricycle around in a circle on a schoolyard doesn't give much useful information about how to operate a super-highway. More Serious Problems With all the above as preliminary, we may now get to more complex and serious issues. Alt-roots don't just have the potential for creating static name conflicts, they have the potential for creating far more serious instabilities in the name space. Problems in this category are unfortunately complex to describe, and require a bit of explanation before diving off into the complicated stuff. Here's a diagram of our English/French example: Alt Alt root(E) root(F) /|\ /|\ / | \ / | \ / | \ / | \ / | \ / | \ / | \ / | \ .com....... .biz(E) .com....... .biz(F) We presume that the English alternate root is located in London, and the French alternate root is located in Paris. The users of each root system are located in their respective countries. We assume that the two .biz's are well used, and further that there are overlaps in the name space -- "xxx.biz(E)" and "xxx.biz(F)", or perhaps large companies register the same name in each domain, and provide web-sites with different content for each country: "ibm.biz(E)" and "ibm.biz(F)". An almost universal assumption in this scenario is that users of alt-root(E) would resolve names from .biz(E), and that users of alt- root(F) would resolve names from .biz(F). This assumption, however, is incorrect. The two root systems share .com, .net, and the rest of the 240 or so "legacy" TLDs, and the DNS protocol, *by design*, passes information that allows leakage of other data between the two root systems. Basically, after some period of time, users of either root system will potentially get information from the other one, on a totally unpredictable basis. One day an English user may get the French xxx.biz, and on the next day he might get the English xxx.biz. Crispin Expires: November 2001 [Page 8] Internet-Draft Alt-Roots, Alt-TLDs May 2001 The next section goes through a rather complete explanation. It must be remembered, however, that this is only one of many possible scenarios. Also, all these possible scenarios interact with the common name conflicts mentioned above. And recall the man who lost his fingers: we can't know all the possible problems with multiple roots. Detailed example As above, we have two root zones, each with its own version of a TLD. All other TLDs in these root zones are the same, including the in- addr.arpa zone. The two versions of .biz are ".biz(E)" and ".biz(F)". Suppose that each version of .biz has a well-known popular SLD, run by entrepreneurs unafraid of controversy, e.g., "sex". So we have sex.biz(E) and sex.biz(F), in the subtrees from their respective root zones. Since the "sex" name is very valuable as an SLD, it is quite reasonable to assume that someone would leap at the chance to register it wherever it became available. We could also assume that the two versions of sex.biz would be in a fiercely competitive economic relationship. Suppose further that sex.biz(E) has IP address 1.2.3.4, and sex.biz(F) has IP address 4.3.2.1. Assume that the inverse lookup is correctly configured in both cases. (Normal DNS lookup maps a domain name to an Internet address -- looking up sex.biz would return 1.2.3.4, in root(E). Inverse DNS lookup starts with the Internet address -- 1.2.3.4 -- and returns the associated domain name.) This means that if one looks up the address 1.2.3.4 one will get "sex.biz", and if one looks up the address 4.3.2.1, one will also get "sex.biz". Inverse addresses are maintained in DNS in a special SLD, "in-addr.arpa". So far, we have a structure like this: .biz(E) .biz(F) in-addr.arpa /|\ /|\ /|\ / | \ / | \ / | \ .. | .. .. | .. / | \ | | / .. 1.2.3.4->sex.biz sex->1.2.3.4 sex->4.3.2.1 / 4.3.2.1->sex.biz Fig. 1 This structure is maintained by DNS nameservers, which are also identified by DNS names and IP addresses. It is a quite common convention for a nameserver for an SLD to have a name like "ns1.SLD.TLD". In our example, then, it would be quite reasonable that a nameserver for sex.biz(E) would be "ns1.sex.biz(E)", and that Crispin Expires: November 2001 [Page 9] Internet-Draft Alt-Roots, Alt-TLDs May 2001 a nameserver for sex.biz(F) would be "ns1.sex.biz(F)". The IP addresses for these machines might be 1.2.3.5 and 4.3.2.2, respectively. Note that the name "ns1" is part of the .sex.biz subtree. We now have the following diagram: .biz(E) .biz(F) in-addr.arpa /|\ /|\ /|\ / | \ / | \ / | \ .. | .. .. | .. / | \ | | / .. 1.2.3.4->sex.biz sex->1.2.3.4 sex->4.3.2.1 / /|\ /|\ 4.3.2.1->sex.biz / | \ / | \ .. | .. .. | .. | | ns1->1.2.3.5 ns1->4.3.2.2 Fig. 2 I should stress that all of this (except for the multiple roots, of course) is completely standard, and that the conventional name "ns1" for a nameserver is *extremely* common. Moreover, the nameservers for "sex.biz" will carry records for all the names in the sex.biz domain. One final bit of information is necessary to complete the scenario: the information stored for every domain name also includes references to the address of the name servers for that domain. The critical point is that the reply to DNS queries routinely return the name and address of the name servers involved. That is, in our example, if one were to look up the address "4.3.2.1", one would get back "sex.biz", *and* that the nameserver for 1.2.3.4.in-addr.arpa was "ns1.sex.biz", *and* (most important) that the address of "ns1.sex.biz" was 4.3.2.2. This last bit of information is the source of the problem. DNS has just returned the address of "ns1.sex.biz" as 4.3.2.2, and there is no indication of *which* sex.biz is involved. The information will be cached on your local computer, and will be used whenever the computer needs to know the address of ns1.sex.biz. Crispin Expires: November 2001 [Page 10] Internet-Draft Alt-Roots, Alt-TLDs May 2001 Three further points: 1) there are all kinds of automatic processes that do inverse lookups -- when a packet arrives at a computer, it is only identified by the IP address, and if it is desirable to to know the name of the machine that was the source of the packet (for logging purposes, say), an inverse query is necessary; 2) inverse lookups aren't the only thing that can cause this to happen -- nameserver information is returned as part of many queries. 3) The discussion has been greatly simplified from the real case. There are numerous conditions that affect when a cache entry is replaced -- timing, type of query, authoritative vs non- authoritative answers, software, software version, etc -- and there is no simple rule predicting what will happen. (*) This non-deterministic behavior, of course, is just further instability. Note that in a competitive situation, the owners of the respective sex.biz domains could contrive their own scenarios to switch nameservers. That is, the above scenario assumes accidental confusion, but there might very well be economic incentive to create deliberate confusion. Conclusion There are a wide variety of situations associated with alternate TLDs that can cause instabilities; they are all fairly complex. This is why they aren't much of an issue when there are only a few people using alternate TLDs. But wide scale deployment of alternate roots/TLDs open up possibilities for destructive and subtle problems. Security Considersations This memo does not introduce any new security issues, but it does attempt to clear up misunderstandings on the use of "alternate root" and "alternate TLDs" in the Internet. Crispin Expires: November 2001 [Page 11] Internet-Draft Alt-Roots, Alt-TLDs May 2001 References [1] http://www.cavebear.com/cavebear/growl/issue_2.htm#multiple_roots [2] "Information Rules: a Strategic Guide to the Network Economy", by Carl Shapiro and Hal Varian, Harvard Business School Press, 1999 [3] More recent versions of BIND have been more resistant to casual cache replacement. It should be stressed that DNS is a *very* much more complicated protocol than it appears, and there are multiple implementations of DNS servers. The above scenario works with widely deployed versions of current software, but may behave differently with different versions or less commonly used software. Author's Address Kent Crispin EMail: kent@songbird.com Crispin Expires: November 2001 [Page 12]