Internet Engineering Task Force                              K. Crispin
Internet-Draft                                                 May 2001
Document: draft-crispin-alt-roots-tlds-00.txt
Expires: November 2001


                          Alt-Roots, Alt-TLDs

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This Internet Draft discusses the "alternate root" and "alternate
   TLDs (Top Level Domains)", in an attempt to help clear up
   misunderstandings on their use in the Internet.

   For the past 6 years or so various organizations and individuals have
   implemented "alternate roots" to support their own Top Level Domains
   (TLDs).  Some have gone so far as to argue that alternate roots are
   good for the Internet, and actually enhance stability.  Such a
   position is seriously mistaken, and reflects a serious lack of
   understanding of technical realities involved.  The Domain Name
   System (DNS) is a complicated system that is commonly misunderstood,
   and this paper is an attempt to help clear up these
   misunderstandings.  It is complementary to the IETF's RFC 2826, "IAB
   Statement on the Unique DNS Root", which very clearly states the
   principles involved.


Crispin                  Expires: November 2001                 [Page 1]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


Background

   In this paper I define the term "alternate root" to mean "a DNS root
   zone connected to the Internet, but with contents that differ from
   the ICANN roots".  That is, as I use the term, an alternate root by
   definition includes "alternate TLDs (Top Level Domains)", and hence,
   alternate roots and alternate TLDs are really both characteristics of
   the same phenomenon.  Sometimes I use the term "multiple roots", or
   "multiple root regime" to indicate a hypothetical situation where
   several distinct alternate roots exist, and are seeing significant
   use. [Note that the current existing alternate roots, with the
   possible exception of new.net's activities, are not heavily used, and
   consequently our current situation probably does not constitute a
   "multiple root regime".]

        ICANN                 Alt                     Alt
         root                root(E)                 root(F)
          /|\                 /|\                     /|\
         / | \               / | \                   / | \
        /  |  \             /  |  \                 /  |  \
       /   |   \           /   |   \               /   |   \
      /    |    \         /    |    \             /    |    \
    .com............    .com.......  .biz(E)    .com.......  .biz(F)

   In the above diagram, there are three root systems, and all three of
   them support ".com", ".net", and the rest of the "legacy" Top Level
   Domains (TLDs).  But Alt-root(E) [English] and Alt-root(F) [French]
   support different versions of ".biz".  This situation (having the
   same name for two different things) is referred to as a "name
   conflict".  Note that since the DNS is a public system, the managers
   of .biz(E) and .biz(F) could take information from each other, and
   the two .biz's could share a great deal of information as well.

   If a root server provides exactly the same information as some other
   root server, then it is a "replica" of that server, and not an
   "alternate" to it.  This is just a semantic distinction, but it is an
   important one, because replication is a highly desirable thing that
   provides redundancy and reliability, whereas alternate roots/TLDs
   have quite the opposite effect.  Note that if two roots provide the
   same information at the TLD level, then the information must be the
   same throughout the tree.

   Since I believe that the negative characteristics of alternate roots
   are decisive, this document is essentially structured as one long
   reductio ad absurdum argument: frequently, I implicitly postulate
   that multiple roots exist and are in heavy use and that the Internet
   Corporation for Assigned Names and Numbers (ICANN) has somehow
   approved of them, and then show that the consequences are


Crispin                  Expires: November 2001                 [Page 2]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


   undesirable.  That is, I frequently postulate things that I believe
   will simply never happen.

Economics of Name Conflicts

   Name conflicts are generally considered a bad thing.  If company "A"
   uses Alt-Root(E), and company "B" uses Alt-root(F), then "A" and "B"
   will see different versions of .biz.  "A" may put its web page in
   .biz(E), and "B" may put its web page in .biz(F), and neither
   company will see the other's web site.  This may not matter much if
   .biz(E) only supports the English language, and .biz(F) only
   supports the French language, but in general one of the major values
   of the Web is it's universal visibility.

   It has been argued that economic forces will eliminate name conflicts
   in alternate TLDs, and that in time all root servers would converge
   to providing the same information.  This argument, however, is based
   on a very simplistic view of economic reality.  The core argument is
   presented by Karl Auerbach:

      As a general rule, customers of a root server system will act much
      like subscribers to a cable TV system -- they will want as many
      TLDs (or as many channels) as they can get.  This will drive the
      root server system operators to include as many viable TLDs as
      they can into their inventory.

      The net result of all the root system operators following this
      strategy will be that they all attempt to trump one another by
      each including more TLDs.  The end of this is that all root server
      operators will incorporate all viable TLDs.  The benefit of this
      is that the domain names of all people and organizations who have
      registrations in these TLDs will be essentially universally
      resolvable no matter which root server system us being used [1].

   A moments thought about the French/English versions of .biz discussed
   above should make it clear that there are a myriad of factors other
   than pure economics that have effect.  In fact, there has been much
   talk about different countries/linguistic groups forming their own
   TLDs and running them in alternate roots.  These efforts are
   apparently primarily motivated by political and cultural
   considerations, not economic.

   From a purely economic point of view, competing TLDs (and roots)
   would essentially be waging what Shapiro and Varian [2] call a
   "Standards War", where the competitors are seeking to get their
   version of the TLD to be considered as *the* standard.  And, as
   Shapiro and Varian note, there are many possible results of such a
   war -- the protracted battle over cellular phone standards, or HDTV


Crispin                  Expires: November 2001                 [Page 3]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


   standards, clearly indicate that competing standards can fight it
   out for a very long time.  And, as the English/French example above
   indicates, the regionalization that slows the convergence of cellular
   standards, for example, can easily be replicated in DNS.  [Note that
   it would be quite possible for large international companies to
   simply register in both versions of .biz, just as they currently
   register in many ccTLDs.]

   Moreover, Auerbach's belief that root server operators would tend to
   include all TLDs rests on the simplistic assumption that the
   interests of root server operators are distinct from the interests of
   TLD operators.  As the case of new.net illustrates, this is not
   necessarily so: clearly, new.net is not interested in supporting
   other versions of its own TLDs.

   In fact, if multiple roots were somehow blessed by ICANN there is no
   question but that other large and well financed companies could enter
   the market instantly -- Verisign, for example, could announce that it
   was supporting its own versions of some of new.net's TLDs, and quite
   possibly Verisign's reputation, expertise, and established
   distribution channels would quickly destroy whatever lead new.net
   might have, which (frequently the case in Standards Wars) lead to the
   demise of new.net.

   From an economics point of view, it would require enormous resources
   to overcome the network advantages (in the economics sense) enjoyed
   by the ICANN root.  But presuming that a multiple root regime could
   be established, there is no indication whatsoever that it would ever
   converge to a single root zone.  In purely economic terms it is just
   as likely that a multiple root environment lead to large scale and
   protracted name conflicts.

   Here are some concrete examples of possible results of name
   conflicts:

   I.  A business in England, with a web-site in .biz(E), is trying to
       expand their business to France.  The salesman calls the
       prospective client in France and in the conversation says "check
       our web site at xxx.biz for our prices".  However, the French
       company uses .biz(F), and can't access .biz(E).  The French
       company sends email to the English company, reporting the
       problem, but of course the email doesn't arrive.

  II.  A English "Road Warrior" tries to check his email from a
       cybercafe at a conference in France.  ("Road Warriors" are
       individuals that travel extensively, and use their laptops and
       the Internet to keep connected with their work.)  The operator of
       the cybercafe, being a patriot, uses .biz(F), so the Road Warrior


Crispin                  Expires: November 2001                 [Page 4]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


       can't check his mail.

 III.  There is another issue with email -- intermediate servers are
       used to transport the mail.  So email delivery may fail because
       an intermediate server uses a different root system.  More
       insidiously, the path email takes through the network is not
       fixed, and a secondary path may be taken if there is a problem on
       the primary path.  So, sometimes the mail would be delivered, and
       sometimes the message would be returned with an error saying that
       the address did not exist.  And indeed, sometimes the email could
       be delivered, but to a different person than the intended one.

  IV.  The "intermediate node" problem mentioned above for email is far
       more common than people realize.  It can be illustrated by a
       common service on the Internet, "ping" pages.  A "ping" page is a
       web page that allows you to type in a domain name or network
       address, and the server will see if it can reach the specified
       site.  This can be very useful in network debugging -- site
       a.b.com is not reachable from your location, but you access a
       remote "ping" page, and discover that the site *is* reachable
       from an external site.  Of course, in our multiple root scenario,
       if you specify "a.b.biz", you really don't know if you are
       reaching the site you mean to reach.  Network engineers are
       comfortable with raw IP addresses, but sometimes you don't get
       the IP address -- you may just have a trouble ticket entry with a
       domain name in it.

   V.  Trouble tickets aren't that familiar, but online orders certainly
       are.  In almost all cases, the confirmation for the order is sent
       back via email, and consequently the online order form almost
       always asks you for your email address.  But if your email
       address is jax@xxx.biz, and the place where from which you order
       uses a different .biz, then you will never receive your
       confirmation.

  VI.  The above scenarios all express what we might call "inconsistency
       due to location" -- the result you get for a domain name lookup
       depends on the location in cyberspace from which you made the
       request.

       But when you think about it, "location in cyberspace" is almost
       meaningless -- two computers in the same room can have vastly
       different locations in cyberspace.  In fact, DNS largely defines
       what most of us mean by "location in cyberspace", and
       consequently, name conflicts undermine our most basic
       navigational tool.


Crispin                  Expires: November 2001                 [Page 5]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


 VII.  Finally, "inconsistencies due to location" are only a part of the
       problem.  Another very significant issue revolves around what we
       might call "inconsistencies due to timing".  This was hinted at
       above in the case where an intermediate mail transport server
       fails and another takes over -- where your mail goes depends on
       *when* it was sent, relative to the time of the server failure.
       In a multiple root regime, the pattern of DNS lookups over time,
       independent of any other factor, can cause you to get different
       results.  How this can happen is complicated, and will be
       explained in much greater detail below.  But this factor
       interacts with all the above scenarios, and adds an element of
       complete non-determinism.

Interlude on the Significance of Design Goals

   The above discussion of name conflicts underscores a fundamental
   point: DNS wasn't designed to deal with name conflicts.

   In fact, the fundamental design goal of the DNS is to provide unique
   and stable names for certain resources on the Internet.  A "resource"
   may be, for example, an IP address (or, in some cases, a group of IP
   addresses), an email server, or a portion of the Domain Name Space
   itself.  The resources are represented by objects in DNS; the
   fundamental service provided by the DNS is retrieval of an object,
   given the name for the object.

   Providing unique and stable names for millions of objects requires
   that there be millions of unique names available, and managing
   millions of unique names is a large job.  The most basic management
   task is "registration": the assignment of names to the objects that
   they name.  In practice, registration also involves the
   identification of an associated party who is responsible for the
   object.

   The names provided by the DNS are structured in a hierarchical
   manner, which allows the management of the names to be distributed.
   Instead of a single gigantic name registry, the registration of names
   can be spread across many registries.

   The visible DNS hierarchy starts with what are called "Top Level
   Domains" (TLDs).  The next level of the hierarchy is made up of
   "Second Level Domains" (SLDs), the level are "Third Level Domains"
   (3LDs), and so on.  The familiar ".com" is a TLD, "example.com" is a
   SLD, "an.example.com" is a 3LD.  "this.is.an.example.com" would be a
   domain name with 5 levels.


Crispin                  Expires: November 2001                 [Page 6]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


   It is important to realize, however, that while the names are
   structured (and the registration is distributed), the fundamental
   purpose of DNS is still to provide unique, stable names, and that the
   entire obscure and complex technical design of DNS supports this
   fundamental purpose.

   Now consider for a moment the design of a lawn-mower.

   Power lawn-mowers are devices that are designed to cut the grass in a
   lawn to a uniform height.  They typically have sharp spinning blades
   that rotate parallel to the ground, blades that spin fast enough so
   that the grass is cut, rather than pushed aside.

   All the decisions in the design of a lawn-mower are made with that
   particular task in mind, and over time the design has evolved to a
   high degree of complexity.  For example, the spinning blades also act
   as a fan, and the shape of the blade housing is carefully designed to
   confine the airflow produced by the blades.  The *intended*
   consequence of this design is for the airflow to carry the grass
   clippings up and into a collection bag.

   In an interesting case from product liability law, the plaintiff
   attempted to use his lawn mower as a hedge trimmer, and some of his
   fingers were cut off.  Thus, an *unintended* consequence of the
   design of the blade housing is that there isn't room for fingers
   between the blade housing and the blades.

   The defendant, the manufacturer of the mower, argued that the lawn-
   mower was perfectly safe when used as designed, that clearly a lawn
   mower is not designed to be used as a hedge trimmer, and that use of
   a product in a manner for which it is not designed is likely to cause
   all kinds of unanticipated problems.

   This example illustrates an important general principle: a design has
   high level goals, and design decisions are made to fit those high
   level goals.  Those design decisions may have unintended
   consequences, if the end product is used in a way not consistent with
   the original goals.  If the design is very complicated, with many
   internal decisions, it may be *very* difficult to predict the
   consequences of misuse of the end product.  Certainly the individual
   in question did not predict that his fingers would be amputated.

   As mentioned above, the fundamental goal of DNS is a single unified
   name space, and intrinsic to that design is the assumption that there
   is a single root zone.  DNS is a complicated protocol, and use of
   multiple root zones was not and is not a design goal of the DNS.
   Consequently, it is simply not possible to predict all the problems
   may result from use of multiple root zones.


Crispin                  Expires: November 2001                 [Page 7]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


   Moreover, DNS as currently deployed is an enormous system, and it
   provides infrastructure that is absolutely critical to the operation
   of the Internet.  Operations on such a large scale must be considered
   in a different light than operations on the small.  Hooking up tiny
   alternate root zones with essentially no traffic tells us very little
   about how such a system would work on a large scale -- peddling a
   tricycle around in a circle on a schoolyard doesn't give much useful
   information about how to operate a super-highway.

More Serious Problems

   With all the above as preliminary, we may now get to more complex and
   serious issues.  Alt-roots don't just have the potential for creating
   static name conflicts, they have the potential for creating far more
   serious instabilities in the name space.  Problems in this category
   are unfortunately complex to describe, and require a bit of
   explanation before diving off into the complicated stuff.

   Here's a diagram of our English/French example:

              Alt                     Alt
             root(E)                 root(F)
              /|\                     /|\
             / | \                   / | \
            /  |  \                 /  |  \
           /   |   \               /   |   \
          /    |    \             /    |    \
        .com.......  .biz(E)    .com.......  .biz(F)

   We presume that the English alternate root is located in London, and
   the French alternate root is located in Paris.  The users of each
   root system are located in their respective countries.  We assume
   that the two .biz's are well used, and further that there are
   overlaps in the name space -- "xxx.biz(E)" and "xxx.biz(F)", or
   perhaps large companies register the same name in each domain, and
   provide web-sites with different content for each country:
   "ibm.biz(E)" and "ibm.biz(F)".

   An almost universal assumption in this scenario is that users of
   alt-root(E) would resolve names from .biz(E), and that users of alt-
   root(F) would resolve names from .biz(F).  This assumption, however,
   is incorrect.  The two root systems share .com, .net, and the rest of
   the 240 or so "legacy" TLDs, and the DNS protocol, *by design*,
   passes information that allows leakage of other data between the two
   root systems.  Basically, after some period of time, users of either
   root system will potentially get information from the other one, on a
   totally unpredictable basis.  One day an English user may get the
   French xxx.biz, and on the next day he might get the English xxx.biz.


Crispin                  Expires: November 2001                 [Page 8]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


   The next section goes through a rather complete explanation.  It must
   be remembered, however, that this is only one of many possible
   scenarios.  Also, all these possible scenarios interact with the
   common name conflicts mentioned above.  And recall the man who lost
   his fingers: we can't know all the possible problems with multiple
   roots.

Detailed example

   As above, we have two root zones, each with its own version of a TLD.
   All other TLDs in these root zones are the same, including the in-
   addr.arpa zone. The two versions of .biz are ".biz(E)" and ".biz(F)".

   Suppose that each version of .biz has a well-known popular SLD, run
   by entrepreneurs unafraid of controversy, e.g., "sex".  So we have
   sex.biz(E) and sex.biz(F), in the subtrees from their respective root
   zones.  Since the "sex" name is very valuable as an SLD, it is quite
   reasonable to assume that someone would leap at the chance to
   register it wherever it became available.  We could also assume that
   the two versions of sex.biz would be in a fiercely competitive
   economic relationship.

   Suppose further that sex.biz(E) has IP address 1.2.3.4, and
   sex.biz(F) has IP address 4.3.2.1.  Assume that the inverse lookup is
   correctly configured in both cases.  (Normal DNS lookup maps a domain
   name to an Internet address -- looking up sex.biz would return
   1.2.3.4, in root(E). Inverse DNS lookup starts with the Internet
   address -- 1.2.3.4 -- and returns the associated domain name.)  This
   means that if one looks up the address 1.2.3.4 one will get
   "sex.biz", and if one looks up the address 4.3.2.1, one will also get
   "sex.biz".  Inverse addresses are maintained in DNS in a special SLD,
   "in-addr.arpa".  So far, we have a structure like this:

       .biz(E)         .biz(F)          in-addr.arpa
        /|\             /|\                 /|\
       / | \           / | \               / | \
     ..  | ..        ..  | ..             /  |  \
         |               |               /  ..   1.2.3.4->sex.biz
    sex->1.2.3.4    sex->4.3.2.1        /
                                       4.3.2.1->sex.biz

                           Fig. 1

   This structure is maintained by DNS nameservers, which are also
   identified by DNS names and IP addresses.  It is a quite common
   convention for a nameserver for an SLD to have a name like
   "ns1.SLD.TLD".  In our example, then, it would be quite reasonable
   that a nameserver for sex.biz(E) would be "ns1.sex.biz(E)", and that


Crispin                  Expires: November 2001                 [Page 9]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


   a nameserver for sex.biz(F) would be "ns1.sex.biz(F)".  The IP
   addresses for these machines might be 1.2.3.5 and 4.3.2.2,
   respectively.  Note that the name "ns1" is part of the .sex.biz
   subtree.  We now have the following diagram:


       .biz(E)         .biz(F)          in-addr.arpa
        /|\             /|\                 /|\
       / | \           / | \               / | \
     ..  | ..        ..  | ..             /  |  \
         |               |               /  ..   1.2.3.4->sex.biz
    sex->1.2.3.4    sex->4.3.2.1        /
        /|\             /|\             4.3.2.1->sex.biz
       / | \           / | \
     ..  |  ..       ..  |  ..
         |               |
    ns1->1.2.3.5     ns1->4.3.2.2


                           Fig. 2

   I should stress that all of this (except for the multiple roots, of
   course) is completely standard, and that the conventional name "ns1"
   for a nameserver is *extremely* common.  Moreover, the nameservers
   for "sex.biz" will carry records for all the names in the sex.biz
   domain.

   One final bit of information is necessary to complete the scenario:
   the information stored for every domain name also includes references
   to the address of the name servers for that domain.

   The critical point is that the reply to DNS queries routinely return
   the name and address of the name servers involved.  That is, in our
   example, if one were to look up the address "4.3.2.1", one would get
   back "sex.biz", *and* that the nameserver for 1.2.3.4.in-addr.arpa
   was "ns1.sex.biz", *and* (most important) that the address of
   "ns1.sex.biz" was 4.3.2.2.

   This last bit of information is the source of the problem.  DNS has
   just returned the address of "ns1.sex.biz" as 4.3.2.2, and there is
   no indication of *which* sex.biz is involved.  The information will
   be cached on your local computer, and will be used whenever the
   computer needs to know the address of ns1.sex.biz.


Crispin                  Expires: November 2001                [Page 10]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


   Three further points:

   1) there are all kinds of automatic processes that do inverse lookups
      -- when a packet arrives at a computer, it is only identified by
      the IP address, and if it is desirable to to know the name of the
      machine that was the source of the packet (for logging purposes,
      say), an inverse query is necessary;

   2) inverse lookups aren't the only thing that can cause this to
      happen -- nameserver information is returned as part of many
      queries.

   3) The discussion has been greatly simplified from the real case.
      There are numerous conditions that affect when a cache entry is
      replaced -- timing, type of query, authoritative vs non-
      authoritative answers, software, software version, etc -- and
      there is no simple rule predicting what will happen.  (*) This
      non-deterministic behavior, of course, is just further
      instability.

   Note that in a competitive situation, the owners of the respective
   sex.biz domains could contrive their own scenarios to switch
   nameservers.  That is, the above scenario assumes accidental
   confusion, but there might very well be economic incentive to create
   deliberate confusion.

Conclusion

   There are a wide variety of situations associated with alternate TLDs
   that can cause instabilities; they are all fairly complex.  This is
   why they aren't much of an issue when there are only a few people
   using alternate TLDs.  But wide scale deployment of alternate
   roots/TLDs open up possibilities for destructive and subtle problems.

Security Considersations

   This memo does not introduce any new security issues, but it does
   attempt to clear up misunderstandings on the use of "alternate root"
   and "alternate TLDs" in the Internet.


Crispin                  Expires: November 2001                [Page 11]

Internet-Draft            Alt-Roots, Alt-TLDs                   May 2001


References

   [1] http://www.cavebear.com/cavebear/growl/issue_2.htm#multiple_roots

   [2] "Information Rules: a Strategic Guide to the Network Economy", by
       Carl Shapiro and Hal Varian, Harvard Business School Press, 1999

   [3] More recent versions of BIND have been more resistant to casual
       cache replacement.  It should be stressed that DNS is a *very*
       much more complicated protocol than it appears, and there are
       multiple implementations of DNS servers.  The above scenario
       works with widely deployed versions of current software, but may
       behave differently with different versions or less commonly used
       software.

Author's Address

   Kent Crispin

   EMail: kent@songbird.com


Crispin                  Expires: November 2001                [Page 12]