ICANN Rio de Janeiro Meeting Topic: Internationalized Domain Names
Posted: 13 March 2003
Standards for ICANN Authorization of Internationalized Domain Name Registrations in Registries with Agreements
Internet domain names are easy-to-remember identifiers for hosts and services on the Internet. Until now, only a subset of US-ASCII1 has been usable in domain names.
Over the past 2+ years, the Internationalized Domain Name Working Group (IDN-WG) of the Internet Engineering Task Force (IETF) has been working to internationalize the domain name system at the application layer by standardizing a system for the translation of non-ASCII characters into unique ASCII strings that can be resolved by the existing domain name system.
In October 2002, a significant milestone toward internationalization of the DNS was achieved when the Internet Engineering Steering Group approved for publication the three documents that together define Internationalizing Domain Names in Applications (IDNA), a standards-track protocol.2 These three documents were published as RFCs 3490, 3491, and 3492 in March 2003. Implementation of the IDNA specification by DNS registries will allow users to use domain names with non-ASCII characters.
Implementation of IDNA brings significant risks for the domain name system. In particular, serious concerns have been raised about the likelihood of widespread user confusion and new opportunities for cybersquatting. These risks can be greatly reduced by the adoption of sensible registry-level policies, and by the coordination of consistent technical implementations across DNS registries.
IDNA will be a major step forward for the domain name system, but only if the DNS registries undertake to implement it in a thoughtful, responsible, and standards-compliant manner. The role of ICANN in this process is, of course, limited ICANN's agreements with the major gTLD registries give it the responsibility to expressly authorize the registration of IDNA-compliant internationalized domain names, but ICANN's mission does not include micromanaging registry-level implementation.
ICANN's agreements with registry operators (including the gTLDs .com, .net, .org, .info, .biz, .name, .pro, .aero, .coop, .museum) provide that ICANN authorization is required before the registry can begin accepting registrations of internationalized domain names (IDNs).3 This paper sets forth a proposed answer to the question "What standards should ICANN apply in exercising its contractual responsibility to authorize IDNA registrations in these registries?" The basic premise of this paper is that ICANN should take a light-handed approach, mandating compliance with the applicable technical standards and securing a registry commitment to collaborate with the affected communities, relevant experts, and other registries in developing appropriate language-specific registration policies.
This approach anticipates that the DNS registries will collectively recognize the benefits to them and to the community of consistent cross-registry implementation of the IDNA protocol and, flowing from that, of collaborating with each other to develop common, locally-accepted IDNA implementations. In that way, local expertise (about, for example, a given language's character equivalence problems and solutions) can be developed and shared globally, to the benefit of all DNS registries, registrars, application software engineers, and, ultimately, Internet users. Implementing registries have great incentives to recognize that global adoption of IDNA will be greatly enhanced if they seek to harmonize, for example, their approaches to registry-level character encoding and variant problems. These incentives are appropriately highlighted by publication by registries of the rules they apply to IDNs.
At the same time, the premise of this paper is that it would be a mistake for ICANN to pursue a burdensome and/or intrusive approach to IDN implementation for example, by putting ICANN in the position of approving a character-equivalence table for each language, and of maintaining such tables. The deployment of IDNA within existing top-level domain registries is fundamentally a registry responsibility, and the registries will be in the best position to make appropriate implementation decisions themselves, and should have the freedom to make adjustments as experience dictates. Just as DNS registries embrace a wide diversity in registration policies and administrative procedures, reflecting the diversity of local Internet communities, it seems apparent that the vast diversity of human character sets and the languages from which they come compels a language-by-language, registry-led approach to the development of detailed registration policies and administrative procedures.
Accordingly, this paper proposes an IDN deployment environment of relatively few, vital requirements for implementing registries. This proposed approach would provide a platform of technical compatibility and transparency and promote a cooperative environment in which registries would consult with the relevant user communities to establish registration procedures that are widely adopted, predictable, and broadly acceptable to users.
This paper addresses the specific issue of authorizing the registration of IDNA-compatible domain name strings in the DNS registries with agreements with ICANN. It does not address other IDN policy and implementation issues.
As a condition to authorizing IDNA registrations, this paper proposes that ICANN require the registries agree to four mandatory points, while encouraging them to adhere to two others. The paper synthesizes the many discussions and debates (and occasional brawls) about IDNA implementation that have taken place online and in various fora in recent months. In addition, the paper draws heavily from the work of ICANN's IDN Committee, and the Internet-draft "IDN Registration and Administration Guideline for Chinese, Japanese, and Korean," <http://www.ietf.org/internet-drafts/draft-jseng-idn-admin-02.txt>, by J. Seng and J. Klensin, eds., and K. Konishi, K. Huang, H. Qian, and Y. Ko ("CJK Guidelines").
The first four points below are proposed as mandatory requirements that the registries would be required to agree to as the conditions for ICANN authorization to begin accepting IDNA-compliant domain name registrations. Points 5 and 6 are proposed as strong recommendations to the gTLD registries (and, indeed, to all registries), but would not be made mandatory.
This point may seem obvious, but its significance merits a restatement of its importance.
Registry compliance with technical standards is of absolutely critical importance to the continued success of the domain name system. As with the Internet protocol itself, the success of the DNS rests upon the universal compatibility of its implementations at all points in the network. Adherence to published standards has resulted in a DNS that is reliable, scalable, resilient, predictable and globally authoritative. Standards compliance by registries, registrars, application writers, ISPs, etc., ensures that the DNS gives the same answer to a given query no matter where the query originates, what kind of application is generating the query, or what the nature of the identified host or service is.
A DNS registry that does not comply with the published DNS technical standards breaks the principle of universal compatibility, risks conflicts and confusion, and jeopardizes its own utility to users.
For a new DNS protocol like IDNA, it is thus essential that implementing registries take exceptional care to ensure comprehensive compliance with the published specifications. The broad availability of IDNA applications is vital to the deployment of IDNs and, to encourage applications to be written, it is vital to provide those applications an IDN registration system that operates strictly according to the published technical standards.
2. In implementing IDNA, top-level domain registries must employ an "inclusion-based" approach for identifying permissible code points from among the full Unicode repertoire, and, at the very least, must not include (a) line symbol-drawing characters, (b) symbols and icons that are neither alphabetic nor ideographic language characters, such as typographical and pictographic dingbats, (c) punctuation characters, and (d) spacing characters.
The reasoning behind this point has been fully articulated by the IDN Committee in its February 2002 "Input to the IETF on Permissible Code Point Problems," posted at <http://www.icann.org/committees/idn/idn-codepoint-input.htm>, and in the accompanying "Briefing Paper on Permissible Code Point Problems," posted at <http://www.icann.org/committees/idn/idn-codepoint-paper.htm>. (The paper explains "inclusion-based approach" this way: "Start with the current restricted LDH ASCII characters (a-z, A-Z, 0-9, -) and then extend [that table] to include relevant, non-problematical 'international' characters. Another way to state this model is: 'Everything that is not explicitly permitted is prohibited.'") The IETF concluded that this concern was best addressed at the registry level, through registration policies and administrative procedures, rather than at the protocol level.
The registries under contract will be expected to work together through the IDN Registry Implementation Committee to reach a common definition of the exact Unicode ranges described in sub-points (a), (b), (c), and (d), above.
3. In implementing IDNA, top-level domain registries must (a) associate each registered domain name with one or more languages, (b) employ language-specific registration and administration rules that are documented and publicly available, such as the reservation of all domain names with equivalent character variants in the languages associated with the registered domain name, and, (c) where the registration and administration rules depend on a character variants table, allow registrations in a particular language only when a character variants table for that language is available.
For speakers of the Latin-alphabet-based languages, the easiest way to understand point 3 is to consider the following list of domain names:
In the languages that utilize Latin characters (e.g., English, Finnish, German, Italian, etc.), each letter has two variants: upper case and lower case. The Internet's basic DNS and hostname specifications provide that the upper-case and lower-case variants of each letter are considered to be equivalent. Thus, all the variant domain names in the above list are treated as the same domain name.
Other languages' character sets present similar problems of variants and equivalence that is, when one character (or Unicode code point) is considered to be the same as another character (or Unicode code point). "Sameness" in this context is often extremely complicated to define equivalence can be functional, semantic, or visual and varies from language to language. (For a detailed discussion of these issues in the context of the characters used by the Chinese, Japanese, and Korean languages, see Section 1 of the CJK Guidelines Internet-draft, above.) Accordingly, the global DNS registries need to develop and implement a set of registration policies and administrative procedures for each language prior to accepting IDN registrations in that language's character set.
For nearly all languages, registry-level IDN policies will need to incorporate a language-specific table of character variants and equivalences. Some tables, such as the CJK table, are quite complex; others, such as those for the Romance and Germanic languages (or other languages that rely primarily on the Roman alphabet but also incorporate a limited number of extra characters or accents), may be quite short and straightforward. For example, one might imagine that the German language table will address vowels with Umlauts and the ß (Eszet or "Scharfes S"), perhaps specifying that, e.g., "ä" (a-umlaut) is equivalent to"ae" (though perhaps not in all cases). Indeed, a few languages such as Creole, Indonesian, Malay, Swahili, Xhosa, and Zulu may not require any special provision for character variants and equivalences (at least not for their Latin-character orthographies). The point is that such determinations will require careful study and close collaboration with local experts and, where possible, relevant DNS registries.
The CJK Guidelines introduce the helpful concept of a IDN "package," meaning that when an IDN is registered, the registry applies the table of characters variants, determines which variations are considered equivalent, and groups them all together into a single unit (for all purposes, including transfer and deletion).4 If more than one language is associated with the registered IDN, then each associated language's ruleset must be applied, with each generating additional character variants to be included in the package. The entire package is registered or reserved to the registrant, meaning that the registration of a single IDN also brings with it the registration or reservation of any and all equivalents. Whether or not the equivalent domain names are treated as live registrations (i.e., are included in the relevant zone file) and whether or not additional registration fees are charged for equivalents are matters for each registry to resolve for itself, after consultation with linguistic experts and others in the affected community.
(Smart registries are likely to pursue registrant-friendly practices and policies, because the competitive global market in domain name registrations means that registries that mishandle IDNA implementation for example, by failing to treat equivalents as equivalents will lose customers to other registries in the short term, and will do long-lasting damage to their TLD brand. By way of ASCII analogy, it doesn't take much imagination to recognize that if the .foo registry began to allow different registrants to register MICROSOFT.foo, microsoft.foo, microSOFt.foo, and so forth, both registrants and Internet users would cease to regard the .foo TLD as a reliable source of identifiers. The stakes in the implementation of IDNA are just as great. User trust is at the heart of why it is so important for registries to develop and apply appropriate, language-specific, locally-legitimated registration policies for IDN registrations.)
As noted, to assure broad acceptance by Internet users, the creation of language-specific policies and rules is essential and should be done by the affected DNS registries in collaboration with local experts. The creation of the CJK Guidelines document demonstrates that experts and registries can work together to produce a carefully-considered, well-documented, technically sensitive, and administratively implementable set of rules and policies for a defined set of characters. The authors of that document and the registries that supported them deserve much credit for their hard work, and for taking a pioneering lead in creating practical solutions to the many complex problems presented by the implementation of IDNA. Ideally, a guidelines document should be developed and published for each language in which a DNS registry wishes to accept IDNA registrations.
Once a registry develops registration and administration rules for a particular language, it is important that these rules be fully documented and made available online. By making the rules fully available, all the affected parties will be able to work toward the mutually beneficial goals of predictability, simplicity, and uniformity. While different registry operators should be free, with reasons that are compelling to them, to adopt different registration and administration rules, they should not be forced into taking different approaches simply by ignorance of what others are doing. It is in the mutual interest of users, application developers, and registration authorities alike to promote full transparency of registration and administration rules followed by all registries for all languages.
4. Registries must commit to working collaboratively through the IDN Registry Implementation Committee to develop character variants tables and language-specific registration policies, with the objective of achieving consistent approaches to IDN implementation for the benefit of DNS users worldwide.
This requirement is intended to assure the Internet community that the registries under contract with ICANN will continue working together for the common good of the DNS together, we sincerely hope, with those ccTLD registries working on IDN implementation. It is extremely important that the introduction of IDNA preserve the consistency, reliability, and inter-compatibility that characterize the DNS. Registry-level collaboration in areas of common concern is essential to accomplishing that goal. This collaboration requirement, however, is not intended to constrain the freedom of registries to choose which variants tables and registration policies to apply.
This principle is stated as a recommendation rather than a requirement. At this early stage of IDNA implementation, it appears that massive confusion and cybersquatting will result if a single domain name label is allowed to include characters associated with different languages. For example, the Roman, Greek, Cyrillic, and Armenian character sets include numerous characters that appear to be identical but are separate code points on the Unicode tables. Other languages, such as Turkish, present similar problems.
Over time, after careful study and the accumulation of experience, it may become clear that the limitation of single labels to single languages' characters can be relaxed in some cases. But for now, at the outset of IDNA implementation, it seems clear that potential risks in terms of confusion and cybersquatting are enormous, and can easily be avoided by application of this straightforward rule.
Note that this principle is written carefully to recognize that some languages share characters with other languages, and that those shared characters should not somehow be excluded. The "characters associated with one language" may include characters used by other languages, too.
This principle, too, is written as a common-sense recommendation, not a requirement. Customers speaking a language whose characters are offered as IDNs will reasonably expect to get customer service in their language. It is, of course, up to registries and registrars to determine what languages they will support. But if a registry or registrar offers an IDN service targeted at speakers of a particular language, it would be prudent to have the ability to communicate in that language with registrants.
Subject to feedback and comments from the ICANN community, it is proposed to apply the foregoing standards to the authorization of IDN registrations by registries with agreements with ICANN. The actual procedure would be quick and straightforward: the registry would submit to ICANN an agreed statement of its commitment to abide by the required principles stated in the first four points above. ICANN would, in turn, provide written authorization to the registry to begin accepting IDNA-compliant IDN registrations.
The ongoing IDN implementation tasks of common concern to all registries implementing IDNA such as the development of character variant and equivalence tables in consultation with local experts and affected registries must proceed, through the IDN Registry Implementation Committee and the various local and regional bodies that have taken it on. To assure the rapid introduction of IDNs in the major languages' character sets, the development of language-specific rulesets must proceed in parallel, in both formally-constituted and ad hoc groupings of experts and registries (both gTLDs and ccTLDs).
3 More precisely, each of those registry agreements provides that the registry must reserve from initial registration "All labels with hyphens in the third and fourth character positions" (for example, xn--1k2n4h4b.org), except as expressly authorized by ICANN in writing. See, for example, the .org registry agreement, Appendix K: <http://www.icann.org/tlds/agreements/unsponsored/registry-agmt-appk-26apr01.htm>. Under the recently-finalized IDNA protocol, all internationalized domain names are converted into ASCII strings with hyphens in the third and fourth character positions.
4 Consistent with the standard usage in the DNS technical documentation, the CJK Guidelines document correctly distinguishes between a "domain name" and a "label." A "label" in a domain name refers to a single segment or zone (i.e, the string of characters that come between two sequential dots in a fully qualified domain name); a "domain name" is all of the segments or zones joined together. Thus, in the domain name "www.aso.icann.org," the "icann" portion constitutes one label. Because the CJK Guideline document is intended to be applied on a zone-by-zone basis, one label at a time, it focuses on "internationalized domain labels" rather than "internationalized domain names", and thus speaks of "IDL packages," rather than "IDN packages."
This paper, intended for a less technically oriented audience, uses "internationalized domain name" or "IDN" to mean "a domain name that contains at least one internationalized domain label."