Internationalized Domain Names (IDN) Committee

Discussion Paper on Non-ASCII Top-Level Domain Policy Issues
16 April 2002


The purpose of this discussion paper is to outline the ICANN IDN Committee's current preliminary thinking with respect to strategies for the selection of non-ASCII top-level domains (TLDs).* As such, the paper deals with a speculative future that is possible, but by no means guaranteed: namely, a future in which the IDNA standard currently under consideration by the IETF becomes deployable at the TLD level. For that future to become reality, a number of contingencies would have to occur, including rigorous testing of non-ASCII TLD strings (meaning, more accurately, their ASCII LDH encodings, which would typically not be visible to users) in the root zone environment. As part of its charter, however, the ICANN IDN Committee is tasked with anticipating the policy issues that would arise if and when ICANN confronts demonstrably deployable non-ASCII TLDs.

With that caveat in mind, this paper outlines the broad policy analysis undertaken by the committee to date, including a preliminary framework for selection of registry operators for non-ASCII TLDs. The paper reviews some advantages and disadvantages of each of the proposed approval mechanisms. It concludes by outlining a process designed to garner ICANN community feedback and commentary on sequential drafts of this paper before the Committee submits its final report to the ICANN Board later this calendar year.

The committee welcomes feedback on the contents of this discussion paper. Because the concept and technical implementations of Internationalized Domain Names are 'works in progress', the committee emphasizes that the information and views contained in this discussion paper may become inaccurate or outdated. Comments should be sent to idn-comment@icann.org.

In simple terms, the Committee's current thinking focuses on extending to the IDN namespace existing policies and concepts for the creation of ASCII generic TLDs (gTLDs) and ASCII country-code TLD (ccTLDs), which have been developed and refined over time, while giving due consideration to additions and variations in policy to take into account unique factors related to the use of non-ASCII characters within the DNS.


This paper suggests a tentative framework for classifying possible non-ASCII TLDs. The rationale for classification is a recognition that different types of TLDs (whether ASCII or non-ASCII) may require different selection mechanisms, depending upon the peculiar policy considerations that arise in connection with each category. For example, as elaborated below, current ICANN policy provides different selection mechanisms for ccTLDs and gTLDs.

A comprehensive selection and implementation process for non-ASCII TLDs would include a number of steps, including:

  • Finalization of IDNA standard;
  • Root zone implementation testing;
  • Selection of registry operators; and
  • Registry-level testing and deployment.

It is anticipated that the ICANN IDN Committee will publish a further paper that considers the issue of potential non ASCII TLD selection processes and criteria in the near future.

In thinking about the creation of non-ASCII TLDs, the committee began with the existing ICANN policy baseline for the creation of new TLDs, including the principles that (1) TLD expansion should occur in a careful and controlled fashion, with regard for the overall stability of the DNS; (2) the sudden introduction of a massive number of new TLDs would be a bad idea; (3) a new TLD can only be created if there is a willing and able registry operator to run it; and (4) different categories of new TLDs may require different contractual, policy, and selection frameworks; and (5) the selection process should be transparent, allowing key stakeholders to participate in it.

The committee has generally agreed on two additional principles that relate specifically to non-ASCII TLDs, namely: (6)  the core purpose of introducing non-ASCII TLDs would be to make the DNS service easier to use for Internet users whose native languages include non-ASCII characters; and (7) a new TLD should only be introduced if user demand can be demonstrated to exist.

For purposes of this discussion paper, the committee takes as its hypothesis the notion that potential new TLDs should logically be classified according to the evident semantic meaning of the TLD string itself. In other words, a given TLD string should be understood as having a certain intended meaning, should be classified accordingly, and should be matched to an appropriately tailored selection process.

The Committee notes that the potential for semantic ambiguity exists when different non ASCII TLDs are interpreted by Internet users who speak different languages. While every reasonable attempt will be made to minimize such ambiguities, it is impossible to eliminate this problem altogether.

The concept of semantic meaning is central to this analysis. For example, the .info TLD was treated as global and unrestricted, on the basis of the semantic link between the term "info" and the concept of "information." The .biz TLD is restricted to business registrants, on the basis of the semantic link between the term "biz" and the concept of "business." The .museum TLD is restricted to museum registrants, on the basis of the semantic link between the term "museum" and the community of "museums." In the realm of the country-code TLDs, there is a semantic link between the assigned two-letter codes and the name of the associated geographic unit. Thus, .no is semantically linked to Norway, .us to the United States, .za to South Africa, and .ch to Switzerland. Under longstanding IANA policy (see RFC-1591 and ICP-1 ), these two-letter codes are taken strictly from the ISO-3166-1 table (with a few historical exceptions), meaning that IANA/ICANN stays out of the business of determining what is and is not a country (or geographically distinct territory), and what name or abbreviation is semantically associated with any given geographic unit. Those problems are properly excluded from the ICANN process, and resolved by a politically expert, internationally-recognized body, the International Organization for Standardization and its ISO 3166 Maintenance Agency.

Following from these premises, the committee undertook to imagine the DNS namespace as a whole (including ASCII and non-ASCII TLDs), and to attempt a general (and probably incomplete) classification of potential TLD strings, oriented around semantic meaning.

The committee identified the following basic categories of potential TLD strings, based on the semantic meaning of the string itself:

1. Semantic association with Geographic Units
2. Semantic association with Languages
3. Semantic association with Cultural Groups or Ethnicities
4. Semantic association with Existing Sponsored TLDs
5. Semantic association with Existing Unsponsored TLDs
6. Everything else.

More detailed definitions of these categories, as currently understood by the committee, are set out below.

In the following sections, the committee identifies some specific questions for comment.  However, the committee does not intend to limit comment and input to these questions alone.  The committee invites general comment on any aspect of the issues raised in this paper.



By "semantic association with recognized geographic units", we mean a TLD string that to a typical reader would be clearly linked to a recognized geographic unit, as is the case with the existing ASCII ccTLDs.  A non-ASCII TLD consisting of Japanese characters semantically associated with the recognized geographic unit of Japan would logically be <日本>.


  • The existing ISO-3166-1 table provides a clear definition of recognized geographic units.  Use of that table allows ICANN to avoid the problem of deciding what is and it not a country or geographically distinct territory.  It would be easy for ICANN to determine whether or not the table includes the geographic unit with which a given proposed TLD is associated.

  • The process for delegating ASCII ccTLDs is well-defined, and could simply be extended to the non-ASCII space.  Under the terms of RFC-1591 and ICP-1, an entity seeking to run a non-ASCII TLD semantically associated with a recognized geographic unit would have to submit a proposal demonstrating wide support among the Internet stakeholders within that defined area, and establishing the technical competence of the proposed registry operator.  In the interest of transparency and the testing of consensus, proposals for this category of non-ASCII TLD could be posted publicly, and views sought from major stakeholders (including, for example, ISP associations, the ASCII ccTLD registry operator, and the government).

  • In the area of geography, the use of the ISO-3166-1 provides certainty both by inclusion and exclusion.  In other words, ICANN can maintain that only the geographic units recognized on that table are available for non-ASCII TLDs with semantic association; the names of cities, counties, provinces, and other geographic units would implicitly be excluded.  This principle would minimize ICANN/IANA having to make judgments about the sovereign authority over a given geographic area.

  • The use of the principles of RFC-1591 and ICP-1 in the context of non-ASCII TLDs would also clarify that any TLD string semantically associated with a given geographic unit would not be eligible for delegation (regardless of the language or script) except under those principles.  In other words (by way of hypothetical example), the local Internet community in New Kryptonia is the relevant decision maker for any terms semantically associated (within reason) with its officially recognized name, whether the string in question is in English, Japanese, Arabic, or Kryptonian.

  • A limit to the total number of TLDs eligible for delegation to a given geographic unit might be set at a number equal to the number of its official languages.


  • The ISO-3166-1 table solves the problem of what is and is not a recognized geographic unit (country or geographically distinct territory).  However, the table only provides two- and three-letter ASCII codes for each such geographic unit.  The table does not solve the problem of what non-ASCII names (or abbreviations) should be assigned to each recognized geographic unit.  This is a very serious disadvantage that might be approached in three ways:

(1) Locate (or ask a properly legitimate body like the ISO to create) a table equivalent to ISO-3166-1, expanded to include names in all non-ASCII scripts.  This would be a massive undertaking with enormous political complications.

(2) Allow proposers to designate their desired non-ASCII TLD string, and rely upon a requirement of consensus within a local Internet community to determine whether the proposed TLD string is appropriate.  This could potentially place on ICANN a massive and unsustainable set of political burdens for which it is not well suited. 

(3) Ask ICANN's Governmental Advisory Committee to sort the issue out.  This option may be very unappealing to the GAC for similar reasons.

Questions for Comment:

1. Current ICANN/IANA policy permits the delegation of ASCII ccTLDs only when a given geographic unit and its associated codes appear on the ISO 3166-1 list.  The ISO-3166-1 list defines not only recognized countries and territories, but also the specific 2-letter ASCII codes associated with each.  No such authoritative list exists to define country/territory names in non-ASCII characters, leaving ICANN without a given reference point.  How should this problem be handled?  Should it be left to each proposer to identify the desired TLD string and to justify it?  If so, what criteria should be applied to evaluate the proposed string's suitability for (and acceptability to) the given geographic unit?  If not, to what neutral and authoritative arbiter should this problem be referred?

2.  Current ICANN/IANA policy requires proposers for new ASCII ccTLDs to demonstrate support from the local Internet community.  How can this requirement be objectively evaluated in the case of a proposed non-ASCII TLD semantically associated with a recognized geographic unit?  What indicators of support should be expected or required?

3. What role, if any, should be played by the manager of the existing ASCII ccTLD for that geographic unit?

4. What role, if any, should be played by governments in the selection of non-ASCII TLDs associated with its country/territory?



By "semantic association with language"**, we mean a TLD string that to a typical reader would be clearly linked to the name of a language – for example, the Arabic word for "Arabic."


  • If the objective of IDNs is to enable users to easily type domain names in familiar, non-ASCII scripts (while preserving universal uniqueness and resolvability), it might be easiest to simply create a single TLD for each non-ASCII script, allowing the registry operator to make decisions about lower-level naming conventions.

  • A language-associated TLD string may assist in the development of global language-based Internet communities, particularly where the language speakers are widely distributed around the world, for example, the various Cambodian-speaking communities.


  • In a sense, a language-associated TLD string would be redundant:  if a domain name consists of Chinese characters, a user might find no added value to require the term "Chinese" (in Chinese characters) as the final label.

  • There does not appear to be a recognized list of all human languages analogous to the ISO-3166-1 table.

  • Language communities cross sovereign national boundaries.  The problem of identifying and achieving consensus among the stakeholders of a given set of language communities may be extremely difficult.  ICANN/IANA might be left with competing claims backed by different stakeholders, or, worse, different national governments.  ICANN/IANA is not well-suited to resolve those kinds of disputes.

  • Languages are the products of thousands of years of history, generate tremendous emotional attachments among people, and have been sources of enormous political controversy.  For these and other reasons, the IETF has focused exclusively on characters and code points, not on languages.  

  • Attempting to create TLDs semantically linked to languages would raise a large number of extremely difficult political problems.  The Chinese-speaking community, for example, includes 1.2 billion people in mainland China, 22 million in Taiwan, and millions more in Singapore, Malaysia, and the United States.  To select a single registry operator in such a complicated and sensitive political environment might simply be an impossible task for a technical coordinating body like ICANN.

Questions for Comment:

1.  Is there any recognized, authoritative reference list of languages (analogous to ISO 3166-1) that could be employed as a reference against which to judge proposals for language-associated non-ASCII TLDs?

2.  The relevant community of interest for a given language is the set of all speakers of that language.  Is it likewise correct that the relevant community to be served by a given language-associated non-ASCII TLD would be the set of all speakers of the languages that utilize the characters that comprise the proposed language-associated non-ASCII TLD string?  If not, how should ICANN define the community to be served by the manager of a language-associated TLD?

3.  Is it correct that a non-ASCII TLD semantically associated with the name of a language is essentially redundant, given that the domain is by its very nature an expression of that language?

4.  Languages are often spoken across territorial boundaries and under different and sometimes hostile governments, making the usual requirement of community consensus extremely difficult to establish or document.  Given the near-impossibility of demonstrating, for example, a consensus among all the worldwide Internet communities consisting of Arabic speakers, should this category simply be excluded as a matter of policy pragmatism?  If not, how could "consensus" across many territorial boundaries be objectively demonstrated?

5.  What role, if any, should by played by governments in the selection of non-ASCII TLDs semantically associated with languages?

6.  What role, if any, should by played by recognized language authorities (for example, l'Académie française) in the selection of non-ASCII TLDs semantically associated with languages?



By "semantic association with a cultural group or ethnicity," we mean a TLD string that to a typical reader would be clearly linked to a cultural group or ethnicity that is not defined or limited by recognized national boundaries – for example, the Kurdish or Swahili peoples.


  • All of the problems with language-associated TLDs would apply to this set of TLD strings, as well. There appears to be no internationally-recognized and legitimate list of cultural and ethnic groups. These groups cross national borders. The attempted to identify and verify consensus among stakeholders in these communities would be extremely difficult.

  • As with languages, the names of cultures and ethnicities are the subjects of great emotion and, often, political controversy. For ICANN even to consider creating TLD semantically associated with such groups would be to invite a storm of controversy.

Questions for Comment:

1. Is there any recognized, authoritative reference list of cultures and ethnicities (analogous to ISO 3166-1) that could be employed as a reference against which to evaluate proposals for non-ASCII TLD strings semantically linked to them?

2. How should we define the community to be served by the manager of a non-ASCII TLD semantically linked to a culture or ethnicity?

3. Given the vast range of complicated political controversies and the difficulty of demonstrating, for example, a consensus among all members of a given culture or ethnicity, should this category simply be excluded as a matter of policy pragmatism? If not, how could ICANN avoid being caught in the middle of innumerable emotionally charged and politically sensitive fights?

4. What role, if any, should by played by governments in the selection of non-ASCII TLDs semantically linked to a culture or ethnicity?



By "semantic association with an existing sponsored gTLD," we mean a non-ASCII TLD string that to a typical reader would be clearly linked to an existing ASCII sponsored TLD.  Currently, that list includes .aero, .coop, and .museum.  For purposes of this paper, the category arguably includes .edu and .int also.  A example of a non-ASCII TLD semantically associated with .museum would be the TLD string consisting of the Hangul (Korean) characters meaning "museum" in Korean.

The issue with this group is simple:  Should existing registry sponsors have a right to TLD strings in non-ASCII characters with the same semantic meaning as their ASCII TLD string?  Should they be given a preference, or be treated the same as any other proposer?

Advantages of incumbent preference:

  • Giving a preference for equivalent non-ASCII strings to existing ASCII sponsored registries would be simple, and somewhat logical.  Once ICANN has concluded that a given sponsor is an appropriate proxy and policymaker on behalf of the community to be served by the TLD, it could logically be considered to have equivalent legitimacy across non-ASCII TLD strings as well.

Disadvantages of incumbent preference:

  • The selection of TLD registries is complicated.  A representative sponsor for an ASCII TLD may not be best for all communities, particularly where the scope of a given script's use is highly localized.  For that reason, it may be best to place no hard rights or prohibitions on the allocation of TLD strings with semantic association to existing sponsored TLDs, treating them as any other new TLD, open to any proposer, but also allowing the existing ASCII TLD registry sponsors to present proposals for equivalent non-ASCII TLDs, with relevant justification for their role.

  • Each TLD string should be treated differently, and should be open for proposals to any potential registry operator that can establish the basic requirements for a sponsored TLD, including support within the community to be served.

Questions for Comment:

What process and criteria should be employed to select non-ASCII TLDs semantically associated with an existing sponsored gTLD?  Are any special considerations required?



By "semantic association with existing unsponsored gTLDs", we mean a non-ASCII TLD string that to a typical reader would be clearly linked to the an existing unsponsored ASCII gTLD, such as .com, .net, .org, .info, .biz, or .name.

Here again, the issue is whether to give any advantage to the current registry operators of the existing ASCII unsponsored gTLDs.

Advantages of incumbent preference:

  • There appear to be no significant advantages, other than for the existing registry operators themselves.

Disadvantages of incumbent preference:

  • Given the generic nature of the terms at issue, and the wide-ranging complexities of meaning across languages, it would be extremely difficult to determine which non-ASCII words and abbreviations should qualify for the preference; indeed, it is not clear why the preference would not also logically extend to ASCII TLD strings, such as "company" for the .com registry operator.

  • The principles of registry-level competition and geographic distribution of registries both argue strongly against giving any preference to existing ASCII gTLD registry operators.  Such a preference would promote market concentration, rather than competition.

Questions for Comment:

What process and criteria should be employed to select non-ASCII TLDs semantically associated with an existing unsponsored gTLD?  Are any special considerations required?



In this category, we mean to include every non-ASCII word, abbreviation, or other string that is not semantically associated with one of the above 5 categories.


For this category (and perhaps in categories 4 and 5, above), the committee concludes that no distinctions should be drawn between ASCII and non-ASCII new TLDs – the process that is used for new ASCII TLDs (as refined over time and with experience) should equally apply to new non-ASCII TLDs (once the technical standard is completed and deployable at the TLD level).

The key elements of that process are:  open call for proposals, defined criteria for selection, independent review by technical and financial experts, and full transparency of all proposals.  The committee sees no reason why these elements could not apply equally to non-ASCII TLD proposals, with some added criteria for selection, perhaps focusing on the proposed registry's plans to meet the needs of (and make policy for) the language communities to be served by a given TLD string in a given script.

Questions for Comment:

What process and criteria should be employed to select non-ASCII TLDs that do not fall into one of the defined categories above – in other words, non-ASCII gTLDs?


The ICANN IDN Committee tentatively proposes to undertake the process outlined below to ensure that key ICANN community stakeholders are given ample opportunity to comment on this paper during the course of the Committee's deliberations over the ensuing months.

Interested stakeholders should make particular note of the two public comment periods detailed below where the Committee seeks feedback from stakeholders regarding the first and second drafts of this paper. Comments for the Committee's consideration should be sent to idn-comment@icann.org.

17 April 2002 - Publish Issue Paper for non ASCII TLDs.

17-30 April  - First Public Comment Period.

10 May - Publish Second Draft of Issue Paper for non ASCII TLDs.

10-30 May - Second Public Comment Period.

17 June - Publish IDN Committee Report to the Board

24-29 June - Discussion of IDN Committee Report by ICANN Community & Board (Bucharest)



* By "non-ASCII TLDs" we mean top-level domains that include in the TLD string itself characters other than the currently allowed ASCII "LDH code points" repertoire (meaning the code points associated with ASCII letters, digits, and the hyphen-minus; that is, U+002D, 30..39, 41..5A, and 61..7A). "LDH" is a commonly-used abbreviation for "letters, digits, hyphen."  In this paper, we use the term "non-ASCII" as a shorthand for "characters other than ASCII LDH."

** The terms "language," "script," and related concepts are commonly mis-defined and therefore misunderstood in discussions regarding internationalization of computers and the Internet.  Readers are referred to Paul Hoffman's Internet Draft entitled "Terminology Used in Internationalization in the IETF" for a comprehensive list of definitions.  Also, see the IANA language registry, which is defined by RFC 3066.

Comments concerning the layout, construction and functionality of this site
should be sent to webmaster@icann.org .

Page Updated 17-Apr-2002
©2002  The Internet Corporation for Assigned Names and Numbers. All rights reserved.