Internationalized Domain Names (IDN) Committee

Discussion Paper on Non-ASCII Top-Level Domain Policy Issues
(Revised Draft
13 June 2002


The purpose of this discussion paper is to outline the ICANN IDN Committee's current preliminary thinking with respect to strategies for the selection of non-ASCII top-level domains (TLDs).[1]  As such, the paper deals with a speculative future that is possible, but by no means guaranteed:  namely, a future in which the IDNA standard currently under consideration by the IETF becomes deployable at the TLD level.  For that future to become reality, a number of contingencies would have to occur, including rigorous testing of non-ASCII TLD strings (meaning, more accurately, their ASCII LDH encodings) in the root zone environment.  As part of its charter, however, the ICANN IDN Committee is tasked with anticipating the policy issues that would arise if and when ICANN confronts demonstrably deployable non-ASCII TLDs.[2] 

With that caveat in mind, this paper outlines the broad policy analysis undertaken by the committee to date, including a preliminary framework for selection of registry operators for non-ASCII TLDs.  The paper reviews some advantages and disadvantages of each of the proposed approval mechanisms. It concludes by outlining a process designed to garner ICANN community feedback and commentary on sequential drafts of this paper before the committee submits its final report to the ICANN Board later this calendar year.

The committee welcomes feedback on the contents of this discussion paper.  Because the concept and technical implementations of Internationalized Domain Names are "works in progress", the committee emphasizes that the information and views contained in this discussion paper may become inaccurate or outdated.  Comments should be sent to <idn-comment@icann.org>.

In simple terms, the committee's current thinking focuses on extending to the IDN namespace existing policies and concepts for the creation of ASCII generic TLDs (gTLDs) and ASCII country-code TLD (ccTLDs), which have been developed and refined over time, while giving due consideration to additions and variations in policy to take into account unique factors related to the use of non-ASCII characters within the DNS. 


This paper suggests a tentative framework for classifying possible non-ASCII TLDs.  The rationale for classification is a recognition that different types of TLDs (whether ASCII or non-ASCII) may require different selection mechanisms, depending upon the peculiar policy considerations that arise in connection with each category.  For example, as elaborated below, current ICANN policy provides different selection mechanisms for ccTLDs and gTLDs.

A comprehensive selection and implementation process for non-ASCII TLDs would include a number of steps, including:

  • Finalization of IDNA standard;
  • Decision to proceed to adopt non-ASCII TLDs;
  • Root zone implementation testing;
  • Selection of registry operators; and
  • Registry-level testing and deployment.

It is anticipated that the ICANN IDN Committee will publish a further paper that considers the issue of potential non ASCII TLD selection processes and criteria in the near future.

In thinking about the creation of non-ASCII TLDs, the committee began with the existing ICANN policy baseline for the creation of new TLDs, including the principles that  (1) TLD expansion should occur in a careful and controlled fashion, with regard for the overall stability of the DNS;  (2) the sudden introduction of a massive number of new TLDs would be a bad idea;  (3) a new TLD can only be created if there is a willing and able registry operator to run it;  and (4) different categories of new TLDs may require different contractual, policy, and selection frameworks; and (5) the selection process should be transparent, allowing key stakeholders to participate in it.

The committee has generally agreed on two additional principles that relate specifically to non-ASCII TLDs, namely:  (6) the core purpose of introducing non-ASCII TLDs would be to make the DNS service easier to use for Internet users whose native languages include non-ASCII characters; and (7) a new TLD should be introduced only if some sort of user demand can be demonstrated to exist.

For purposes of this discussion paper, the committee takes as its hypothesis the notion that potential new TLDs should logically be classified according to the evident semantic meaning of the TLD string itself.  In other words, a given TLD string should be understood as having a certain intended meaning, should be classified accordingly, and should be matched to an appropriately tailored selection process.

The committee notes that the potential for semantic ambiguity exists when different non ASCII TLDs are interpreted by Internet users who speak different languages. While every reasonable attempt will be made to minimize such ambiguities, it is impossible to eliminate this problem altogether. The committee further notes that at least the following policy issues are highly likely to require consideration in the future:

  • Nothing within any future non ASCII TLD space constrains names or labels to be in any language at all.  For example, nothing in the protocols would prevent a domain label from being created (at the top-level or otherwise) that consists of a Chinese character, followed by a roman-derived character, followed by a Thai character, followed by an Arabic character, followed by a Cyrillic character, etc.
  • Even if IDNs compatible with a given language at the second level of the TLD are permitted, enforcement of a restriction at the third or fourth level or beyond is likely to prove challenging.

These issues will be dealt with in greater detail in the committee's final report to the ICANN Board.

The concept of semantic meaning is central to this analysis.  For example, the .info TLD was treated as global and unrestricted, on the basis of the semantic link between the term "info" and the concept of "information."  The .biz TLD is restricted to business registrants, on the basis of the semantic link between the term "biz" and the concept of "business."  The .museum TLD is restricted to museum registrants, on the basis of the semantic link between the term "museum" and the community of "museums."  In the realm of the country-code TLDs, there is a semantic link between the assigned two-letter codes and the name of the associated geographic unit.  Thus, .no is semantically linked to Norway, .us to the United States, .za to South Africa, and .ch to Switzerland.  Under longstanding IANA policy (see RFC-1591 and ICP-1),

these two-letter codes are taken strictly from the ISO-3166-1 table (with a few historical exceptions), meaning that IANA/ICANN stays out of the business of determining what is and is not a country (or geographically distinct territory), and what name or abbreviation is semantically associated with any given geographic unit.  Those problems are properly excluded from the ICANN process, and resolved by a politically expert, internationally-recognized body, the International Organization for Standardization and its ISO 3166 Maintenance Agency.

Following from these premises, the committee undertook to imagine the DNS namespace as a whole (including ASCII and non-ASCII TLDs), and to attempt a general (and probably incomplete) classification of potential TLD strings, oriented around semantic meaning.

The following basic categories were identified:

  1. Semantic association with Geographic Units.
  2. Semantic association with Languages.
  3. Semantic association with Cultural Groups or Ethnicities.
  4. Semantic association with Existing Sponsored TLDs.
  5. Semantic association with Existing Unsponsored TLDs.
  6. Everything else.

More detailed definitions of these categories, as currently understood by the committee, are set out below.



By "semantic association with recognized geographic units", we mean a TLD string that to a typical reader would be clearly linked to a recognized geographic unit, as is the case with the existing ASCII ccTLDs.  A non-ASCII TLD consisting of Japanese characters semantically associated with the recognized geographic unit of Japan would logically be  <日本>.


  • The existing ISO-3166-1 table provides a clear definition of recognized geographic units.  Use of that table allows ICANN to avoid the problem of deciding what is and it not a country or geographically distinct territory.  It would be easy for ICANN to determine whether or not the table includes the geographic unit with which a given proposed TLD is associated.

  • The process for delegating ASCII ccTLDs is well-defined, and could simply be extended to the non-ASCII space.  Under the terms of RFC-1591 and ICP-1, an entity seeking to run a non-ASCII TLD semantically associated with a recognized geographic unit would have to submit a proposal demonstrating wide support among the Internet stakeholders within that defined area, and establishing the technical competence of the proposed registry operator.  In the interest of transparency and the testing of consensus, proposals for this category of non-ASCII TLD could be posted publicly, and views sought from major stakeholders (including, for example, ISP associations, the ASCII ccTLD registry operator, and the government).

  • In the area of geography, the use of the ISO-3166-1 provides certainty both by inclusion and exclusion.  In other words, ICANN can maintain that only the geographic units recognized on that table are available for non-ASCII TLDs with semantic association; the names of cities, counties, provinces, and other geographic units would implicitly be excluded.  This principle would minimize ICANN/IANA having to make judgments about the sovereign authority over a given geographic area.

  • The use of the principles of RFC-1591 and ICP-1 in the context of non-ASCII TLDs would also clarify that any TLD string semantically associated with a given geographic unit would not be eligible for delegation (regardless of the language or script) except under those principles.  In other words (by way of hypothetical example), the local Internet community in New Kryptonia is the relevant decision maker for any terms semantically associated (within reason) with its officially recognized name, whether the string in question is in English, Japanese, Arabic, or Kryptonian.

  • A limit to the total number of TLDs eligible for delegation to a given geographic unit might be set at a number equal to the number of its official languages.


  • The ISO-3166-1 table solves the problem of what is and is not a recognized geographic unit (country or geographically distinct territory).  However, the table only provides two- and three-letter ASCII codes for each such geographic unit.[3]  The table does not solve the problem of what non-ASCII names (or abbreviations) should be assigned to each recognized geographic unit.  This is a very serious disadvantage that might be approached in three ways:  

(1)  Locate (or ask a properly legitimate body like the ISO to create) a table equivalent to ISO-3166-1, expanded to include names in all non-ASCII scripts.  This would be a massive undertaking with enormous political complications.

(2) Allow proposers to designate their desired non-ASCII TLD string, and rely upon a requirement of consensus within a local Internet community to determine whether the proposed TLD string is appropriate.  This could potentially place on ICANN a massive and unsustainable set of political burdens for which it is not well suited.

(3) Ask ICANN's Governmental Advisory Committee to sort the issue out.  This option may be very unappealing to the GAC for similar reasons.


  • The committee believes that ICANN should not be placed in the role of determining what is and is not an appropriate TLD string for a given geographic unit.  Accordingly, TLDs that correspond to geographic units should only be created on the basis of an authoritative list defined and maintained by a legitimate international organization, such as the ISO.  See footnote [3] for some options in this area.
  • The committee believes that no automatic preference should be given to current managers of ASCII ccTLD registries in the context of delegating non-ASCII TLDs.  The applicable requirements should continue to be the support of the local Internet community, technical competence, and acceptance of the responsibilities of service to the local and global Internet communities.  In some cases, the manager of an ASCII ccTLD registry may be able to earn the support of the local Internet community to operate a non-ASCII TLD for that same geographic unit; in other cases, it will not.  The requirements should be applied evenhandedly to all, without giving any automatic preference to current ASCII ccTLD managers.  Among commentators, those who have taken the opposite view have almost uniformly been managers of current ccTLD registries.
  • Some commentators suggested that the national government alone should be the determiner of a local Internet community's wishes with regard to any non-ASCII TLDs corresponding to that geographic unit.  This would be an easy solution from an administrative standpoint; at a minimum, in the absence of an authoritative reference list, the government's views could be considered decisive as to the definition of the non-ASCII TLD string itself.  The committee nevertheless believes that ICANN should rely on an appropriate, externally-defined authoritative list for geographic unit TLD strings, meaning that government views should be addressed to the organization that maintains the authoritative list (for example, the ISO or a similar international entity), rather than asking ICANN to address them. 
  • Several commentators urged that ICANN defer to "recognized regional bodies" or "recognized language organizations" to handle tasks such as the determination of TLD strings and the delegation of registry managers, on the theory that each language comprises a distinct namespace.  There are a number of problems with this point of view.  First, many languages share characters and code points in common, making clean divisions of "namespaces" by language impossible.  Second, though the committee supports the principles of localization and decentralization in decision-making, the committee believes that these decisions should primarily rest with local Internet communities.  It is not clear why regional or language-specific bodies would be free from the complex political and other pressures that could be brought to bear on ICANN.  The committee favors a simple approach that minimizes the discretionary role of ICANN.  As to the determination of TLD strings, the committee strongly believes that ICANN should maintain its policy of deferring to ISO or an similarly-qualified international body.  As to the delegation of registry managers, that is a more complex issue that should primarily be addressed within the local Internet community.  Whether there is a role or regional or language-specific organizations should be assessed on the basis of the wishes of the local Internet community.  
  • In view of these many difficult and complex issues, some commentators have argued against the creation of any non-ASCII TLDs semantically associated with geography, language, or culture, instead favoring non-ASCII TLD strings consisting of generic terms.



By "semantic association with language" [4], we mean a TLD string that to a typical reader would be clearly linked to the name of a language -- for example, the Arabic word for "Arabic."


  • If the objective of IDNs is to enable users to easily type domain names in familiar, non-ASCII scripts (while preserving universal uniqueness and resolvability), it might be easiest to simply create a single TLD for each non-ASCII script, allowing the registry operator to make decisions about lower-level naming conventions.

  • A language-associated TLD string may assist in the development of global language-based Internet communities, particularly where the language speakers are widely distributed around the world, for example, the various Cambodian-speaking communities.
  • An authoritative international list of the world's languages exists in the international standard ISO 639, which consists of two parts.  ISO 639 specifies a two-letter code identifying languages. (This standard is presently under revision and will be published later this summer as ISO 639-1).  ISO 639-2 specifies a three-letter ASCII code identifying languages.  Both standards are updated by Registration Authorities -- an ISO office similar to the Maintenance Agencies.  For example, Infoterm is the Registration Authority for ISO 639-1, and the Library of Congress, Washington D.C., USA, acts as the Registration Authority for ISO 639-2. However, one commentator has noted that the ISO 639 list of languages is "not even close to exhaustive." Other commentators pointed to the Ethnologue list, which conflicts in some respects with ISO 639.  The committee has been informed that ISO's TC 37/SC 2 has initiated a process that is intended to lead to extension of the ISO 639 standard; one of the apparent objectives is to generate a comprehensive list of language identifiers. Finally, it is noteworthy that RFC 3066 specifies Internet best current practice for tags for the identification of languages.


  • In a sense, a language-associated TLD string would be redundant:  if a domain name consists of Chinese characters, a user might find no added value to require the term "Chinese" (in Chinese characters) as the final label.

  • There does not appear to be a recognized list of all human languages analogous to the ISO-3166-1 table.  While the ISO 639 list is comprehensive, authoritative, and broadly accepted, it is not universally complete; moreover, as with ISO 3166-1, the three-letter codes associated with each language are given in ASCII characters.

  • Language communities cross sovereign national boundaries.  The problem of identifying and achieving consensus among the stakeholders of a given set of language communities may be extremely difficult.  ICANN/IANA might be left with competing claims backed by different stakeholders, or, worst, different national governments.  ICANN/IANA is not well-suited to resolve those kinds of disputes.

  • Languages are the products of thousands of years of history, generate tremendous emotional attachments among people, and have been sources of enormous political controversy.  For these and other reasons, the IETF has focused, as much as possible, exclusively on characters and code points, not on languages (which cross boundaries, share scripts and characters, and can differ from place to place). 

  • Attempting to create TLDs semantically linked to languages would raise a large number of extremely difficult political problems.  The Chinese-speaking community, for example, includes 1.2 billion people in mainland China, 22 million in Taiwan, and millions more in Singapore, Malaysia, and the United States.  To select a single registry operator in such a complicated and sensitive political environment might simply be an impossible task for a technical coordinating body like ICANN.
  • The linkage of language based non-ASCII TLD rights to a country’s list of official languages may well result in highly charged political debates between different national stakeholders, or, worst, different national governments revolving around the relative approval priority associated with competing non-ASCII TLD bids.
  • The linkage of language based non-ASCII TLD rights to a country’s list of its official languages may also result in a number of non-ASCII TLDs being commercialized in a way that results in the needs and requirements of the originally intended users of those official languages being ignored or forgotten in the pursuit of profit.
  • Due to all the foregoing points, defining the community of interest for a language-associated TLD is extremely complex.  Likewise, ICANN does not seem to be an appropriate forum for fostering language-community consensus on these issues; among possible alternatives would be the ISO, the ITU, and UNESCO.



By "semantic association with a cultural group or ethnicity," we mean a TLD string that to a typical reader would be clearly linked to a cultural group or ethnicity that is not defined or limited by recognized national boundaries – for example, the Kurdish or Swahili peoples.


  • All of the problems with language-associated TLDs would apply to this set of TLD strings, as well.  There appears to be no internationally-recognized and legitimate list of cultural and ethnic groups.  These groups cross national borders.  The attempted to identify and verify consensus among stakeholders in these communities would be extremely difficult.

  • As with languages, the names of cultures and ethnicities are the subjects of great emotion and, often, political controversy.  For ICANN even to consider creating TLD semantically associated with such groups would be to invite a storm of controversy.



By "semantic association with an existing sponsored gTLD," we mean a non-ASCII TLD string that to a typical reader would be clearly linked to an existing ASCII sponsored TLD.  Currently, that list includes .aero, .coop, and .museum.  For purposes of this paper, the category arguably includes .edu and .int also.  A example of a non-ASCII TLD semantically associated with .museum would be the TLD string consisting of the Hangul (Korean) characters meaning "museum" in Korean.

The issue with this group is twofold: Firstly should existing registry sponsors have a right to TLD strings in non-ASCII characters with the same semantic meaning as their ASCII TLD string?  In other words, should they be given a preference, or be treated the same as any other proposer? Secondly, if one language variant is approved for a given sponsored TLD does this automatically imply a right to all other language variants?

Advantages of incumbent preference:

  • Giving a preference for equivalent non-ASCII strings to existing ASCII sponsored registries would be simple, and somewhat logical.  Once ICANN has concluded that a given sponsor is an appropriate proxy and policymaker on behalf of the community to be served by the TLD, it could logically be considered to have equivalent legitimacy across non-ASCII TLD strings as well.  This might also lead to less confusion among users, in that registration rules and registries policies would be consistent.

Disadvantages of incumbent preference:

  • The selection of TLD registries is complicated.  A representative sponsor for an ASCII TLD may not be best for all communities, particularly where the scope of a given script's use is highly localized.  For that reason, it may be best to place no hard rights or prohibitions on the allocation of TLD strings with semantic association to existing sponsored TLDs, treating them as any other new TLD, open to any proposer, but also allowing the existing ASCII TLD registry sponsors to present proposals for equivalent non-ASCII TLDs, with relevant justification for their role.
  • Each TLD string should be treated differently, and should be open for proposals to any potential registry operator that can establish the basic requirements for a sponsored TLD, including support within the community to be served.  At the same time, it may be possible for a single organization to demonstrate its legitimacy and capacity to serve as a global coordinating registry for a given term (i.e., "museum") as to all non-ASCII strings consisting of characters that have the equivalent semantic meaning in some language.



By "semantic association with existing unsponsored gTLDs", we mean a non-ASCII TLD string that to a typical reader would be clearly linked to the an existing unsponsored ASCII gTLD, such as .com, .net, .org, .info, .biz, or .name.

Here again, the issue is whether to give any advantage to the current registry operators of the existing ASCII unsponsored gTLDs.

Advantages of incumbent preference:

  • There appear to be few notable advantages, other than for the existing registry operators themselves. 
  • It should be noted that several current operators of gTLD registries have pointed to a range of arguable advantages, from their perspective, including:  ease of transition, single-point-of-registration services for registrants, consistency and stability of applicable registration rules and policies, and minimization of consumer confusion between semantically-equivalent ASCII and non-ASCII TLDs. 

Disadvantages of incumbent preference:

  • Given the generic nature of the terms at issue, and the wide-ranging complexities of meaning across languages, it would be extremely difficult to determine which non-ASCII words and abbreviations should qualify for the preference; indeed, it is not clear why the preference would not also logically extend to ASCII TLD strings, such as "company" for the .com registry operator.

  • The principles of registry-level competition and geographic distribution of registries both argue strongly against giving any preference to existing ASCII gTLD registry operators.  Such a preference would promote market concentration, rather than competition.



In this category, we mean to include every non-ASCII word, abbreviation, or other string that is not semantically associated with one of the above 5 categories.


For this category (and perhaps in categories 4 and 5, above), the committee concludes that no distinctions should be drawn between ASCII and non-ASCII new TLDs – the process that is used for new ASCII TLDs (as refined over time and with experience) should equally apply to new non-ASCII TLDs (once the technical standard is completed and deployable at the TLD level).  

The key elements of that process are:  open call for proposals, defined criteria for selection, independent review by technical and financial experts, and full transparency of all proposals.  The committee sees no reason why these elements could not apply equally to non-ASCII TLD proposals, with some added criteria for selection, perhaps focusing on the proposed registry's plans to meet the needs of (and make policy for) the language communities to be served by a given TLD string in a given script.


The IDN Committee received a number of thoughtful and helpful comments on this draft.  In particular, the committee thanks Asaad Alnajjar of Millennium Inc. and the Arabic Internet Names Consortium (AINC);  Marilyn Cade of AT&T;  Prof. Kilnam Chon;  Roger Cochetti of VeriSign;  Peter Constable of the Non-Roman Script Initiative, SIL International;  Håvard Hjulstad, chairman of ISO/TC37 and convener of ISO/TC37/SC2/WG1;  Hiro Hotta of JPRS;  Cary Karp of the Museum Domain Management Association;  S. Maniam of the International Forum for IT in Tamil (INFITT);  Jeffrey J. Neuman of  NeuStar, Inc.;  Stefan Probst;  James Seng;  Konstantin Vinogradov of the International Centre for Scientific and Technical Information (ICSTI); Eric Brunner-Williams of Wampumpeag LLC;  Cord Wischhöfer of the ISO 3166 Maintenance Agency; Yoshiro Yoneda;  and Danny Younger.


The committee welcomes feedback from stakeholders regarding this second draft paper. Comments for the committee's consideration should be sent to idn-comment@icann.org .

13 June – Publish Second Draft of non-ASCII TLD Paper

13 June to 20 June - Second Public Comments Period.

22 June - IDN Committee Report to the Board

28/29 June – Consideration of IDN Committee Report by ICANN Board (Bucharest)


[1]  By "non-ASCII TLDs" we mean top-level domains that include in the TLD string itself characters other than the currently allowed ASCII "LDH code points" repertoire (meaning the code points associated with ASCII letters, digits, and the hyphen-minus; that is, U+002D, 30..39, 41..5A, and 61..7A). "LDH" is a commonly-used abbreviation for "letters, digits, hyphen."  In this paper, we use the term "non-ASCII" as a shorthand for "characters other than ASCII LDH.

[2] The Committee wishes to make it abundantly clear to all readers that in no sense is there an implied guarantee in this paper that non-ASCII TLDs will become a reality. Much technical and policy work remains to be undertaken before it is possible to contemplate such a reality with any certainty, let alone have any sense of the actual outcome, one way or the other.

[3] The statement that ISO-3166-1 exists only in French and English is correct, but some significant qualifications must be added.

First qualificiation: One of the two United Nations sources of the country and territory names listed in ISO 3166-1 is the UN Terminology Bulletin entitled "Country Names," which lists about 180 country names in each of the six official languages of the UN:  Arabic, Chinese, English, French, Russian, and Spanish.  UN Terminology Bulletin Country Names, United Nations New York, 1997. (Ref.: Sales No. A/C/E/F/R/S.97.I.19 (ST/CS/SER.F/347/Rev.1)).  As noted by Cord Wischhöfer of the ISO 3166 Maintenance Agency, "this document could well serve as a reference basis for authoritative input into a possible future list of non-ASCII country/territory names for Russian, Arab and Chinese name forms.  For those entities not listed in the UN Terminology Bulletin -- 99 percent of those not listed are dependent territories -- another solution would have to be found."

Second qualification:  Each national standards organization that is a member of ISO may adopt ISO standards nationally. By this process – known as "national adoption" – the ISO standard becomes a national standard recognized in that country. In many cases, this adoption entails the translation of the English version into the national language.  As a result, ISO 3166-1 has been translated into a number of the larger languages utilizing non-ASCII scripts.  For example, ISO 3166-1 is national standard JIS X 0304 in Japan, and standard KS X 1510-1 in the Republic of Korea.

These qualifications lend support to the proposition that ISO might, in fact, be able to undertake the creation and maintenance of a non-ASCII extension to ISO 3166-1.  More significantly, they suggest that ICANN could rely on national adoptions of ISO 3166-1, where they exist.  For any given geographic unit recognized on ISO 3166-1, ICANN could refer to the relevant national adoption(s) to ascertain the appropriate string (which, by virtue of its presence on the version adopted locally by the national ISO member standards organization, should be acceptable to the local Internet community).  Where conflicts arise between national adoptions, ICANN could simply defer action until the national standards bodies and their respective governments act to resolve the issue.

[4] The terms "language," "script," and related concepts are commonly mis-defined and therefore misunderstood in discussions regarding internationalization of computers and the Internet.  Readers are referred to Paul Hoffman's Internet Draft entitled "Terminology Used in Internationalization in the IETF " for a comprehensive list of definitions.  Also, see the IANA language registry , which is defined by RFC 3066 .

Comments concerning the layout, construction and functionality of this site
should be sent to webmaster@icann.org .

Page Updated 14-Jun-2002
©2002  The Internet Corporation for Assigned Names and Numbers. All rights reserved.