Discussion Paper on Non-ASCII Top-Level
Domain Policy Issues
16 April 2002
INTRODUCTION
The purpose of this discussion paper is to outline the ICANN IDN Committee's
current preliminary thinking with respect to strategies for the selection
of non-ASCII top-level domains (TLDs).* As
such, the paper deals with a speculative future that is possible, but
by no means guaranteed: namely, a future in which the IDNA standard currently
under consideration by the IETF becomes deployable at the TLD level. For
that future to become reality, a number of contingencies would have to
occur, including rigorous testing of non-ASCII TLD strings (meaning, more
accurately, their ASCII LDH encodings, which would typically not be visible
to users) in the root zone environment. As part of its charter, however,
the ICANN IDN Committee is tasked with anticipating the policy issues
that would arise if and when ICANN confronts demonstrably deployable non-ASCII
TLDs.
With that caveat in mind, this paper outlines the broad policy analysis
undertaken by the committee to date, including a preliminary framework
for selection of registry operators for non-ASCII TLDs. The paper reviews
some advantages and disadvantages of each of the proposed approval mechanisms.
It concludes by outlining a process designed to garner ICANN community
feedback and commentary on sequential drafts of this paper before the
Committee submits its final report to the ICANN Board later this calendar
year.
The committee welcomes feedback on the contents of this discussion paper.
Because the concept and technical implementations of Internationalized
Domain Names are 'works in progress', the committee emphasizes that the
information and views contained in this discussion paper may become inaccurate
or outdated. Comments should be sent to
idn-comment@icann.org.
In simple terms, the Committee's current thinking focuses on extending
to the IDN namespace existing policies and concepts for the creation of
ASCII generic TLDs (gTLDs) and ASCII country-code TLD (ccTLDs), which
have been developed and refined over time, while giving due consideration
to additions and variations in policy to take into account unique factors
related to the use of non-ASCII characters within the DNS.
FRAMEWORK
This paper suggests a tentative framework for classifying possible non-ASCII
TLDs. The rationale for classification is a recognition that different
types of TLDs (whether ASCII or non-ASCII) may require different selection
mechanisms, depending upon the peculiar policy considerations that arise
in connection with each category. For example, as elaborated below, current
ICANN policy provides different selection mechanisms for ccTLDs and gTLDs.
A comprehensive selection and implementation process for non-ASCII TLDs
would include a number of steps, including:
- Finalization of IDNA standard;
- Root zone implementation testing;
- Selection of registry operators; and
- Registry-level testing and deployment.
It is anticipated that the ICANN IDN Committee will publish a further
paper that considers the issue of potential non ASCII TLD selection processes
and criteria in the near future.
In thinking about the creation of non-ASCII TLDs, the committee began
with the existing ICANN policy baseline for the creation of new TLDs,
including the principles that (1) TLD expansion should occur in a
careful and controlled fashion, with regard for the overall stability
of the DNS; (2) the sudden introduction of a massive number of new
TLDs would be a bad idea; (3) a new TLD can only be created if there
is a willing and able registry operator to run it; and (4) different
categories of new TLDs may require different contractual, policy, and
selection frameworks; and (5) the selection process should be transparent,
allowing key stakeholders to participate in it.
The committee has generally agreed on two additional principles that
relate specifically to non-ASCII TLDs, namely: (6) the core purpose
of introducing non-ASCII TLDs would be to make the DNS service easier
to use for Internet users whose native languages include non-ASCII characters;
and (7) a new TLD should only be introduced if user demand can be
demonstrated to exist.
For purposes of this discussion paper, the committee takes as its hypothesis
the notion that potential new TLDs should logically be classified according
to the evident semantic meaning of the TLD string itself. In other words,
a given TLD string should be understood as having a certain intended meaning,
should be classified accordingly, and should be matched to an appropriately
tailored selection process.
The Committee notes that the potential for semantic ambiguity exists
when different non ASCII TLDs are interpreted by Internet users who speak
different languages. While every reasonable attempt will be made to minimize
such ambiguities, it is impossible to eliminate this problem altogether.
The concept of semantic meaning is central to this analysis. For example,
the .info TLD was treated as global and unrestricted, on the basis of
the semantic link between the term "info" and the concept of "information."
The .biz TLD is restricted to business registrants, on the basis of the
semantic link between the term "biz" and the concept of "business." The
.museum TLD is restricted to museum registrants, on the basis of the semantic
link between the term "museum" and the community of "museums." In the
realm of the country-code TLDs, there is a semantic link between the assigned
two-letter codes and the name of the associated geographic unit. Thus,
.no is semantically linked to Norway, .us to the United States, .za to
South Africa, and .ch to Switzerland. Under longstanding IANA policy (see
RFC-1591
and ICP-1 ), these two-letter codes are
taken strictly from the ISO-3166-1 table (with a few historical exceptions),
meaning that IANA/ICANN stays out of the business of determining what
is and is not a country (or geographically distinct territory), and what
name or abbreviation is semantically associated with any given geographic
unit. Those problems are properly excluded from the ICANN process, and
resolved by a politically expert, internationally-recognized body, the
International Organization for Standardization and its
ISO 3166 Maintenance Agency.
Following from these premises, the committee undertook to imagine the
DNS namespace as a whole (including ASCII and non-ASCII TLDs), and to
attempt a general (and probably incomplete) classification of potential
TLD strings, oriented around semantic meaning.
The committee identified the following basic categories of potential
TLD strings, based on the semantic meaning of the string itself:
1. Semantic association with Geographic Units
2. Semantic association with Languages
3. Semantic association with Cultural Groups or Ethnicities
4. Semantic association with Existing Sponsored TLDs
5. Semantic association with Existing Unsponsored TLDs
6. Everything else.
More detailed definitions of these categories, as currently understood
by the committee, are set out below.
In
the following sections, the committee identifies some specific
questions for comment. However, the committee does not
intend to limit comment and input to these questions alone.
The committee invites general comment on any aspect of the issues
raised in this paper.
|
1. GEOGRAPHIC UNITS
Definition:
By "semantic association with recognized geographic units", we mean
a TLD string that to a typical reader would be clearly linked to a recognized
geographic unit, as is the case with the existing ASCII ccTLDs.
A non-ASCII TLD consisting of Japanese characters semantically associated
with the recognized geographic unit of Japan would logically be <日本>.
Advantages:
- The existing ISO-3166-1 table provides a clear definition of recognized
geographic units. Use of that table allows ICANN to avoid the
problem of deciding what is and it not a country or geographically distinct
territory. It would be easy for ICANN to determine whether or
not the table includes the geographic unit with which a given proposed
TLD is associated.
- The process for delegating ASCII ccTLDs is well-defined, and could
simply be extended to the non-ASCII space. Under the terms of
RFC-1591 and ICP-1, an entity seeking to run a non-ASCII TLD semantically
associated with a recognized geographic unit would have to submit a
proposal demonstrating wide support among the Internet stakeholders
within that defined area, and establishing the technical competence
of the proposed registry operator. In the interest of transparency
and the testing of consensus, proposals for this category of non-ASCII
TLD could be posted publicly, and views sought from major stakeholders
(including, for example, ISP associations, the ASCII ccTLD registry
operator, and the government).
- In the area of geography, the use of the ISO-3166-1 provides certainty
both by inclusion and exclusion. In other words, ICANN can maintain
that only the geographic units recognized on that table are available
for non-ASCII TLDs with semantic association; the names of cities, counties,
provinces, and other geographic units would implicitly be excluded.
This principle would minimize ICANN/IANA having to make judgments about
the sovereign authority over a given geographic area.
- The use of the principles of RFC-1591 and ICP-1 in the context of
non-ASCII TLDs would also clarify that any TLD string semantically associated
with a given geographic unit would not be eligible for delegation (regardless
of the language or script) except under those principles. In other
words (by way of hypothetical example), the local Internet community
in New Kryptonia is the relevant decision maker for any terms semantically
associated (within reason) with its officially recognized name, whether
the string in question is in English, Japanese, Arabic, or Kryptonian.
- A limit to the total number of TLDs eligible for delegation to a given
geographic unit might be set at a number equal to the number of its
official languages.
Disadvantages:
- The ISO-3166-1 table solves the problem of what is and is not a recognized
geographic unit (country or geographically distinct territory).
However, the table only provides two- and three-letter ASCII codes for
each such geographic unit. The table does not solve the problem
of what non-ASCII names (or abbreviations) should be assigned to each
recognized geographic unit. This is a very serious disadvantage
that might be approached in three ways:
(1) Locate (or ask a properly legitimate body like the ISO to create)
a table equivalent to ISO-3166-1, expanded to include names in all
non-ASCII scripts. This would be a massive undertaking with
enormous political complications.
(2) Allow proposers to designate their desired non-ASCII TLD string,
and rely upon a requirement of consensus within a local Internet community
to determine whether the proposed TLD string is appropriate.
This could potentially place on ICANN a massive and unsustainable
set of political burdens for which it is not well suited.
(3) Ask ICANN's Governmental Advisory Committee to sort the issue
out. This option may be very unappealing to the GAC for similar
reasons.
Questions for Comment:
1. Current ICANN/IANA policy permits the delegation of ASCII
ccTLDs only when a given geographic unit and its associated
codes appear on the ISO 3166-1 list. The ISO-3166-1 list
defines not only recognized countries and territories, but also
the specific 2-letter ASCII codes associated with each.
No such authoritative list exists to define country/territory
names in non-ASCII characters, leaving ICANN without a given
reference point. How should this problem be handled?
Should it be left to each proposer to identify the desired TLD
string and to justify it? If so, what criteria should
be applied to evaluate the proposed string's suitability for
(and acceptability to) the given geographic unit? If not,
to what neutral and authoritative arbiter should this problem
be referred?
2. Current ICANN/IANA policy requires proposers for new
ASCII ccTLDs to demonstrate support from the local Internet
community. How can this requirement be objectively evaluated
in the case of a proposed non-ASCII TLD semantically associated
with a recognized geographic unit? What indicators of
support should be expected or required?
3. What role, if any, should be played by the manager of the
existing ASCII ccTLD for that geographic unit?
4. What role, if any, should be played by governments in the
selection of non-ASCII TLDs associated with its country/territory?
|
2. LANGUAGES
Definition:
By "semantic association with language"**,
we mean a TLD string that to a typical reader would be clearly linked
to the name of a language for example, the Arabic word for "Arabic."
Advantages:
- If the objective of IDNs is to enable users to easily type domain
names in familiar, non-ASCII scripts (while preserving universal uniqueness
and resolvability), it might be easiest to simply create a single TLD
for each non-ASCII script, allowing the registry operator to make decisions
about lower-level naming conventions.
- A language-associated TLD string may assist in the development of
global language-based Internet communities, particularly where the language
speakers are widely distributed around the world, for example, the various
Cambodian-speaking communities.
Disadvantages:
- In a sense, a language-associated TLD string would be redundant:
if a domain name consists of Chinese characters, a user might find no
added value to require the term "Chinese" (in Chinese characters) as
the final label.
- There does not appear to be a recognized list of all human languages
analogous to the ISO-3166-1 table.
- Language communities cross sovereign national boundaries. The
problem of identifying and achieving consensus among the stakeholders
of a given set of language communities may be extremely difficult.
ICANN/IANA might be left with competing claims backed by different stakeholders,
or, worse, different national governments. ICANN/IANA is not well-suited
to resolve those kinds of disputes.
- Languages are the products of thousands of years of history, generate
tremendous emotional attachments among people, and have been sources
of enormous political controversy. For these and other reasons,
the IETF has focused exclusively on characters and code points, not
on languages.
- Attempting to create TLDs semantically linked to languages would raise
a large number of extremely difficult political problems. The
Chinese-speaking community, for example, includes 1.2 billion people
in mainland China, 22 million in Taiwan, and millions more in Singapore,
Malaysia, and the United States. To select a single registry operator
in such a complicated and sensitive political environment might simply
be an impossible task for a technical coordinating body like ICANN.
Questions for Comment:
1. Is there any recognized, authoritative reference
list of languages (analogous to ISO 3166-1) that could be employed
as a reference against which to judge proposals for language-associated
non-ASCII TLDs?
2. The relevant community of interest for a given language
is the set of all speakers of that language. Is it likewise
correct that the relevant community to be served by a given
language-associated non-ASCII TLD would be the set of all speakers
of the languages that utilize the characters that comprise the
proposed language-associated non-ASCII TLD string? If
not, how should ICANN define the community to be served by the
manager of a language-associated TLD?
3. Is it correct that a non-ASCII TLD semantically associated
with the name of a language is essentially redundant, given
that the domain is by its very nature an expression of that
language?
4. Languages are often spoken across territorial boundaries
and under different and sometimes hostile governments, making
the usual requirement of community consensus extremely difficult
to establish or document. Given the near-impossibility
of demonstrating, for example, a consensus among all the worldwide
Internet communities consisting of Arabic speakers, should this
category simply be excluded as a matter of policy pragmatism?
If not, how could "consensus" across many territorial boundaries
be objectively demonstrated?
5. What role, if any, should by played by governments
in the selection of non-ASCII TLDs semantically associated with
languages?
6. What role, if any, should by played by recognized language
authorities (for example, l'Académie française)
in the selection of non-ASCII TLDs semantically associated with
languages?
|
3. CULTURAL GROUPS / ETHNICITIES
Definition:
By "semantic association with a cultural group or ethnicity," we mean
a TLD string that to a typical reader would be clearly linked to a cultural
group or ethnicity that is not defined or limited by recognized national
boundaries for example, the Kurdish or Swahili peoples.
Disadvantages:
- All of the problems with language-associated TLDs would apply to this
set of TLD strings, as well. There appears to be no internationally-recognized
and legitimate list of cultural and ethnic groups. These groups cross
national borders. The attempted to identify and verify consensus among
stakeholders in these communities would be extremely difficult.
- As with languages, the names of cultures and ethnicities are the subjects
of great emotion and, often, political controversy. For ICANN even to
consider creating TLD semantically associated with such groups would
be to invite a storm of controversy.
Questions for Comment:
1. Is there any recognized, authoritative reference list
of cultures and ethnicities (analogous to ISO 3166-1) that could
be employed as a reference against which to evaluate proposals
for non-ASCII TLD strings semantically linked to them?
2. How should we define the community to be served by the manager
of a non-ASCII TLD semantically linked to a culture or ethnicity?
3. Given the vast range of complicated political controversies
and the difficulty of demonstrating, for example, a consensus
among all members of a given culture or ethnicity, should this
category simply be excluded as a matter of policy pragmatism?
If not, how could ICANN avoid being caught in the middle of
innumerable emotionally charged and politically sensitive fights?
4. What role, if any, should by played by governments in the
selection of non-ASCII TLDs semantically linked to a culture
or ethnicity?
|
4. EXISTING SPONSORED gTLDs
Definition:
By "semantic association with an existing sponsored gTLD," we mean
a non-ASCII TLD string that to a typical reader would be clearly linked
to an existing ASCII sponsored TLD. Currently, that list includes
.aero, .coop, and .museum. For purposes of this paper, the category
arguably includes .edu and .int also. A example of a non-ASCII
TLD semantically associated with .museum would be the TLD string consisting
of the Hangul (Korean) characters meaning "museum" in Korean.
The issue with this group is simple: Should existing registry
sponsors have a right to TLD strings in non-ASCII characters with the
same semantic meaning as their ASCII TLD string? Should they be
given a preference, or be treated the same as any other proposer?
Advantages of incumbent preference:
- Giving a preference for equivalent non-ASCII strings to existing ASCII
sponsored registries would be simple, and somewhat logical. Once
ICANN has concluded that a given sponsor is an appropriate proxy and
policymaker on behalf of the community to be served by the TLD, it could
logically be considered to have equivalent legitimacy across non-ASCII
TLD strings as well.
Disadvantages of incumbent preference:
- The selection of TLD registries is complicated. A representative
sponsor for an ASCII TLD may not be best for all communities, particularly
where the scope of a given script's use is highly localized. For
that reason, it may be best to place no hard rights or prohibitions
on the allocation of TLD strings with semantic association to existing
sponsored TLDs, treating them as any other new TLD, open to any proposer,
but also allowing the existing ASCII TLD registry sponsors to present
proposals for equivalent non-ASCII TLDs, with relevant justification
for their role.
- Each TLD string should be treated differently, and should be open
for proposals to any potential registry operator that can establish
the basic requirements for a sponsored TLD, including support within
the community to be served.
Questions for Comment:
What process and criteria should be employed to select non-ASCII
TLDs semantically associated with an existing sponsored gTLD?
Are any special considerations required?
|
5. EXISTING UNSPONSORED gTLDs
Definition:
By "semantic association with existing unsponsored gTLDs", we mean
a non-ASCII TLD string that to a typical reader would be clearly linked
to the an existing unsponsored ASCII gTLD, such as .com, .net, .org,
.info, .biz, or .name.
Here again, the issue is whether to give any advantage to the current
registry operators of the existing ASCII unsponsored gTLDs.
Advantages of incumbent preference:
- There appear to be no significant advantages, other than for the existing
registry operators themselves.
Disadvantages of incumbent
preference:
- Given the generic nature
of the terms at issue, and the wide-ranging complexities of meaning
across languages, it would be extremely difficult to determine which
non-ASCII words and abbreviations should qualify for the preference;
indeed, it is not clear why the preference would not also logically
extend to ASCII TLD strings, such as "company" for the .com registry
operator.
- The principles of registry-level competition and geographic distribution
of registries both argue strongly against giving any preference to existing
ASCII gTLD registry operators. Such a preference would promote
market concentration, rather than competition.
Questions for Comment:
What process and criteria should be employed to select non-ASCII
TLDs semantically associated with an existing unsponsored gTLD?
Are any special considerations required?
|
6. EVERYTHING ELSE
Definition:
In this category, we mean to include every non-ASCII word, abbreviation,
or other string that is not semantically associated with one of the
above 5 categories.
Analysis:
For this category (and perhaps in categories 4 and
5, above), the committee concludes that no distinctions
should be drawn between ASCII and non-ASCII new TLDs the process
that is used for new ASCII TLDs (as refined over time and with experience)
should equally apply to new non-ASCII TLDs (once the technical standard
is completed and deployable at the TLD level).
The key elements of that process are: open call for proposals,
defined criteria for selection, independent review by technical and
financial experts, and full transparency of all proposals. The
committee sees no reason why these elements could not apply equally
to non-ASCII TLD proposals, with some added criteria for selection,
perhaps focusing on the proposed registry's plans to meet the needs
of (and make policy for) the language communities to be served by a
given TLD string in a given script.
Questions for Comment:
What process and criteria should be employed to select non-ASCII
TLDs that do not fall into one of the defined categories above
in other words, non-ASCII gTLDs?
|
NEXT STEPS
The ICANN IDN Committee tentatively
proposes to undertake the process outlined below to ensure that key ICANN
community stakeholders are given ample opportunity to comment on this
paper during the course of the Committee's deliberations over the
ensuing months.
Interested stakeholders should make particular note of the two public
comment periods detailed below where the Committee seeks feedback from
stakeholders regarding the first and second drafts of this paper. Comments
for the Committee's consideration should be sent to
idn-comment@icann.org.
17 April 2002 - Publish Issue Paper for non ASCII TLDs.
17-30 April - First Public Comment Period.
10 May - Publish Second Draft of Issue Paper for non ASCII TLDs.
10-30 May - Second Public Comment Period.
17 June - Publish IDN Committee Report to the Board
24-29 June - Discussion of IDN Committee Report by ICANN Community
& Board (Bucharest)
FOOTNOTES
* By "non-ASCII TLDs" we mean top-level domains that
include in the TLD string itself characters other than the currently
allowed ASCII "LDH code points" repertoire (meaning the code points
associated with ASCII letters, digits, and the hyphen-minus; that is,
U+002D, 30..39, 41..5A, and 61..7A). "LDH" is a commonly-used abbreviation
for "letters, digits, hyphen." In this paper, we use the term
"non-ASCII" as a shorthand for "characters other than ASCII LDH."
** The terms "language," "script," and related concepts
are commonly mis-defined and therefore misunderstood in discussions
regarding internationalization of computers and the Internet.
Readers are referred to Paul Hoffman's Internet Draft entitled
"Terminology Used in Internationalization in the IETF"
for a comprehensive list of definitions. Also, see the
IANA language registry, which is defined by RFC
3066.
Comments concerning the layout, construction
and functionality of this site
should be sent to webmaster@icann.org
.
Page Updated
17-Apr-2002
©2002 The Internet Corporation for Assigned
Names and Numbers. All rights reserved.
|