Discussion Paper on Non-ASCII
TLD Policy Issues
Summary of Public Feedback
NOTE: This page quotes
excerpts from comments submitted via email in response to
the IDN Committee's Discussion
Paper on Non-ASCII TLD Policy Issues.
The quoted excerpts are presented in a structured manner, are meant
to be illustrative,
and are not necessarily full quotations. A revised version of the
paper is available here.
I. NAME SPACE CREATION APPROACH
Response from Kilnam Chon:
We are discussing on the name space creation in each language community
(or character code community), rather than extension of the ASCII name
space to "non-ASCII". To be more specific, we are dealing with the
domain name space, which is the proper subset of the name space.
The ASCII name space was created around 20 years ago under ARPANET Project
along with other domain name space development including one from JANET
in UK. We may consider these developments as prototypes, and we should
learn lessons from these early developments such as the .com problems
which include imbalance of the namespace.
Mere extension of ASCII name space to non-ASCII would not work as we are
finding out in Korean and other languages, and we need to approach this
issue as "name space development from scratch" for each language community.
II. STAKEHOLDER ROLES
Response from Kilnam Chon
We need to define what roles ICANN and other relevant parties to play
in the name space development in each language. Elliot Noss commented
in his message to Heathrow Declaration mailing list on the role of ICANN
as "... Think of ICANN almost as a bare infrastructure provider in this
context with all innovation taking place at the edge...."[NOSS]
See the complete paragraph in Appendix 1, or the complete message in the
mailing list. We need to look for other relevant parties such as local
language communities, and international organizations as well as current
developments in this area.
III. LANGUAGE ISSUES
Response from Kilnam Chon
Are we dealing with "language", or "language code", or "character", or
"character code"? For example in the case of Chinese, are we dealing
with Chinese language, or Chinese language code for DNS (in written form
if we can assume that we are not dealing with the verbal form, which may
be coming very soon.), or Chinese character which would include non-Chinese
language communities such as Japanese and Korean, or their codes.
I believe we need to deal with the latter case, the character (code) since
we have to share TLDs in "Chinese characters" among Chinese, Japanese
and Korean, rather than Chinese language community of China, Taiwan, Hong
Kong,.. as the IDN Discussion paper suggest. The situation is similar
in Arabic and other languages, too.
Response from Håvard Hjulstad (Chairman of ISO/TC37; Convener of
ISO/TC37/SC2/WG1)
My input relates primarily to the second issue, Languages.
There is one ISO standard, published in two parts, Codes for the representation
of names of languages. Part 1, which is currently a Final Draft with expected
publication date as an International Standard around mid-2002, replaces
ISO 639:1988 and gives alpha-2 identifiers for 182 languages. Part 2 (published
1998) gives alpha-3 identifiers for 446 languages. Both parts are updated
continuously (17 languages have actually been added to ISO 639-2 since
its publication). The list of approved identifiers are available on <http://lcweb.loc.gov/standards/iso639-2/langhome.html>.
That site will be updated fairly soon. You may wish to look at some of
the pages under
<http://www.hjulstad.com/havard/639-test/>. In particular
I should think that the PDF file <http://www.hjulstad.com/havard/639-test/total-list.pdf>
might be of interest.
PLEASE NOTE: The pages under <http://www.hjulstad.com/havard/639-test/>
have not been finalized. However, the tables of language names and identifiers
are reliable. Please bear in mind that this is a test, and that the pages
may be moved or removed without notice. If you don't get access to the
pages, I shall be happy to send the PDF file separately.
ISO 639 (both parts) state that the identifiers are composed of Latin
letters a-z only, with no diacritical marks. Although the standard (in
particular ISO 639:1988) has been translated into other languages (probably
also non-Latin-script languages) I am not aware that the identifiers in
any case have been rendered in other scripts. Parallel to the strong position
that Latin-script symbols (e.g. units of measurement and others) have
in non-Latin-script languages, I should think it unlikely that non-Latin-script
language identifiers are much used on an "authoritative level".
ISO 639 standardizes language identifiers, not language names. The names
in English and French, and the indigenous names are given for reference
and retrieval purposes. Conformance with ISO 639 does not require the
use of the name forms that are used in ISO 639. I should think that on
one level the lists of names in ISO 639 are "authoritative lists", although
they are not standardized in the strict sense.
ISO 639-1 gives indigenous names of the languages. This is not (yet) the
case for 639-2. However, it is recognized that it is of great interest
to many users that as many name forms as possible are registered. In the
PDF file (total-list.pdf) mentioned above, some languages that are not
in 639-1, but are in 639-2, have indigenous names. The intention is to
register even "indigenous names in indigenous script". The list in the
PDF file is so far very incomplete, but we are working on it. (Input from
any source is most welcome!)
Work is going on in the framework of ISO/TC37 to expand the coverage of
language identifiers greatly. It is recognized that at least 5000-6000
languages need to be included to meet the present and future requirements.
There is also a need to create mechanisms to encode language variation.
This work is currently in the planning stage, but ISO/TC37 recognizes
the importance to proceed with this work relatively rapidly.
The use of alpha-2 or (alpha-3) language identifiers as TLDs will have
certain problems. There is a fairly high level of coordination between
ISO 3166 (countries) and ISO 639, but some "conflicts" have been inevitable.
This is partly due to historical reasons, and partly because there are
so many more languages than countries. A simple example: SE is the alpha-2
country identifier (ISO 3166) for Sweden; sv is the alpha-2 language identifier
for Swedish, while se is the language identifier for Northern Sami (used
in Norway, Sweden, and Finland). Since there is no case-sensitivity in
TLDs, .se would in theory be ambiguous. A solution could be to include
a prefix or some other mechanism, making, e.g. .se Sweden and .lang-se
(or some other modifier) Northern Sami.
It is not on the current work plan for ISO/TC37/SC2/WG1 to create non-Latin-script
language identifiers. Theoretically one could think of an extension of
our work to add "indigenous-script identifiers" to non-Latin-script languages.
This could be extended e.g. to give Icelandic the "generic identifier"
is (which it currently is) and an "indigenous-script identifier" ís
(i-with-accute), representing the indigenous íslensk, and also
the "generic identifier" hi (current) and "indigenous-script identifier"
?? representing ????? The committee has no such plans, and I don't expect
that the committee will add this to its plans unless there should be concrete
user needs.
If your study and report reveals such needs, ISO/TC37/SC2/WG1 will take
it into serious consideration.
Response from Konstantin Vinogradov:
I hope an alphabet TLD (aTLD) concept may be a purely technical solution.
According to ISO10646/Unicode the primary scripts currently supported
are:
"Arabic Armenian Bengali Bopomofo Buhid Canadian Syllabics Cherokee Cyrillic
Deseret Devanagari Ethiopic Georgian Gothic Greek Gujarati Gurmukhi
Han Hangul Hanunoo Hebrew Hiragana Kannada Katakana Khmer Latin Lao Malayalam
Mongolian Myanmar Ogham Old Italic (Etruscan) Oriya Runic Sinhala
Syriac Tagalog Tagbanwa Tamil Telugu Thaana Thai Tibetan Yi".
I'm not sure all of them are "alphabets", but seems all alphabets are
here. An Alphabet Registry/Registrars should be organized to operate according
to ICANN policy.
A Cyrillic Registry, for example, should be responsible for technical
functions including coded representation for <aTLD> name in bulgarian,
mongolian, Russian and other cyrillic using languages.
An Alphabet Registrar should be responsible for particular language(s).
A Non-Ascii Domain Name Administrator should be responsible for cultural,
ethnicities and ethics issues.
IV. NON ASCII TLDS - A TAMIL PERSPECTIVE
Response from Mr. S. Maniam, Chairman WG03 of the International Forum
for IT in Tamil (INFITT) and Chairman MINC Tamil Language Working Group
Approved by Executive Director, INFITT.
Introduction
The purpose of this paper is to comment on the ICANN IDN Committee's preliminary
thinking on the Non-ASCII Top Level Domain from the perspective of the
Tamil language community.
The International Forum for IT in Tamil (INFITT) is the first and only
coordinated effort so far amongst the Tamil speaking diaspora to address
IT and Internet issues pertaining to the Tamil language and its script.
The INFITT coordinates a number of Working Groups and one of these working
groups is tasked to deal with issues related to Internet names, such as
Tamil Internet domain names. <http://www.infitt.org/>
History
The INFITT was formed through a culmination of several years' effort,
starting from the 1997 Tamil Internet conference which gathered IT and
language professionals from the Tamil speaking communities world wide
to the National University of Singapore.
Since then, we have had regular conferences in Chennai, Tamil Nadu, India
(1999), the home of the Tamil language, back in Singapore (2000) and in
Kuala Lumpur, Malaysia (2001), and this year, scheduled to be held in
California, USA (September 2002). INFITT therefore represents the global
leaders, movers and shakers of the Tamil IT communities and the language
experts associated with them.
Structure
INFITT is organized with representation from communities where Tamil is
spoken. The representation includes geographical localities such as the
home of Tamil language - the State of Tamil Nadu, India, countries where
Tamil is an official language e.g. Malaysia, Singapore, Sri Lanka, and
places where Tamil language speakers constitute a significant populations,
e.g. Mauritius, USA, Canada, Europe, Australasia, Middle East, etc.
The Language
The Tamil language is one of the world's oldest languages, and has been
referred to by many as an ancient language which still in common use.
The Script
The Tamil alphabet is unique. Like English, it is phonetic, where letters
represent sounds. The Tamil script is only used by the Tamil language
unlike other scripts such Arabic, used by Arabic, Urdu, Farsi, Pashto
languages or Chinese script, used by Chinese, Japanese and Korean languages,
or the ubiquitous Roman script.
Speakers of the Language and Users of the Script
The Tamil communities world wide are the speakers of the language, and
users of the Tamil script. In some instances, there is a transliteration
form of Tamil, where the Roman alphabet is used to represent the Tamil
script. In most instances, the Tamil script is the preferred official
version in almost all communities.
General Comments
Internet Domain Name Space Creation for Tamil script
Just as the current ASCII character domain name space has been created
by the original Internet community, and resides within the current root
server system, any creation of Tamil Domain Name Space using Tamil script
should evolve as a separate name space distinct from the other languages
despite the issues of using ASCII as an internal representation in the
DNS server backend, a procedure currently being discussed in the IETF
IDN Working Group. This procedure converts language script of all languages
currently found in the Unicode, into ASCII encoding format (ACE). The
IETF IDN WG intends for this ACE format to be distinguished from other
conventional ASCII domain names by using a special prefix to denote the
ACE encoding.
However, this should not lead to the confusion that the Tamil domain name
space is a so-called "mere extension of ASCII name space to non-ASCII".
This name space for Tamil (and for other languages) should be treated
as a separate name space with a separate process for its administration
and governance without interference from non-stakeholders and non-users
of the language. So long as the underlying ASCII ACE representation is
not altered or affected, there is no reason for interference, and the
Tamil name space operates as a separate autonomous space and should not
impact the administration, policies or governance of any other language
name space.
This non-interference principle is the basis of the comments herein.
Roles
As coordinating bodies, the role which ICANN and IANA should play would
be an infrastructure enabler. It is generally not thought that it should
be that of an infrastructure provider. This provider function is already
established by the constellation of root servers under the Root Server
Consortium. Whoever runs the coordination role of what goes into
the root server system should simply ensure non-collision of namespace,
effective administration and facilitation rather than the role of a bureaucracy,
a gatekeeper or a controller.
The role of the language community and its chosen representative body
which can be identified as responsible for a particular language/script
namespace would be to determine what names go into that namespace.
Therefore local language communities and the international organizations
that represent them will likely be the appropriate authorities to manage
and administer the namespace for a particular language based on standard
internationally acceptable practice.
Language
For the Tamil language, it is fairly a clear cut situation where only
one language community uses one set of scripts. The distinctive feature
is that no one sovereign country "owns" the Tamil script. Instead, Tamil
language speakers and Tamil Script users are scattered throughout the
world in the so-called Tamil Diaspora. Nevertheless, the source and home
of the Tamil language is still based in the state of Tamil Nadu, India,
as recognized almost universally.
The Tamil language situation is less complicated than in the situation
of Chinese characters commonly used amongst different language groups:
Chinese, Japanese and Korean. Like the Chinese language communities of
China, Taiwan, Hong Kong, where there is a similar Diaspora, the Tamil
language speakers may be found in local communities in many countries.
In response to the need to manage our own destiny in terms of Internet
and computing technologies, we have created the INFITT and designated
it as the body to deal with such issues as the latest technology offering
of Tamil domain names, amongst many other things.
V. NON ASCII TLDS – THE KOREAN PERSPECTIVE
Response from Kilnam Chon
After two years of study on Korean (domain) name space, we are coming
up with the following recommendations for the name space architecture
and design. Detailed information will be available in the workshop proceedings,
the meeting reports, and RFC-KR, the standard document for .kr.
"ccTLD"
We would have one each for South Korea(.kr) and North Korea(.kp).
We may develop regional TLDs for large Korean community outside of Korea.
Language Code TLD
We would have .wuri(in Korean language, which means "we" in Korean) to
represent .ko (or .kor or .korean) of the language code for Korean.
"gTLD"
We would start from the following set;
.kiup(~ .com)
.hoesa(~ .com)
.hakkyo(school)
.danche(~ .org)
.net (= .net)
We may have many more gTLDs if we "run out" of the name space in future.
We need to find out how many "decent" names does each TLD to support,
and this number would be different from ASCII TLDs.
Korean Language Community
We need to have "blessing" or consensus among Korean language community
which is around 75 million, and .kr would represent 45 million.
This
consensus development process would be time-consuming. Uneven development
on the Internet among different regions would require additional consideration.
Implementation
The implementation of the above Korean name space in the domain name space
and the non-domain name space would require additional consideration,
which we have to investigate and discuss with the relevant parties.
The testing is the obvious first step as we need to validate our name
space concept, and this would require substantial effort. Currently, we
are testing korean.kr as well as developing many relevant standards related
to Korean language.
VI. A CASE FOR IDN COMMITTEE PROCESS MODIFICATION
Response from Roger J. Cochetti Senior Vice President & Chief Policy
Officer, VeriSign
VeriSign is pleased to comment on the IDN Committee's April 16th Discussion
Paper on Non-ASCII Top-Level Domain Policy Issues and we look forward
to continuing to work with the Committee in this important area. As the
Committee knows, VeriSign has been a strong supporter of efforts to make
electronic identity services, including Internet domain name and other
services, friendly to people whose native language does not use the ASCII
character set. And we are a global leader in many efforts to make the
Internet more friendly to people who use non- ASCII-based languages. These
efforts are important because in the 21st century, to be competitive and
successful, nations, linguistic groups, peoples, and individuals will
need to be network-empowered and this empowerment should not be reserved
to just those whose language includes ASCII characters.
An important part of the global effort to network-empower those whose
language does not include ASCII characters is making the Internet more
friendly to non-ASCII users. And an important part of the effort to make
the Internet more friendly to non-ASCII users is making the domain name
system more friendly to non-ASCII users. The most important, and by far
the most complex, part of this task lies in making Top-Level Domains more
friendly to non-ASCII communities.
Regardless of anyone's preferences, the Top-Level Domain space today is,
and is designed for ASCII characters. Any effort to change that structure
involves enormous and inter-related technological, legal, operational,
political, cultural, intellectual property and economic issues. VeriSign
is committed to helping address these issues because the benefits of doing
so are important; but no one should underestimate the complexity of doing
so. The technical standards issues alone have been the subject of intense
expert discussion for several years and are likely to continue to be actively
discussed for some time to come.
In this context, we offer two comments on the process before the Committee:
First, few believe that the myriad of inter-related issues that need to
be addressed before non-ASCII-based TLDs are introduced can be addressed
effectively in a matter of months. Because of the range of inter-related,
complex issues involved, we believe that the Committee should address
the issue of non-ASCII TLDs in a deliberate and disciplined way. The Committee
should solicit input on specific questions from legal experts, linguistic
experts, technological experts, intellectual property experts, cultural
and national interests, service providers, and end-users. This process
should proceed in a careful and deliberate manner, giving expert groups
adequate time to consider each other's input.
Instead, the Committee has posted a comprehensive set of questions on
the ICANN Website and asked for initial comments on all questions at once
within a few weeks.
We think the goal of reaching comprehensive conclusions in about ten weeks
through two Web-postings should be reconsidered. We urge the Committee
to consider the benefits of soliciting expert input in a deliberate way,
and then sharing that input among various experts. This approach would
benefit the quality of the Committee's work and aid the consensus development
processes. As each question is examined, the Committee should not just
publish a request for comments on the ICANN Website. It should actively
solicit input and analyses from expert technological, legal, policy, linguistic,
operator, intellectual property, and other groups. This expert input will
help the Committee understand the inter-relationship between various solutions.
The added time necessary for a thorough and deliberate process of this
sort -- compared with an effort to develop a comprehensive blueprint in
ten weeks-- will be more than made up in the lasting value of the results.
Second, a useful starting point for such a deliberate exercise could be
to take the existing TLDs and try to understand the many issues associated
with their extension to non-ASCII character sets. The solution to this
complex question could provide a framework for the examination of other
questions, such as the possible addition of altogether new non-ASCII TLDs.
In this regard, as the Committee has noted, there are hundreds of TLDs
in use today and there are arguably dozens of non-ASCII character-based
languages in use. Simply extending existing TLDs to one non-ASCII character
set would raise major, interrelated operational, legal, technical, policy,
linguistic, cultural, and other issues. The Committee should start by
trying to address the question of extending existing TLDs and then move
on to other questions.
For example, the Committee would learn a great deal by pursuing in depth
an answer to a fundamental question such as "If the TLD space were to
be initially extended to a very small number of non-ASCII character sets,
how should the selection of a small number of non-ASCII character sets
be made and/or which non-ASCII character sets should be selected?" A question
like this one could be submitted to expert technical/standards groups;
expert legal groups; expert linguistic groups; expert intellectual property
groups; governments; cultural or ethnic groups; ccTLD operators; gTLD
operators; and organizations or individuals who are expert in the commercial
and non-commercial use of the Internet
Undoubtedly, the debate over a question like this would be complex; but
no less complex than attempting to address an even larger number of difficult
questions at once. The benefit of the more narrow approach, however, is
that to the extent that a solution emerges, significant and lasting progress
will have been made, on which further progress can be based. Since the
issues that the Committee is examining raise controversial cultural, political,
economic, legal, technological operational and other questions, it is
important that it exercise as much discipline as possible in managing
its efforts and that it rely as much as possible on objective criteria
and expert input.
We look forward to working with you in that effort.
VII. A CASE FOR NON ASCII TLDS SEMANTICALLY LINKED
TO GEOGRAPHY ONLY
Response from Yoshiro Yoneya
Following is my personal thought to the IDN TLD. At this moment,
I only support the first category (geographic units).
- Introducing IDN TLD should be tried positively.
- Introduction of the IDN TLD should be prioritized to the root zone implementation
testing, and the classification should be discussed with the result
of the testing.
- The root zone implementation testing should be focused on technical
points of view.
+ Operation of the root DNS servers
+ Operation of the IDN registries
+ Accumulation of the IDN name resolution (resolve) HowTo
- Therefore, registries that meet following conditions would be
appropriate to define IDN TLDs while testing.
+ Have intention to cooperate ICANN IDN TLD testing actively
+ Already registering IDN
(Already have IDN registration mechanism)
+ Have clear definition of the Language of the registering IDN
+ Have name resolution mechanism to the registered IDNs
(Possible to map the the existing IDN name space under
current TLD
to the testing IDN TLD)
I think discussion of another categories will be able to start after the
evaluation of the testing.
Comments concerning the layout, construction
and functionality of this site
should be sent to webmaster@icann.org
.
Page Updated
26-Jun-2002
©2002 The Internet Corporation
for Assigned Names and Numbers. All rights reserved.
|