ICANN : IDN Committee : Discussion Paper on non-ASCII Top-Level Domain Policy Issues

Discussion Paper on Non-ASCII TLD Policy Issues

Summary of Public Feedback

[ General Comments & Responses | Specific Comments & Responses | Structured Question Comments ]

NOTE: This page quotes excerpts from comments submitted via email in response to
the IDN Committee's Discussion Paper on Non-ASCII TLD Policy Issues.
The quoted excerpts are presented in a structured manner, are meant to be illustrative,
and are not necessarily full quotations. A revised version of the paper is available here.

I. NAME SPACE CREATION APPROACH

Response from Kilnam Chon:

We are discussing on the name space creation in each language community (or character code community), rather than extension of the ASCII name space to "non-ASCII". To be more specific, we are dealing with the domain name space, which is the proper subset of the name space.

The ASCII name space was created around 20 years ago under ARPANET Project along with other domain name space development including one from JANET in UK. We may consider these developments as prototypes, and we should learn lessons from these early developments such as the .com problems which include imbalance of the namespace.

Mere extension of ASCII name space to non-ASCII would not work as we are finding out in Korean and other languages, and we need to approach this issue as "name space development from scratch" for each language community.

II. STAKEHOLDER ROLES

Response from Kilnam Chon

We need to define what roles ICANN and other relevant parties to play in the name space development in each language. Elliot Noss commented in his message to Heathrow Declaration mailing list on the role of ICANN as "... Think of ICANN almost as a bare infrastructure provider in this context with all innovation taking place at the edge...."[NOSS] See the complete paragraph in Appendix 1, or the complete message in the mailing list. We need to look for other relevant parties such as local language communities, and international organizations as well as current developments in this area.

III. LANGUAGE ISSUES

Response from Kilnam Chon

Are we dealing with "language", or "language code", or "character", or "character code"? For example in the case of Chinese, are we dealing with Chinese language, or Chinese language code for DNS (in written form if we can assume that we are not dealing with the verbal form, which may be coming very soon.), or Chinese character which would include non-Chinese language communities such as Japanese and Korean, or their codes.

I believe we need to deal with the latter case, the character (code) since we have to share TLDs in "Chinese characters" among Chinese, Japanese and Korean, rather than Chinese language community of China, Taiwan, Hong Kong,.. as the IDN Discussion paper suggest. The situation is similar in Arabic and other languages, too.

Response from Håvard Hjulstad (Chairman of ISO/TC37; Convener of ISO/TC37/SC2/WG1)

My input relates primarily to the second issue, Languages.

There is one ISO standard, published in two parts, Codes for the representation of names of languages. Part 1, which is currently a Final Draft with expected publication date as an International Standard around mid-2002, replaces ISO 639:1988 and gives alpha-2 identifiers for 182 languages. Part 2 (published 1998) gives alpha-3 identifiers for 446 languages. Both parts are updated continuously (17 languages have actually been added to ISO 639-2 since its publication). The list of approved identifiers are available on <http://lcweb.loc.gov/standards/iso639-2/langhome.html>. That site will be updated fairly soon. You may wish to look at some of the pages under
<http://www.hjulstad.com/havard/639-test/>. In particular I should think that the PDF file <http://www.hjulstad.com/havard/639-test/total-list.pdf> might be of interest.

PLEASE NOTE: The pages under <http://www.hjulstad.com/havard/639-test/> have not been finalized. However, the tables of language names and identifiers are reliable. Please bear in mind that this is a test, and that the pages may be moved or removed without notice. If you don't get access to the pages, I shall be happy to send the PDF file separately.

ISO 639 (both parts) state that the identifiers are composed of Latin letters a-z only, with no diacritical marks. Although the standard (in particular ISO 639:1988) has been translated into other languages (probably also non-Latin-script languages) I am not aware that the identifiers in any case have been rendered in other scripts. Parallel to the strong position that Latin-script symbols (e.g. units of measurement and others) have in non-Latin-script languages, I should think it unlikely that non-Latin-script language identifiers are much used on an "authoritative level".

ISO 639 standardizes language identifiers, not language names. The names in English and French, and the indigenous names are given for reference and retrieval purposes. Conformance with ISO 639 does not require the use of the name forms that are used in ISO 639. I should think that on one level the lists of names in ISO 639 are "authoritative lists", although they are not standardized in the strict sense.

ISO 639-1 gives indigenous names of the languages. This is not (yet) the case for 639-2. However, it is recognized that it is of great interest to many users that as many name forms as possible are registered. In the PDF file (total-list.pdf) mentioned above, some languages that are not in 639-1, but are in 639-2, have indigenous names. The intention is to register even "indigenous names in indigenous script". The list in the PDF file is so far very incomplete, but we are working on it. (Input from any source is most welcome!)

Work is going on in the framework of ISO/TC37 to expand the coverage of language identifiers greatly. It is recognized that at least 5000-6000 languages need to be included to meet the present and future requirements. There is also a need to create mechanisms to encode language variation. This work is currently in the planning stage, but ISO/TC37 recognizes the importance to proceed with this work relatively rapidly.

The use of alpha-2 or (alpha-3) language identifiers as TLDs will have certain problems. There is a fairly high level of coordination between ISO 3166 (countries) and ISO 639, but some "conflicts" have been inevitable. This is partly due to historical reasons, and partly because there are so many more languages than countries. A simple example: SE is the alpha-2 country identifier (ISO 3166) for Sweden; sv is the alpha-2 language identifier for Swedish, while se is the language identifier for Northern Sami (used in Norway, Sweden, and Finland). Since there is no case-sensitivity in TLDs, .se would in theory be ambiguous. A solution could be to include a prefix or some other mechanism, making, e.g. .se Sweden and .lang-se (or some other modifier) Northern Sami.

It is not on the current work plan for ISO/TC37/SC2/WG1 to create non-Latin-script language identifiers. Theoretically one could think of an extension of our work to add "indigenous-script identifiers" to non-Latin-script languages. This could be extended e.g. to give Icelandic the "generic identifier" is (which it currently is) and an "indigenous-script identifier" ís (i-with-accute), representing the indigenous íslensk, and also the "generic identifier" hi (current) and "indigenous-script identifier" ?? representing ????? The committee has no such plans, and I don't expect that the committee will add this to its plans unless there should be concrete user needs.

If your study and report reveals such needs, ISO/TC37/SC2/WG1 will take it into serious consideration.

Response from Konstantin Vinogradov:

I hope an alphabet TLD (aTLD) concept may be a purely technical solution. According to ISO10646/Unicode the primary scripts currently supported are:

"Arabic Armenian Bengali Bopomofo Buhid Canadian Syllabics Cherokee Cyrillic Deseret Devanagari Ethiopic Georgian Gothic Greek Gujarati Gurmukhi Han Hangul Hanunoo Hebrew Hiragana Kannada Katakana Khmer Latin Lao Malayalam Mongolian Myanmar Ogham Old Italic (Etruscan) Oriya Runic Sinhala Syriac Tagalog Tagbanwa Tamil Telugu Thaana Thai Tibetan Yi".

I'm not sure all of them are "alphabets", but seems all alphabets are here. An Alphabet Registry/Registrars should be organized to operate according to ICANN policy.

A Cyrillic Registry, for example, should be responsible for technical functions including coded representation for <aTLD> name in bulgarian, mongolian, Russian and other cyrillic using languages.

An Alphabet Registrar should be responsible for particular language(s). A Non-Ascii Domain Name Administrator should be responsible for cultural, ethnicities and ethics issues.

IV. NON ASCII TLDS - A TAMIL PERSPECTIVE

Response from Mr. S. Maniam, Chairman WG03 of the International Forum for IT in Tamil (INFITT) and Chairman MINC Tamil Language Working Group Approved by Executive Director, INFITT.

Introduction

The purpose of this paper is to comment on the ICANN IDN Committee's preliminary thinking on the Non-ASCII Top Level Domain from the perspective of the Tamil language community.

The International Forum for IT in Tamil (INFITT) is the first and only coordinated effort so far amongst the Tamil speaking diaspora to address IT and Internet issues pertaining to the Tamil language and its script. The INFITT coordinates a number of Working Groups and one of these working groups is tasked to deal with issues related to Internet names, such as Tamil Internet domain names. <http://www.infitt.org/>

History

The INFITT was formed through a culmination of several years' effort, starting from the 1997 Tamil Internet conference which gathered IT and language professionals from the Tamil speaking communities world wide to the National University of Singapore.

Since then, we have had regular conferences in Chennai, Tamil Nadu, India (1999), the home of the Tamil language, back in Singapore (2000) and in Kuala Lumpur, Malaysia (2001), and this year, scheduled to be held in California, USA (September 2002). INFITT therefore represents the global leaders, movers and shakers of the Tamil IT communities and the language experts associated with them.

Structure

INFITT is organized with representation from communities where Tamil is spoken. The representation includes geographical localities such as the home of Tamil language - the State of Tamil Nadu, India, countries where Tamil is an official language e.g. Malaysia, Singapore, Sri Lanka, and places where Tamil language speakers constitute a significant populations, e.g. Mauritius, USA, Canada, Europe, Australasia, Middle East, etc.

The Language

The Tamil language is one of the world's oldest languages, and has been referred to by many as an ancient language which still in common use.

The Script

The Tamil alphabet is unique. Like English, it is phonetic, where letters represent sounds. The Tamil script is only used by the Tamil language unlike other scripts such Arabic, used by Arabic, Urdu, Farsi, Pashto languages or Chinese script, used by Chinese, Japanese and Korean languages, or the ubiquitous Roman script.

Speakers of the Language and Users of the Script

The Tamil communities world wide are the speakers of the language, and users of the Tamil script. In some instances, there is a transliteration form of Tamil, where the Roman alphabet is used to represent the Tamil script. In most instances, the Tamil script is the preferred official version in almost all communities.

General Comments

Internet Domain Name Space Creation for Tamil script

Just as the current ASCII character domain name space has been created by the original Internet community, and resides within the current root server system, any creation of Tamil Domain Name Space using Tamil script should evolve as a separate name space distinct from the other languages despite the issues of using ASCII as an internal representation in the DNS server backend, a procedure currently being discussed in the IETF IDN Working Group. This procedure converts language script of all languages currently found in the Unicode, into ASCII encoding format (ACE). The IETF IDN WG intends for this ACE format to be distinguished from other conventional ASCII domain names by using a special prefix to denote the ACE encoding.

However, this should not lead to the confusion that the Tamil domain name space is a so-called "mere extension of ASCII name space to non-ASCII". This name space for Tamil (and for other languages) should be treated as a separate name space with a separate process for its administration and governance without interference from non-stakeholders and non-users of the language. So long as the underlying ASCII ACE representation is not altered or affected, there is no reason for interference, and the Tamil name space operates as a separate autonomous space and should not impact the administration, policies or governance of any other language name space.

This non-interference principle is the basis of the comments herein.

Roles

As coordinating bodies, the role which ICANN and IANA should play would
be an infrastructure enabler. It is generally not thought that it should be that of an infrastructure provider. This provider function is already established by the constellation of root servers under the Root Server Consortium. Whoever runs the coordination role of what goes into
the root server system should simply ensure non-collision of namespace,
effective administration and facilitation rather than the role of a bureaucracy, a gatekeeper or a controller.

The role of the language community and its chosen representative body which can be identified as responsible for a particular language/script namespace would be to determine what names go into that namespace.

Therefore local language communities and the international organizations that represent them will likely be the appropriate authorities to manage and administer the namespace for a particular language based on standard internationally acceptable practice.

Language

For the Tamil language, it is fairly a clear cut situation where only one language community uses one set of scripts. The distinctive feature is that no one sovereign country "owns" the Tamil script. Instead, Tamil language speakers and Tamil Script users are scattered throughout the world in the so-called Tamil Diaspora. Nevertheless, the source and home of the Tamil language is still based in the state of Tamil Nadu, India, as recognized almost universally.

The Tamil language situation is less complicated than in the situation of Chinese characters commonly used amongst different language groups: Chinese, Japanese and Korean. Like the Chinese language communities of China, Taiwan, Hong Kong, where there is a similar Diaspora, the Tamil language speakers may be found in local communities in many countries.

In response to the need to manage our own destiny in terms of Internet and computing technologies, we have created the INFITT and designated it as the body to deal with such issues as the latest technology offering of Tamil domain names, amongst many other things.

V. NON ASCII TLDS – THE KOREAN PERSPECTIVE

Response from Kilnam Chon

After two years of study on Korean (domain) name space, we are coming up with the following recommendations for the name space architecture and design. Detailed information will be available in the workshop proceedings, the meeting reports, and RFC-KR, the standard document for .kr.

"ccTLD"

We would have one each for South Korea(.kr) and North Korea(.kp).
We may develop regional TLDs for large Korean community outside of Korea.

Language Code TLD

We would have .wuri(in Korean language, which means "we" in Korean) to
represent .ko (or .kor or .korean) of the language code for Korean.

"gTLD"

We would start from the following set;

.kiup(~ .com)
.hoesa(~ .com)
.hakkyo(school)
.danche(~ .org)
.net (= .net)

We may have many more gTLDs if we "run out" of the name space in future.
We need to find out how many "decent" names does each TLD to support, and this number would be different from ASCII TLDs.

Korean Language Community

We need to have "blessing" or consensus among Korean language community
which is around 75 million, and .kr would represent 45 million. This
consensus development process would be time-consuming. Uneven development on the Internet among different regions would require additional consideration.

Implementation

The implementation of the above Korean name space in the domain name space and the non-domain name space would require additional consideration, which we have to investigate and discuss with the relevant parties.

The testing is the obvious first step as we need to validate our name space concept, and this would require substantial effort. Currently, we are testing korean.kr as well as developing many relevant standards related to Korean language.

VI. A CASE FOR IDN COMMITTEE PROCESS MODIFICATION

Response from Roger J. Cochetti Senior Vice President & Chief Policy Officer, VeriSign

VeriSign is pleased to comment on the IDN Committee's April 16th Discussion Paper on Non-ASCII Top-Level Domain Policy Issues and we look forward to continuing to work with the Committee in this important area. As the Committee knows, VeriSign has been a strong supporter of efforts to make electronic identity services, including Internet domain name and other services, friendly to people whose native language does not use the ASCII character set. And we are a global leader in many efforts to make the Internet more friendly to people who use non- ASCII-based languages. These efforts are important because in the 21st century, to be competitive and successful, nations, linguistic groups, peoples, and individuals will need to be network-empowered and this empowerment should not be reserved to just those whose language includes ASCII characters.

An important part of the global effort to network-empower those whose language does not include ASCII characters is making the Internet more friendly to non-ASCII users. And an important part of the effort to make the Internet more friendly to non-ASCII users is making the domain name system more friendly to non-ASCII users. The most important, and by far the most complex, part of this task lies in making Top-Level Domains more friendly to non-ASCII communities.

Regardless of anyone's preferences, the Top-Level Domain space today is, and is designed for ASCII characters. Any effort to change that structure involves enormous and inter-related technological, legal, operational, political, cultural, intellectual property and economic issues. VeriSign is committed to helping address these issues because the benefits of doing so are important; but no one should underestimate the complexity of doing so. The technical standards issues alone have been the subject of intense expert discussion for several years and are likely to continue to be actively discussed for some time to come.

In this context, we offer two comments on the process before the Committee:
First, few believe that the myriad of inter-related issues that need to be addressed before non-ASCII-based TLDs are introduced can be addressed effectively in a matter of months. Because of the range of inter-related, complex issues involved, we believe that the Committee should address the issue of non-ASCII TLDs in a deliberate and disciplined way. The Committee should solicit input on specific questions from legal experts, linguistic experts, technological experts, intellectual property experts, cultural and national interests, service providers, and end-users. This process should proceed in a careful and deliberate manner, giving expert groups adequate time to consider each other's input.

Instead, the Committee has posted a comprehensive set of questions on the ICANN Website and asked for initial comments on all questions at once within a few weeks.

We think the goal of reaching comprehensive conclusions in about ten weeks through two Web-postings should be reconsidered. We urge the Committee to consider the benefits of soliciting expert input in a deliberate way, and then sharing that input among various experts. This approach would benefit the quality of the Committee's work and aid the consensus development processes. As each question is examined, the Committee should not just publish a request for comments on the ICANN Website. It should actively solicit input and analyses from expert technological, legal, policy, linguistic, operator, intellectual property, and other groups. This expert input will help the Committee understand the inter-relationship between various solutions. The added time necessary for a thorough and deliberate process of this sort -- compared with an effort to develop a comprehensive blueprint in ten weeks-- will be more than made up in the lasting value of the results. Second, a useful starting point for such a deliberate exercise could be to take the existing TLDs and try to understand the many issues associated with their extension to non-ASCII character sets. The solution to this complex question could provide a framework for the examination of other questions, such as the possible addition of altogether new non-ASCII TLDs. In this regard, as the Committee has noted, there are hundreds of TLDs in use today and there are arguably dozens of non-ASCII character-based languages in use. Simply extending existing TLDs to one non-ASCII character set would raise major, interrelated operational, legal, technical, policy, linguistic, cultural, and other issues. The Committee should start by trying to address the question of extending existing TLDs and then move on to other questions.

For example, the Committee would learn a great deal by pursuing in depth an answer to a fundamental question such as "If the TLD space were to be initially extended to a very small number of non-ASCII character sets, how should the selection of a small number of non-ASCII character sets be made and/or which non-ASCII character sets should be selected?" A question like this one could be submitted to expert technical/standards groups; expert legal groups; expert linguistic groups; expert intellectual property groups; governments; cultural or ethnic groups; ccTLD operators; gTLD operators; and organizations or individuals who are expert in the commercial and non-commercial use of the Internet

Undoubtedly, the debate over a question like this would be complex; but no less complex than attempting to address an even larger number of difficult questions at once. The benefit of the more narrow approach, however, is that to the extent that a solution emerges, significant and lasting progress will have been made, on which further progress can be based. Since the issues that the Committee is examining raise controversial cultural, political, economic, legal, technological operational and other questions, it is important that it exercise as much discipline as possible in managing its efforts and that it rely as much as possible on objective criteria and expert input.

We look forward to working with you in that effort.

VII. A CASE FOR NON ASCII TLDS SEMANTICALLY LINKED TO GEOGRAPHY ONLY

Response from Yoshiro Yoneya

Following is my personal thought to the IDN TLD.   At this moment, I only support the first category (geographic units).

- Introducing IDN TLD should be tried positively.

- Introduction of the IDN TLD should be prioritized to the root zone implementation testing, and the classification should be discussed with the result of the testing.

- The root zone implementation testing should be focused on technical points of view.
+ Operation of the root DNS servers
+ Operation of the IDN registries
+ Accumulation of the IDN name resolution (resolve) HowTo

- Therefore, registries that meet following conditions would be
appropriate to define IDN TLDs while testing.
+ Have intention to cooperate ICANN IDN TLD testing actively
+ Already registering IDN
(Already have IDN registration mechanism)
+ Have clear definition of the Language of the registering IDN
+ Have name resolution mechanism to the registered IDNs
    (Possible to map the the existing IDN name space under current TLD
    to the testing IDN TLD)

I think discussion of another categories will be able to start after the evaluation of the testing.