*** Disclosure: The following is the output resulting from
transcribing an audio file into a word/text document. Although the
transcription is largely accurate, in some cases may be incomplete or
inaccurate due to inaudible passages and grammatical corrections. It
is posted as an aid to the original audio file, but should not be
treated as an authoritative record.***

 IDN Variant Issues Project (VIP) Update
 Dakar, Senegal
 24 October 2011


 >>DENNIS JENNINGS:   It's 9:15 so we will start.  It's my apologies
that this meeting is on at a time when you didn't expect this meeting
to be on, because there are a lot of changes in the schedule.

 (no audio).

 People's control so we just have to live with it.

 The way we are going to conduct this session is I am going to have
a quick run through the project and where we are.  Sort of the
organizational (dropped audio).

 To update you as to where they are.  We'll take one or two
questions, if we have time, after each presentation, immediate
questions, and we will have a question session at the end (dropped
audio).

 So let me quickly give you an update on the project.  Why are we
doing this project?  Well, there has been a longstanding request from
the user community for IDN TLD.

 (dropped audio) next slide, if you would.

 Next slide, if you would.

 And that gave us the authority to move along.

 Now, what's the situation today?  Well, applicants for -- (dropped
audio).

 Project is two main things.  To create a glossary of terms so that
everybody who is using the terms is using the terms in exactly the
same way.  This is one of the significant challenges in this project
because --

 (dropped audio).

 So what do we do?  We established case study teams based on
community experts.  So this is a community-driven project.  So we
have six case studies teams.

 (dropped audio).

 Their work in early October, and end of September was the deadline,
but most nearly got there, which was a tremendous achievement.  And
the reports have been published for public comment and translations
of the reports are under way.  And there is the URL there.

 (dropped audio).

 So thank you, teams.

 Public comment, please contribute.  The closing date is the 14th of
November.  It's not too far away.  There is the URL.  We do need, the
teams need, the case study teams, need your -- (dropped audio).

 The second phase which we are starting now is called the integrated
issues report phase.  And the third phase, which we haven't sorted
out exactly what it is and how we do it, is the development of
solutions so -- (dropped audio) -- issues that are identified in the
report.

 So next slide.

 We have formed a coordination team who are representatives of the
case studies to be our advisory team.  And there is a team of ICANN
staff --

 (dropped audio).

 -- is advisory to us.  Because this time, next slide, please.  This
time, the report is an ICANN report, assisted by the coordinate- --

 (dropped audio).

 Because we have only studied six case studies, and there are
others, other scripts, language combinations in the world, and more
work is going to be needed.

 So we're hoping that this will provide a guideline for the
development of additional case study reports.

 (Dropped audio).  Issues according to type, whether they are
technical issues or policy issues or other types of issues, and
attempt to prioritize them.  Which issues need to be addressed first,
which can be shelved and maybe not addressed in the first phase.

 (Dropped audio)  Areas, technical area, whatever.  Identify issues,
areas where further study is needed, and document the level of
support for the conclusions.

 So as part of the general --

 (dropped audio)  -- use that conclusion, and again to develop some -
- and I put it in quotes, because I'm not quite sure what the form of
it is, develop some guidelines for additional cases.

 The key stakeholders are -- (dropped audio).

 And of course at the potential TLD requestors and applicants, the
most important people there, the stakeholders in this documentation --
in this activity.

 Next.

 So how are we going to do this?

 (dropped audio).

 As advisory and the coordination team in that role.

 We'll have a Wiki.  The coordination team will meet weekly by
telephone conference.  We had our first meeting on Saturday, an all-
day meeting, and this is --


 (dropped audio).

 Comment closes.  That's the closing date for public comment.

 The case studies teams, the existing case study teams do the
analysis of the comments.  We're not asking the --

 (dropped audio).

 -- to produce the first draft on the 2nd of December.  Um notice
there's not much time between these dates.

 We tentatively are planning a two-day face-to-face meeting on the
8th and the 9th of December.

 (dropped audio).

 We're on a tight schedule here.

 We have a last call on the report on the 15th of February.  On the
20th of February we publish the integrated issues report based on the
public comment.

 (dropped audio).

 -- easy to achieve this schedule.

 In addition, as part of this work, in parallel with this, we are
doing a variant survey of variants at the existing variants.

 (Dropped audio) -- to various people but we want to get as much
information as possible about IDN variant implementations that are
already there as input to the work of the coordination team.

 So how do you stay informed?

 Well, there's -- (dropped audio) -- or have a say or make a
contribution.  So that's how we stay informed.  The community Wiki
where generally, as soon as possible, we publish the material that is
developed, so you could see the materials there.

 (dropped audio) -- in case somebody wants to actually ask me a
question.  I see no hand raised.

 Can I ask the first of our presenters and remind the presenters
that we are looking to get the presentation and some questions done
in the sort of 12 to -- (dropped audio)

 >> SARMAD HUSSAIN:  Thank you, Dennis.  So I'm going to summarize
the discussions we've had -- (dropped audio) -- from Africa which use
Arabic script.  So we are actually looking for -- (dropped audio) 

 But more intricate difference is that, when you write different
letters, so this is an example of the same letter written four times -
- these letters join up (dropped audio) -- the first task which we
had was we really -- the focus of the project was what should be the
issues, what would be the issues related to variant (dropped audio).

 So one of the first tasks which the committee -- the case study
undertook was to actually go through the Arabic code page from
Unicode.  (Dropped audio.)

 Something in finality we do need some feedback from language
communities which are not represented on the case study team.  So we
are looking for some more input.  

 Next slide, please.

 (Dropped audio.)

 Then we've had some very extensive discussions on this, is how to
deal with multipart labels.  So I'm saying multipart labels because,
when we write Arabic script, as I -- (dropped audio) -- but that's
not how the script is written.  This sequence are joint forms from
the broken at certain places, which I'm referring to as part.  But,
again, we normally call it -- (dropped audio.)

 As far as the user expectation is concerned, they expect these
breaks to be present in the script.  And they would want to or expect
these breaks to be present as part of the labels as well.

 (Dropped audio.)

 -- at is indicated by the third line.  And we discussed with -- the
possibilities of doing that, which includes possibly hyphens or non-
joiners.  (Dropped audio).

 -- rather than anything else.  And there's some examples in the
rules.  Second set of -- and then there are interchangeable cases
where the glyphs are -- look like -- they don't look like -- look
similar.  But the communities consider them as similar, and they use
them interchangeably.  So that's the third case.  

 And then final option cases are vowel marks, which are sometimes
written and sometimes not written, and, therefore, with or without --
strings with or without these vowel marks are perceived the same.

 So next slide, please.

 There was -- and then, again, detailed analysis of these is
available in the report with all the relevant tables.

 These, eventually -- characters and their variants need to be
represented in the form of what is normally referred to as a language
table.  But we are not quite convinced that it should actually be
called a language table because sometimes it refers to more than a
language.  It represents 2 to 3, or 5 to 10 languages in a same table
or sometimes all languages, so it should be a script table in that
case.  

 And then second thing is that it's not really a table.  Table is an
implementation detail.  It actually contains information on how
labels are generated -- variant labels are generated.  So what we are
suggesting is it should be called a label generation policy and
whether there's a table or XML file should be left to implementation.

 And it would contain not only characters and variant level
information but also rules, formal rules on generating labels and any
other meta level information like what languages it represents and so
on.

 We do need an automatic process to enable this.  The reason for
having an automatic -- requiring an automatic process is because
number of variants can be very large in Arabic.  And it's just
probably not a good idea to produce all these variants manually.

 And then there is also an issue of whether the variants should be
character level or position level.

 Again, this is something which is -- both are possible but we need
to really decide on this as a community for TLDs.

 One more thing which came up regarding variants is that a variant
actually may have -- next slide, please -- was that variants can
actually be at many states.  So a variant is, first of all,
identified as a set, variants are identified as a set.  Some of them
can be allocated; others can be reserved or blocked.  Then, once --
those which are allocated can be activated.  And within activation
there is some are delegated and some are just mapped.  And those
which are delegated, one of them is going to be a fundamental label. 
So it's slightly confusing to see how this happens.  And then, once
you're in operation, then over time you would want some reserve
variants to be activated and some activated variants to be blocked. 
So we really don't know what are all the possibilities, and we really
need to figure those out and eventually have an implementation policy
around that.  

 Next slide, please.

 WHOIS lookup is also going to be an issue. Because, when you have
variants at multiple levels, you could have a very difficult
situation in -- when somebody has a set of variant labels in the
domain name and looking it up.  One of the solutions we've
recommended was that anything which actually does not have
fundamental sequence in it, fundamental label sequence in it should
be mapped to a fundamental label.  And, when you actually call in the
fundamental label, it should give you the WHOIS information.  Next
slide, please.  

 We also discussed whether additional variants should actually have
additional fees.  And we really thought about it and -- you know, the
point was -- the basic question was whose responsibility is it.  And,
if you're talking about gTLD process, the variants are not because
it's requirement for business.  But it is, actually, in a way an
accident of encoding and linguistic traditions.  So it's not the gTLD
businesses which are responsible for variants but how -- and
Unicode's encoded Arabic script and how Arabic script generally works.

 So these -- we do recommend that, in the interest of the users,
this -- there should not be extra fees to label variants.  

 Next slide, please.  

 And there is, of course, end user issues.  Once variants are
enabled, there's going to be variations in keyboard, depending on
whether you're in Iran or Pakistan or Saudi Arabia or Egypt.  Those
kind of issues come up.  There are font variation issues. There are
display variation issues.  There's a  host of other issues which we
need to look at on application layer and operating system layers to
resolve to implement these.  And that information needs to be passed
on to the relevant communities to react.  Next slide, please.

 So that's sort of an overview of what we went through.  As I said,
details are available in the report which is published online.  This
is just to acknowledge the people who were part of this effort.  I'll
not go through the names.  These are posted online.  So thank you
very much.  And I'll take any questions, if you have any at this time.

  >>DENNIS JENNINGS:  So, Sarmad, thank you very much.  Are there
any immediate questions right now?  We will have a question and
answer session at the end.  Is there a question that someone wants to
come up and ask right now?  Let me repeat what Sarmad said.  I've
just been indicated we have -- excellent.  We have a remote
participant question.  Naela.

 >>NAELA SARRAS:  Okay.  We have a question from Mr. Raymond Doctor.
And he's still typing.  So he's saying, "I don't understand why a
keyboard is an issue.  The keyboard is only a means of inputting, and
different keyboards will and should map to the same storage."

 >>SARMAD HUSSAIN:  So I didn't get your question completely, but
the gist of it is why the keyboard is an issue, and I can respond to
that.  One simple issue is that -- I'm just going to -- if you can go
back to the second slide.  Yeah.  This one.  Next one, next one,
please.  Slide number -- sorry.  Slide number 5.

 Okay.  So, if you look at the top two rules, you see exactly the
same sequences.  But there's a "not equal to" sign there, which means
that, if you're in Saudi Arabia, the keyboard you have is probably
going to -- the Unicode sequence you're going to type is what you see
on the left side of that "equals to" sign.  If you're in Pakistan and
you type the same thing, you get what is available on the other side
of the "not equal to" sign.  So for a user, when they type it,
they'll look exactly the same.  If variants are not enabled, they
will -- one will resolve and one will not resolve.

 And that's because the encoding on which these characters are
mapped are different from keyboard to keyboard.  And the keyboards we
use in Pakistan versus keyboards we use in Saudi Arabia are
different.  So users -- because they look exactly the same, users
will not be able to identify why it's not working.  But it's not
working.  It's not resolving.  So, if the variants are not there --
if variants are there, they'll be mapped on to each other or both
will resolve.  If variants are not there, user will think that
Internet is unstable.  It works in Saudi Arabia and stops working
when you go to Pakistan.  So that's where the keyboard issue comes in.

  >>DENNIS JENNINGS:  Thank you.  We have a question here, please. 
I'm just going to take this question, and we'll leave other questions
for later.  But please go ahead.

 >> My name is Hamid Zarandi (phonetic.)  I'm actually the ccTLD
operator for dot Emirate, Arabic IDN domain names.  

 I went through the report.  And I would like, first of all, to
thank you for the great effort you've done to identify all the IDNs
issues, which are actually very important for the end users.  

 For us, we'd like to emphasize on the importance of adopting .3.3b,
which is the case of interchangeable case, to be considered as a very
valid and solid variant case.  

 Speaking of Arabic language in particular, alternative string is a
common practice within the Arabic Internet users communities. 
Although the condition of using alif with hamza above or below is
well-defined in the Arabic grammar, linguistically known as hamzat
qat'or or hamzat wasl.  Such condition is not widely implemented by
the Internet community at large.  So users, on many occasions, they
use their own characters due to a lack of grammar awareness or simply
for the sake of typing simplicity. So they're using the alif without
hamzat.  This is the culture typing behavior which we cannot change
it, unfortunately.  So I would also like to emphasize on this point.

 The second point, which is within the same scope, that, when it
comes to the interchangeable case, although it is mentioned properly
in the document, however, we don't see any appendix which describes
the table.  That much is undefined what are the variants of the word
"alif" in terms of Unicode point similar to other cases like the
similar cases or the identical cases.  So we'd suggest to add an
appendix which describes the interchangeable case in the report. 
Thank you very much.

  >>DENNIS JENNINGS:  Thank you very much, indeed.  Will you submit
that as a comment in the public forum?  Thank you very much, indeed. 
Thank you for those questions.  We'll have more questions later on.

 I'd like to move on to our second presentation on the Chinese case
study report.  And James Seng is going to give us an update.

  >>JAMES SENG:  Good morning.  Good morning.

 Okay.  I'm representing the Chinese case study group, case study
team on the variant group.  The team comprises many members and is
led by Lee Xiaodong.  Unfortunately, he's still on his way here, so
I'm giving the presentation on his behalf.  

 First, on the scope of work of the team, the team focused on the
Chinese variant level and only at the top level.  But we do consider -
- understand the impact to the lower level.  And we refined that in
the report.  But the priority and the issues that we consider mostly
focus on the top level.

 We also look at certain user expectations on how top level and the
variants is going to work at the top level.  And we discuss that in
the later slides.

 While we focus on Chinese variants, Chinese is a very tender word
(indiscernible) -- in the sense that Chinese characters are also
being used by Japanese and Korean.  So for the working group team we
include both experts from Japan and Korea.  We include them, so they
can give us -- provide feedback on the implication of our variants in
respect to their language.

 And then, lastly, we focus on Unicode.  There are many characters
in Chinese that are still not included in Unicode.  But we only focus
on those that have been included in Unicode.

 So, as I mentioned, Chinese characters being used across Chinese,
Japanese, and Korean.  In Chinese we call it Hanzi.  In Japanese it's
called Kanji.  And in Korean it's called Hanja.  Technically, it's
called a Han script, not a Chinese script.  And in Unicode we call it
a CJK unified ideograph.  In unicode there's about 70,000 encoded
ideographs at this moment.  In BMP it's about 2,000 -- 26,000; and
extension A, B, C, D we have an additional 44,000.

 The Han script itself are -- when we say ideograph, it means that
there are actually graphic symbols that represent an idea or concept.
It evolves from pictograph over time.  Like cavemen we draw tiny
hills, and that evolves into a Chinese character called a hue, as you
see from the first example.  The second characters actually evolve
from the sun, which evolved to the last one, which is what we call
group, which is the Chinese concept for sun.

 But Chinese characters can be very complicated with many stokes. 
And it imposed an education restriction on literacy and so on.  So in
China in 1964 and 1986 they introduced a concept of simplified Hanzi,
about 2200, 2,200 simplified Hanzi has been introduced, which
involves various processes as defined in the 1986 report, which
signifies component -- complex component.  And then you do through
complex component and simplified radicals, you combine them to form
simplified Hanzi.

 So this resulted that we have -- in Chinese we have two writing
systems but using one script.  Both traditional and simplified
writing systems are actually one Han script within the Unicode. 
Unfortunately, they're not 1:1 in the simplification process, as
example given on the left-hand side.

 The concept of prosperity and the concept of hair in the
traditional characters are simplified to a different -- to the same
character in the Unicode, in the process for Chinese simplification.

 Japanese also use what they call Kanji, and they were imported from
China.  This is in addition to hiragana and katakana, which is the is
phonetic alphabet that they use in Japan.  And Japanese also have
simplification.  They call it shinjitai, which is derived from
kyujitai, which is the old character form.

 For those who understand Chinese, you see some of this
simplification process that was done in 1923-1949 looks very similar
to Chinese simplification.

 The answer is, yes, it is very similar. Because the Chinese
actually adopt some of this simplification process in the 1964
process.  But they're not exactly the same.  So they introduce
simplification process, but it's not exactly the same.  Both
shinjitai and kyujitai are being used interchangeably, to a certain
extent, in Japan, except in the case where it's being used in nouns
to describe names and places.  And, because domain name normally
generally refers to names, therefore, the new and old form are
considered distinct in the domain name.  And this has been reflected
by the dot JP policy in dealing with Kanji.

 Korean also use the Han script.  They call it Hanja.  This is in
addition to Hangul.  Modern Korean use Hangul.  It's a writing system
designed by King Sejong in the 15th century.  But it's a phonetic
script.  It is very (indiscernible), but it's phonetic.  Whereas the
Hanzo was introduced from China in the Tang dynasty, and that was
being used mostly to learn classical Chinese.  Most of the Korean
people still have Chinese name, although they don't use it much
anymore.  And effectively of April 14, 2011, there is a legislation
called all government documents can only be written in Hangeul, and
Hanja is not allowed unless by President decree.  This is one step
that they are trying to pull away from the hues of Hanja within the
Korean community.

 And, therefore, if you look at dot KR, the policy for dot KR, they
do not allow Hanja in the registration.

 As a side note, there are actually simplification process within
the Korean for the Han script, and it's called Yakja, which is also
in Japanese, by the way.  

 So in our group we cover CJK, and what we define as Chinese
variant.  We define variant in a very narrow form.  We define it as
characters with different visual form, that look slightly the same or
maybe totally different but they have the same pronunciation with the
same meaning, and therefore can be used interchangeably within the
community.

 So what is meant by Chinese variants is simplified and traditional
writing system.  And in Unicode, there's a certain aspect called Z-
variant which is characters that looks almost the same, but which
(indiscernible) unified within Unicode, but because of certain
technicality for round-trip compatibility with the local encoding, it
is being assigned different codepoints like the hu (phonetic) and the
huang (phonetic) which is (indiscernible) shown in the slides.

 We do not believe the spelling form or the form of the characters
is a variant and we don't believe that translation, transliteration
or using the (indiscernible) are considered variant.

 So when we talk about Chinese user expectation.  As I mentioned,
Chinese user use simplified and traditional as equivalent and
interchangeable.  And from our data from dot zongua (phonetic), in
both traditional and simplified, the data has shown that we have DNS
query to both of them, not on code basis, obviously.  Chinese use
mostly simplified, but there are about 10% of the DNS query coming
for the simplified -- the traditional version of China.

 If you look at variants in the zone file from CNNIC, TWNIC, and
HKNIC, that's between 70 to 80% of the registered IDNs have variants.
So variants are very important to Chinese, and without variants, it
cause a lot of problems and confusion within the community.

 Because Han script are still heavily used between Chinese and
Japanese, and Korean has more or less opt out within the process, so
if we look at certain implication for the Chinese and Japanese
community if they use variant at the script level.

 At the script level, if you do at the second level, we have the top
ccTLD or the top level as certain contextual indicator, like (saying
word) which is (indiscernible) and the same character with variant
under dot CN is expected to be variant and been delegated to the same
registrant.  But in Japan the same character are considered distinct
and given two characters, although they are actually (indiscernible)
within the Japanese community.

 Unfortunately when it comes to top level there is no contextual
indicator, and therefore, we went through a long debate and we both
agreed there should be a more conservative approach.  And we believe
that we should -- let's say someone apply for the association in
Chinese, would result in multiple variants, six or seven variants to
be reserved.  That's okay.  It doesn't make a lot of sense for the
Japanese, for most of them.  It doesn't make sense for Chinese most
of the time anyway.  But one of which is traditional, and we would
like that to be delegated, too, as part of the process.

 So other than that, we went through the various issues on the
implementation for the variants.  And because consideration was a
language variant table, we believe there needs to be a Chinese
variant table for the root, and we believe that it should be based on
language, but that's under consideration and these are issues that we
need to discuss.

 There's also consideration on process of how do we define these
variant tables and what the format the standard that we adopt for the
variant tables.

 We also look at certain process on evaluation, allocation,
delegation and operation of the Chinese IDN variant at the top level.
How do we deal with string system later, conflict with geographic
names, discovery of variants.  How to deal with contention.  What if
application A and application B collide because of the variant
collision.  What if application A collide with a variant of
application B but application B don't really care about the variant. 
These cases where we need to look at and how do we resolve those
issue.

 We look at allocation issue and how do we actually activate and
delegate domain names, how do we reserve domain names, how do we
block them.  And we look at the delegation process.  We also consider
whether there should be the technical manager to do delegation where
there is a CNAME, DNAME, NS.  And then whether, of course, like the
Arabic, we look at the fees involved in the application.

 We also consider certain impact to the other organization, that
includes IANA, root server, registrars, registrants, DNS providers,
and software application.  And the idea of IDN variant at the top
level has various implication and consideration for them that they
need to implement.

 And roughly, that concludes my presentation on the Chinese case
study.

 >>DENNIS JENNINGS:   James, thank you very much, indeed.  Are there
any immediate questions?  I think I see a questioner.

 It's on.  Yes, it is.

 >>KENNY HUANG:  Okay.  Thank you, chair.  My name is Kenny Huang,
board member of TWNIC and executive council of APNIC.

 I only have two questions.  My first question in the previous
presentation from Arabic study, there is a recommendation, there is
no extra fee for variant registration.  But I didn't see the
recommendation from Chinese study group, so probably you can comment
on this a little bit.

 The second question will be I know there's some other extra work,
such as RFC 3743, also James and I also contribute a little bit on
the RFC, and (indiscernible) mention about technology and
registration policy and process.  I didn't see the registration
policy was mentioned in the Chinese, in the reorganization.  For
example, registrant policy.  So can you also comment on that.

 Thank you.

 >>JAMES SENG:   Okay.  So on the first question, the reason it's
not been clear in my presentation is this is meant to be an issues
report, not recommendation report.  So we identify issues, and then
the issues will be posted to the community for comments.  And then
later they will come up for recommendation.

 But within the group, we do have certain discussion and it's been
reflected in the document.  We take very similar position with the
Arabic which is at the top level we do not believe that there should
be additional fee for the variant because it's not at any fault of
the applicant that they have variants.  But that's our initial
discussion and consideration.

 As for the second question on the standard that's been adopted,
Chinese has been working on this for the longest time.  We have RFC
3743 for eight years now and its implementation by China, Japan,
Korea.  And then we have a more well-defined RFC 4692 -- 4691, I
think -- correct me if I'm wrong -- that we look at Chinese
registration specifically.  3741 deals with Chinese, Japanese, and
Korean where China do the 4691.

 And later on another RFC that takes the concept from 37 and 43 and
then put it into a more well defined that cuts across different.

 And because of the different standards that's been adopted, they
are not -- although they are not in contention with each other, and
of course the current RFCs is more or less suitable for Chinese
already, but there are also consideration from ICANN that there needs
to be more generic that cuts across different language.

 So we need to look at how the tables are going to expand or take
into consideration the language like Arabic and then how do we come
up with a unified, central format.

 So this is a consideration, again, and no recommendation at the
moment.

 >>DENNIS JENNINGS:   Thank you for that.  And we will have time for
more questions later.

 In case it's not obvious, we are going through the six case studies
in alphabetic order by the name of the script, so just to clarify
that.

 So the next one after Arabic and Chinese is the Cyrillic case.  And
may I ask Vladimir Shadrunov to present his case to the board.

 >>VLADIMIR SHADRUNOV:  Dennis, thanks.  Thank you all for coming
here.  My name is Vladimir Shadrunov.  I am a member of the (dropped
audio) for the VIP project.

 I am just going to turn the slide, please.

 On the screen you will see there the list of members that
constituted the Cyrillic case studied team.  People from various
backgrounds, registries, registrars, ccTLDs, gTLDs, academic
environment, linguistic.  We had good support from ICANN staff and
external experts in the protocol level and the registry operations.

 The team met since Singapore.  We met weekly, and we also had a two
full-day meeting in Paris in September to finalize the report.

 I'd like to talk a little bit about what distinguishes Cyrillic
script among other scripts.

 First of all, Russian is -- constitutes about 60% in terms of
number of speakers, but apart from Russian language, there are about
60 other languages that use Cyrillic and they all come from various
language groups.

 Therefore, although the team had quite a good representation in
terms of members of speakers of different languages, it was not
physically possible for us to cover all the languages that use
Cyrillic.  This is one of the issues we found.

 The feature number two about Cyrillic is that there is nothing in
the script itself that can provide some form of inherent variant
relationships between characters.  It's unlike in Chinese, for
example, where they have traditional and simplified.  There is no
such situation in Cyrillic.

 We do have variant cases, but they occur only on the language
level.  What's interesting about that is that variants that occur in
one language, variant characters that you can see in one language,
they do not have the same variant relationships when they occur in
other languages.

 That's something -- That's an issue that should be further
considered how to deal with it.

 Next slide, please.

 The very important issue with Cyrillic is that there is a vast
array of characters that are visually identical or very similar with
characters from other scripts, such as Latin and Greek.  You will see
a few examples on the screen.  In fact if you go to Wikipedia and
look up homographic attack, you will see the very first example of
homographic attack was made up by replacing a Latin character with a
Cyrillic character in a well-known trademark.

 So in the report, you will see we tried to come up with the
extensive list of characters that may be visually confusable.

 Next slide, please.

 We identified some other cases where variant issues might arise,
such as YEH and YOH.  In Russian language and a few others, which I
will comment separately.  And our observation, one funny observation
was that Cyrillic space seems to be a little bit more politically
charged than other spaces.  And this might have some consequence such
as spelling reforms occur more often than in other languages, other
scripts, with new characters appearing in the languages.  For
example, the Ukrainian language have GHE, a character called GHE and
another character called GHE with an upturn.  This is something
specific for the Ukrainian language, but the Russian language speaker
would probably not even recognize the different between those two
characters.

 There is another interesting situation.  In the Ukrainian language,
they have a character called apostrophe and that apostrophe is
different from the apostrophe that you can find on most keyboards. 
It's a different Unicode codepoint.

 So in terms of variants, in many cases, this apostrophe can just be
omitted.  So it's kind of a variant with nothing.  Variant with zero.

 And the Ukrainian Internet community indicated to us that this --
it is very important for us -- for them that the apostrophe letter is
included and is part of the domain labels.

 So some conclusions that we came up with.

 First, the current ICANN policy is that visually confusable labels
should not be ever delegated.  We think because of the number of
situations where Cyrillic and -- Cyrillic characters and characters
from other scripts are identical, we think that it's very important
that this policy should be preserved with ICANN, and visual
confusability test criteria should be maintained.

 The second conclusion, we recommend reserving all other variant-
related cases.  What do I mean by reserving?  It means that where two
labels are in variant relationships, if -- if the variant label is
ever delegated, it can only be delegated to the same registry
operator.

 The third conclusion.  We did not identify any cases where parallel
delegation would be an absolute must from the user experience
perspective.  That simplifies the Cyrillic case a little bit.

 So although some of the team members had the opinion that this
could be a good idea where to provide some kind of aliasing,
alternate names approach, but the general sense of the group that it
is not absolute requirement.

 As variant cases occur on the language level only, we think that to
account for all the cases with variants, a root variant table will be
needed.

 The last conclusion is that we recommend that certain codepoints
that are not a part of the Cyrillic script and Unicode should be
allowed, specifically allowed for Cyrillic TLD labels.

 There are two characters that share that feature.  That would be
the Ukrainian apostrophe that I mentioned and the one diacritical
mark combining a (indiscernible).

 I think that's it, and if you have any questions, do not hesitate
to ask.

 >>DENNIS JENNINGS:   Thank you very much, indeed, and bang on ten
minutes so that's excellent.

 Any immediate questions?

 Okay.  There will be an opportunity to ask questions later on, so
thank you, Vladimir.  Let's move on to the Devanagari case and we
have Akshat Joshi to present the case.

 >>AKSHAT JOSHI:  Thank you, Dennis.  First of all, let me thank
ICANN for inviting us to become part of the Devanagari case study
team.  And with the background of the IDN project that was -- we were
already working on with the Department of Information Technology, we
could definitely participate well.  This was also a very enriching
experience for us.

 Next slide, please.

 The slide which you are seeing currently is the Devanagari case
study team.  You can see we are a fairly large team with expertise
from different areas.  Our case study coordinator, Dr. Govind, could
not be present here physically to present this so he sends his
apologies.  I am Akshat Joshi.  I am a team member of the case study
team.

 Next slide, please.

 Let me take you through the structure of the report.  So these are
the main points.  First one is the basic postulates.  The basic
postulates are the main assumptions which we have done while
designing this report.

 Let me give you a brief background about this, why we had these in
the first place.  Is, first, when we created the first draft report,
and we had a first face-to-face meeting back in (saying name) after
this case study started, the report was very elaborate.  It had many
issues listed in it.  But in the face-to-face meeting, we came to
know that there can be many things which can be excluded, because
there are certain restrictions that IDN in 2008 itself imposes, then
there are certain restrictions that gTLD Applicant Guidebook would
impose.  And with that in mind, to make the main points get focus, we
introduce this basic postulate.  So some of them are like Unicode has
been taken as a base.  So Unicode, with all its normalization rules,
NFC, NFKC restrictions, then only those cases which are excluded by
this will be considered.  And some of these points form a part of the
basic postulates.  

 And then that is followed by overview and evolution of Devanagari. 
Some of you might be aware Devanagari is a complex script and it has
characters which change shape when they are joined what have so what
happens is Devanagari is not restricted like Latin that has only 26
characters or basic characters, but Devanagari has a fairly large
Unicode table.

 So what happens is with that (indiscernible) set of characters,
even after imposing IDNA restrictions and gTLD Applicant Guidebook
restrictions, one character set remains is really very large.  And
when they join, what happens is the shapes that can be formed become
too large.  So the analysis become really rigorous in this sense.

 So there is a brief overview of how Devanagari is formed.

 And a brief sketch of writing systems of the -- writing system of
the language, that forms part of the, like, there is a concept of
syllable in Devanagari.  So that concept has been elaborated in that
part.

 Then issues and extraneous issues will be taken up in successive
slides so I will skip them for now.

 Then we come to a main part of registry and registrar perspective. 
Over here, there are issues like at registry we have something like a
EPP protocol to be obeyed and there are certain issues that can come
up in view of the variants.  It's not a core part of the variant but,
yes, these issues are important so they have been enlisted over there.

 Then we list of appendices which has larger information in it which
are in connection to the main points of the (indiscernible).

 Next slide, please.

 So moving on to the part on the issues, the main issues at large. 
First one is the language versus script.  So in Devanagari, which is
a Brahmi-based language, there are many scripts that come under the
Brahmi family, and Devanagari forms one of those scripts.  

 So what happens is in Brahmi-based languages, we have something
like one language can be returned in multiple scripts.  Also, one
script can cater to multiple languages.

 So these are the things that need to be considered when we are
viewing it in terms of issues.

 Then there comes a part of variants which will be taken up again in
successive slides.  The second part, also issues related to software
behavior will be taken up.  We have whole script confusables.  This
is been separated from a part of variants, even though it is part of
variants, it has been separated because these confusables are because
of introduction of some other script, so they have been taken up
separately.  And then we have the case of 02BC character, which is
called a modified apostrophe.  This particular character has two
issues associated with it.  One is in relation to variant.  The first
one is that this character comes from a different code page from that
of the Devanagari, which is 0900.

 So the script property of this particular character is common.  And
with the restriction that it cannot -- script mixing cannot be
allowed.  This character first does not come with the languages which
are returning Devanagari (indiscernible) near this character.  So if
a decision is taken so that this character gets included, then the
issue comes up like this:  Looks like an apostrophe.  So we just need
to look into that.

 Next slide, please.

 Moving on to the main variant classification, we have two broad
categories, and those are the confusingly similar single characters
and confusingly similar composite characters.  So this is the
category which has single characters which look alike, and given the
restriction of the URL bar of the browser, so to say, you can see
that these characters can be confused with one another.  But in dot
IN policy in India wherein we are allowing IDNs in Devanagari, we
have not considered this, but that being a second level, the
restrictions can be more restrictive or something like that.  But in
gTLD we need to be cautious, so we have flagged them here as variants
as well.

 Next slide, please.

 Then these are the variants which are the characters which are
composite characters.  So each of the three columns which you are
seeing is a set of three Unicode codepoints, and those characters do
not mean the same.  They do not sound the same, but visually, they
lookalike.

 So any competent user of the language will also not be able to
differentiate them.  So they have been given as the variants over
here.

 Next slide, please.

 Then we have mouse issues, the issues which are very much related
to the variants.  The Devanagari being a complex language, it heavily
depends on the rendering engine that is backed up by the operating
system and the font that gets applied.  Many times this is default
font applied by the operating system.

 So the way it looks, you can see the variants that have been
proposed are mainly given on the basis of their look.  So the font
that gets applied on that is really major thing that needs to be
considered.

 Next slide, please.

 This is the next miscellaneous issue, and it's not categorized on
that variant but under miscellaneous issues because it actually talks
about a different script.  That is Gujarati.  The Gujarati has not
been part of this particular case study.  It has been taken under
miscellaneous.

 As you can see, the first character, the first word, so to say,
comes from the Devanagari script.  The latter one comes from the
Gujarati script.  The only difference between them is the lack of the
header line or (saying word), as we say.

 So when we see this again in the very small font, they look alike
and can be confusing.  Now some of these characters do form part of
the Unicode's confusables list, but a manual check on the (poor
audio) of confusables list of Unicode shows there are many more
characters that should be identified under the DAG.  So they have
been given here.

 Next slide, please.

 Then there comes an issue of zero width joiner and nonjoiner. 
These are the characters which are given to get particular forms in
the language.  Those are invisible characters.  So the issues
associated with those characters have been flagged in this particular
section.

 Next slide, please.

 So here is the link for the report.  You can click on this
particular link and read the report elaborately and we definitely
solicit a good amount of comments on this so we can analyze the
issues better.

 Thank you.  I will answer any questions you may have.

 >>DENNIS JENNINGS:   Thank you very much, indeed.  Have we any
immediate questions for Akshat and the Devanagari case study?

 Yes, I have a question.  Good.  Please come forward to the
microphone.

 >> Hell hoe, I am Sala and I am from Fiji, and the IDNs are very
interesting, and this is coming from someone who doesn't know
anything about IDNs.  But in terms -- I just wanted to ask to the
panelists, in terms of the classifications and the confusing, you
know, scripts and the challenges, are there some sort of guidelines
on how they are going to actually be treated and that sort of thing? 
Just out of curiosity.

 >>AKSHAT JOSHI:  If I understood your question correctly, you are
asking are there any guidelines that talk about how the variants
should be formed?

 >>DENNIS JENNINGS:   Can you come a little closer to the microphone?

 >> Sorry.  Are there any guidelines or are guidelines being
developed in terms -- I understand that people are pioneering and
that sort of thing.  In terms of how it's going to be treated.  I
understood from what the panelists were saying that, you know, there
are certain scripts that are politicized.  I think somebody referred
to it as politically challenged, and that sort of thing.

 So I am just interested to see the guidelines sort of thing.

 >>DENNIS JENNINGS:   Thank you for the question.

 I think I've understood what you have asked, and one of the things
that we're hoping to get out of this are some guidelines for other
case studies.

 We haven't formulated exactly how that's going to be done.  And I
think this addresses your question, that the experience of these six
case-study teams will be documented in such way that another set of
users in another script can have a methodology for at least beginning
to define what the variants are.

 Do you want to come back and supplement the question?

 >> Yes, thank you.  I take it from your response that it is
relatively going to be a subjective experience based on the different
methodologies.

 >>DENNIS JENNINGS:   No, it's not subjective.

 >> Okay, sure.

 >>DENNIS JENNINGS:   There are -- It's as objective as possible. 
There are characters which are well-known to be variants of one
another.  That is their -- Same character with the same meaning with
a different codepoint is an example of a variant.

 >> Right.

 >>DENNIS JENNINGS:   And out of these case studies, there will be a
number of definitions for variants which will be a guide for whatever
script you are concerned about.

 >> Right.  And so that answers my questions about the confusingly
similar script perspective that the panelists addressed.

 The other question I have very quickly before I take my seat is in
terms of -- like somebody mentioned in another forum about the word
Kong.  Kyong, Khong, Cong, Kong, Kong, Kong, Kong, and how you can
have different languages but with the same phonetics.  And would that
be something that would be woven, as well, into the guidelines?

 >>DENNIS JENNINGS:   That's a very interesting question which I am
not going to answer but I will inviolate the panel to think about
that and we will come back to that question at the end, definitely.

 Thank you.

  >>DENNIS JENNINGS:  So I'd like to move on to the next case study
report, which is on the Greek case study.  And I'd ask Panagiotis
Papaspiliopoulos to give the presentation.

  >>PANAGIOTIS PAPASPILIOPOULOS:  Thank you, Dennis.  And hello,
everybody.  My name is Panagiotis Papaspiliopoulos.  I will make a
presentation for the Greek case study team report and on behalf of
the team.

 Here you can find -- on the next slide you can see the members of
the team.  And, unfortunately, our coordinator, Vaggelis Segredakis,
could not be here.  So I'll make the presentation, and you can see
the other members of the team.

 So the Greek case study team developed this report.  And it was
posted for comments in the public forum.  It was posted since the 7th
of October.  And here you can see the link.  And we expect for your
comments until the 14th of November.  I think we have already one
comment.  And we expect some more.  

 So the current -- I will -- I'll show you the structure of the
report.  First we have a rather long introduction and disclaimer.

 Excuse me.

 And afterwards is the definitions section.  Some useful key points
regarding the Greek language that readers should be aware of in order
to understand afterwards our proposals.

 The proposed characters for registrations, the issues concerned,
the proposed solutions -- there are two of them -- and the
recommendations of the team.  

 And at the end you can see the appendix with the table of the
proposed load characters for Greek top-level domain registrations. 
And I would like to say we are dealing only with top-level domain
registrations in our report.

 So here is the -- some definitions that we used in our report.  We
defined the homograph.  Homograph is when two words or strings are
written the same in the -- in different scripts.  Homophone is when
two words or strings are pronounced the same in the same or different
scripts.  

 Greeklish is a Greek word.  It's not an English word.  And we
invented this word years ago when the majority of the applications
could not support the full set of the Greek characters, the
characters with the tonos and things like this.

 So it's the representation of the Greek words and the characters
using the English characters.  Nowadays users of this Greeklish has
become less over the years.  And aliased name and name aliasing is
when a name can have two different -- domain name can have two
different forms.  

 And then the bundling domain names is when these different forms of
the domain name is acting as one.

 Tonos is the accent mark that is used in nowadays the Greek
language.  The dialytika is when we try to separate one vowel from
the neighboring vowel instead of pronouncing them together.  And
katharevousa and dimotiki are two different forms of the Greek
language.  Katharevousa was an older form.  It was made -- it was
made by the scientists even before the Greek revolution.  It's a
successor from the ancient Greek.  And it was alive until the middle
'70s when the Greek government changed it to the dimotiki.  Dimotiki
is the current form which is to be used by the people and since the
middle '70s is also used by the state.

 And so let's see two useful key points regarding the Greek
language.  First is the Greek language question is the diglossia
between katharevousa and the dimotiki.

 To other people who do not know the Greek history and the Greek
language, this issue, this dilemma was very significant.  And even
people died for it.

 And, of course, as I mentioned before, nowadays the dimotiki is
used.  But many  forms of the words that belong to katharevousa are
still used.

 And another issue is the Greek orthography.  It's the polytonic
version, monotonic form.  Polytonic is when the accents were used.  I
was taught polytonic grammar until the first grade of the high
school.  In 1982 this polytonic form was replaced by the monotonic
form by the government.  

 And you can see the example.  It's the first lines of the Lord's
Prayer.  The first example is in the polytonic form, and the second
one is in monotonic.  

 As you can see, the first one, even if it's more beautiful, let's
say -- it's more complicated.  And, nevertheless, the monotonic form
is now used.  

 So next slide, please.

 And so the team proposed for registration characters that are only
monotonic because these characters are used now for the spelling, the
correct spelling of the Greek words.  And the polytonic characters
offer no significant advantage for the user for a top-level domain
registration.

 And these polytonic characters can be used for the lower level
registration according to its new Greek top-level domains policy. 
But we believe that it's for the benefit of the end user and, since
there are no significant advantage to use polytonic characters, to
recommend only monotonic ones.  

 So, in order to formulate our proposals, we had to think about
several issues.  One very important is the sigma and final sigma
issue.  We have three sigmas in the Greek alphabet.  It's the small
one.  You can see them in the examples in the slide.

 It's a small one.  It's the small one, the big one, the capital
one.  And it's the final one.  The final one is a small sigma, but
it's the final small sigma because it goes at the end of the words.

 Like at the example.

 So, in IDNA 2003 protocol, it was a mapping from the small middle --
let's call it sigma -- to the capital one and from the final sigma to
the capital sigma.  But, when you had to reverse this thing, then
from the capital sigma you went only to the middle sigma.  And, as
you can see from the example there, the result of this -- from this
thing is not a correct Greek representation of the word.

 So, fortunately, in IDNA 2008, middle sigma and final sigma are
different accepted characters and treated separately and reverse
mapping is not possible any more.  So, if you see the official name
of our country Greece, Hellas, and you consider the IDN approach of
translating this word into a domain name, you can see that Hellas has
a top tonos and the final sigma.  This is the normal writing of the
word.  If you write it without the tonos, you have another domain. 
If you write it in capital letters, even if they are not accepted in
the IDNA 2008 protocol, may some application to load characters, you
may have a word that is a middle sigma instead of final sigma, so you
have a different Unicode.  

 So, in our study group, we tried to meet the typical user
experience.  The typical user experience the user is asking --
expects to have this same result if he uses small letters or capital
letters.

 So we concluded in our team that a word or a string without a tonos
should be considered as a variant of the tonos, of accented version
of the string.  Other issues are the homographs.  And here you can
see some examples either in different scripts.  Add the capital --
initial letters of the Greek regulator of the tonic communications
market.  In the capital form you can see it is very similar.  And in
small form, okay, you can separate them.  

 But also you can have in Greek Athina and Athena.  Athina is the
name of the capital city of Greece.  Athena was the goddess of
wisdom.  If you write them both in capital letters, this capital form
is not accepted. So you do not understand seeing only this word for
which you are referring to.  Homophones is like when you pronounce
the same.  And here is the example of the Greeklish that I said many
people are using English characters or numbers sometimes to represent
the Greek characters.  

 So, having all these issues in our thinking, where -- and having
also the fact that the correct spelling and the correct form of
writing a Greek word is to use the tonos and to use the final sigma. 
We will consider that the tonos accent and final sigma characters
should be included also.  And I have -- I have to say here that many
times I have used the word "word" instead of a label or a domain
name.  That is because most of the times it's not only a string of
characters.  Most of the times a word is used in order to address the
memory of somebody to the domain name.  That's why we had to deal in
our report with words in the Greek grammar.  Of course, we understand
that a domain name can have different characters or a set of words. 
But most of the times we dealt with the same word.

 So, in our case, as a variant, we have a form of capital letters
and a form of accented letters.  The given word in katharevousa and
dimotiki and the monotonic and polytonic form, then variation if
there is a final sigma or if there is not a final sigma of the exact
position or if you change it, and some changes that happen in the
Greek words, according to the Greek grammar.

 So these -- here is the two proposals from the Greek team.  One --
next slide, please.  One is called the variants proposal.  It's the
proposal that includes variants.  The accepted characters for this
proposal are the small, monotonic characters include the characters
with tonos, dialytika, and their combination.  The domain name will
be accepted at exact requested form.  The same domain will be
allocated to the registrant stripped of accent marks and final
sigmas.  The same domain will be allocated to the registrant stripped
of accent marks but retaining the final sigma at the exact position. 
Alternate position of tonos is not allowed.  Alternate position of
final sigma is not allowed.  And the same meaning of this word in --
next slide, please -- 

 in katharevousa and dimotiki are not allowed.  The registrant, the
person who applies for a string for the word, can choose only one
form.  And the others should be excluded.  

 And two options for handling the allocated variants, the first one
is to enter the zone as DNAMEs, and the other is to be treated as
FQDNs and registrant instead has to make it a recommendation.

 The second proposal is the small characters proposal is included in
the first one but it's without the variants.

 We are aware of the lack of the technology because in DNS you can
have DNAMES, but in other protocols these techniques are not 100%
successful, let's say.  And there is a way for this.

 So the accepted characters are the same as before, but their only
accepted form is the form that is originally submitted by the
applicant with the tonos and the final sigma in exact positions.  All
the other variations, the variations without the tonos, the variation
without the final sigma, the variation without the final sigmas is
not allowed.  And also is not allowed the different form of the
dimotiki and the katharevousa.  So taking into account the advantages
-- and can I have the next slide, please?  Thank you.  

 Taking into account the advantage and disadvantage of each
proposal, the average Internet users experience neither expectations.
As I said, typical Greek Internet user expects to have the same
result when using a capital or small letter form.  

 And the current protocol status acknowledges limitations.  And,
having in mind that ICANN itself recognizes the need to have variants
in order for the language to be represented as the native speakers
use it, the Greek case study team recommends as the most appropriate
solution the variants proposal.  And so the next steps is that, as I
said in the beginning, we are waiting for your comments until the end
-- until the 14th of November.  We'll analyze them, and we'll
determine how to address them in order to revise our report.  And
then we believe that is a useful next step is that the Greek and
Cyrillic and the Latin study teams must direct together to identify
the cross script issues that might be the same in the Latin,
Cyrillic, and Greek characters registration.  Sorry for talking too
long.  I could talk more. No, that's fine.

 >>PANAGIOTIS PAPASPILIPOULOS:  Now, I am waiting for your
questions.  Thank you very much.

  >>DENNIS JENNINGS:  Thank you very much indeed.  Maybe we'll hold
the questions for the Greek case, because we're running a little
beyond time.  We will take questions at the end on the Greek case.  

 And let's go to the final case, the Latin case.  Can I ask Cary
Karp to give the presentation.

  >>CARY KARP:  Thank you, very much.  Just noting my name is there
because I'm the one to be blamed for anything wrong with this
presentation.  It was a team that prepared it.  And at the final
slide we'll point you in the direction of the full list of
participants here.  All righty.  Next slide, please.  

 One of the things that makes life easy for us and also something
that makes it difficult for us, the Latin alphabet is at the core of
a larger number of writing systems than is any other in current use. 
Next slide.  

 The basic 26 letters of it, which are not adequate for as many
languages as you might think, not even the English language, are
these.  This is what is important, encoded in ASCII, which has been
around since the first days of the domain name system.  Next slide.

 Next slide again, please.

 All right.  The English case.  There are two ways in which non --
extended Latin letters appear in the English language.  And it's a
general situation.  Lots of languages are the same.  The first is the
word näiveté, which is very commonly typeset.  The house rules for
many publishing houses, if not most publishing houses, want to see at
least the dots over the A or the accent over the E or both.  And
that's generally regarded as decorative use of diacritical marking. 
However, the second case, the difference between the word "resume"
and the word "resumé" is significant.  Adding those diacritical marks
changes the meaning of the word.  And it is, therefore, contrastive. 
That's an important distinction.  

 If we continue now, just taking one of the vowels, one of the basic
vowels, these are some of the ways in which it can be used validly in
IDNs.  There are a few other decorated Os, using a term that I don't
really like.  But we don't have them here.  And, if you take a good
look at those, you can easily see that they are, in fact, separate
and distinct.  If we were to reduce the size of this -- I don't have
a pointer here.  But, if you take a look at the second row here, the
left-most character there and the fourth from the left, those are two
diacritical marks that are clearly distinct that become less distinct
as one reduces the size at which they're displayed.  However, -- and
this is the important thing -- to the people who use the one or the
other, the distinction is immediately obvious.  And to people who use
neither the distinction may be irrelevant.  Okay?

 The next slide.

 This is one of the consonants.  The same deal here.  And, in fact,
on the bottom row there, there are letters that are derived from "h,"
but they're not.  The left-most, the bottom left is a hang, which is,
in fact, an "h," and then a sound that exists in English -- ng -- but
is represented with two letters and in others is represented with an
ng, which is an "n" combined with a "g."  So you have the ascending
portion of an "h;" you have the body of an "n;" and you have the
ascender of a "g" taken into a single glyph, as it's called.  And
that's, again, very, very important to the communities that use them.
And, if you don't understand even what the language is, that a label
includes this, it's possible, if not outright likely, that the
documents that it leads you to will be similarly difficult to parse.  

 Okay.  Next slide.

 And this is what I just said.  The distinctions between these forms
are a fundamental -- sorry.  I need the second part of it.  

 However -- and this is the crucial point -- a community that uses --
that doesn't perceive a particular distinction between two forms of a
letter will regard them in one way.  And some other community where
that distinction is central -- we've heard it in preceding
presentations -- we can't disregard them.  

 So, to the extent that the way a potential variant situation is
dealt with with language A is at fundamental odds with the way that
same difference is dealt with in language B, precludes any general
rule about how to manage those variants.  

 Now next slide, please.

 In the Latin script, one of the key issues is decomposition of a
marked letter.  Can you represent a marked letter with two unmarked
letters that are both in ASCII, or what is the alternate
representation otherwise?  

 Next slide.  

 These are Swedish words.  The umlauted O in Swedish is not a
diacritically marked letter.  It is the 29th letter of a 29-letter
alphabet.  It is not a marked O.  It is not regarded as that.  Nobody
sees it as that.  The first word means the north, the region that
many people call Scandinavia.  And it's perfectly conceivable for
Norden to be a regional TLD proposal.  If we were to decompose that O
the way a German might into an OE, we have something undefined.  I'm
not sure that a Swede would even read that correctly, recognize it as
a Swedish word.  And, if we do take the correct Swedish fallback
representation -- if you do not have access to the oo, then you just
do without the two dots and you write an O.  It is not correct
orthography, but it is contrastive.  The third word is what you get
when you have that -- what I'm seeing is I'm giving this backwards. 
Patrick and Stephane are -- crawling in their chairs.  I've swapped
this.  I'm not looking at my slides.

 Okay.  The bottom one, norden is the name of the district.  Oh, my
God.  This is embarrassing.  The top one, Nöorden is the word "nörd"
in Swedish.  So, if you take the word "nörd" and strip it of its
diacritical markings, as we would see it, you end up with the name of
the region.  And those are two utterly separate concepts, despite the
fact some fool sitting here who identified himself as responsible for
any slips in this presentation has now committed a really big one.  

 Next slide, please.  Okay.  

 A single writing system can have multiple rules dealing with
alternate representations depending on the context.  This is truly
crucial when dealing with proper names.  Next slide.

 For example, the names -- let me do this right.  The author Goethe
would never, in any orthographic form, be written with the umlaut
over the O, although a Swedish name in something of an archaic form
would certainly be written by that.  And removing the two dots from
the Swedish form gives you nothing -- expanding it to an OE gives you
a claim of literary skill that isn't -- that doesn't exist.  So,
again, a German would certainly regard an umlauted O as decomposable
to an OE, although a Swede wouldn't.  And, under some circumstances,
a German would regard combining an OE into an umlauted O as
acceptable but not in all.  There's an asymmetry to this.  There is
difference in approach within closely related languages.  So,
considering a broader linguistic community than that still is -- it's
an impossible situation.  Next slide.

 And that's what I just said.

 There may be some meaningful concept of variants that attaches to
the Latin script.  But it is not capable of global quantification. 
Okay.  Next.

 However, in a local situation, it is actually quite possible to do
this.  So a Swedish character table defining an IDN permissible
repertoire can reflect Swedish orthographic rules.  The same goes for
the German.  Okay?  Continue.

 Remembering that the graphic distinctions between one decorated
character and another are visible to anybody, if magnified enough and
visible to those for whom they're important at any degree of
magnification, we considered a situation where two different
codepoints, two different values in the Unicode system might be
ascribed to the same character, the exact same thing, not similar,
but identical.  The display form is the same as the other one.  And
the other situation where the same letter might have two different
codepoints.  And we found exactly one such situation.  

 Next slide, please.  Keep going.  Okay.  

 And that's this:  The Latin small letter turned E and the Latin
small letter schwa -- schwa, by the way, occurs frequently in the
English language, although it's never noted as such.  One of these is
used in many African orthographic systems and far northern
orthographic systems.  The other is used in academic discourse about
alphabets.

 But, in any case, a writing system in which one of these appears is
not going to include the other.  However, the root zone is supposed
to support all communities used in all writing systems.  

 So here we have the case.  If the one of these appears in the root
zone, it would probably be quite risky to allow the other to appear
in the root zone.  All righty.  Next slide.

 We, as with our Cyrillic and Greek associates, noted that there are
similar -- there's similar glyph sharing across our language and
boundaries.  But, because that was at the outset, not part of VIP
study, we thought it would be most useful to defer it until we got to
the stage where we are now.  And that is discussing how to merge the
sets of -- the six sets of issues into one consistent issue report
that we then put forward.  All right.

 Next and final slide.

 You will see the full lists of all the available Latin codepoints
divided into two tables.  The ones that simply have to be made
available without need for further consideration, and the others that
are, in terms of the protocol, available but someone would need to
demonstrate true need and true warrant for their appearance.  And
here in the full report, which we hope you will read and comment upon
in the forum, you will find the entire list of the people who were
the members of this study group and what their roles in it were.

 Again finally, profound apologies to the Swedes for having intended
so well to illustrate the intricacies of their writing system and
then munched it as I did.

 Okay.  That's it.

 >>DENNIS JENNINGS:   Thank you very much, indeed, Cary.

 We have 15 minutes for questions, and I haven't forgotten the
question about homophone strings, which we will come to at the end
because I think we could spend all the time about that.

 You will see that the general consideration is character variants,
not string variants, just as a preamble remark.  And I am now opening
it to questions to any or all of the case study presenters.

 Please, and, Naela, we have -- Okay.  We will do that in a minute.

 >> Thank you, Chair.  I am Oksana (saying name), and I was happy
not to participate but to follow one of these group, and now I have
questions for all presenters.

 What issues were the most controversial during your work?

 Thank you.

 >>DENNIS JENNINGS:   So just to repeat back the question.  What
issues for each case study were the most controversial.  Does anybody
want to -- Cary, what was the most controversial issue you addressed
in the case study?

 >>CARY KARP:   Coming to the understanding that it wasn't amenable
to the kind of solution that we were charged with identifying.

 >>DENNIS JENNINGS:   Anybody else want to say?  Akshat, do you have
a most controversial comment?

 >>AKSHAT JOSHI:  Yes, specifically talking in terms of the
Devanagari language, that being a complex script and the backing up
of the (indiscernible) form was very much crucial, but the consensus
on that has not been reached.  So identification of variants with
those things in mind which are not in a unified manner, that was
really one of the most difficult calls to take while identifying the
variants.

 >>DENNIS JENNINGS:   Anybody else want to take the opportunity to
identify?  Sarmad?

 >>SARMAD HUSSAIN:  So one of the most challenging things is
actually for TLDs we have to think at script level, and none of us is
actually trained to think at script level.  Most of us are actually
trained to think at language level.

 And that's really a challenge to sort of step away from language
and look at the whole script rather than characters belonging to
different languages.

 >>DENNIS JENNINGS:   Thank you.  Naela, are you ready with the
remote question?

 (Garbled audio)

 >>NAELA SARRAS:  Thank you.  So I have a comment -- I have a
question, a comment, and then another question if you have time.

 So the comment came after Dr. Sarmad said his presentation, from
Ayesha, and it says:  From your experience, which way is better, to
reserve the other variant or block it?  I think that was the question.

 >>DENNIS JENNINGS:   So do you want to address that first, Sarmad?

 >>SARMAD HUSSAIN:  So what we are proposing is blocked and reserved
are two different statuses.  So it's not a matter of preference
between one or the other.

 Reserved means those things which an applicant does not want, and
blocked means those things which applicant cannot have, whether he or
she wants it or not.

 >>NAELA SARRAS:  Thank you.  And then the comment was from Sayef
(phonetic).  I didn't get the last name.  I'm sorry.  It says Arabic
variant is one of the complicated variants in the world.  Since Arabs
do not agree on the way they write, there will be plenty of
consequences they will face in accepting some letters rather than
others.  Hence, we will lose the identity of our language.  That's
why we have to agree on one document that regulates the end user on
how to write IDNs.

 >>DENNIS JENNINGS:   Sarmad, did you get the question?

 >> I don't understand the question.

 >> So actually, I don't feel comfortable summarizing for him, but I
think -- I will read it again.  Arabic variants is one of the
complicated -- Arabic variants is one of the complicated variants in
the world.  Since Arabs do not agree on the way they write, there
will be plenty of consequences that they will face accepting some
letters rather than others.  Hence, we will lose the identity of our
language.

 And then later he says that's why we have to agree on one document
that regulates the end user on how to write the IDN.

 >>SARMAD HUSSAIN:  I will just make a general comment in this
context because it seems more like a remark than a question to me.

 >>NAELA SARRAS:  Yeah, it was just a comment.

 >>SARMAD HUSSAIN:  So when we are talking about TLDs, we are not
talking about a particular language, we are talking about a script
because when a TLD is being used by an end user anywhere around the
world the context of language is no longer there available for that
person.

 So when we're talking about TLDs, again we have to step away from
language level, and there was also -- I think it was also pointed out
by other case-study teams that we have to take the most conservative
approach.  Meaning that we have to protect all users.

 And so there will be compromises which will be made across
languages, and that's really been the fundamental philosophy of the
case-study team as well.

 >> Thank you.

 >>DENNIS JENNINGS:   Thank you.  And can that comment please -- can
the commenter make those comments in the public forum.

 Is there another question, Naela?

 >>NAELA SARRAS:  I have two more.  What would you like to do?

 >>DENNIS JENNINGS:   Unless I see a hand raised in the audience
here, we will take the next remote question.

 >>NAELA SARRAS:  So, I'm sorry, I think it was a person named
David.  I don't see who asked it because there were a few comments
that came a after that.  It says:  Naela, if possible and if time
permits, please ask the question that was asked by the lady in the
room prior in the call.  Will we see dot com domains in other
languages?  I assume transliterations, for lack of better wording.

 >>DENNIS JENNINGS:   We will come to that.  I promised we will
spend time on that at the end.

 >>NAELA SARRAS:  Sorry, you're right.

 The other question was from Matt Harper.  I would like to ask a
question.  If a new gTLD application for dot pet, P-E-T, and dot
pets, with a plural at the end, would this be a bundled application
or two different required applications?

 >>DENNIS JENNINGS:   I will also say we never used the word bundle
in any of these presentations.  It's a term that we don't recognize,
and it doesn't exist in our lexicon.

 Pet and pets in this example, the single and plural, are not
variants of one another.

 Next question.

 >>NAELA SARRAS:  I'm done.

 >>DENNIS JENNINGS:   Okay.  Back to the audience here.  Are there
any questions people would like to ask?

 And seeing none, let me go on to the question about similar
sounding strings in multiple languages, and let me take a
noncontroversial example.  The sound bang.  The sound bang I'm sure
can be expressed in almost every language in almost every script in
the world.  However it might be spelled, it represents the sound
bang, B-A-N-G, B-J-Y-N-G, whatever, just using the alphabet that I
know.  These homophone strings are not variants and they are not
something this team has considered.

 Perhaps the members of the case study team want to contradict me,
but I understand that that is the situation.  

 Cary, would you like to lead off?

 >>CARY KARP:   It's easy.  We were considering graphemic similarity
and not phonemic similarity.

 >>PANAGIOTIS PAPASPILIOPOULOS:   And we did that exactly the same
in the Greek group.

 >> We do not consider (indiscernible) similarity as variants.

 >> Even we don't recognize them as variants, homophones.  Thank you.

 >> I think coming back to the question about controversial issues,
one of the most controversial issues was to determine what's within
the scope of the project, what's outside the scope of the project. 
And definitely this issue was outside.

 >>DENNIS JENNINGS:   So does that answer the question that was
asked earlier in the question?

 Come up and challenge us, please.

 And again, if you could speak close to the microphone so we catch it.

 >> Hello.  I say yes and no.  And again, this is from an IDN idiot.
So that's my caveat.

 >>DENNIS JENNINGS:   An IDN naive user.

 >> Yes, an IDN naive user.

 It's interesting because I'm just thinking very quickly in terms of
a policy perspective, if it were -- if that were to be the rule,
that, okay, similar phonetics would not be variant, in my view, very
quickly, it would create a floodgate of potential -- I don't know,
you know, like people could sort of use it adversely.  Off the top of
my mind, I can't think of an example.

 Would it be -- And this is a question I pose to you.  Would it be
an exception, then, rather than a rule?  Just very quickly.

 >>DENNIS JENNINGS:   Okay.  Again, using my noncontroversial
example of bang, the -- if bang in some language became a very
successful TLD there might well be a demand for the need for the
bang.  This policy issue is a policy issue that's not for this
project.  And if it is a policy issue, it will have to be addressed
in the normal policy development process in my opinion.  And I am
only expressing an opinion.  I have the signal we have three more
minutes and we have remote questions.

 >>NAELA SARRAS:  A question from Chris Dillon.  He says:  The
concept "visually similar" is important for IDN variant TLDs, but how
should it be defined?  Ideally with a scoring system, question mark?

 >>DENNIS JENNINGS:   Thank you.  So how should the visually similar
be defined.  Cary, you want to take this?

 >>CARY KARP:   Well, I should think those are the reports that said
there is no way to quantify similarity have at the same time said
that that can't be supported by scoring, either.

 And the other studies that didn't draw any such conclusion I
suppose should be commenting on how they perceive it, and the extent
to which algorithmic support can make this something less of an
intricate issue.

 >>JAMES SENG:   So in the Chinese case, what we consider variants
are characters that look -- that has no visual similarity between the
two.  The traditional.

 (garbled audio) and they don't look the same.  And those are
considered variant.  It's not just visual similarity that's
considered variant.

 There are also cases where they look similar, but they may or may
not be variant depending whether if defined as Z-variant.  As I'm
(indiscernible) the document (indiscernible), but visual similarity
is not a clear-cut rule that says these are variants and not, for
Chinese, at least.

 >> Thank you, Dennis.  In our team, and I think this is obvious, we
had the result that the -- that there will be visual similarity
between the Latin, Cyrillic and Greek script.  And since these three
scripts uses a lot of common characters, there will be visual
similarity which will be in a greater extent if we include all the
capital letters and less if we exclude them.

 But we recommend that ICANN should address a common work between
these three groups in order to define the similarity, and may propose
something about the benchmarking, making a tool.  I don't want to say
now about making some grades or which is more visual, more similar. 
But anyway, this work has to be done between these three teams.

 >>DENNIS JENNINGS:   Good.  Thank you.

 Edmon, you have 60 seconds -- probably only 30 seconds now.  30
seconds, quickly, and then we are going to wrap.

 >>EDMON:  That will be enough.  I just want to do an advertisement,
actually.  All of the study team reports actually touch on the
subject of user experience, and one of the one of the sessions later
on today, at 4:00, the Joint IDN Group, I'd like to invite all of you
to go there as one of the things we would talk about is the
acceptance of IDN TLDs and that's a related issue.

 So that's it.

 >>DENNIS JENNINGS:   Thank you for the advertisement.  That's great.

 And I'd like you -- We have to close.  This room is needed.

 Thank you for attending, and could I ask you to show your
appreciation to the six work study teams for the work they have done.

 [ Applause ]