*** Disclosure: The following is the output resulting from transcribing an audio file into a word/text document. Although the transcription is largely accurate, in some cases may be incomplete or inaccurate due to inaudible passages and grammatical corrections. It is posted as an aid to the original audio file, but should not be treated as an authoritative record.*** IDN Variant Issues Project (VIP) Update Dakar, Senegal 24 October 2011 >>DENNIS JENNINGS: It's 9:15 so we will start. It's my apologies that this meeting is on at a time when you didn't expect this meeting to be on, because there are a lot of changes in the schedule. (no audio). People's control so we just have to live with it. The way we are going to conduct this session is I am going to have a quick run through the project and where we are. Sort of the organizational (dropped audio). To update you as to where they are. We'll take one or two questions, if we have time, after each presentation, immediate questions, and we will have a question session at the end (dropped audio). So let me quickly give you an update on the project. Why are we doing this project? Well, there has been a longstanding request from the user community for IDN TLD. (dropped audio) next slide, if you would. Next slide, if you would. And that gave us the authority to move along. Now, what's the situation today? Well, applicants for -- (dropped audio). Project is two main things. To create a glossary of terms so that everybody who is using the terms is using the terms in exactly the same way. This is one of the significant challenges in this project because -- (dropped audio). So what do we do? We established case study teams based on community experts. So this is a community-driven project. So we have six case studies teams. (dropped audio). Their work in early October, and end of September was the deadline, but most nearly got there, which was a tremendous achievement. And the reports have been published for public comment and translations of the reports are under way. And there is the URL there. (dropped audio). So thank you, teams. Public comment, please contribute. The closing date is the 14th of November. It's not too far away. There is the URL. We do need, the teams need, the case study teams, need your -- (dropped audio). The second phase which we are starting now is called the integrated issues report phase. And the third phase, which we haven't sorted out exactly what it is and how we do it, is the development of solutions so -- (dropped audio) -- issues that are identified in the report. So next slide. We have formed a coordination team who are representatives of the case studies to be our advisory team. And there is a team of ICANN staff -- (dropped audio). -- is advisory to us. Because this time, next slide, please. This time, the report is an ICANN report, assisted by the coordinate- -- (dropped audio). Because we have only studied six case studies, and there are others, other scripts, language combinations in the world, and more work is going to be needed. So we're hoping that this will provide a guideline for the development of additional case study reports. (Dropped audio). Issues according to type, whether they are technical issues or policy issues or other types of issues, and attempt to prioritize them. Which issues need to be addressed first, which can be shelved and maybe not addressed in the first phase. (Dropped audio) Areas, technical area, whatever. Identify issues, areas where further study is needed, and document the level of support for the conclusions. So as part of the general -- (dropped audio) -- use that conclusion, and again to develop some - - and I put it in quotes, because I'm not quite sure what the form of it is, develop some guidelines for additional cases. The key stakeholders are -- (dropped audio). And of course at the potential TLD requestors and applicants, the most important people there, the stakeholders in this documentation -- in this activity. Next. So how are we going to do this? (dropped audio). As advisory and the coordination team in that role. We'll have a Wiki. The coordination team will meet weekly by telephone conference. We had our first meeting on Saturday, an all- day meeting, and this is -- (dropped audio). Comment closes. That's the closing date for public comment. The case studies teams, the existing case study teams do the analysis of the comments. We're not asking the -- (dropped audio). -- to produce the first draft on the 2nd of December. Um notice there's not much time between these dates. We tentatively are planning a two-day face-to-face meeting on the 8th and the 9th of December. (dropped audio). We're on a tight schedule here. We have a last call on the report on the 15th of February. On the 20th of February we publish the integrated issues report based on the public comment. (dropped audio). -- easy to achieve this schedule. In addition, as part of this work, in parallel with this, we are doing a variant survey of variants at the existing variants. (Dropped audio) -- to various people but we want to get as much information as possible about IDN variant implementations that are already there as input to the work of the coordination team. So how do you stay informed? Well, there's -- (dropped audio) -- or have a say or make a contribution. So that's how we stay informed. The community Wiki where generally, as soon as possible, we publish the material that is developed, so you could see the materials there. (dropped audio) -- in case somebody wants to actually ask me a question. I see no hand raised. Can I ask the first of our presenters and remind the presenters that we are looking to get the presentation and some questions done in the sort of 12 to -- (dropped audio) >> SARMAD HUSSAIN: Thank you, Dennis. So I'm going to summarize the discussions we've had -- (dropped audio) -- from Africa which use Arabic script. So we are actually looking for -- (dropped audio) But more intricate difference is that, when you write different letters, so this is an example of the same letter written four times - - these letters join up (dropped audio) -- the first task which we had was we really -- the focus of the project was what should be the issues, what would be the issues related to variant (dropped audio). So one of the first tasks which the committee -- the case study undertook was to actually go through the Arabic code page from Unicode. (Dropped audio.) Something in finality we do need some feedback from language communities which are not represented on the case study team. So we are looking for some more input. Next slide, please. (Dropped audio.) Then we've had some very extensive discussions on this, is how to deal with multipart labels. So I'm saying multipart labels because, when we write Arabic script, as I -- (dropped audio) -- but that's not how the script is written. This sequence are joint forms from the broken at certain places, which I'm referring to as part. But, again, we normally call it -- (dropped audio.) As far as the user expectation is concerned, they expect these breaks to be present in the script. And they would want to or expect these breaks to be present as part of the labels as well. (Dropped audio.) -- at is indicated by the third line. And we discussed with -- the possibilities of doing that, which includes possibly hyphens or non- joiners. (Dropped audio). -- rather than anything else. And there's some examples in the rules. Second set of -- and then there are interchangeable cases where the glyphs are -- look like -- they don't look like -- look similar. But the communities consider them as similar, and they use them interchangeably. So that's the third case. And then final option cases are vowel marks, which are sometimes written and sometimes not written, and, therefore, with or without -- strings with or without these vowel marks are perceived the same. So next slide, please. There was -- and then, again, detailed analysis of these is available in the report with all the relevant tables. These, eventually -- characters and their variants need to be represented in the form of what is normally referred to as a language table. But we are not quite convinced that it should actually be called a language table because sometimes it refers to more than a language. It represents 2 to 3, or 5 to 10 languages in a same table or sometimes all languages, so it should be a script table in that case. And then second thing is that it's not really a table. Table is an implementation detail. It actually contains information on how labels are generated -- variant labels are generated. So what we are suggesting is it should be called a label generation policy and whether there's a table or XML file should be left to implementation. And it would contain not only characters and variant level information but also rules, formal rules on generating labels and any other meta level information like what languages it represents and so on. We do need an automatic process to enable this. The reason for having an automatic -- requiring an automatic process is because number of variants can be very large in Arabic. And it's just probably not a good idea to produce all these variants manually. And then there is also an issue of whether the variants should be character level or position level. Again, this is something which is -- both are possible but we need to really decide on this as a community for TLDs. One more thing which came up regarding variants is that a variant actually may have -- next slide, please -- was that variants can actually be at many states. So a variant is, first of all, identified as a set, variants are identified as a set. Some of them can be allocated; others can be reserved or blocked. Then, once -- those which are allocated can be activated. And within activation there is some are delegated and some are just mapped. And those which are delegated, one of them is going to be a fundamental label. So it's slightly confusing to see how this happens. And then, once you're in operation, then over time you would want some reserve variants to be activated and some activated variants to be blocked. So we really don't know what are all the possibilities, and we really need to figure those out and eventually have an implementation policy around that. Next slide, please. WHOIS lookup is also going to be an issue. Because, when you have variants at multiple levels, you could have a very difficult situation in -- when somebody has a set of variant labels in the domain name and looking it up. One of the solutions we've recommended was that anything which actually does not have fundamental sequence in it, fundamental label sequence in it should be mapped to a fundamental label. And, when you actually call in the fundamental label, it should give you the WHOIS information. Next slide, please. We also discussed whether additional variants should actually have additional fees. And we really thought about it and -- you know, the point was -- the basic question was whose responsibility is it. And, if you're talking about gTLD process, the variants are not because it's requirement for business. But it is, actually, in a way an accident of encoding and linguistic traditions. So it's not the gTLD businesses which are responsible for variants but how -- and Unicode's encoded Arabic script and how Arabic script generally works. So these -- we do recommend that, in the interest of the users, this -- there should not be extra fees to label variants. Next slide, please. And there is, of course, end user issues. Once variants are enabled, there's going to be variations in keyboard, depending on whether you're in Iran or Pakistan or Saudi Arabia or Egypt. Those kind of issues come up. There are font variation issues. There are display variation issues. There's a host of other issues which we need to look at on application layer and operating system layers to resolve to implement these. And that information needs to be passed on to the relevant communities to react. Next slide, please. So that's sort of an overview of what we went through. As I said, details are available in the report which is published online. This is just to acknowledge the people who were part of this effort. I'll not go through the names. These are posted online. So thank you very much. And I'll take any questions, if you have any at this time. >>DENNIS JENNINGS: So, Sarmad, thank you very much. Are there any immediate questions right now? We will have a question and answer session at the end. Is there a question that someone wants to come up and ask right now? Let me repeat what Sarmad said. I've just been indicated we have -- excellent. We have a remote participant question. Naela. >>NAELA SARRAS: Okay. We have a question from Mr. Raymond Doctor. And he's still typing. So he's saying, "I don't understand why a keyboard is an issue. The keyboard is only a means of inputting, and different keyboards will and should map to the same storage." >>SARMAD HUSSAIN: So I didn't get your question completely, but the gist of it is why the keyboard is an issue, and I can respond to that. One simple issue is that -- I'm just going to -- if you can go back to the second slide. Yeah. This one. Next one, next one, please. Slide number -- sorry. Slide number 5. Okay. So, if you look at the top two rules, you see exactly the same sequences. But there's a "not equal to" sign there, which means that, if you're in Saudi Arabia, the keyboard you have is probably going to -- the Unicode sequence you're going to type is what you see on the left side of that "equals to" sign. If you're in Pakistan and you type the same thing, you get what is available on the other side of the "not equal to" sign. So for a user, when they type it, they'll look exactly the same. If variants are not enabled, they will -- one will resolve and one will not resolve. And that's because the encoding on which these characters are mapped are different from keyboard to keyboard. And the keyboards we use in Pakistan versus keyboards we use in Saudi Arabia are different. So users -- because they look exactly the same, users will not be able to identify why it's not working. But it's not working. It's not resolving. So, if the variants are not there -- if variants are there, they'll be mapped on to each other or both will resolve. If variants are not there, user will think that Internet is unstable. It works in Saudi Arabia and stops working when you go to Pakistan. So that's where the keyboard issue comes in. >>DENNIS JENNINGS: Thank you. We have a question here, please. I'm just going to take this question, and we'll leave other questions for later. But please go ahead. >> My name is Hamid Zarandi (phonetic.) I'm actually the ccTLD operator for dot Emirate, Arabic IDN domain names. I went through the report. And I would like, first of all, to thank you for the great effort you've done to identify all the IDNs issues, which are actually very important for the end users. For us, we'd like to emphasize on the importance of adopting .3.3b, which is the case of interchangeable case, to be considered as a very valid and solid variant case. Speaking of Arabic language in particular, alternative string is a common practice within the Arabic Internet users communities. Although the condition of using alif with hamza above or below is well-defined in the Arabic grammar, linguistically known as hamzat qat'or or hamzat wasl. Such condition is not widely implemented by the Internet community at large. So users, on many occasions, they use their own characters due to a lack of grammar awareness or simply for the sake of typing simplicity. So they're using the alif without hamzat. This is the culture typing behavior which we cannot change it, unfortunately. So I would also like to emphasize on this point. The second point, which is within the same scope, that, when it comes to the interchangeable case, although it is mentioned properly in the document, however, we don't see any appendix which describes the table. That much is undefined what are the variants of the word "alif" in terms of Unicode point similar to other cases like the similar cases or the identical cases. So we'd suggest to add an appendix which describes the interchangeable case in the report. Thank you very much. >>DENNIS JENNINGS: Thank you very much, indeed. Will you submit that as a comment in the public forum? Thank you very much, indeed. Thank you for those questions. We'll have more questions later on. I'd like to move on to our second presentation on the Chinese case study report. And James Seng is going to give us an update. >>JAMES SENG: Good morning. Good morning. Okay. I'm representing the Chinese case study group, case study team on the variant group. The team comprises many members and is led by Lee Xiaodong. Unfortunately, he's still on his way here, so I'm giving the presentation on his behalf. First, on the scope of work of the team, the team focused on the Chinese variant level and only at the top level. But we do consider - - understand the impact to the lower level. And we refined that in the report. But the priority and the issues that we consider mostly focus on the top level. We also look at certain user expectations on how top level and the variants is going to work at the top level. And we discuss that in the later slides. While we focus on Chinese variants, Chinese is a very tender word (indiscernible) -- in the sense that Chinese characters are also being used by Japanese and Korean. So for the working group team we include both experts from Japan and Korea. We include them, so they can give us -- provide feedback on the implication of our variants in respect to their language. And then, lastly, we focus on Unicode. There are many characters in Chinese that are still not included in Unicode. But we only focus on those that have been included in Unicode. So, as I mentioned, Chinese characters being used across Chinese, Japanese, and Korean. In Chinese we call it Hanzi. In Japanese it's called Kanji. And in Korean it's called Hanja. Technically, it's called a Han script, not a Chinese script. And in Unicode we call it a CJK unified ideograph. In unicode there's about 70,000 encoded ideographs at this moment. In BMP it's about 2,000 -- 26,000; and extension A, B, C, D we have an additional 44,000. The Han script itself are -- when we say ideograph, it means that there are actually graphic symbols that represent an idea or concept. It evolves from pictograph over time. Like cavemen we draw tiny hills, and that evolves into a Chinese character called a hue, as you see from the first example. The second characters actually evolve from the sun, which evolved to the last one, which is what we call group, which is the Chinese concept for sun. But Chinese characters can be very complicated with many stokes. And it imposed an education restriction on literacy and so on. So in China in 1964 and 1986 they introduced a concept of simplified Hanzi, about 2200, 2,200 simplified Hanzi has been introduced, which involves various processes as defined in the 1986 report, which signifies component -- complex component. And then you do through complex component and simplified radicals, you combine them to form simplified Hanzi. So this resulted that we have -- in Chinese we have two writing systems but using one script. Both traditional and simplified writing systems are actually one Han script within the Unicode. Unfortunately, they're not 1:1 in the simplification process, as example given on the left-hand side. The concept of prosperity and the concept of hair in the traditional characters are simplified to a different -- to the same character in the Unicode, in the process for Chinese simplification. Japanese also use what they call Kanji, and they were imported from China. This is in addition to hiragana and katakana, which is the is phonetic alphabet that they use in Japan. And Japanese also have simplification. They call it shinjitai, which is derived from kyujitai, which is the old character form. For those who understand Chinese, you see some of this simplification process that was done in 1923-1949 looks very similar to Chinese simplification. The answer is, yes, it is very similar. Because the Chinese actually adopt some of this simplification process in the 1964 process. But they're not exactly the same. So they introduce simplification process, but it's not exactly the same. Both shinjitai and kyujitai are being used interchangeably, to a certain extent, in Japan, except in the case where it's being used in nouns to describe names and places. And, because domain name normally generally refers to names, therefore, the new and old form are considered distinct in the domain name. And this has been reflected by the dot JP policy in dealing with Kanji. Korean also use the Han script. They call it Hanja. This is in addition to Hangul. Modern Korean use Hangul. It's a writing system designed by King Sejong in the 15th century. But it's a phonetic script. It is very (indiscernible), but it's phonetic. Whereas the Hanzo was introduced from China in the Tang dynasty, and that was being used mostly to learn classical Chinese. Most of the Korean people still have Chinese name, although they don't use it much anymore. And effectively of April 14, 2011, there is a legislation called all government documents can only be written in Hangeul, and Hanja is not allowed unless by President decree. This is one step that they are trying to pull away from the hues of Hanja within the Korean community. And, therefore, if you look at dot KR, the policy for dot KR, they do not allow Hanja in the registration. As a side note, there are actually simplification process within the Korean for the Han script, and it's called Yakja, which is also in Japanese, by the way. So in our group we cover CJK, and what we define as Chinese variant. We define variant in a very narrow form. We define it as characters with different visual form, that look slightly the same or maybe totally different but they have the same pronunciation with the same meaning, and therefore can be used interchangeably within the community. So what is meant by Chinese variants is simplified and traditional writing system. And in Unicode, there's a certain aspect called Z- variant which is characters that looks almost the same, but which (indiscernible) unified within Unicode, but because of certain technicality for round-trip compatibility with the local encoding, it is being assigned different codepoints like the hu (phonetic) and the huang (phonetic) which is (indiscernible) shown in the slides. We do not believe the spelling form or the form of the characters is a variant and we don't believe that translation, transliteration or using the (indiscernible) are considered variant. So when we talk about Chinese user expectation. As I mentioned, Chinese user use simplified and traditional as equivalent and interchangeable. And from our data from dot zongua (phonetic), in both traditional and simplified, the data has shown that we have DNS query to both of them, not on code basis, obviously. Chinese use mostly simplified, but there are about 10% of the DNS query coming for the simplified -- the traditional version of China. If you look at variants in the zone file from CNNIC, TWNIC, and HKNIC, that's between 70 to 80% of the registered IDNs have variants. So variants are very important to Chinese, and without variants, it cause a lot of problems and confusion within the community. Because Han script are still heavily used between Chinese and Japanese, and Korean has more or less opt out within the process, so if we look at certain implication for the Chinese and Japanese community if they use variant at the script level. At the script level, if you do at the second level, we have the top ccTLD or the top level as certain contextual indicator, like (saying word) which is (indiscernible) and the same character with variant under dot CN is expected to be variant and been delegated to the same registrant. But in Japan the same character are considered distinct and given two characters, although they are actually (indiscernible) within the Japanese community. Unfortunately when it comes to top level there is no contextual indicator, and therefore, we went through a long debate and we both agreed there should be a more conservative approach. And we believe that we should -- let's say someone apply for the association in Chinese, would result in multiple variants, six or seven variants to be reserved. That's okay. It doesn't make a lot of sense for the Japanese, for most of them. It doesn't make sense for Chinese most of the time anyway. But one of which is traditional, and we would like that to be delegated, too, as part of the process. So other than that, we went through the various issues on the implementation for the variants. And because consideration was a language variant table, we believe there needs to be a Chinese variant table for the root, and we believe that it should be based on language, but that's under consideration and these are issues that we need to discuss. There's also consideration on process of how do we define these variant tables and what the format the standard that we adopt for the variant tables. We also look at certain process on evaluation, allocation, delegation and operation of the Chinese IDN variant at the top level. How do we deal with string system later, conflict with geographic names, discovery of variants. How to deal with contention. What if application A and application B collide because of the variant collision. What if application A collide with a variant of application B but application B don't really care about the variant. These cases where we need to look at and how do we resolve those issue. We look at allocation issue and how do we actually activate and delegate domain names, how do we reserve domain names, how do we block them. And we look at the delegation process. We also consider whether there should be the technical manager to do delegation where there is a CNAME, DNAME, NS. And then whether, of course, like the Arabic, we look at the fees involved in the application. We also consider certain impact to the other organization, that includes IANA, root server, registrars, registrants, DNS providers, and software application. And the idea of IDN variant at the top level has various implication and consideration for them that they need to implement. And roughly, that concludes my presentation on the Chinese case study. >>DENNIS JENNINGS: James, thank you very much, indeed. Are there any immediate questions? I think I see a questioner. It's on. Yes, it is. >>KENNY HUANG: Okay. Thank you, chair. My name is Kenny Huang, board member of TWNIC and executive council of APNIC. I only have two questions. My first question in the previous presentation from Arabic study, there is a recommendation, there is no extra fee for variant registration. But I didn't see the recommendation from Chinese study group, so probably you can comment on this a little bit. The second question will be I know there's some other extra work, such as RFC 3743, also James and I also contribute a little bit on the RFC, and (indiscernible) mention about technology and registration policy and process. I didn't see the registration policy was mentioned in the Chinese, in the reorganization. For example, registrant policy. So can you also comment on that. Thank you. >>JAMES SENG: Okay. So on the first question, the reason it's not been clear in my presentation is this is meant to be an issues report, not recommendation report. So we identify issues, and then the issues will be posted to the community for comments. And then later they will come up for recommendation. But within the group, we do have certain discussion and it's been reflected in the document. We take very similar position with the Arabic which is at the top level we do not believe that there should be additional fee for the variant because it's not at any fault of the applicant that they have variants. But that's our initial discussion and consideration. As for the second question on the standard that's been adopted, Chinese has been working on this for the longest time. We have RFC 3743 for eight years now and its implementation by China, Japan, Korea. And then we have a more well-defined RFC 4692 -- 4691, I think -- correct me if I'm wrong -- that we look at Chinese registration specifically. 3741 deals with Chinese, Japanese, and Korean where China do the 4691. And later on another RFC that takes the concept from 37 and 43 and then put it into a more well defined that cuts across different. And because of the different standards that's been adopted, they are not -- although they are not in contention with each other, and of course the current RFCs is more or less suitable for Chinese already, but there are also consideration from ICANN that there needs to be more generic that cuts across different language. So we need to look at how the tables are going to expand or take into consideration the language like Arabic and then how do we come up with a unified, central format. So this is a consideration, again, and no recommendation at the moment. >>DENNIS JENNINGS: Thank you for that. And we will have time for more questions later. In case it's not obvious, we are going through the six case studies in alphabetic order by the name of the script, so just to clarify that. So the next one after Arabic and Chinese is the Cyrillic case. And may I ask Vladimir Shadrunov to present his case to the board. >>VLADIMIR SHADRUNOV: Dennis, thanks. Thank you all for coming here. My name is Vladimir Shadrunov. I am a member of the (dropped audio) for the VIP project. I am just going to turn the slide, please. On the screen you will see there the list of members that constituted the Cyrillic case studied team. People from various backgrounds, registries, registrars, ccTLDs, gTLDs, academic environment, linguistic. We had good support from ICANN staff and external experts in the protocol level and the registry operations. The team met since Singapore. We met weekly, and we also had a two full-day meeting in Paris in September to finalize the report. I'd like to talk a little bit about what distinguishes Cyrillic script among other scripts. First of all, Russian is -- constitutes about 60% in terms of number of speakers, but apart from Russian language, there are about 60 other languages that use Cyrillic and they all come from various language groups. Therefore, although the team had quite a good representation in terms of members of speakers of different languages, it was not physically possible for us to cover all the languages that use Cyrillic. This is one of the issues we found. The feature number two about Cyrillic is that there is nothing in the script itself that can provide some form of inherent variant relationships between characters. It's unlike in Chinese, for example, where they have traditional and simplified. There is no such situation in Cyrillic. We do have variant cases, but they occur only on the language level. What's interesting about that is that variants that occur in one language, variant characters that you can see in one language, they do not have the same variant relationships when they occur in other languages. That's something -- That's an issue that should be further considered how to deal with it. Next slide, please. The very important issue with Cyrillic is that there is a vast array of characters that are visually identical or very similar with characters from other scripts, such as Latin and Greek. You will see a few examples on the screen. In fact if you go to Wikipedia and look up homographic attack, you will see the very first example of homographic attack was made up by replacing a Latin character with a Cyrillic character in a well-known trademark. So in the report, you will see we tried to come up with the extensive list of characters that may be visually confusable. Next slide, please. We identified some other cases where variant issues might arise, such as YEH and YOH. In Russian language and a few others, which I will comment separately. And our observation, one funny observation was that Cyrillic space seems to be a little bit more politically charged than other spaces. And this might have some consequence such as spelling reforms occur more often than in other languages, other scripts, with new characters appearing in the languages. For example, the Ukrainian language have GHE, a character called GHE and another character called GHE with an upturn. This is something specific for the Ukrainian language, but the Russian language speaker would probably not even recognize the different between those two characters. There is another interesting situation. In the Ukrainian language, they have a character called apostrophe and that apostrophe is different from the apostrophe that you can find on most keyboards. It's a different Unicode codepoint. So in terms of variants, in many cases, this apostrophe can just be omitted. So it's kind of a variant with nothing. Variant with zero. And the Ukrainian Internet community indicated to us that this -- it is very important for us -- for them that the apostrophe letter is included and is part of the domain labels. So some conclusions that we came up with. First, the current ICANN policy is that visually confusable labels should not be ever delegated. We think because of the number of situations where Cyrillic and -- Cyrillic characters and characters from other scripts are identical, we think that it's very important that this policy should be preserved with ICANN, and visual confusability test criteria should be maintained. The second conclusion, we recommend reserving all other variant- related cases. What do I mean by reserving? It means that where two labels are in variant relationships, if -- if the variant label is ever delegated, it can only be delegated to the same registry operator. The third conclusion. We did not identify any cases where parallel delegation would be an absolute must from the user experience perspective. That simplifies the Cyrillic case a little bit. So although some of the team members had the opinion that this could be a good idea where to provide some kind of aliasing, alternate names approach, but the general sense of the group that it is not absolute requirement. As variant cases occur on the language level only, we think that to account for all the cases with variants, a root variant table will be needed. The last conclusion is that we recommend that certain codepoints that are not a part of the Cyrillic script and Unicode should be allowed, specifically allowed for Cyrillic TLD labels. There are two characters that share that feature. That would be the Ukrainian apostrophe that I mentioned and the one diacritical mark combining a (indiscernible). I think that's it, and if you have any questions, do not hesitate to ask. >>DENNIS JENNINGS: Thank you very much, indeed, and bang on ten minutes so that's excellent. Any immediate questions? Okay. There will be an opportunity to ask questions later on, so thank you, Vladimir. Let's move on to the Devanagari case and we have Akshat Joshi to present the case. >>AKSHAT JOSHI: Thank you, Dennis. First of all, let me thank ICANN for inviting us to become part of the Devanagari case study team. And with the background of the IDN project that was -- we were already working on with the Department of Information Technology, we could definitely participate well. This was also a very enriching experience for us. Next slide, please. The slide which you are seeing currently is the Devanagari case study team. You can see we are a fairly large team with expertise from different areas. Our case study coordinator, Dr. Govind, could not be present here physically to present this so he sends his apologies. I am Akshat Joshi. I am a team member of the case study team. Next slide, please. Let me take you through the structure of the report. So these are the main points. First one is the basic postulates. The basic postulates are the main assumptions which we have done while designing this report. Let me give you a brief background about this, why we had these in the first place. Is, first, when we created the first draft report, and we had a first face-to-face meeting back in (saying name) after this case study started, the report was very elaborate. It had many issues listed in it. But in the face-to-face meeting, we came to know that there can be many things which can be excluded, because there are certain restrictions that IDN in 2008 itself imposes, then there are certain restrictions that gTLD Applicant Guidebook would impose. And with that in mind, to make the main points get focus, we introduce this basic postulate. So some of them are like Unicode has been taken as a base. So Unicode, with all its normalization rules, NFC, NFKC restrictions, then only those cases which are excluded by this will be considered. And some of these points form a part of the basic postulates. And then that is followed by overview and evolution of Devanagari. Some of you might be aware Devanagari is a complex script and it has characters which change shape when they are joined what have so what happens is Devanagari is not restricted like Latin that has only 26 characters or basic characters, but Devanagari has a fairly large Unicode table. So what happens is with that (indiscernible) set of characters, even after imposing IDNA restrictions and gTLD Applicant Guidebook restrictions, one character set remains is really very large. And when they join, what happens is the shapes that can be formed become too large. So the analysis become really rigorous in this sense. So there is a brief overview of how Devanagari is formed. And a brief sketch of writing systems of the -- writing system of the language, that forms part of the, like, there is a concept of syllable in Devanagari. So that concept has been elaborated in that part. Then issues and extraneous issues will be taken up in successive slides so I will skip them for now. Then we come to a main part of registry and registrar perspective. Over here, there are issues like at registry we have something like a EPP protocol to be obeyed and there are certain issues that can come up in view of the variants. It's not a core part of the variant but, yes, these issues are important so they have been enlisted over there. Then we list of appendices which has larger information in it which are in connection to the main points of the (indiscernible). Next slide, please. So moving on to the part on the issues, the main issues at large. First one is the language versus script. So in Devanagari, which is a Brahmi-based language, there are many scripts that come under the Brahmi family, and Devanagari forms one of those scripts. So what happens is in Brahmi-based languages, we have something like one language can be returned in multiple scripts. Also, one script can cater to multiple languages. So these are the things that need to be considered when we are viewing it in terms of issues. Then there comes a part of variants which will be taken up again in successive slides. The second part, also issues related to software behavior will be taken up. We have whole script confusables. This is been separated from a part of variants, even though it is part of variants, it has been separated because these confusables are because of introduction of some other script, so they have been taken up separately. And then we have the case of 02BC character, which is called a modified apostrophe. This particular character has two issues associated with it. One is in relation to variant. The first one is that this character comes from a different code page from that of the Devanagari, which is 0900. So the script property of this particular character is common. And with the restriction that it cannot -- script mixing cannot be allowed. This character first does not come with the languages which are returning Devanagari (indiscernible) near this character. So if a decision is taken so that this character gets included, then the issue comes up like this: Looks like an apostrophe. So we just need to look into that. Next slide, please. Moving on to the main variant classification, we have two broad categories, and those are the confusingly similar single characters and confusingly similar composite characters. So this is the category which has single characters which look alike, and given the restriction of the URL bar of the browser, so to say, you can see that these characters can be confused with one another. But in dot IN policy in India wherein we are allowing IDNs in Devanagari, we have not considered this, but that being a second level, the restrictions can be more restrictive or something like that. But in gTLD we need to be cautious, so we have flagged them here as variants as well. Next slide, please. Then these are the variants which are the characters which are composite characters. So each of the three columns which you are seeing is a set of three Unicode codepoints, and those characters do not mean the same. They do not sound the same, but visually, they lookalike. So any competent user of the language will also not be able to differentiate them. So they have been given as the variants over here. Next slide, please. Then we have mouse issues, the issues which are very much related to the variants. The Devanagari being a complex language, it heavily depends on the rendering engine that is backed up by the operating system and the font that gets applied. Many times this is default font applied by the operating system. So the way it looks, you can see the variants that have been proposed are mainly given on the basis of their look. So the font that gets applied on that is really major thing that needs to be considered. Next slide, please. This is the next miscellaneous issue, and it's not categorized on that variant but under miscellaneous issues because it actually talks about a different script. That is Gujarati. The Gujarati has not been part of this particular case study. It has been taken under miscellaneous. As you can see, the first character, the first word, so to say, comes from the Devanagari script. The latter one comes from the Gujarati script. The only difference between them is the lack of the header line or (saying word), as we say. So when we see this again in the very small font, they look alike and can be confusing. Now some of these characters do form part of the Unicode's confusables list, but a manual check on the (poor audio) of confusables list of Unicode shows there are many more characters that should be identified under the DAG. So they have been given here. Next slide, please. Then there comes an issue of zero width joiner and nonjoiner. These are the characters which are given to get particular forms in the language. Those are invisible characters. So the issues associated with those characters have been flagged in this particular section. Next slide, please. So here is the link for the report. You can click on this particular link and read the report elaborately and we definitely solicit a good amount of comments on this so we can analyze the issues better. Thank you. I will answer any questions you may have. >>DENNIS JENNINGS: Thank you very much, indeed. Have we any immediate questions for Akshat and the Devanagari case study? Yes, I have a question. Good. Please come forward to the microphone. >> Hell hoe, I am Sala and I am from Fiji, and the IDNs are very interesting, and this is coming from someone who doesn't know anything about IDNs. But in terms -- I just wanted to ask to the panelists, in terms of the classifications and the confusing, you know, scripts and the challenges, are there some sort of guidelines on how they are going to actually be treated and that sort of thing? Just out of curiosity. >>AKSHAT JOSHI: If I understood your question correctly, you are asking are there any guidelines that talk about how the variants should be formed? >>DENNIS JENNINGS: Can you come a little closer to the microphone? >> Sorry. Are there any guidelines or are guidelines being developed in terms -- I understand that people are pioneering and that sort of thing. In terms of how it's going to be treated. I understood from what the panelists were saying that, you know, there are certain scripts that are politicized. I think somebody referred to it as politically challenged, and that sort of thing. So I am just interested to see the guidelines sort of thing. >>DENNIS JENNINGS: Thank you for the question. I think I've understood what you have asked, and one of the things that we're hoping to get out of this are some guidelines for other case studies. We haven't formulated exactly how that's going to be done. And I think this addresses your question, that the experience of these six case-study teams will be documented in such way that another set of users in another script can have a methodology for at least beginning to define what the variants are. Do you want to come back and supplement the question? >> Yes, thank you. I take it from your response that it is relatively going to be a subjective experience based on the different methodologies. >>DENNIS JENNINGS: No, it's not subjective. >> Okay, sure. >>DENNIS JENNINGS: There are -- It's as objective as possible. There are characters which are well-known to be variants of one another. That is their -- Same character with the same meaning with a different codepoint is an example of a variant. >> Right. >>DENNIS JENNINGS: And out of these case studies, there will be a number of definitions for variants which will be a guide for whatever script you are concerned about. >> Right. And so that answers my questions about the confusingly similar script perspective that the panelists addressed. The other question I have very quickly before I take my seat is in terms of -- like somebody mentioned in another forum about the word Kong. Kyong, Khong, Cong, Kong, Kong, Kong, Kong, and how you can have different languages but with the same phonetics. And would that be something that would be woven, as well, into the guidelines? >>DENNIS JENNINGS: That's a very interesting question which I am not going to answer but I will inviolate the panel to think about that and we will come back to that question at the end, definitely. Thank you. >>DENNIS JENNINGS: So I'd like to move on to the next case study report, which is on the Greek case study. And I'd ask Panagiotis Papaspiliopoulos to give the presentation. >>PANAGIOTIS PAPASPILIOPOULOS: Thank you, Dennis. And hello, everybody. My name is Panagiotis Papaspiliopoulos. I will make a presentation for the Greek case study team report and on behalf of the team. Here you can find -- on the next slide you can see the members of the team. And, unfortunately, our coordinator, Vaggelis Segredakis, could not be here. So I'll make the presentation, and you can see the other members of the team. So the Greek case study team developed this report. And it was posted for comments in the public forum. It was posted since the 7th of October. And here you can see the link. And we expect for your comments until the 14th of November. I think we have already one comment. And we expect some more. So the current -- I will -- I'll show you the structure of the report. First we have a rather long introduction and disclaimer. Excuse me. And afterwards is the definitions section. Some useful key points regarding the Greek language that readers should be aware of in order to understand afterwards our proposals. The proposed characters for registrations, the issues concerned, the proposed solutions -- there are two of them -- and the recommendations of the team. And at the end you can see the appendix with the table of the proposed load characters for Greek top-level domain registrations. And I would like to say we are dealing only with top-level domain registrations in our report. So here is the -- some definitions that we used in our report. We defined the homograph. Homograph is when two words or strings are written the same in the -- in different scripts. Homophone is when two words or strings are pronounced the same in the same or different scripts. Greeklish is a Greek word. It's not an English word. And we invented this word years ago when the majority of the applications could not support the full set of the Greek characters, the characters with the tonos and things like this. So it's the representation of the Greek words and the characters using the English characters. Nowadays users of this Greeklish has become less over the years. And aliased name and name aliasing is when a name can have two different -- domain name can have two different forms. And then the bundling domain names is when these different forms of the domain name is acting as one. Tonos is the accent mark that is used in nowadays the Greek language. The dialytika is when we try to separate one vowel from the neighboring vowel instead of pronouncing them together. And katharevousa and dimotiki are two different forms of the Greek language. Katharevousa was an older form. It was made -- it was made by the scientists even before the Greek revolution. It's a successor from the ancient Greek. And it was alive until the middle '70s when the Greek government changed it to the dimotiki. Dimotiki is the current form which is to be used by the people and since the middle '70s is also used by the state. And so let's see two useful key points regarding the Greek language. First is the Greek language question is the diglossia between katharevousa and the dimotiki. To other people who do not know the Greek history and the Greek language, this issue, this dilemma was very significant. And even people died for it. And, of course, as I mentioned before, nowadays the dimotiki is used. But many forms of the words that belong to katharevousa are still used. And another issue is the Greek orthography. It's the polytonic version, monotonic form. Polytonic is when the accents were used. I was taught polytonic grammar until the first grade of the high school. In 1982 this polytonic form was replaced by the monotonic form by the government. And you can see the example. It's the first lines of the Lord's Prayer. The first example is in the polytonic form, and the second one is in monotonic. As you can see, the first one, even if it's more beautiful, let's say -- it's more complicated. And, nevertheless, the monotonic form is now used. So next slide, please. And so the team proposed for registration characters that are only monotonic because these characters are used now for the spelling, the correct spelling of the Greek words. And the polytonic characters offer no significant advantage for the user for a top-level domain registration. And these polytonic characters can be used for the lower level registration according to its new Greek top-level domains policy. But we believe that it's for the benefit of the end user and, since there are no significant advantage to use polytonic characters, to recommend only monotonic ones. So, in order to formulate our proposals, we had to think about several issues. One very important is the sigma and final sigma issue. We have three sigmas in the Greek alphabet. It's the small one. You can see them in the examples in the slide. It's a small one. It's the small one, the big one, the capital one. And it's the final one. The final one is a small sigma, but it's the final small sigma because it goes at the end of the words. Like at the example. So, in IDNA 2003 protocol, it was a mapping from the small middle -- let's call it sigma -- to the capital one and from the final sigma to the capital sigma. But, when you had to reverse this thing, then from the capital sigma you went only to the middle sigma. And, as you can see from the example there, the result of this -- from this thing is not a correct Greek representation of the word. So, fortunately, in IDNA 2008, middle sigma and final sigma are different accepted characters and treated separately and reverse mapping is not possible any more. So, if you see the official name of our country Greece, Hellas, and you consider the IDN approach of translating this word into a domain name, you can see that Hellas has a top tonos and the final sigma. This is the normal writing of the word. If you write it without the tonos, you have another domain. If you write it in capital letters, even if they are not accepted in the IDNA 2008 protocol, may some application to load characters, you may have a word that is a middle sigma instead of final sigma, so you have a different Unicode. So, in our study group, we tried to meet the typical user experience. The typical user experience the user is asking -- expects to have this same result if he uses small letters or capital letters. So we concluded in our team that a word or a string without a tonos should be considered as a variant of the tonos, of accented version of the string. Other issues are the homographs. And here you can see some examples either in different scripts. Add the capital -- initial letters of the Greek regulator of the tonic communications market. In the capital form you can see it is very similar. And in small form, okay, you can separate them. But also you can have in Greek Athina and Athena. Athina is the name of the capital city of Greece. Athena was the goddess of wisdom. If you write them both in capital letters, this capital form is not accepted. So you do not understand seeing only this word for which you are referring to. Homophones is like when you pronounce the same. And here is the example of the Greeklish that I said many people are using English characters or numbers sometimes to represent the Greek characters. So, having all these issues in our thinking, where -- and having also the fact that the correct spelling and the correct form of writing a Greek word is to use the tonos and to use the final sigma. We will consider that the tonos accent and final sigma characters should be included also. And I have -- I have to say here that many times I have used the word "word" instead of a label or a domain name. That is because most of the times it's not only a string of characters. Most of the times a word is used in order to address the memory of somebody to the domain name. That's why we had to deal in our report with words in the Greek grammar. Of course, we understand that a domain name can have different characters or a set of words. But most of the times we dealt with the same word. So, in our case, as a variant, we have a form of capital letters and a form of accented letters. The given word in katharevousa and dimotiki and the monotonic and polytonic form, then variation if there is a final sigma or if there is not a final sigma of the exact position or if you change it, and some changes that happen in the Greek words, according to the Greek grammar. So these -- here is the two proposals from the Greek team. One -- next slide, please. One is called the variants proposal. It's the proposal that includes variants. The accepted characters for this proposal are the small, monotonic characters include the characters with tonos, dialytika, and their combination. The domain name will be accepted at exact requested form. The same domain will be allocated to the registrant stripped of accent marks and final sigmas. The same domain will be allocated to the registrant stripped of accent marks but retaining the final sigma at the exact position. Alternate position of tonos is not allowed. Alternate position of final sigma is not allowed. And the same meaning of this word in -- next slide, please -- in katharevousa and dimotiki are not allowed. The registrant, the person who applies for a string for the word, can choose only one form. And the others should be excluded. And two options for handling the allocated variants, the first one is to enter the zone as DNAMEs, and the other is to be treated as FQDNs and registrant instead has to make it a recommendation. The second proposal is the small characters proposal is included in the first one but it's without the variants. We are aware of the lack of the technology because in DNS you can have DNAMES, but in other protocols these techniques are not 100% successful, let's say. And there is a way for this. So the accepted characters are the same as before, but their only accepted form is the form that is originally submitted by the applicant with the tonos and the final sigma in exact positions. All the other variations, the variations without the tonos, the variation without the final sigma, the variation without the final sigmas is not allowed. And also is not allowed the different form of the dimotiki and the katharevousa. So taking into account the advantages -- and can I have the next slide, please? Thank you. Taking into account the advantage and disadvantage of each proposal, the average Internet users experience neither expectations. As I said, typical Greek Internet user expects to have the same result when using a capital or small letter form. And the current protocol status acknowledges limitations. And, having in mind that ICANN itself recognizes the need to have variants in order for the language to be represented as the native speakers use it, the Greek case study team recommends as the most appropriate solution the variants proposal. And so the next steps is that, as I said in the beginning, we are waiting for your comments until the end -- until the 14th of November. We'll analyze them, and we'll determine how to address them in order to revise our report. And then we believe that is a useful next step is that the Greek and Cyrillic and the Latin study teams must direct together to identify the cross script issues that might be the same in the Latin, Cyrillic, and Greek characters registration. Sorry for talking too long. I could talk more. No, that's fine. >>PANAGIOTIS PAPASPILIPOULOS: Now, I am waiting for your questions. Thank you very much. >>DENNIS JENNINGS: Thank you very much indeed. Maybe we'll hold the questions for the Greek case, because we're running a little beyond time. We will take questions at the end on the Greek case. And let's go to the final case, the Latin case. Can I ask Cary Karp to give the presentation. >>CARY KARP: Thank you, very much. Just noting my name is there because I'm the one to be blamed for anything wrong with this presentation. It was a team that prepared it. And at the final slide we'll point you in the direction of the full list of participants here. All righty. Next slide, please. One of the things that makes life easy for us and also something that makes it difficult for us, the Latin alphabet is at the core of a larger number of writing systems than is any other in current use. Next slide. The basic 26 letters of it, which are not adequate for as many languages as you might think, not even the English language, are these. This is what is important, encoded in ASCII, which has been around since the first days of the domain name system. Next slide. Next slide again, please. All right. The English case. There are two ways in which non -- extended Latin letters appear in the English language. And it's a general situation. Lots of languages are the same. The first is the word näiveté, which is very commonly typeset. The house rules for many publishing houses, if not most publishing houses, want to see at least the dots over the A or the accent over the E or both. And that's generally regarded as decorative use of diacritical marking. However, the second case, the difference between the word "resume" and the word "resumé" is significant. Adding those diacritical marks changes the meaning of the word. And it is, therefore, contrastive. That's an important distinction. If we continue now, just taking one of the vowels, one of the basic vowels, these are some of the ways in which it can be used validly in IDNs. There are a few other decorated Os, using a term that I don't really like. But we don't have them here. And, if you take a good look at those, you can easily see that they are, in fact, separate and distinct. If we were to reduce the size of this -- I don't have a pointer here. But, if you take a look at the second row here, the left-most character there and the fourth from the left, those are two diacritical marks that are clearly distinct that become less distinct as one reduces the size at which they're displayed. However, -- and this is the important thing -- to the people who use the one or the other, the distinction is immediately obvious. And to people who use neither the distinction may be irrelevant. Okay? The next slide. This is one of the consonants. The same deal here. And, in fact, on the bottom row there, there are letters that are derived from "h," but they're not. The left-most, the bottom left is a hang, which is, in fact, an "h," and then a sound that exists in English -- ng -- but is represented with two letters and in others is represented with an ng, which is an "n" combined with a "g." So you have the ascending portion of an "h;" you have the body of an "n;" and you have the ascender of a "g" taken into a single glyph, as it's called. And that's, again, very, very important to the communities that use them. And, if you don't understand even what the language is, that a label includes this, it's possible, if not outright likely, that the documents that it leads you to will be similarly difficult to parse. Okay. Next slide. And this is what I just said. The distinctions between these forms are a fundamental -- sorry. I need the second part of it. However -- and this is the crucial point -- a community that uses -- that doesn't perceive a particular distinction between two forms of a letter will regard them in one way. And some other community where that distinction is central -- we've heard it in preceding presentations -- we can't disregard them. So, to the extent that the way a potential variant situation is dealt with with language A is at fundamental odds with the way that same difference is dealt with in language B, precludes any general rule about how to manage those variants. Now next slide, please. In the Latin script, one of the key issues is decomposition of a marked letter. Can you represent a marked letter with two unmarked letters that are both in ASCII, or what is the alternate representation otherwise? Next slide. These are Swedish words. The umlauted O in Swedish is not a diacritically marked letter. It is the 29th letter of a 29-letter alphabet. It is not a marked O. It is not regarded as that. Nobody sees it as that. The first word means the north, the region that many people call Scandinavia. And it's perfectly conceivable for Norden to be a regional TLD proposal. If we were to decompose that O the way a German might into an OE, we have something undefined. I'm not sure that a Swede would even read that correctly, recognize it as a Swedish word. And, if we do take the correct Swedish fallback representation -- if you do not have access to the oo, then you just do without the two dots and you write an O. It is not correct orthography, but it is contrastive. The third word is what you get when you have that -- what I'm seeing is I'm giving this backwards. Patrick and Stephane are -- crawling in their chairs. I've swapped this. I'm not looking at my slides. Okay. The bottom one, norden is the name of the district. Oh, my God. This is embarrassing. The top one, Nöorden is the word "nörd" in Swedish. So, if you take the word "nörd" and strip it of its diacritical markings, as we would see it, you end up with the name of the region. And those are two utterly separate concepts, despite the fact some fool sitting here who identified himself as responsible for any slips in this presentation has now committed a really big one. Next slide, please. Okay. A single writing system can have multiple rules dealing with alternate representations depending on the context. This is truly crucial when dealing with proper names. Next slide. For example, the names -- let me do this right. The author Goethe would never, in any orthographic form, be written with the umlaut over the O, although a Swedish name in something of an archaic form would certainly be written by that. And removing the two dots from the Swedish form gives you nothing -- expanding it to an OE gives you a claim of literary skill that isn't -- that doesn't exist. So, again, a German would certainly regard an umlauted O as decomposable to an OE, although a Swede wouldn't. And, under some circumstances, a German would regard combining an OE into an umlauted O as acceptable but not in all. There's an asymmetry to this. There is difference in approach within closely related languages. So, considering a broader linguistic community than that still is -- it's an impossible situation. Next slide. And that's what I just said. There may be some meaningful concept of variants that attaches to the Latin script. But it is not capable of global quantification. Okay. Next. However, in a local situation, it is actually quite possible to do this. So a Swedish character table defining an IDN permissible repertoire can reflect Swedish orthographic rules. The same goes for the German. Okay? Continue. Remembering that the graphic distinctions between one decorated character and another are visible to anybody, if magnified enough and visible to those for whom they're important at any degree of magnification, we considered a situation where two different codepoints, two different values in the Unicode system might be ascribed to the same character, the exact same thing, not similar, but identical. The display form is the same as the other one. And the other situation where the same letter might have two different codepoints. And we found exactly one such situation. Next slide, please. Keep going. Okay. And that's this: The Latin small letter turned E and the Latin small letter schwa -- schwa, by the way, occurs frequently in the English language, although it's never noted as such. One of these is used in many African orthographic systems and far northern orthographic systems. The other is used in academic discourse about alphabets. But, in any case, a writing system in which one of these appears is not going to include the other. However, the root zone is supposed to support all communities used in all writing systems. So here we have the case. If the one of these appears in the root zone, it would probably be quite risky to allow the other to appear in the root zone. All righty. Next slide. We, as with our Cyrillic and Greek associates, noted that there are similar -- there's similar glyph sharing across our language and boundaries. But, because that was at the outset, not part of VIP study, we thought it would be most useful to defer it until we got to the stage where we are now. And that is discussing how to merge the sets of -- the six sets of issues into one consistent issue report that we then put forward. All right. Next and final slide. You will see the full lists of all the available Latin codepoints divided into two tables. The ones that simply have to be made available without need for further consideration, and the others that are, in terms of the protocol, available but someone would need to demonstrate true need and true warrant for their appearance. And here in the full report, which we hope you will read and comment upon in the forum, you will find the entire list of the people who were the members of this study group and what their roles in it were. Again finally, profound apologies to the Swedes for having intended so well to illustrate the intricacies of their writing system and then munched it as I did. Okay. That's it. >>DENNIS JENNINGS: Thank you very much, indeed, Cary. We have 15 minutes for questions, and I haven't forgotten the question about homophone strings, which we will come to at the end because I think we could spend all the time about that. You will see that the general consideration is character variants, not string variants, just as a preamble remark. And I am now opening it to questions to any or all of the case study presenters. Please, and, Naela, we have -- Okay. We will do that in a minute. >> Thank you, Chair. I am Oksana (saying name), and I was happy not to participate but to follow one of these group, and now I have questions for all presenters. What issues were the most controversial during your work? Thank you. >>DENNIS JENNINGS: So just to repeat back the question. What issues for each case study were the most controversial. Does anybody want to -- Cary, what was the most controversial issue you addressed in the case study? >>CARY KARP: Coming to the understanding that it wasn't amenable to the kind of solution that we were charged with identifying. >>DENNIS JENNINGS: Anybody else want to say? Akshat, do you have a most controversial comment? >>AKSHAT JOSHI: Yes, specifically talking in terms of the Devanagari language, that being a complex script and the backing up of the (indiscernible) form was very much crucial, but the consensus on that has not been reached. So identification of variants with those things in mind which are not in a unified manner, that was really one of the most difficult calls to take while identifying the variants. >>DENNIS JENNINGS: Anybody else want to take the opportunity to identify? Sarmad? >>SARMAD HUSSAIN: So one of the most challenging things is actually for TLDs we have to think at script level, and none of us is actually trained to think at script level. Most of us are actually trained to think at language level. And that's really a challenge to sort of step away from language and look at the whole script rather than characters belonging to different languages. >>DENNIS JENNINGS: Thank you. Naela, are you ready with the remote question? (Garbled audio) >>NAELA SARRAS: Thank you. So I have a comment -- I have a question, a comment, and then another question if you have time. So the comment came after Dr. Sarmad said his presentation, from Ayesha, and it says: From your experience, which way is better, to reserve the other variant or block it? I think that was the question. >>DENNIS JENNINGS: So do you want to address that first, Sarmad? >>SARMAD HUSSAIN: So what we are proposing is blocked and reserved are two different statuses. So it's not a matter of preference between one or the other. Reserved means those things which an applicant does not want, and blocked means those things which applicant cannot have, whether he or she wants it or not. >>NAELA SARRAS: Thank you. And then the comment was from Sayef (phonetic). I didn't get the last name. I'm sorry. It says Arabic variant is one of the complicated variants in the world. Since Arabs do not agree on the way they write, there will be plenty of consequences they will face in accepting some letters rather than others. Hence, we will lose the identity of our language. That's why we have to agree on one document that regulates the end user on how to write IDNs. >>DENNIS JENNINGS: Sarmad, did you get the question? >> I don't understand the question. >> So actually, I don't feel comfortable summarizing for him, but I think -- I will read it again. Arabic variants is one of the complicated -- Arabic variants is one of the complicated variants in the world. Since Arabs do not agree on the way they write, there will be plenty of consequences that they will face accepting some letters rather than others. Hence, we will lose the identity of our language. And then later he says that's why we have to agree on one document that regulates the end user on how to write the IDN. >>SARMAD HUSSAIN: I will just make a general comment in this context because it seems more like a remark than a question to me. >>NAELA SARRAS: Yeah, it was just a comment. >>SARMAD HUSSAIN: So when we are talking about TLDs, we are not talking about a particular language, we are talking about a script because when a TLD is being used by an end user anywhere around the world the context of language is no longer there available for that person. So when we're talking about TLDs, again we have to step away from language level, and there was also -- I think it was also pointed out by other case-study teams that we have to take the most conservative approach. Meaning that we have to protect all users. And so there will be compromises which will be made across languages, and that's really been the fundamental philosophy of the case-study team as well. >> Thank you. >>DENNIS JENNINGS: Thank you. And can that comment please -- can the commenter make those comments in the public forum. Is there another question, Naela? >>NAELA SARRAS: I have two more. What would you like to do? >>DENNIS JENNINGS: Unless I see a hand raised in the audience here, we will take the next remote question. >>NAELA SARRAS: So, I'm sorry, I think it was a person named David. I don't see who asked it because there were a few comments that came a after that. It says: Naela, if possible and if time permits, please ask the question that was asked by the lady in the room prior in the call. Will we see dot com domains in other languages? I assume transliterations, for lack of better wording. >>DENNIS JENNINGS: We will come to that. I promised we will spend time on that at the end. >>NAELA SARRAS: Sorry, you're right. The other question was from Matt Harper. I would like to ask a question. If a new gTLD application for dot pet, P-E-T, and dot pets, with a plural at the end, would this be a bundled application or two different required applications? >>DENNIS JENNINGS: I will also say we never used the word bundle in any of these presentations. It's a term that we don't recognize, and it doesn't exist in our lexicon. Pet and pets in this example, the single and plural, are not variants of one another. Next question. >>NAELA SARRAS: I'm done. >>DENNIS JENNINGS: Okay. Back to the audience here. Are there any questions people would like to ask? And seeing none, let me go on to the question about similar sounding strings in multiple languages, and let me take a noncontroversial example. The sound bang. The sound bang I'm sure can be expressed in almost every language in almost every script in the world. However it might be spelled, it represents the sound bang, B-A-N-G, B-J-Y-N-G, whatever, just using the alphabet that I know. These homophone strings are not variants and they are not something this team has considered. Perhaps the members of the case study team want to contradict me, but I understand that that is the situation. Cary, would you like to lead off? >>CARY KARP: It's easy. We were considering graphemic similarity and not phonemic similarity. >>PANAGIOTIS PAPASPILIOPOULOS: And we did that exactly the same in the Greek group. >> We do not consider (indiscernible) similarity as variants. >> Even we don't recognize them as variants, homophones. Thank you. >> I think coming back to the question about controversial issues, one of the most controversial issues was to determine what's within the scope of the project, what's outside the scope of the project. And definitely this issue was outside. >>DENNIS JENNINGS: So does that answer the question that was asked earlier in the question? Come up and challenge us, please. And again, if you could speak close to the microphone so we catch it. >> Hello. I say yes and no. And again, this is from an IDN idiot. So that's my caveat. >>DENNIS JENNINGS: An IDN naive user. >> Yes, an IDN naive user. It's interesting because I'm just thinking very quickly in terms of a policy perspective, if it were -- if that were to be the rule, that, okay, similar phonetics would not be variant, in my view, very quickly, it would create a floodgate of potential -- I don't know, you know, like people could sort of use it adversely. Off the top of my mind, I can't think of an example. Would it be -- And this is a question I pose to you. Would it be an exception, then, rather than a rule? Just very quickly. >>DENNIS JENNINGS: Okay. Again, using my noncontroversial example of bang, the -- if bang in some language became a very successful TLD there might well be a demand for the need for the bang. This policy issue is a policy issue that's not for this project. And if it is a policy issue, it will have to be addressed in the normal policy development process in my opinion. And I am only expressing an opinion. I have the signal we have three more minutes and we have remote questions. >>NAELA SARRAS: A question from Chris Dillon. He says: The concept "visually similar" is important for IDN variant TLDs, but how should it be defined? Ideally with a scoring system, question mark? >>DENNIS JENNINGS: Thank you. So how should the visually similar be defined. Cary, you want to take this? >>CARY KARP: Well, I should think those are the reports that said there is no way to quantify similarity have at the same time said that that can't be supported by scoring, either. And the other studies that didn't draw any such conclusion I suppose should be commenting on how they perceive it, and the extent to which algorithmic support can make this something less of an intricate issue. >>JAMES SENG: So in the Chinese case, what we consider variants are characters that look -- that has no visual similarity between the two. The traditional. (garbled audio) and they don't look the same. And those are considered variant. It's not just visual similarity that's considered variant. There are also cases where they look similar, but they may or may not be variant depending whether if defined as Z-variant. As I'm (indiscernible) the document (indiscernible), but visual similarity is not a clear-cut rule that says these are variants and not, for Chinese, at least. >> Thank you, Dennis. In our team, and I think this is obvious, we had the result that the -- that there will be visual similarity between the Latin, Cyrillic and Greek script. And since these three scripts uses a lot of common characters, there will be visual similarity which will be in a greater extent if we include all the capital letters and less if we exclude them. But we recommend that ICANN should address a common work between these three groups in order to define the similarity, and may propose something about the benchmarking, making a tool. I don't want to say now about making some grades or which is more visual, more similar. But anyway, this work has to be done between these three teams. >>DENNIS JENNINGS: Good. Thank you. Edmon, you have 60 seconds -- probably only 30 seconds now. 30 seconds, quickly, and then we are going to wrap. >>EDMON: That will be enough. I just want to do an advertisement, actually. All of the study team reports actually touch on the subject of user experience, and one of the one of the sessions later on today, at 4:00, the Joint IDN Group, I'd like to invite all of you to go there as one of the things we would talk about is the acceptance of IDN TLDs and that's a related issue. So that's it. >>DENNIS JENNINGS: Thank you for the advertisement. That's great. And I'd like you -- We have to close. This room is needed. Thank you for attending, and could I ask you to show your appreciation to the six work study teams for the work they have done. [ Applause ]