ICANN Meeting - Cairo ASIWG Open Meeting 2 November 2008 >>RAM MOHAN: Sorry to be a couple of minutes behind schedule. Welcome to the -- to this session on the Arabic Script IDN Working Group. We're here to provide a bit of an update for what the Arabic Script IDN Working Group is about, why we got organized, and what we are trying to do. Just as a way of introduction, here on stage, my name is Ram Mohan, one of the convenors of the working group. Right next to me is Dr. Sarmad Hussain, who is a contributor in the working group. We also have George Victor, who's on stage. And we have a good and strong team. In terms of how we plan to do this session today, I have a set of slides to present to you, and my thought is the following: We'll get the slides up first on the screen. And as each slide comes up, George has actually kindly agreed to speak to the content of the slide in Arabic first, and once that is done, I will simply cover a few pieces of it in English, and then we will continue on. In terms of the agenda, although the session was originally scheduled for about two hours, we are likely to finish in just a little bit shorter than that. The plan is to spend perhaps the first 30, 45 minutes, maybe 30 or 40 minutes of this session, discussing who we are and why we got together. And then Dr. Hussain here will present one concrete outcome among the several things that we have done, one concrete outcome of the work that we have done, and go through that in some level of detail, not a great deal of detail, but at a high level of detail. And then, time permitting, also present a paper that is a subject of conversation and discussion at the upcoming meeting of the Arabic Script IDN Working Group later this week. After that is over, then we're hoping that we can actually engage you in participation and in questions and answers and in your ideas and suggestions about what we can do better and how we can do more. So with that brief introduction, thank you for coming. And, George, if you could please do a similar introduction, and then we'll just go through the slide set. >>GEORGE VICTOR SALAMA: (in Arabic) >> I am jumping a bit, but I have a question related to digits. We know that part of Unicode, it looks like the numbers, they don't have a left-to-right or a left-to-right orientation, which sometimes makes also the sequence of the string, if it has numbers. I'm jumping, but it came the subject, since the subject came and the question is popping in my mind. Is there something to do for that? For example, there will be agreement to make it right-to-left by rule, or we'll keep that open? Just as a question that I am -- >>RAM MOHAN: Sure. Thank you for the question. And again, after I speak, if you wouldn't mind speaking in Arabic as well. We discussed the numbers in the working group, and from what I recall, there are three numeric systems that need to be looked at. We have not reached consensus on any rules as to right-to-left versus left-to-right for numbers. What we have agreed upon is that the digits that -- most of it looks the same, but some of it does not and actually has different Unicode underneath it. So much of the focus has been on ensuring what is correct, rather than defining how to represent it. And also -- and George will speak to this in a minute in another slide, we have been focused first on the IDNA protocol level. And at that level, numbers are allowed. So we haven't worried about numbers. We have just worried about making sure that confusing similarity is resolved. We do need to talk a little bit more about how to differentiate between the left-to-right versus right-to-left, but inside the Arabic working group, there is a lot of discussion. Some members have said it must be -- there must be a rule. Other members have said why do you want rules? Let the registry decide because the registry understands how to do it. So we don't have consensus as yet. >>GEORGE VICTOR SALAMA: (in Arabic) >>RAM MOHAN: Thank you. As we've gone through this presentation, it's fairly clear that it doesn't actually require for me to go through the entire presentation again. So what I would rather do -- because for those of you who could speak English, could you certainly read it on the screen. So what I would rather do is spend about five minutes or so speaking to what we are trying to do in the working group, especially over this week coming up. There is a Wiki. There is -- The Arabic Script IDN Working Group has a Wiki, and we have posted the agenda for this Saturday, Sunday, and Monday's meeting on this Wiki. So you are welcome to go and look it up. The general idea behind what we are trying to do is, a) be as inclusive as possible, and then second, to ensure that the kind of solutions that we come up with are extensible. They are solutions that don't work for only one language but they are actually answers or solutions that can cross the language boundaries. Because we come together in the Arabic script working group because in each of our countries or in each of our regions that we work in, the Arabic script is being used. And it's not always used for the Arabic language itself; right? Where someone is from, you have certainly Urdu and Pashto in parts of the country. In India, there is certainly Urdu uses the Arabic script. And at the last meeting, some colleagues come from Northern Africa where again the Arabic script is being used in several of Northern African languages. We had a very fascinating presentation last time from the team from Malaysia where they were presenting to us about how the Arabic script is used in Jawi and some of the unique characteristics there. They actually had to -- Because the Unicode character set represents the Arabic script in a certain way, they have had to kind of, I guess, hack the Unicode script to actually make it work for their language. And that's obviously not a good thing long term. Long term, you want your language to be available online and to be represented online. But at the core of it, the reason why we are together and the reason why we ask for more participation from all of you in the community is because if we don't work together to take the Arabic script and make it work appropriately for the domain name system, for e-mails, for all of the things we take for granted in the English ASCII world, we run the risk that the populations of our country, the people in our countries and our regions and those who speak the language and use the script will actually miss out on what is happening online. And that access and providing access, making sure that the ways to be able to get online, to be online and still not have to learn some other language, especially one that is not your own, that that should not be a barrier to your ability to participate online. And that -- If you really boil it up to the guiding principle, that's why we have come together and we stay together. It's been an extraordinarily effective short duration since we have come together. But one of the big things that we always focus on is to get momentum. So if you would like to -- One of the things that we say for the actual meetings is, if you plan to show up for the meeting, you need to be there, you need to be present, and you need to make a contribution. So the mailing list, it has over 60 -- I think over 90 people right now on the mailing list. But the actual face-to-face meetings, one of the things that we certainly make sure is if you show up, you need to carry your weight. You need to do something. You can't just sit there and do nothing else. And we get vigorous discussions. We have folks inside the group who will take completely opposite sides of an issue. And the really fun thing about it is that we're attacking issues, and all of us inside the group, actually, there's a great deal of respect that we are building up as a result of bringing keen intellect along with some humor to solving a really important problem that is there. We're not an ICANN group. We're not a U.N. group. We are actually just self-organized. We have come together because we are interested. We worry about things like how are we going to stay funded, how are we going to stay and thrive as we go forward. So far, we have been able to get funding from several organizations, but that's one of the things that we continually worry about. But it doesn't stop us from going ahead with the work. So with that as a general preamble. If you would like to participate or observe what we are doing, please join the mailing list. It's free to all. It's available, and if you go to the Wiki, you will be able to see the links on it. There's a link to the Wiki from the ICANN schedule on the ICANN Web site that will take you there. So join in and participate and share your wisdom and share your knowledge with us, because together, I feel quite convinced that we'll actually be able to take the Arabic script and move it into the domain name system in a way that is compatible with both the Domain Name System itself as well as the layers above that. So with that, I will just take a couple of minutes and ask if you have any questions. Because after this, then I'm going to pass this on to Sarmad and we will get into a little bit more of the detail of what we have done. So before we go there, are there any questions from you in the audience? Okay. I do not see any questions, so Sarmad, it's your turn. >>SARMAD HUSSAIN: So I am going to go through some of the discussions we have had and some of the recommendations we have made to IETF in the context of IDNA protocol, which is currently being revised. The IDNA standard is based on Unicode standard, which would eventually enable Arabic script domain names. I'm going to put up the slide which shows the Unicode standard for Arabic script. It's already been explained by my colleague that the Unicode table is not designed just to cater to Arabic language alone, but to all languages which use the script, and there are plenty of them. In Pakistan alone, we have 69 languages, all of which use Arabic script. So Arabic script standard is quite vast and is currently being used by many different languages. When IDNA IETF team set out to define this protocol, they were working at Arabic script level. They were not working for a particular language per se. So the IDNA standard is only working at script will he have. And through that standard, it has to address different problems which arise in different languages which use the Arabic script. One of the first things which the protocol did is actually use the character properties for each of the Arabic letters defining the standard, and try to come up with an automated way, using a formula, to use the character properties and devise whether the character should be allowed in the domain name or not. The character properties -- The idea of character properties, for example, some characters in Arabic script are combining marks or diacritics for Arab, some are characters are core characters. Some are digits. So each of those different characters which are encoded in Unicode, they also have some properties which are also encoded in Unicode. And one can write a formula using those properties of these characters to automatically decide whether they should be allowed in the domain name or not. So this was the initial list, which the IDNA -- the IDN standard came up with. And if you go through this, it actually, in the first column, shows the Unicodes, Unicode ranges. And then in the second column, tells you whether they are allowed or disallowed. So the Pvalid characters are the allowed characters. You can use them so they are protocol valid characters, and you can use them in domain names. Whereas the disallowed characters are those which cannot be used in the domain names. And they were actually based -- This data was derived from the character properties which are encoded in the Unicode. Let me show this to you a bit more visually which is perhaps more understandable. So this is the code table. Let me zoom in a bit more. So there are certain characters like the number sign and the sana sign which is used to write date or some of these arithmetic signs which are not going to be used in the domain name systems. And the yellow highlighted ones were the ones which were initially disallowed by the IDNA protocol. So all the white ones were allowed initially by the IDNA protocol to be used in domain names and the highlighted ones were not. Clearly, again, the highlighted ones were either formatting markers or related to numbering system or somehow not related to basic characters. But once we started looking at what was allowed and what was disallowed, it became very clear that just the formula was not enough to clearly distinguish the requirements of different languages. There were some characters which were disallowed but were actually required by some languages. An example which was incorporated earlier so it's not highlighted here, these two signs, 06FD and FE, which were required by Sindhi language spoken in Pakistan and India, but were initially disallowed, so that was one problem. And the other problem was there were some characters which were allowed but most of the community or people who came together as ASIWG platform thought that they were probably not going to be used in domain names. And so they were redundant and will cause unnecessary confuse in domain names, even to the extent some security issues, because they can cause confusingly similar domain names. So they should be disallowed. So one of the first tasks which we undertook was go through each of these characters in the Unicode standard within the Arabic script block, so the 0600 block, and then extended Arabic -- the supplemental Arabic script, which is 0750 onwards, and looked at each character which was allowed or disallowed and verified that character by character to see whether any disallowed character should be allowed, or whether any allowed character should be disallowed as far as domain name is concerned. So we are not talking about writing Arabic or Urdu language or any of these languages. We are only concerned with a subset of representing those languages within the domain name. So going through that exercise, so this is some of the presentation forms, very interestingly. Most of them were disallowed as they should have been, but there was this one character there which was apparently allowed for some reason because of its character properties. So we went through each of the characters, and actually analyzed whether they should be part of the domain name or not, and as Ram pointed out we had some very healthy discussion on many of these characters, sometimes taking many hours. And eventually, this is the summary of recommendations which we have presented to IETF for Arabic script. And these recommendations have now been accepted and made part of the protocol, protocol layer, for IDNA for Arabic script. So this is done not through the formula, of course. The ones with the formula were already there, so these were additional things which we thought should not be allowed. So most of this list are disallowed characters, and these has been incorporated at the protocol through an exception list because they were not earlier identified through the formula to which the IDNA protocol had applied. So this is a better view of looking at it. So the yellow ones were already disallowed. The white ones are allowed. And the green ones are the ones which we, during our discussions, thought were irrelevant for domain names and, therefore, should not be allowed. So green ones are also disallowed and are now incorporated in the protocol layer as an exception list. Going through some of these -- I'm not going to be able to go through all of them in detail, but just some generic concepts which we have used. Some of these Arabic characters -- let me go up a bit -- were introduced for stylistic variation. So, for example, Tatweel character which is used to elongate writing a letter, so you can actually stretch SEEN or some of the other characters in Arabic script by adding this hyphen-like character right there. And so it was not playing any significant role as far as the characters was concerned. It was just a formatting character. So there were these formatting kind of characters which had to be taken out because they would cause confusion. And so it was not playing any significant role as far as the characters was concerned. It was just a formatting character. So there were these formatting kind of characters which had to be taken out because they would cause confusion, as end user would not be able to tell the difference between a scene -- whether the wheel or without the wheel or therefore whether may cause phishing or security breaches. So there are formatting kind of characters which had to be taken out. There were also some Quranic characters and marks which were not used by any of the languages, they're just used for publishing Quran, and, therefore, people who were parts of those discussions thought they may not be relevant as far as the domain names were concerned, because domain names are representing different languages. So some of those Quranic marks were also taken out. One of the reasons why they were taken out was because if they were left in, they would cause significant amount of confusion and it was a better solution to control the security rather than -- of domain names so that they eventually become (inaudible) versus allowing everything which is possible, and cause almost chaos as far as domain name security is concerned. Okay? So some of these characters, so these are, for example, some of the combining marks which were used in Quran which we thought are not used by any other languages, and therefore, they were disallowed. One of the things which we were very careful about is, if we were even slightly unsure that one of these characters may be used for one of the languages other than the ones which were represented, we kept it in. We didn't disallow it, because we don't want to limit any of the languages being represented in domain names. But beyond languages, Quranic characters, we thought, was reasonably unusable as far as domain names were concerned. Again, these are also some of the other characters. All of them are encoded in Unicode as Quranic characters, not for specific languages or generally Arabic script. So we had actually taken them out as well. So this is -- the highlighted portions, both yellow and green, are the ones which are now currently disallowed by the protocol through the formula or through the exception list. And the white ones are those which are still allowed in IDNA protocol. Even within the IDNA protocol, we are currently undergoing a discussion on -- you can still see some combining marks. There are some combining marks here which are used in South Asia. There are also some combining marks here which are used in different languages. There's also the more popular combining marks, the damma, the fatha, which are used by many different languages. These actually cause confusingly similar variations with the words in which they are not used. So you can write "bank" in Urdu or Farsi or Arabic with or without the combining marks, and to an end user, they would seem to be the same word, because these diacritics are optional. So one of the things we are doing is -- so we've allowed them at protocol, so that, eventually, when the IDN -- the domain name system becomes so mature that it can actually display Arabic script properly in its totality -- currently, the domain name is not -- it's displayed properly as far as ASCII is concerned. But when these new languages come in in IDNs, the small little bar at the top of Internet Explorer or Firefox is just not going to be the solution. Something more elaborate at the application layer needs to be determined to enable IDNs as well. But until that date, these characters will remain confusing. And, therefore, we are -- what we are saying is that they're allowed in protocol now, but at a higher layer, the Arabic script layer, we are disallowing them. But at a later stage, then applications can cater to these kind of ambiguities properly, we will allow them again. So that's why we are not disallowing them at protocol level, because if you disallow them at protocol level, it's disallowed forever, it cannot be renegotiated into the domain name system. That's some of the work which is already completed and already part of the IETF IDNA protocol recommendation. We are also working on some issues which are not related directly to the protocol -- perhaps they are. So this is something which is still under discussion. We will be discussing this in the upcoming meeting next weekend here in Cairo. So what I'm going to do is not now -- So this is -- the current work which I am now going to present is in progress. It's not finalized, this particular document. But I am going to show you what the issues are so you know what we have to deal with and eventually come up with a technological framework, which is protocol-wise stable as well, and, obviously, end user-wise usable as well. >>RAM MOHAN: Before we go there, Sarmad, I wonder if we might stop for a minute and ask if there are any questions on what we've done so far, on the previous presentation from Sarmad. Two questions there. Could we get a microphone. Thank you, Baher. >> RAMY AHMED: I just want to ask a small question. How do you get -- First of all, how many countries are being served by this set of Unicode characters? Roughly how many countries? >>SARMAD HUSSAIN: Actually, I would say most of the countries in the world, because even if a country is not directly using Arabic script as an official script, even, for example, U.S. or U.K. or some of these other European countries, or there are Arabic script-using populations who are there, so this is relevant for everybody. >> RAMY AHMED: I think that this Arabic script Unicode characters or these Arabic script characters would be of a certain benefit to the Middle East culture. I can't say that it's totally beneficial to all countries -- it's culture-oriented. That's why I'm asking this question. It's culture-oriented. >>RAM MOHAN: That's certainly a valid perspective. We've taken in the Arabic Script Working Group, we've taken a script orientation rather than a language orientation, because in the domain name system, computers, browsers, et cetera, don't really know how to distinguish and differentiate from one language to another. And as you know, they are quite tone deaf. They don't understand culture at all. So we have focused quite a bit more on the script. Because it's quite straightforward to be able to say, "Here are a few characters that, in addition to the characters used in the Arabic language, here are other characters that are required in, say, Urdu or Pashto." But if you start to add the cultural context, right, it's used a certain way, what we do inside the group is to say, "That's wonderful. Now let us take that and try to convert that into, how does it affect the script? Is there a character that is missing that should be added?" Because if it is not added, then that language is not represented properly, which obviously means -- culturally, that's a really bad thing. So that's been our approach. You're right, the Arabic script is -- there's got -- there's a large population in the, you know, Arab world, in the Middle East region, there's a large population. But I have to say, a lot of our contributions have come from people who live in Canada, people who live in Ireland. Because the languages that are using the script are global, international, and popular everywhere around the world. And so we have focused much more on what we can do, which is, if your culture requires the script to be used a certain way, let's make sure that that is represented properly, that that is not eliminated from the DNS. Because if we don't exercise care, you know, if Sarmad, for example, wasn't there, two of these characters would have been banned, completely banned, from the protocol level. And once characters are banned at the protocol level, it's very hard for it to come back, because all applications, everything around the world will then implement those rules; right? So that's been our approach. I hope that answers your question. >> RAMY AHMED: Yes, it does. And this goes directly to my second question. What is the criteria you have adopted in your -- in the group for admitting a set or the agreed-upon characters? >>RAM MOHAN: Consensus. >> RAMY AHMED: In what sense? There might be a country that elects two or three characters. It is a crucial point here. >>SARMAD HUSSAIN: Yeah. It is actually quite crucial. And we do realize that no matter how large we are, we cannot be 100% representative. So what we are doing is not looking at what should be included, but, actually, very carefully looking at what should be excluded. And as I said, so we take one character at a time and see whether it can be potentially a language character or not, the way it's encoded in Unicode. Because Unicode also explains why that character was encoded. And if it is clearly an encoding which was not for linguistic purposes, but for formatting purposes or for some other extra-linguistic purposes, we will take it out and leave everything inside. >> Thank you. >> MOHAMMED ELZAROONY: First of all, thank you very much for such a nice presentation. My name is Hammed Elzaroony, and I'm from dot AE domain administration. My question to Sarmad, you said that the list of the disallowed characters recommended by the ASIWG group have been added to the exception list in the IDNA instead of the formula itself. And I believe this is because -- and correct me if I'm wrong -- this is because the IDNA started before the ASIWG group, that's why -- I mean. My question is that, will that be continuing to be this way, even if the IDNA has been revised and the new version of IDNA introduced? So will that be still in the exception or will be added into the formula? And is there any technical difference between putting it in an exception list or in formula? >>SARMAD HUSSAIN: As far as the domain names are concerned, there is no difference whether it is through the formula or through the disallowed list. The only thing it says is that it cannot be part of the domain name. We are actually dealing with two different standards, which are owned and developed by two different organizations. IDNA is work which is being taken under the umbrella of IETF. And Unicode, obviously, is managed by the Unicode Consortium in conjunction with the ISOs committee. So the character properties are actually defined by the ISO WG2 committee and the Unicode technical committee, in conjunction with each other. And so what IETF receives is a standard which is already formalized. And they have to work with that standard, and based on that standard, come up with a way to define the Internationalized Domain Names. Through our work, the -- and so by using this formula, we obviously found there were some interesting differences with the way these character properties are defined and maybe the way they should be. And they have been -- the feedback has been sent back to the Unicode Consortium as well. But, again, those are two different processes. And if -- the Unicode standard actually would probably not revise the existing standard in a way that it disturbs different standards which are based on it. That's one of the Unicode standard stability statement. So I think, based on what Unicode's already done, what IETF is doing is also going to be reasonably stable over time as well. >>RAM MOHAN: Another question there. >> My name is (saying name) I work for the NTRA, Egypt. My question is about the Latin domain names, like, we are all using on the Internet and like we never think about the Latin domain names. We know different languages use the Latin character set, for example, English, French, Italian. So my question is, do they have different domain names or do they follow one domain name when it comes to the Latin characters? >>SARMAD HUSSAIN: The current domain name system is based on ASCII, not the extended ASCII, which includes some of the letters which are used in European languages. But with the revision, with the Internationalized Domain Names, that will also be possible. >> So will be, like, if we can have French domain names currently? >>RAM MOHAN: Yeah, I think in this -- if you look at this new round that ICANN is opening up, at least from what I have read -- and Baher is here from ICANN, so he might be able to speak to this directly -- but you would be able to submit, for example, a request for, you know, "café" with the "E" with an accent on it. In the regular domain name system today, that is not possible, because it's only the standard ASCII. But in the -- as we extend the domain name system to be internationalized and to allow Unicode, even in the Latin script, the accents and the umlauts and the things like that, all of those, the possibility comes through. But perhaps, you know, since Baher is here directly from ICANN, you're a better source for this. >>BAHER ESMAT: I'm afraid I missed the question. But -- >>RAM MOHAN: Let me just paraphrase. The question is, we're talking here about what we're doing with the Arabic script. The question from the gentleman from NTRA is, what about the Latin script itself which has characters that are currently not extended -- >>BAHER ESMAT: (inaudible) okay. So the IDNs will potentially include all set -- potentially, all set of characters that are not currently included under the ASCII. Those characters are the extended Latins plus any other set of characters, like Arabic script, Chinese, et cetera. So ram gave an example of "café." We're talking here at the top level of the domain name system. Because today you can register "café" with accent on the "E" under dot com, for instance. But you cannot have a top-level domain at the level of com, net, et cetera, that has this accent on the "E" letter. So we're talking here about top-level domains. So within the process of the -- what we call the new gTLDs or the new generic top-level domains, IDNs will be part of this process. Also, the whole process about the IDN ccTLDs or the IDN country code top-level domains, like having dot EG in Arabic, like, for instance dot (inaudible), or dot AE for Emirates, like dot emirate, and so on and so forth. This will be part of the new process that will hopefully take place during next year, 2009. And for this, maybe I should also hint that we're going to have a couple of sessions today on both new gTLDs and IDNs. They are starting at 1:00 in this room, and we'll have a presentation in English and another one in Arabic on the same topic. So if you are interested, you can stay with us until we have the sessions. Thank you. >>RAM MOHAN: Thank you, Baher. And another question there. Did you want to add? >>SARMAD HUSSAIN: Yeah, I just want to add another comment to it. And that is that when you extend the Latin script, it will also come up with similar issues. There will be some characters which will be confusingly similar or same. And the Latin script community will also have to go through a similar exercise to see what is actually -- what should be allowed within the domain name system so that there are no confusions. >>RAM MOHAN: Just as -- again, to follow up on what Sarmad is saying, one of the interesting things for me, watching this group evolve, is that, originally, I remember Dr. Al-Zoman from dot SA coming into the meetings and saying, "We've solved the problem, we have the Arabic language table. We've solved the problem." And as we started to discuss a little bit more, he was the same person who came in and said, "It's a bit more complex than this, because there are" -- in fact, the examples that George was showing you, those are actually the famous examples from Dr. Al-Zoman's slides. As we've gotten a better understanding, we realize that the script table is really critical, and the languages become subsets of the script table rather than the other way around. The gentleman there and then a gentleman at the back. >> WAHID ABDALLA: Good morning. My name is Wahid Abdalla. Thank you for your presentation. It was very good and informative. So let me talk with my Arabic language, if it is possible. (Speaking in Arabic.) >>GEORGE VICTOR SALAMA: Okay. I will translate here. He was just asking about a proposal to omit the numbers from the table, because -- from a culture point of view, they are not representing the Arabic characters, if you went back to history. So maybe Latin is not Arabic. So we can just ignore them at all and avoid the confusion. >> (Speaking in Arabic.) >>BAHER ESMAT: Just make clarification for the panel. So the gentleman is talking about the set of numbers and the disagreement among the Arab community on whether these numbers are Indians or Arabic and the other set is Latin or -- yeah. So what he's saying is that there are currently six countries in the Arab region, the North African countries, yeah, the Moroccan countries like Libya, Tunis, Morocco, Algeria, and Mauritania and the western Sahara. And they agreed a long time ago that they would only use what people call the Latin characters, and to omit the -- what people call the Arabic or -- yeah. So -- and his proposal is to -- for -- yeah, his proposal for the rest of the Arab countries, to take the same direction on the thing. >>RAM MOHAN: Okay. Got it. Let me quickly respond. First of all, thank you for the suggestion. Second of all, please join the mailing list. And please make the proposal. Because it'll then certainly get discussed in the face-to-face meeting. We'll get a chance to work on it. So far, what we've -- as Sarmad was saying, we've been focusing on what to eliminate at the protocol layer. And we have not reached consensus that, you know, numbers should be eliminated. But please join the mailing list. We need such contributions. Sarmad, you wanted to respond as well. >>SARMAD HUSSAIN: So as far as the Arabic script numbers as encoded in the Unicode are concerned, so they -- the origin may be Indic, as you are suggesting, they're used by people in communities that use Arabic script in conjunction with the Arabic script. And that's why they are being allowed within the domain name system. It's an interesting, good argument, even -- I come from Pakistan. Even in Pakistan, there is ongoing debate whether to use the characters which are currently used in Latin script or the ones which are currently used in Arabic script, and whether to limit to just the characters which are used in Latin script. However, please appreciate that when you're using Internationalized Domain Names, the ASCII is still allowed, along with the Arabic. So that's one of the basic things within the Internationalized Domain Names. So when you're using Arabic script, you can still use the Latin characters. It does not preclude the concurrently usage of the two scripts. So that choice will be there for the Arabic script users, at least the way protocol works right now. And they will also have the choice to use these letters which are within the Arabic script block. And so everybody's happy. If you want to limit the use of these particular characters within the Arabic script block, you can always do that at a registry level or at a national level at a ccTLD level. So those things are above the protocol, and everybody still has the liberty to control the way they want to control the language in a particular region. However, we also want to give the liberty to those people who made the choice of using the Arabic script characters. So that's why they are also included. >> Question. >> TOR KINLOK: I'm curious about Arabic-scripted gTLDs and any Unicode gTLD in general. Are these going to be implemented in Punycode nor some other method? >>RAM MOHAN: Could you please introduce yourself as well. >>TOR KINLOK: My name is Tor Kin, from dot nu domain. >>RAM MOHAN: That's actually a question for Baher. >>BAHER ESMAT: Yeah, so, basically, as far as I know, the implementation of IDN gTLDs or IDN ccTLDs will have to follow the IDNA standard, which is following the Punycode algorithm. Does this answer your question? Okay. >>RAM MOHAN: Any other questions or comments? >>ERIC BRUNNER-WILLIAMS: Thank you, Eric Brunner-Williams, from CORE. The previous -- the gentleman directly in front of me mentioned the use of Latin numerals. And I wonder, Ram, if you have discussed the bidirectionality problem that we have in the Unicodes, the bialgorithm problem of directionality. And the use of Latin numbers would trigger that feature or bug -- your choice -- in the Unicode dot handling algorithm. >>SARMAD HUSSAIN: We are, I think, currently going to start discussing some of the bidirectional issues. We've not directly addressed them in our meetings so far. But that's something which is definitely on the agenda, yeah. So at this time, we are just working at character level. But we are now going into -- So what we were doing so far was just at protocol layer. Only now we are going into application layer, which goes into presentation kind of issues. And that's where the bidirection comes in as well. >>ERIC BRUNNER-WILLIAMS: My point was the gentleman's proposal to use the Latin rather than the Arabic script number, you know, 06F, 039 would actually trigger that, and he should be aware. >>RAM MOHAN: Question there. >> ASHAR NISAR: My name is Ashar Nisar. I represent dot PK domain. My question about the fast-track IDN. Are you aware of any Arabic language fast-track IDN proposals? Or is your work in any way gating that? Or is it a parallel activity? >>RAM MOHAN: We've discussed this in just a very high level inside of the working group. And we don't think that we are necessarily a gating factor. We've come together because if you want to implement -- we believe that if you want to implement the -- especially if you're trying to implement across multiple regions that speak different languages but produce the same script, you need to have a good variant table. You need to understand what is confusingly similar. And that didn't really exist before. We had, you know, the UAE or Saudi Arabia or other folks coming together and saying, here is an Arabic language and here is a language table for that, and let's make decisions there. This expands it a little more into Arabic language versus Urdu, versus Pashto versus Jambi, et cetera. So we hope to not be a gating factor. We hope that the work that we're doing will be useful for whoever plans to apply for, you know, a new top-level domain, if they want to, using the Arabic script. But that's not our interest. We are not together for a commercial reason. We're only here because this is essential work, and the only way to do it is to get technologists, linguists, and technocrats, you know, folks to come together to solve the fundamental problems and to leave the policy and the politics and the money piece of it to those who want to work on that. That's been our focus. >> If you disallow one of the characters from the protocol and some IDN, some registry has already registered some domains in that character, what are the ramifications? >>RAM MOHAN: That's a great question. The new protocol is not intended to be backwards-compatible. And as a result, if you disallow something at the protocol level in this new protocol, especially once it's ratified and once application is implemented, you're done. If it's disallowed, it's gone, it won't work. It will break. That's why we have to be extremely careful in making decisions or making recommendations about what to leave out. And we've generally taken a very conservative approach, which is, everything is allowed except those that must not be allowed. We've actually had a very strong debate inside of the working group for some characters that are -- clearly, obviously pose a strong phishing threat. If they are left in, they will cause a big threat for phishing. And our approach has been to not ban them at the protocol level. We've said, "Look, the language needs it. The languages use it. But what should be done is, at the registry level, that" -- you know, and we're trying to come up with some sort of a unified variant table so that you can have registry rules that say, "If you select this particular word, then here are all the other characters that are tied to it as a variant bundle." Right? So at the protocol level, it's still allowed. But at the layers above it, the rules are made to ensure that the confusion does not happen. So our approach has been to be as minimalistic as possible at the banning level. I think we've -- so far, the ones that we've proposed, we've proposed it. We've put it out, you know, in a public list, waited for comments. We've had, so far, zero controversy about the characters that we have suggested be removed at the protocol level. Did you want to add, Sarmad? Okay. Thanks. Any other questions before we go to -- I'm just mindful of time. We have another 20, 25 minutes. I would like to now move on to the last part of this morning's session, which is to give you a bit of a preview of what we plan to discuss at our weekend this weekend in the Arabic Script Working Group. Some of the burning issues that are in front of us and that we're looking to come and discuss and hopefully arrived at some consensus-based outcomes. So to you, Sarmad. >>SARMAD HUSSAIN: Thanks. Okay. So this second document is not about -- directly about IETF IDNA protocol, but may be addressing some of the layers above that protocol, the registry layer or the application layer, but May, obviously, have some bearing on the protocol layer as well. So the idea of presenting this document here, it's still under discussion, so it's not a final document. It's still under debate. But the point of bringing it here was to show you some of the issues which we are facing as far as Arabic script is concerned. And again, if you are interested, please join us in the weekend meeting and bring your opinion on to how some of these issues may be resolved. So I am just going to share with you the problems, not the solutions. So there are two kinds of problems which exist, which cause confusion as far as the Arabic script is concerned. One is derived through the different shaping which Arabic characters take in different context. And the second comes through the normalization process where Arabic characters combine with some other Arabic characters to give you more different Arabic characters. Yeah. >> Can you zoom in a bit, please. >>RAM MOHAN: Is that better? >>SARMAD HUSSAIN: So I'm going to talk about these two problems separately. So first I am going to talk about shaping kind of confusions, and then eventually go into the combining character and normalization kind of confusions which are existing. In the first context, we have three categories of characters as per this particular document. There is one category of characters which have -- which one -- you have multiple characters which have exactly the same shape, but not across all possible shapes. Each character of Arabic can take up to four shapes. You all understand it. There's an isolated shape, there is an initial shape, medial shape, and the final shape. It is not necessary that if two characters are different in one shape, they will be different in all the shapes. So there are -- I will show you examples. There are actually cases where two -- in the isolated form, the two characters may look very different. But in one of the combining shapes, they are exactly the same. And that causes a phishing problem. So that's the first category we will look at. The second category is even though the characters are not same in any particular shape, they are similar culturally or otherwise to some other characters as far as the base form is concerned. And then the third category where they are confusingly similar only in the marks, the diacritical marks. The base form is not the issue. So let me go into those details. This, the KAF characters are the first example. So we have the Arabic KAF, 643, and Urdu or Persian KAF, which is the 6A9. They are very different looking, as you can see. They are not confusing, but when you combine them, the medial shape and the initial shape is exactly alike. And therefore, they become confusingly similar based on initial and medial shapes. So even -- So if you have them in final and isolated forms, they will be distinguishable, but not in other shapes. Similarly, the HEH character, there is a long list here, there are many characters which are confusingly similar with some other characters in particular shapes. So in the isolated form, the first and the last two are exactly the same. So the first two are also same in some of the other shapes as well. The HEH shaping is quite confusing, even by -- in Unicode standard. And obviously, that is all carried over into the domain name IDNA standard as well, because obviously that shaping is coming through. We have to eventually deal with this. Some languages are using some particular characters. So, for example, 6C1 is used by Urdu and 647 I believe is used for Arabic. But if somebody uses 647 for Urdu, it can actually cause phishing issues. The YEH character also has similar problems. So there are three different YEH characters, one with a dot in isolated form, two without dots in isolated forms, but one of these characters which does not have dots in isolated form actually gets dots in initial and medial positions and becomes exactly confusable with the middle and initial position with the YEH, which actually has dots in the isolated form as well. So it's not just about characters which are confusable in isolated form. We actually have to look at all the shaping variations and see if they are confusable even in the other three forms as well. So these are two characters which one would normally not confused. The YEH with two vertical dots and the HEH with two vertical dots. But in the medial formal they become confusing. So the YEH and the BEH are actually confusing. They actually share the same initial and medial forms. Similarly, these two characters, I don't know -- this is BEH, but I don't know the name of this character. Unicode says it's a YEH with three dots below it. But again they become very confusingly similar in medial and initial positions, although they are actually not confusing in isolated or final forms. There are also FEH-like characters. There are two of them which are encoded in Unicode. One is like the KAF, which has a circular shape, and then one is like a BEH, which has a flat shape. But they become exactly the same characters in initial and medial forms -- shapes. And then there is a few more characters. So this is Arabic letter TEH Marbuta, which have the same initial form but different -- but these final forms also are actually variants of each other, actually. At least in Urdu, we can use both and they would mean the same thing. And then going further down, we have similar character combinations for HEH with Hamza and then this TTEH character, which is not in Arabic, but in many other languages it, has again, the noon form and the BEH form, if you allow me to use that term. So the noon forms and the BEH forms are obviously not similar as far as isolated and final forms are concerned, but they become exactly the same in the initial and medial forms. And similarly, same confusion for these characters which have three dots above. They have also the noon form and the BEH form, and the medial and final initial forms are exactly the same. We have already talked about digits. So again, those were the shaping problems. Digits also come into the shaping issues because Unicode has, for its own historic reasons, encoded Arabic digits separately from digits which are used in other languages different from Arabic, but they obviously share exactly the same shape for most of these digits, except the 4, 5 and 6. So these are some digits which have exactly the same shape. This is slightly different, but again, confusingly similar, so we have included 5 in this list as well. So they have exactly the same shape. Not distinguishable. And then there's this other issue which is the application layer issue where underlying it is exactly the same code, but users actually see it differently because the way the presentation layer changes the shape of the same encoding, same letter in different languages. All right. So now moving on to some of the examples of characters which are not exactly the same in a particular shape but are similar, at least in some languages, in shape with each other. The first example comes from the KAF, which -- So there are these -- we have already talked about these two calves. The 6A 9 and 643. There is another KAF which is 4AA which is used in SINDHI which is a little elongated. It is not the same KAF as the other two. So the shape is a little different. And interestingly, in Sindhi, 6AA and 6A9 are two distinct characters, and Sindhi speakers can, without any confusion, differentiate between them. But for non-Sindhi speakers, like Urdu speakers or Arabic speakers or Farsi speakers, these shapes start looking stylistic variations of each other, and therefore become confusable. And so this character can potentially be used in non-Sindhi community as a phishing mechanism. Also, the YEH character, we have already seen the first two. We added the other two to show this. So this is a Pashto YEH which has the small little sort of ending lip. And then there is this -- another YEH which has two vertical dots rather than two horizontal dots. And in some languages, these are distinct characters, but in some other languages, Arabic or Urdu, these may actually be confusable with the YEHs which these languages have and potentially cause a phishing threat as far as end users are concerned. And then there's this interesting problem -- sorry. In Arabic, where these two characters, I am told -- I don't speak Arabic myself -- which are totally distinct in shape are actually considered variants of each other as well. So they also may cause confusion as far as end users are concerned. Okay. So those were characters which were confusingly similar in the base character. There are also characters which are confusingly similar, not exactly the same, just in the marks or their diacritics. Not the shapes, the base shapes. Here are some examples of those as well. So this is ALEF Hamza, and then ALEF Hamza, it's a wavy Hamza which is encoded in Unicode for some of the other languages, and they become confusingly similar. They don't have the exact shape, but they become confusingly similar. Similarly, Hamza below and the wavy Hamza below can become confusingly similar. The TEH with horizontal dots and TEH with vertical dots are confusingly similar for some languages. So if you would write this for Arabic, some end user may consider it as a stylistic variation rather than think of it as a different character which was actually encoded in a different language and therefore should not be used in this particular context. Similarly, the three dots and the inverted three dots above. These two characters can also be taken as a stylistic variation. Again, so these are images which actually encodes this character at this time, but these are also variations of each other. Not in the basic shape, but in the diacritics which associate with these shapes. And then the two YEHs with horizontal and vertical dots, and then this PEH character comes in both varieties with the dots right side up and dots upside-down. Similarly, the two -- whatever these characters are. I don't know the name. Again, you can see the variation in dots. And these variations in different characters -- oops. I think this is a mistake here. These would be similar, except the orientation. So all these characters are different characters in different languages, but may become confusable in other context. And all these characters are currently encoded within the Unicode standard. So we have to deal with them. How we deal with them is again a separate story. As I said, I'm just bringing the problem out. We are still discussing the solution. We don't have the solution yet. But if you want to become part of the solution, join us on this weekend. The second part -- So there was this shaping issue, shaping at two level, one where things are exactly the same and the other where they are confusingly similar. And then there are also normalization issues. Unicode already does some normalization to cater to some of the variations which are caused. And, however, Unicode doesn't do all the normalization which should be there. So somebody has to add up on the normalization which is already prescribed and make a complete list, and that's what we are in the process of doing. So let me show you some of the normalizations which are possible, and whether they are done by Unicode or not. So the MAD sign -- MADA sign, can combine with a variety of characters. So all these combine marks with combine with a variety of characters to give you new characters. And sometimes those new characters are also encoded separately, the combined characters. So that's what causes the confusion. Whether you want to take it piece-wise or you want to take it as a whole. Obviously domain -- the protocol, IDNA protocol, would choose between the normalized or the -- the composed or decomposed versions. Assuming that we are doing the decomposed -- the composed version, all the decomposed forms will at the application level be first composed and then sent to the protocol. However, if the combination is not already part of Unicode normalization, application wouldn't know how to compose those two characters and map it onto the composed form. And therefore, that part has to be added within the standard. So this column actually says whether Unicode normalization is already defined or not. The D and KD actually indicates that Unicode already normalizes this. So we don't have to worry about it if the application follows the normalization Unicode prescribes. But the not-defined things are those which Unicode currently does not normalize, but they still give you the normalized form, of which a composed form is also encoded. And so that's something which has to be added on to the Unicode standard in the Domain Name System. So I am going to go through this. So again, this is one of the things which is not defined, these two -- this combination. Can also look confusingly similar with that combination -- with this letter. So we have the Hamza ray which is not currently defined in the Unicode normalization but there is a character which exists which is confusable or almost exactly the same as the composed form of these two characters within the Unicode. Again, so there are some other characters which are not defined. So we actually have to eventually find a mechanism to include this normalization process on top of the normalization process which is defined by Unicode to make sure that the users get something which is secure and not confusing. So these are, again, some of the characters. This is a combining mark which is used in some Arabic script-based languages and it combines with a host of characters and gives a host of combined versions which are not currently normalized by Unicode. So I'm just going to scroll to this list slowly, and you can.... Okay. So then there is this dot which is used by some languages in Africa. It's not a regular dot. It's a bigger dot. But in the small domain name text box, it will probably not look very different from a regular dot. So therefore it can actually combine with a host of characters to give something which is confusable with the basic characters. So again, I am going to go through this list. And let me just quickly wrap this up and take some questions if you have any. So what you can see is that there is a host of these problems which exist in Unicode from a domain name context. And we have to solve them in a way -- There are two tradeoffs. One tradeoff is that you want to give as much of domain space, domain name space to the end user as possible. But the other side of it is that if you give too much of it, you open it for phishing and security kind of issues. So you have to close this up a little bit. Where that compromise is going to be made, where that balance comes in is still something which we are discussing at the Arabic script group, which we are all a part of. However -- so, sorry, we do get into some very exciting discussions on this tradeoff. But so far, we have actually not been able to find a -- We are still in the process of finding a solution. The solution we are currently thinking is going to be multi-layered, some things which are going to be limited at protocol levels, some more things which are going to be limited at Arabic script level on top of protocols, then a few other things limited at the registry level, and then a few more things which can be done at the application level. So it's not going to be one solution which fits all. It's not possible. So each country, each language will have to make their own unique solution, and that solution will differ at either application level or registry level. Thank you. >>RAM MOHAN: Thank you, Sarmad. Questions for Sarmad, and then let's, George, you wanted to show what you had on the screen. But any questions for Sarmad? One question there. Microphone. There you go. >>RAMY AHMED: I'm Ramy Ahmed, from NTRA Egypt. Just something came into my mind. Anybody offered who presented any study about the statistical distribution of the characters in the relevant languages? Like in English, we know that the most -- the most frequently used character in the English language is E, is the letter E, followed by some statistical distribution. So perhaps we can do something similar to all the characters. And based on the most frequently used character in its relevant context or in its servant language, it can be adopted as a standard. Anybody sort of -- >>RAM MOHAN: We haven't discussed that in our group, but let me give you my first thought on that. In the English language, for example, the letter Q is not very heavily used. But I think it would be quite a pity if it got banned. So I think the statistical analysis might be useful, but I think what we would probably end up doing is -- >>RAMY AHMED: Taking into consideration that as far as I can understand and see from the gentleman's presentation, most of the similarly confusing characters more or less have the same pronunciation or the same -- >>RAM MOHAN: Phonetics. >>RAMY AHMED: Phonetics, yes. So if we have the same phonetics, and if it's such a big problem that everyone wants to stick with his own cultural script character, so perhaps it's a thing to think about. >>RAM MOHAN: It is a good idea. >>RAMY AHMED: Yes. >>RAM MOHAN: I think what we find inside of the working group is that -- >>RAMY AHMED: After all that matters is that we get to a certain domain. We get to a certain -- actually, it would be finally resolved to an IP address. So -- >>RAM MOHAN: So okay. So let me say this. And this is my personal opinion, not that of the working group at all, but it is just my personal opinion. I would have a hard time subscribing to an approach that attempts to stitch together the lowest common denominator, you know, across different languages. Because then inevitably, you are going to find characters being orphaned that really ought to belong but are being orphaned. So our approach is to actually take the -- add as many as possible. And yes, it does cause more work. It does require, at the registry level, and even -- perhaps not so much at the application level, but at the registry level and at the unified Arabic script level, it does require quite a bit more work. The variant tables, for example, when we are done with this exercise, are going to be a little bit more than just simple. However, I think the problem is important enough that attempting for, you know, kind of base simplicity is the wrong answer. This is my personal opinion. I think the complexity here is because we are talking about a script that has evolved over time, and that is essential in different ways to different regions. And what we are trying to do at the computer level is to unify the use, you know, of the script, and not worry so much about whether a particular character is used frequently or less frequently. So that's just my personal opinion. I would have a very hard time looking at a character that is used solely in Jawi and not in any other of the Arabic script-using languages, and say let's leave that out. Because even though its frequency may not be very high, it is very relevant for one community, and that's good enough for me to be included. >>RAMY AHMED: It looks similar. I am not trying to debate here. Just a small comment on your comment. It actually looks similar to my character. So there wouldn't be any problem from my point of view. >>SARMAD HUSSAIN: May I also respond to that? >>RAM MOHAN: Sarmad. >>SARMAD HUSSAIN: Can I also make a short comment on it? It looks similar to us, but those two characters, as the example I gave, the KAF and the KHEH, in Sindhi, they are two very distinct characters as far as the Sindhis are concerned. And then vertical dots versus horizontal dots, in many languages they would have both variations and have them as two different characters. And then also, the phonetics of it are not always the same. So the two similar-looking KAFs are actually not pronounced the same way. That's why they have those two different KAFs, because each one is representing a different phoneme in that particular language. So it is not -- Frequency is something which we will probably definitely use, eventually, to make some decisions, I'm not sure how or when, but it is definitely something which is usable to make some decisions. But frequency is also tricky because many of these languages are not represented on the Web. So it's not very easy to come -- get a corpus on which these frequency calculations can actually be done. But frequencies is also tricky, because many of these languages are not represented on the Web. So it's not very easy to come get a corpus on which these frequencies conclusion can be actually done. So it's -- >> RAMY AHMED: It's the responsibility of those who (inaudible) their own language. >>SARMAD HUSSAIN: That -- >> RAMY AHMED: To offer some (inaudible). >> I would like to add something to that subject. Because it is very similar -- >>RAM MOHAN: Could you please introduce yourself. >>WERNER STAUB: My name is Werner Staub from CORE. The problem is very similar to the problem experienced by many European countries with accented characters and the simplification that comes in when you only have nonaccented characters. Of course, in many cases, it means something completely different; it cannot be the same, but really changes the meaning to a degree that is not tolerable. And the -- curiously enough, the combination of both approaches turned out to be the right one. So we tried to combine the simplification with the recognition of the diversity. And if the TLDs that introduced accented European characters did not recognize both, so to speak, just looked at one extreme or the other, they did not get to a good solution. So if you look at Germany, they said, an IDN variant looks like, for instance, Mueller, with an umlaut, is a different domain from Mueller, which caused the community to lose confidence. The confusion is something people got used to. And whenever they saw "Mueller," they said, "Oh, probably means Mueller." But we know this wrong spelling is sometimes necessary. And in the more recent TLDs that were introduced, for instance, in dot cat, we just basically created packages, and the user would automatically -- the domain registrant would automatically get access free of charge of the whole series of variants that they would have. So the only question for them is to configure them, their machines, properly. And, of course, they were automatically protected against the common misspelling, you know, the common inevitable misspelling to be taken by somebody else, which is sometimes, you know, a big problem. It causes a loss of confidence. By creating this package, we create both use, and you can actually misspell it and still get the correct domain when it's resolved. >>RAM MOHAN: Thank you, Werner. And, you know, very relevant. In the working group, one of the things that we have talked about is to create bundles, which is similar to what -- the packages that Werner was talking about, and to approach that. I hadn't rehearsed this, but since Mr. Elzaroony is here, but in the dot AE, where you're in the middle of implementing the Arabic in it, what has your approach been when it comes to these kind of confusingly similar characters? Are they bundled or how do they work? >> MOHAMMED ELZAROONY: Well, actually, we started some thinking about making our registry system flexible enough to either making this bundled or totally separate from each other. Still we haven't reached any decision about how to make it, but hopefully some registries should be allowed to take care of either decision we made, just to make sure that, you know, we don't lose anything from that perspective. >>RAM MOHAN: Thank you. There's a question in the back. Baher, thank you. >> (saying name) from PK NIC. It looks like the homoglyphs are the confusingly similar characters. Most are concerned about the phishing. So who made the decision that phishing has to be addressed at the protocol level or (inaudible) level when there are other solutions for phishing? >>RAM MOHAN: It's a good question. At the protocol level, you know, there's an IETF working group. And whatever we submit as the recommendation into the working group, the working group, you know, comes to conclusions, posts the conclusions for public comment, and that's the process that happens there. That's an IETF process, a well-defined, documented process. So the protocol level, that's how it works. What we have been doing, if you look at, you know, what George is showing about the four layers, at the unified Arabic script layer, if you will, that's something that we are hoping that within the working group, we'll be able to come to some ideas about outcomes, propose those outcomes in the community, and, you know, make it open for feedback. What we are doing does not have the weight, if you will, of a rule or regulation or even an RFC. We intend -- as we do our work, we intend to submit them into the IETF process as RFCs, 'cause then they kind of become Internet standards. But our primary focus at the script level has been to identify the problems first. And we spent most of this year just identifying what are all the problems that exist. Sarmad's document is a good example of that. We are not even close on identifying solutions to these problems. The more we start discussing problems, the more we realize that we better be doing a good job of being as comprehensive as possible. That's been our effort there. At the registry level, our approach has been that it's up to the registries, it's not up to us, it's not up to the protocol people. The registries are, in some cases, regulated by or managed by a government entity. In other cases, they are private entities. In other cases, they are a mixture of them. And as far as we are concerned, we're a group of self-interested, you know, technologists, linguists, policymakers, et cetera, but we don't set policy. We come together and we say, "Here are what we think are appropriate guidelines." And we put that together because we have talked about it, even argued about it, right. But we don't enforce that at the registry level. Whatever the registry decides is the right thing to do is what it should be doing, right? And one level above that is what applications do; right. And we anticipate that even after, in a registry, let us say the PK registry, decides to allow certain characters to be represented in a domain name, we certainly anticipate that, at an application level, you know, a Google or a Microsoft, or some application vendor may decide, "This is a phishing problem," or, "This causes some other confusion," and not support it, right. And we don't have much control over that. But what we are trying to really arrive at is, we kind of look at it as -- if you look at it as a pyramid, we're trying to create as broad a base as possible of all the characters that ought to be included. Then we go one level up and we say, "In the characters that should be included, what are confusingly similar? Let's see if we can create bundles around them." What you do with the bundles is up to the registries. And what the applications do with the domain names that the registries allow is up to the applications. So that's been our approach. Other questions? >>SARMAD HUSSAIN: Can I -- can I also comment a bit more on the previous question before we move on to the next? It's not just phishing. I think what's also probably much more fundamental is giving the users the same experience of domain names. >>RAM MOHAN: Of course. >>SARMAD HUSSAIN: -- across -- A user may not be in one place all the time. The user is traveling across the world, using Internet in different contexts, somewhere with Cyrillic keyboard, sometimes with Latin keyboard, sometimes with Arabic keyboard. But the essential domain name user experience should not change based on geographical location. And that's also very important. Phishing obviously is important. But the user experience, because, eventually, if the user starts getting different versions of domain names depending on where the person is, it's going to become so confusing that Arabic domain names are not going to fly at the end of the day. So I think whatever main motivation is to enable users to use Arabic domains and whatever issues that come in the process, experience-related or security-related. >>RAM MOHAN: Yeah. And what we've done -- just a second. What we've done inside our working group, we end up calling that the "side of the bus" problem. If you're from one part of the world and you go to another part of the world and you see a domain name or an e-mail address on the side of a bus, you don't actually know what language it is, so KITAB in Arabic and KITAB in Urdu, they kind of mean the same thing. But to a user, we have to make sure that when they type it in, not knowing the language they're in, that they get to the same spot. >> MOHAMMED ELZAROONY: As a registry operator, I'd like just to comment on the approach you are using in the ISOC meeting. What's important to me as a registry operator, to make sure that at the protocol level, we're trying to tackle all the security issues rather than keeping it to the registry and that one. And I think this is a very healthy approach as we are tackling all of the security-related or most of the security-related issues on the base of the pyramid, then that would make me, as a registry operator, or an implementer, much easier for me to implement the IDN. And I shouldn't take care about all of these on the registry level rather than taking care about the -- let's say, the business logic of it rather than keeping aware of all the security issues. So I think that the approach you are using, as we are concentrating on the base, which is at the protocol level, it's a very healthy approach. >>RAM MOHAN: Thank you. Back to you, George. >>GEORGE VICTOR SALAMA: Okay. I have two final comments. The first one is about a tool that really facilitates our work in the Arabic script. This tool is actually developed by our colleagues from SE. And the link is available on the group Wiki page here, an Arabic script comparison tool. Mainly, this tool helps us to visualize and configure and dig more into the problem of the confusingly similar characters. So we can select, for example, the Arabic character "seen." See the presentation of this character, if this stands alone or, in the end, in the middle, in the beginning. Also, the Unicode presentation of this character, as well as the font. We can change the font here, two fonts, for example. And you can see in different fonts. So we can compare this character with another character in another language, and we can imagine and visualize more the problem of confusingly similar character. So this tool is already accomplished and available online. And we are looking for more tools which can facilitate and enhance our results in the working group. This is one point. Another point, personally, I think that in order for the Arabic domain names, in specific, for the Arabic language, to have good results, let's talk about Arabic content. It's from another point of view, Arabic content is about 3% of the overall Internet content, which is a very small percentage, which we really need to enhance, so the Arabic domain names project will have a clear vision. So I can't imagine a Web site having an Arabic name and the content is in English, for example. So this is another point of view, which is not really related to the working group, but I believe we need to work on Arabic language content in order for Arabic domain names to be a nice project. Thank you. >>RAM MOHAN: Thank you, George. That concludes this session. Thank you for coming. And, once again, if you're interested in observing or in participating, please join the mailing list. And this week, we do plan to have a face-to-face session, which typically consists of operators, et cetera. If you're interested in participating in the session, please come and talk to any of us, and we can help facilitate that. And other than that, thank you very much for coming. And look forward to a successful rest of the ICANN meeting. Baher, thank you for helping organize this. [ Applause ]