IETF IDN Working group session 9 August 2001 London, England Agenda Bashing: Agreement on the floor to cut Reordering, nameprep update, Uname proposal, Hangulchar, tsconv from planned agenda. ================================================================ WG UPDATE, Marc Blanchet Coordination with other groups/efforts: - IETF apps area - "requirements" for encoding: ACE or UTF8 - directory efforts: directory@apps.ietf.org - Unicode/ISO - Any modifications to Unicode/ISO tables should be done by those parties, not IETF - IETF dnsext WG - Any modification to DNS protocol should be discussed in dnsext - ICANN/IANA - Policies - Pool W, pool of documents that identify core of interest by WG - Currently: - requirements co-chairs believe there is a wg rough concensus and intend to forward it to IESG for Informational. - idna - nameprep - dude - aceid - jpchar - ace-eval-jp - mace - uname - tsconv - udns - amc-ace-z - hangeulchar - lsb-ace - Today's focus is on standards track proposals ================================================================ ACE EVALUATION WITH IDNs ALREADY REGISTERED, Yoshiro Yoneya - Done by CNNIC, KRNIC, TWNIC and JPNIC with data they have for registered domain names, focusing on ACEs in Pool W. - Most important evaluation criterion to study is to maximize number of characters, raw speed is less important because nameprep is the slow stage. - Long IDNs (more than 15 Han characters) are already registered. - Evaluated ACEs: DUDE, AMC-ACEZ, MACE and RACE - Focus on DUDE and AMC-ACE-Z with MACE&RACE as reference Graphs of efficiency of domain names from each of KRNIC and TWNIC, where AMC-ACE-Z shows best compressions Charts showing that the four NICs consider AMC-ACE-Z to be either good or very good, while others were "bad" or "very bad" for at least one NIC. MACE co-authors (including the presenter, Yoneya-san) support AMC-ACE-Z. Recommendation from the study is: AMC-ACE-Z ================ WG Questions for sense of the group: Question: If there is a need for an ACE, choose one: - DUDE few hands - AMC-ACE-Z most hands - MACE (removed at request of authors) - don't care but want an ACE chosen fair bunch of hands Erik Nordmark: question is, if you use an ACE, this is the one. Not saying you need to use an ACE anywhere. ?: What is re-ordering? James Seng: pre-processing to make more frequently used chars more compressed. Paul Hoffman: Not binding vote. Should be comfirmed on mailing list. Concensus: AMC-ACE-Z (with many don't care so long one is choosen) Should we do reordering? - Yes some hands - No some hands No clear result of poll. Erik Nordmark: A lot fewer people participated in the former poll but not the latter. Why? Bill Manning: We read the draft but didn't understand it, and need to read it again. Paul Hoffman: Don't understand the re-ordering draft. Does not broaden to other scripts. Larry Masinter: Re-ordering adds complexity. Kilnam Chon: Re-ordering is critical for CJK but add complexity. Paul Hoffman: This draft adds complexity, so perhaps people are waiting to decide how to judge whether the added complexity is worth it. Eric Chen: This is just intended to help CJK. Most of the interest is in CJK. Why not? James Seng: What I'm hearing is that the authors should do a cost/benefit analysis, but it is clear the draft is not ready to move forward. Erik Nordmark: Can someone do a pro/con analysis draft, or someone do pro and someone con, to help drive the discussion on the mailing list? Paul Hoffman: Let's make Adam [Costello] do it. [laughter] Kilnam Chon: This straw poll process isn't really valid because not enough representation from people for whom this is really important. There's always a trade-off. James Seng: Could someone who voted against the lsb draft just explain why you are against it? Paul: I'd rather someone else did, but I will ... the reordering draft is somewhat of a hack to optimize for certain scripts, but it is at the cost of other scripts, isn't really generalized, and there has been no analysis of how beneficial it is for DUDE and AMC-ACE-Z. Dongman Lee: The author was not trying to propose this as a generalized mechanism. It is not surprising that since CJK is driving internationalization, that proposals would be specific to that. Ted Hardie: As Paul pointed out, this has different effects on different scripts, but now that we are focused on one ACE we can ask more specifically for the authors to focus on just how it affects AMC-ACE-Z. Concensus: discuss the reordering on mailing list and request authors of ACE and reordering to come to a proposal with analysis. ================================================================ MATCHING (NAMEPREP) - Need for a standardized pre-processing step regardless of what IDN protocol we choose? Yes lot of hands No one hand (Discussion clarified the question from the original.) Other comments: Patrik Faltstrom: Doesn't preclude other pre-processing before it, which some people have worried it would. But even so, IETF really needs to have one standard way of processing Unicode. James Seng: When you say one standard way, do you mean one with flexibility for locale, or essentially fixed? Patrik Falstrom: Essentially fixed. Dave Crocker: I thank Patrik for his comments that helped clarify things for me. I used to be resistant to it, but am coming to accept it. It is quite a bit like the case-insensitive/sensitive thing we're so used to in ASCII. There are two processes here: case-mapping and determining the legal character set. Keep them cleanly separate. ? Russell: Wenhui Zhang: Should have a standard that includes where local issues can be defined, which can include their standarized pre-processing. ?: Goal of working group is noble, but are trying to kill all the birds with one stone, and so we need a really large stone. So many legacy systems are optimized for their local languages, and will have a lot of pain to switch to what is being planned. They don't have much of a voice here, those who are going to suffer most. ?: Look into what happened in the LDAP group, how they ended up with a bunch of language-specific things. It is difficult, but it can be done, and since it has already been solved, build on it. Erik Nordmark: Can we get back on the topic of this question? We seem to be wandering into the general requirements area. POLL: Many to 1 in favour of standard pre-processing step. Post poll: John Klensin: I can agree that a standard pre-processing step is needed, but I can't agree if that necessarily means having a single binary result even in ambiguous situations. Very concerned about that. This working group might be resulting in something that is totally irrelevant. Eric Brunner-Williams: The ambiguity need not exist in "uniprep" (the first of the stages observed by Dave Crocker), the problem arises in the other part. Paul: I think we should now work toward an architecture that includes pre-nameprep, nameprep and post-nameprep. The middle one can be generally standardized while the other stages need not be. Erik Nordmark: Addressing John's concern of irrelevance, I can see how this work would eventually be superseded by something better, but that doesn't mean we have to stop doing this very useful work now. Dave Crocker: Dealing with "language" is out of scope for this group, this working group should just be about expanding the set of strings that are usable as domain names. In that context canonicalization makes a lot of sense, but not when we start talking about natual language. Ted Hardie: I have to take exception to Paul describing a system that is not standarized end-to-end; it can't include processing that is not standardized. Also agree with Dave that we can't work with natural language, we don't have the expertise. ?: Rigorously avoid natural languages. Eric Chen: We need to consider natual language! Dave Crocker: The scope is very narrow and does not include languages. Harald: "Yes." Paul Hoffman: Please defer all questions of language, there will be a draft soon that addresses where it should be addressed. Next step will be for the authors to clarify the relation between the various proposals for processing into a cohesive architecture, namely nameprep, tsconv, jpchar, hangeulchar. ================================================================ PROTOCOL PROPOSALS, Dave Crocker Dave's Disclaimers: - System oriented person - Not a Unicode expert, or even naif - Entirely biased -- wanted to be objective, but failed IDN Task: - Enhance range of domain names that are useful - Not human "name" - Not "language" - Has no sets - Requires: fairness, efficiency, reliability, transition, ... The Usual Suspects: Encoding Approach 1. ACE only IDNA 2. UTF-8 only IDNA-mod, uDNS 3. ACE then UTF-8 IDNA-mod, uDNS 4. ACE & UTF-8 both uDNS, uNAME 5. Anything goes uNAME Encoding efficiency: - ACE is an encoding scheme - UTF-8 is an encoding scheme - Both map many bits to a variable length string - All variable length strings are unfair to some poeple - Fair vs unfair unfairness: - longer mapping mean shorter names - shorter names restricted to information dense character sets Encoding comparison: 1. ACE is three minuses bad. 2. UTF-8 is two minuses bad. Charts showing that there are a lot of modules in both systems, and we have to worry about all the modules in both systems. ACE has an extraordinarily minimal amount of change necessary to make an IDN useful, just two applications. This is about as good a transition scheme as you can possibly get. UTF-8 is an extreme in the opposite direction, it requires that everything work end-to-end. 1. ACE only four pluses good 2. UTF-8 only five minuses bad Transition Interactions: ------------------------------------------------------------------------- ------------------------------------------------------------------------- Client-> Server-> ACE UTF-8 Server Client ------------------------------------------------------------------------- 1. old client old dn new dn transparent UTF-8 and ACE new server maybe break client ------------------------------------------------------------------------- 2. new client, new dn old dn transparent break server? old server ------------------------------------------------------------------------- ------------------------------------------------------------------------- Specification comparison: ------------------------------------------------------------------------- ------------------------------------------------------------------------- Efficiency Transition Risk/Operational Expense ------------------------------------------------------------------------- IDNA (ACE) bad(data) automatic none ------------------------------------------------------------------------- how to detect uDNS (UTF-8) poor(data) when to use ACE? high (poorly defined and not realistic) ------------------------------------------------------------------------- unstated uName (both) bad (round trip) (and based on CNRP, very, very with no meaningful high deployment) ------------------------------------------------------------------------- ------------------------------------------------------------------------- Olafur: Hard for me to say this to you Dave, given our history, but good job. Harald: Think you underestimate the cost of ACE a bit, in that leakage will confuse users. But UTF-8 leakage will also confuse users, but likely even a bit more! But the ranking is still good. Paul: uName doesn't actually have CNRP in it; it was put in the draft and then explicitly shot down in the draft. It uses a new RR, but the end result is pretty much the same as far as your conclusions go. Erik Nordmark: Can we vote on it without a UTF-8 draft in the pool? Would need a draft very fast. Poll: - idna? Yes Most No Some - udns? Yes Few No Most - uname? Yes Few No Most Interpretation by Harald and Marc was that: IDNA was the only strongly supported proposal in the room and the other two had strong opposition. Interpretation was agreed by the floor. Nameprep discussion back (some time remaining) Paul Hoffman: Good (from a marketing sense) user interfaces will do a lot of mucking with input. Really should have it defined how and where they can do that. If you change machine, different local translation tables can yield different names. James Seng: It can be very hard to determine what local conversion option to turn on. Not sure if this wg has capability to deal with codepoint matching. We need to reference code points outside the IETF, at Unicode Consortium. Paul Hoffman: Unicode has put mapping tables out of scope. Harald Alvestrand: This working group is internationalized access to domain names, not localized. This group is trying to specialize what a client must do no matter where it is in the world. I would accept a statement that the relationship between the pre-processing drafts. It has to be made mandatory though or it should not be part of the output of this group. Wenhui Zheng: IDNA draft should be explicit where the local interface/mapping should be done. Eric Chen: We have built a house and opened some gates but not others. Some languages can come in and others can not. IDNA should open its gate to allow other languages to do their thing. ================================================================ NEXT STEPS - AMC-ACE-Z as chosen ACE. - Reordering to be discussed on mailing list. - relation between nameprep/tsconv/hanguelchar/jpchar/stringprep to be consolidated into one architecture. - Go forward with IDNA.