IETF IDN Working group session
9 August 2001
London, England

Agenda Bashing:

Agreement on the floor to cut 
 Reordering, nameprep update, Uname proposal, Hangulchar, tsconv 
from planned agenda.

================================================================

WG UPDATE, Marc Blanchet

Coordination with other groups/efforts:
- IETF apps area
   - "requirements" for encoding: ACE or UTF8
   - directory efforts: directory@apps.ietf.org
- Unicode/ISO
   - Any modifications to Unicode/ISO tables should be done by those
   parties, not IETF
- IETF dnsext WG
   - Any modification to DNS protocol should be discussed in dnsext
- ICANN/IANA
   - Policies

- Pool W, pool of documents that identify core of interest by WG
- Currently:
   - requirements
     co-chairs believe there is a wg rough concensus and intend to forward
     it to IESG for Informational.
   - idna
   - nameprep
   - dude
   - aceid
   - jpchar
   - ace-eval-jp
   - mace
   - uname
   - tsconv
   - udns
   - amc-ace-z
   - hangeulchar
   - lsb-ace
- Today's focus is on standards track proposals

================================================================

ACE EVALUATION WITH IDNs ALREADY REGISTERED, Yoshiro Yoneya

- Done by CNNIC, KRNIC, TWNIC and JPNIC with data they have for
  registered domain names, focusing on ACEs in Pool W.
- Most important evaluation criterion to study is to maximize number
  of characters, raw speed is less important because nameprep is the
  slow stage.
- Long IDNs (more than 15 Han characters) are already registered.
- Evaluated ACEs: DUDE, AMC-ACEZ, MACE and RACE
- Focus on DUDE and AMC-ACE-Z with MACE&RACE as reference

Graphs of efficiency of domain names from each of KRNIC and TWNIC,
where AMC-ACE-Z shows best compressions

Charts showing that the four NICs consider AMC-ACE-Z to be either good
or very good, while others were "bad" or "very bad" for at least one NIC.

MACE co-authors (including the presenter, Yoneya-san) support
AMC-ACE-Z.

Recommendation from the study is: AMC-ACE-Z

================
WG Questions for sense of the group:

Question: If there is a need for an ACE, choose one:
- DUDE                                  few hands
- AMC-ACE-Z                             most hands
- MACE                                  (removed at request of authors)
- don't care but want an ACE chosen     fair bunch of hands

Erik Nordmark: question is, if you use an ACE, this is the one. Not
saying you need to use an ACE anywhere.

?: What is re-ordering?

James Seng: pre-processing to make more frequently used chars more
compressed.

Paul Hoffman: Not binding vote. Should be comfirmed on mailing list.

Concensus: AMC-ACE-Z (with many don't care so long one is choosen)

Should we do reordering?
- Yes                                   some hands
- No                                    some hands
No clear result of poll.

Erik Nordmark: A lot fewer people participated in the former poll but
not the latter.  Why?

Bill Manning: We read the draft but didn't understand it, and need to
read it again.

Paul Hoffman: Don't understand the re-ordering draft. Does not 
broaden to other scripts.

Larry Masinter: Re-ordering adds complexity. 

Kilnam Chon: Re-ordering is critical for CJK but add complexity.

Paul Hoffman: This draft adds complexity, so perhaps people are waiting 
to decide how to judge whether the added complexity is worth it.

Eric Chen: This is just intended to help CJK. Most of the interest
is in CJK. Why not?

James Seng: What I'm hearing is that the authors should do a
cost/benefit analysis, but it is clear the draft is not ready to move
forward.

Erik Nordmark: Can someone do a pro/con analysis draft, or someone do pro and
someone con, to help drive the discussion on the mailing list?

Paul Hoffman: Let's make Adam [Costello] do it. [laughter]

Kilnam Chon: This straw poll process isn't really valid because not
enough representation from people for whom this is really important.
There's always a trade-off.

James Seng: Could someone who voted against the lsb draft just explain 
why you are against it?

Paul: I'd rather someone else did, but I will ... the reordering draft
is somewhat of a hack to optimize for certain scripts, but it is at
the cost of other scripts, isn't really generalized, and there has
been no analysis of how beneficial it is for DUDE and AMC-ACE-Z.

Dongman Lee: The author was not trying to propose this as a
generalized mechanism.  It is not surprising that since CJK is driving
internationalization, that proposals would be specific to that.

Ted Hardie: As Paul pointed out, this has different effects on
different scripts, but now that we are focused on one ACE we can ask
more specifically for the authors to focus on just how it affects
AMC-ACE-Z.

Concensus: discuss the reordering on mailing list and request authors
 of ACE and reordering to come to a proposal with analysis.

================================================================

MATCHING (NAMEPREP)

- Need for a standardized pre-processing step regardless of what IDN
   protocol we choose?
   Yes      lot of hands
   No       one hand

(Discussion clarified the question from the original.)

Other comments:

Patrik Faltstrom: Doesn't preclude other pre-processing before it,
which some people have worried it would.  But even so, IETF really
needs to have one standard way of processing Unicode.

James Seng: When you say one standard way, do you mean one with flexibility
for locale, or essentially fixed?

Patrik Falstrom: Essentially fixed.

Dave Crocker: I thank Patrik for his comments that helped clarify
things for me.  I used to be resistant to it, but am coming to accept
it.  It is quite a bit like the case-insensitive/sensitive thing we're
so used to in ASCII.  There are two processes here: case-mapping and
determining the legal character set.  Keep them cleanly separate.

? Russell:

Wenhui Zhang: Should have a standard that includes where local issues
can be defined, which can include their standarized pre-processing.

?: Goal of working group is noble, but are trying to kill all the
birds with one stone, and so we need a really large stone.  So many
legacy systems are optimized for their local languages, and will have
a lot of pain to switch to what is being planned.  They don't have
much of a voice here, those who are going to suffer most.

?: Look into what happened in the LDAP group, how they ended up with a
bunch of language-specific things.  It is difficult, but it can be
done, and since it has already been solved, build on it.

Erik Nordmark: Can we get back on the topic of this question?  We seem to be
wandering into the general requirements area.

POLL: Many to 1 in favour of standard pre-processing step.

Post poll:

John Klensin: I can agree that a standard pre-processing step is
needed, but I can't agree if that necessarily means having a single
binary result even in ambiguous situations.  Very concerned about that.
This working group might be resulting in something that is totally
irrelevant.

Eric Brunner-Williams: The ambiguity need not exist in "uniprep" (the
first of the stages observed by Dave Crocker), the problem arises in
the other part.

Paul: I think we should now work toward an architecture that includes
pre-nameprep, nameprep and post-nameprep.  The middle one can be
generally standardized while the other stages need not be.

Erik Nordmark: Addressing John's concern of irrelevance, I can see how
this work would eventually be superseded by something better, but that
doesn't mean we have to stop doing this very useful work now.

Dave Crocker: Dealing with "language" is out of scope for this group, this
working group should just be about expanding the set of strings that
are usable as domain names.  In that context canonicalization makes a
lot of sense, but not when we start talking about natual language.

Ted Hardie: I have to take exception to Paul describing a system that
is not standarized end-to-end; it can't include processing that is not
standardized.  Also agree with Dave that we can't work with natural
language, we don't have the expertise.

?: Rigorously avoid natural languages.

Eric Chen: We need to consider natual language!

Dave Crocker: The scope is very narrow and does not include languages.

Harald: "Yes."

Paul Hoffman: Please defer all questions of language, there will be a draft
soon that addresses where it should be addressed.

Next step will be for the authors to clarify the relation between
the various proposals for processing into a cohesive architecture,
namely nameprep, tsconv, jpchar, hangeulchar.

================================================================

PROTOCOL PROPOSALS, Dave Crocker

Dave's Disclaimers:
- System oriented person
- Not a Unicode expert, or even naif
- Entirely biased -- wanted to be objective, but failed

IDN Task:
- Enhance range of domain names that are useful
- Not human "name"
- Not "language"
- Has no sets
- Requires: fairness, efficiency, reliability, transition, ...

The Usual Suspects:
     Encoding            Approach
1. ACE only             IDNA
2. UTF-8 only           IDNA-mod, uDNS
3. ACE then UTF-8       IDNA-mod, uDNS
4. ACE & UTF-8 both     uDNS, uNAME
5. Anything goes        uNAME

Encoding efficiency:
- ACE is an encoding scheme
- UTF-8 is an encoding scheme
- Both map many bits to a variable length string
- All variable length strings are unfair to some poeple
- Fair vs unfair unfairness:
   - longer mapping mean shorter names
   - shorter names restricted to information dense character sets

Encoding comparison:
1. ACE is three minuses bad.
2. UTF-8  is two minuses bad.

Charts showing that there are a lot of modules in both systems, and we
have to worry about all the modules in both systems.

ACE has an extraordinarily minimal amount of change necessary to make
an IDN useful, just two applications.  This is about as good a
transition scheme as you can possibly get.

UTF-8 is an extreme in the opposite direction, it requires that
everything work end-to-end.

1. ACE only four pluses good
2. UTF-8 only five minuses bad

Transition Interactions:
-------------------------------------------------------------------------
-------------------------------------------------------------------------
                 Client->  Server->           ACE              UTF-8
                 Server    Client
-------------------------------------------------------------------------
1. old client   old dn    new dn          transparent     UTF-8 and ACE
    new server                                           maybe break client
-------------------------------------------------------------------------
2. new client,  new dn    old dn          transparent     break server?
    old server
-------------------------------------------------------------------------
-------------------------------------------------------------------------

Specification comparison:

-------------------------------------------------------------------------
-------------------------------------------------------------------------
                 Efficiency          Transition           Risk/Operational
                                                             Expense
-------------------------------------------------------------------------
IDNA (ACE)     bad(data)            automatic               none
-------------------------------------------------------------------------
                                     how to detect
uDNS (UTF-8)   poor(data)           when to use ACE?        high
                                     (poorly defined
                                     and not realistic)
-------------------------------------------------------------------------
                                     unstated
uName (both)   bad (round trip)     (and based on CNRP,     very, very
                                     with no meaningful      high
                                     deployment)
-------------------------------------------------------------------------
-------------------------------------------------------------------------

Olafur: Hard for me to say this to you Dave, given our history, but
good job.

Harald: Think you underestimate the cost of ACE a bit, in that leakage
will confuse users.  But UTF-8 leakage will also confuse users, but
likely even a bit more! But the ranking is still good.

Paul: uName doesn't actually have CNRP in it; it was put in the draft
and then explicitly shot down in the draft.  It uses a new RR, but the
end result is pretty much the same as far as your conclusions go.

Erik Nordmark: Can we vote on it without a UTF-8 draft in the pool?
Would need a draft very fast.

Poll:
- idna?
   Yes   Most
   No    Some
- udns?
   Yes   Few
   No    Most
- uname?
   Yes   Few
   No    Most

Interpretation by Harald and Marc was that: IDNA was the only strongly
supported proposal in the room and the other two had
strong opposition. Interpretation was agreed by the floor.

Nameprep discussion back (some time remaining)

Paul Hoffman: Good (from a marketing sense) user interfaces will do a lot of
mucking with input.  Really should have it defined how and where they
can do that. If you change machine, different local translation
tables can yield different names.

James Seng: It can be very hard to determine what local conversion
option to turn on. Not sure if this wg has capability to deal
with codepoint matching. We need to reference code points outside 
the IETF, at Unicode Consortium. 

Paul Hoffman: Unicode has put mapping tables out of scope.

Harald Alvestrand: This working group is internationalized access to domain
names, not localized.  This group is trying to specialize what a
client must do no matter where it is in the world.  I would accept a
statement that the relationship between the pre-processing drafts.  It
has to be made mandatory though or it should not be part of the output
of this group.

Wenhui Zheng: IDNA draft should be explicit where the local
interface/mapping should be done.

Eric Chen: We have built a house and opened some gates but not others.
Some languages can come in and others can not. IDNA should open its
gate to allow other languages to do their thing.

================================================================
NEXT STEPS

- AMC-ACE-Z as chosen ACE.
- Reordering to be discussed on mailing list.
- relation between nameprep/tsconv/hanguelchar/jpchar/stringprep
   to be consolidated into one architecture.
- Go forward with IDNA.