ICANN Meetings in Carthage
Whois Workshop Captioning
Wednesday, 29 October 2003
The following is the output of the real-time captioning taken during the Whois Workshop held 29 October 2003 in Carthage, Tunisia. Although the captioning output is largely accurate, in some cases it is incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the session, but should not be treated as an authoritative record.
Michael Roberts: good morning. We'll begin very shortly. And we expect to have our audience arrive in due course.
Welcome to all of you who have gotten up early and come on time. We have a busy agenda this morning, and so we're going to start more or less on time in order that the sessions can move more or less according to schedule.
My names is Mike Roberts, and I'm the moderator this morning. I'd like to make just a few introductory remarks about the format of the program and the purpose of the workshop. This is the second of a series of workshops on WHOIS which are designed to inform those of us who have been working closely with ICANN on WHOIS as well as those who have not spent much time on this topic.
For those of you who have not had recent contact with WHOIS, I'd like to just note that although everyone universally refers to this subject as the "WHOIS topic," in fact, WHOIS is a protocol that is designed to do data retrieval. And although we are going to discuss today the use of WHOIS for data relating to domain name registrants, it's also elsewhere in the Internet for other purposes. Most of the issues surrounding WHOIS are related to content and not to the protocol.
So I hope that you'll bear the distinction in mind as we go through the sessions this morning that although we use the term "WHOIS," what we're really trying to deal with is the data related to registrations and registrants and where it comes from and how we use it and what policy envelope ought to apply to that information.
We've organized this session this morning around three topics that are considered to be among the more pressing issues related to WHOIS. There are, of course, many others. And, in fact, the GNSO recently did an issues list for WHOIS. And that list added up to 20 different issues that may receive consideration from the GNSO in the course of its work on WHOIS, which will be discussed later this morning.
The format of the sessions is that for two of them, we will have an overview presentation, and then we'll have comments and remarks from panelists. I'm very pleased that a number of individuals from the community with specialized knowledge about WHOIS have agreed on very short notice to join us today. And I and ICANN staff are most appreciative of their willingness to volunteer to do this.
I'd like to ask, because we have so many people with us, and on the web cast, that have English as a second language, that everyone, including those asking questions from the floor, make an effort to speak slowly, articulate well, and help everyone understand both the questions and the answers.
With that, I'd like to introduce our first panel, Mr. John Klensin, who is the IAB liaison to the ICANN board, has held many important posts in Internet architecture, longtime member of the Internet Architecture Board, is going to give us a presentation on international issues related to WHOIS.
And then we're going to have remarks from a panel that includes four individuals this morning: professor Hualin Qian from the Chinese Academy of Sciences; Ram Mohan, Technical Director at Afilias; Charles Shaban, from the organization in Jordan that has done a lot of work in the Middle East on these issues, and Dr. Wen-sung Chen from TWNIC.
John, why don't you start us off.
John Klensin: thank you, Mike.
I'm going to try to talk generally about some of the issues associated with what from an internationalization standpoint is the WHOIS problem. There are, as Mike pointed out, both protocol issues here, this port-43 protocol, and issues with databases and what you ask when you ask the question and what you get back. We're not going to try to provide any answers; we're going to try to provide an overview of what some of the issues are that the GNSO and the community will need to look at as we try to develop policy in this area. And as part of that, some of the areas which policy needs to be developed.
I decided late last night that much as we've all been talking about internationalization of domain names for the last several years, some of you may not know -- excuse me -- what actually -- how that actually works. So I'm going to attempt to explain to you how it works technically while standing on one foot. This is going to be a very high-level overview.
You can see the foot.
For a number of technical reasons, no characters which are not ASCII ever go into the DNS under this internationalization plan. What's being used instead in order to keep all sorts of applications from breaking is an encoding scheme. The encoding scheme is very ugly. The intent is that users will never see it.
But if you see strings floating around the Internet in the next few years which look like xn dash dash followed by jibberish, ASCII characters don't form any words at all, you are looking at an internationalized domain name in its internal DNS format. And the assumption is that will get converted to real names, which can be displayed in real characters on screens before the user sees it. How well this will work, we don't know yet.
So this special encoding for various strange reasons is called punycode. At the level of policy and general understanding, if you hear the words "IDNA" and "punycode," they are the same thing, and they are the protocol names for dealing with international domain names. The real label after the decoding before the encoding is a Unicode string, and can, in principle, contain almost any character Unicode can contain, which is a lot of modern languages in the world today and a lot of things which are not modern languages. The process of internationalization changes many rules and assumptions we have made about the DNS for years, not only the DNS, but protocols at various ends of this process.
I said yesterday I can explain WHOIS protocol itself from a protocol standpoint in a very small number of sentences. It's an extremely simple protocol. The biggest constraint on it is that it is defined as ASCII only. You cannot push a Unicode name in Unicode form through a conforming WHOIS server or client. There are lots of nonconforming ones out there. But don't expect every one of them to be able to handle these strings.
Message there is that this is another reason why we're going to have to get rid of WHOIS as our primary way of accessing these databases. This raises the first interesting question.
The question is, when you decide to ask about a domain name through a WHOIS protocol or one of its successors, how do you ask?
One possibility is that you can use one of these unpleasant IDNA punycode names and ask. And the advantage of asking that way is that those names are in ASCII. So they go through the protocol. But users aren't going to like them and you're not going to like them and nobody's going to be able to remember them.
The second possibility is, you ask in Unicode, probably in a coding of Unicode we call UTF-8, but maybe something else. If there's any possibility of there being something else, there needs to be a negotiation about what characters are used or we need to standardize. As with almost everything else in protocol design, we have a trade off here. We can provide you more flexible in what you want to do or we can provide a lot of compatibility. If you want more different ways of doing things, you're going to get less compatibility. That's just the way it works. It's sort of like standing here. If we all spoke Arabic, communication here would be much easier. But some of us don't.
Another possibility is to arrange things so that one could ask either in IDNA or in Unicode or in a local language. From a technical standpoint, that raises another set of issues, which is that a conversion of these databases that underlie the WHOIS protocol to use what we call technically multiple keys, different ways of asking the same question, is more complicated. This is a trade off between flexibility and cost.
As I say, one standard would probably be a good idea. One of the issues about asking these questions of the local character set are in Unicode or in an original form, is that it implies the ability to ask the question, is the ability to type the characters in which the question is being asked. I have absolutely no idea how to enter Chinese or Arabic characters on my keyboard. I may be the only one here. But if you're going to expect to query these names, you've either got to have a cut and paste algorithm which works, and most of them don't, or you have to be able to type those characters, or you're stuck with this IDNA query.
So we managed to ask the question somehow. And an answer comes back. Well, we could require that answer always be in English. But that defeats a lot of the purpose we've been trying to do this work, because one of the things that these data should be accessible to are people operating in the country. And certainly registrars will want to be able to accept a registration for an internationalized name in the language and script in which that name is written.
But if I ask a question about a domain name and the answer comes back in Martian, I will probably not be able to read it and probably neither will any of the rest of you. So, again, there's a trade off here between having good adaptability to the local environment and having universal readability and understandability. If you want universal readability and understandability, then English isn't it, either.
We could require local language and English, but that raises some questions about why English should be special. We could require local language and UN language if the local language isn't one of them. Maybe that's a little better.
But each of these options is an option which needs to be considered and evaluated very carefully because it changes the cost structure. It changes the size of databases, it changes the query requirements.
Is it okay to produce an answer which expects the recipient to go out and hire a translator? Maybe.
So, again, if I can't type the query, it's going to be hard to get an answer. If I get an answer which I can't understand, I might not -- I might -- and can't find anybody who can help me understand, there may have been no point in my asking the question in the first place.
One of the most important pieces of technical work which has been done in internationalized names the last several years, the IDNA standard is hard and complicated technically, and an enabling technology, but it's not important, is a notion that one has to have a reasonable way of dealing with characters that look alike or have the same meaning and would confuse people who read them if they were associated with different registration entities.
Those of who you go back as far as Melbourne have seen slides about things which look alike and concerns about this issue. Several other talks have been given at ICANN meetings about this problem.
Even in roman characters with the wrong set of fonts, lower-case l and the number one look alike. It's been known to cause confusion on the network. This new work which came out of a joint engineering team in Asia -- and I have -- we are lucky enough to have two of the senior people in that work with us today -- understood that what it's important to be able to do is to say, "if a particular name is registered, then there may be some set of other names which, if they can be registered at all, can be registered only by the same entity." This introduces a new concept into the management of names of the DNS environment.
It's a concept of reserve names which belong to a particular entity but are not themselves entered into the DNS. You'll probably be hearing a lot about that, because it affects registry, registrar protocols, and other kinds of structures, and it's going to be an interesting problem with regard to thinking about charging.
Fortunately, those are not WHOIS problems, so I don't have to talk about them today. However, there's a WHOIS problem about how much information you get with these names. And it raises all of the usual privacy issues and some commercial issues.
If I ask a question about the primary name in one of these packages of reserved variants, do I get back all the names in that package? Do I get back all the names in that package which are registered in the DNS? If I ask a question about a name which does not appear in the DNS, am I entitled to know that it's part of a package and what name that belongs to?
These are policy issues. Most of us don't understand the implications of them, and I certainly don't. But it would be very, very good for the GNSO to come up with a recommendation and some justification for that recommendation that domains can adopt, because if every different domain does something different, we are going to have a lot of confusion.
So my summary -- and I will then turn it over to my panelists for their remarks about what they're doing -- is that after many years of talking about internationalization and internationalized domain names, we have finally reached the point we have been wishing for. We may regret having wished for it, but we're here. And it's time to take this very seriously. If we wait, we increase the risk. We're already in a situation with ASCII names and simple labels of the kinds we've been using for years that I was talking about earlier.
There are zones in the world in which if I issue a WHOIS query with an ASCII label, I get back an answer in English. There are zones where I get back an answer in the local language. There are zones where I get back an answer in the local language and English. Those are all the combinations right now that I know of. But there are probably more. This problem is on top of us today. If we don't come up with recommendations as we move forward and get agreement about the wisdom of those recommendations, the chaos level will increase.
My Korean colleagues have been very good in that their WHOIS servers are now returning Korean and English. But a year and a half ago, when I attempted to query one of their sites in order to find out something about some sources of some security attacks on my hosts, I got back an answer in Korean. I don't read Korean well. In fact, I don't read Korean at all.
People assume that if they sit along enough and expect the problem will solve itself are going to be very disappointed. This problem will not solve itself. What will happen is we will have more different practices and no compatibility among them.
There are languages and scripts in Unicode which no living human being has used in conversation except in very limited academic circles for many centuries.
If we permit those names and strings to appear in labels, don't expect to be able to read what comes back in the results if the language of registration and the language of the WHOIS databases and the language of the labels and DNS are the same.
As I said earlier, the constraint on what we used to call network virtual terminal ASCII -- it's a technical restriction, just like host name -- may finally kill port 483 WHOIS.
That will make some of you happy; that will make some of you unhappy. I will be dancing with joy. But that protocol has just about outlived its useful life. We need to figure out what we're going to do with programs and scripts which legitimately query that protocol once we move on to something else. That's another policy problem.
And most importantly, and coming back to taking it seriously, we need to plan now, make policies now, and start moving forward with those policies, rather than having to clean up a giant mess later and having to incur the costs of doing something, undoing it, inventing something else, and transitioning to it.
Thank you very much.
Michael Roberts: thank you very much, John. We'll now have reports and comments from our panel.
And I'd just like to repeat for those of you who didn't hear it that we'll have a public Q and A opportunity from the floor after the conclusion of each session this morning.
Hualin Qian: good morning, everybody. This is Hualin Qian from China. Talking about the WHOIS database is my background in China.
We need the information out from the database should be understandable for the local people. So that's, for example, every registry and registrar, they should provide good service for the local community. For example, in China, when we query the WHOIS database, we always get the Chinese name, Chinese information. When the registrant they register their name, filling out the application form, we ask them to fill out both English and Chinese. Then in the domain name system, we use many of the English parts of their application -- in their application form. And for the WHOIS database, we use quite a lot -- use only Chinese.
But for some foreign company, they have office in China, they use English also. So both our WHOIS database can return English and Chinese. But for most of them are Chinese registrants. So that's good for people to look up this database with Chinese output. So that's the -- I think for the international.
For the gTLD and many other TLDs, for example, if Chinese company going abroad or going to register a domain name on dot com or dot net also, it's better to have both English and Chinese language. Otherwise, it's very difficult to recognize who is who.
For example, in China, we used the ASCII character to spell Chinese name, just like my name, "Hualin" stands for Hualin. These are according to pronunciation. And they use pronunciation regulation, rules for spell my name, my Chinese name. So when Chinese character, many different characters, they have same pronunciation. So if you spell in the ASCII code, then when it comes out, you can read it, but you don't know which character it is, because it's -- you know, it's -- multiple Chinese characters have the same pronunciation. It depends on the context sometimes.
But for the names, a person's name or a company name, it is difficult to understand that those characters from context. So that's very important, I think, for the internationalization of the WHOIS database. So that's my comment.
This protocol can support the -- the ASCII features only.
Charles Shaban: good morning. I will speak, of course, regarding the Arabic, since we are -- the panelists are speaking their experience.
So for the Arabic, it's very easy. There's still no Arabic WHOIS, I think. Some experiments only in some of the ccTLDs. So what I will talk about is some of the main concerns that I think each ccTLD manager should think about from the beginning.
For example, I will start with what was mentioned by my, both, colleagues, which language contain the WHOIS. Should it contain English and Arabic or, as in Tunisia, the second language is not English, it's French, so should it be Arabic and French? I think this has to be studied from the beginning. To be WHOIS, the people that we want them to be able to read the information.
If you want only the people who knows Arabic to read the information, then this means Arabic is enough. But if you want, for example, other people, who doesn't know Arabic language, to know who owns this domain name and who is the contacts for it, then it should contain both languages or even sometimes, in some countries, three languages, which will be a problem, I think, a little bit.
I think this applies the same for the countries outside the Arab world. A lot of countries now have a very big population of people who speaks Arabic. So should that country introduce the Arabic in their language? And if they have another people from Iran, let's say, should they include Farsi in their language? This is an issue to study.
Why I'm saying this is that just for everyone, I think, should study all the options and all the problems and concerns and find the solutions before introducing the Arabic domain name to be standard and can be -- depend on -- everyone can depend on, I mean.
I would like to point some -- one of the concerns for the intellectual property trademark owners. For trademark owners, a good WHOIS with a good dispute resolution policy will be a better solution than using a sunrise period. Why I say so, if you remember, when ICANN introduced the new top-level domains, there was a sunrise period for most of them to protect the trademark owners.
So this can -- and it showed up that this is not data sometimes. That's why they say if you have a good WHOIS that you can depend on to know exactly who registered this domain with a good distribution policy, it would be a good policy for the sunrise period.
The final I will mention is that the Arabic script itself is used by different languages, not only the Arabic language, by the way, if some still doesn't know. The Arabic script itself is used, or almost the same script is used in at least, I think, 20 languages: Arabic, Farsi, Urdu, and there is a big list. So a coordination should be done just through the Arab ccTLDs, it should be through the languages that use the Arabic ccTLDs, who use the Arabic script in general, in order not to have problems with other languages with the same scripts.
Final point, I would like to say that there is -- there was some, as most of you heard, some kind of tests and policy testing like the IDN connect and the (inaudible) test bed. I think we should depend on these tests for the technical compatibility, and of course the language concerns for the different languages, they should be taken in consideration.
Wen-sung Chen: thank you. My name is Wen-sung Chen from TWNIC. Because of time limitation, I just share some viewpoint from the IDN challenges. And suppose this WHOIS is all a simple system for inquiry. How this protocol can support the ASCII features only. And a current state of the WHOIS (inaudible) is the current name of the WHOIS and the referral of WHOIS plus plus, this kind of protocol.
And I just show the -- even in the ASCII WHOIS environment, just like professor Qian said, in different ccTLD or different have a different WHOIS requirement, especially local language required for the WHOIS. Just like in this case we have a -- like a Chinese company name, we have a Chinese registrant name, we have a Chinese address; so many different Chinese requirement. Even the ASCII domain name for (inaudible), we already have ASCII data in the WHOIS database.
And another issue is maybe the different registrant category. Like company, organization, registrant, and the individual registrant. Also maybe have a different requirement for those kind of development.
And in our experience, especially the individual, individual registrant, more sensitive, more concern about it, like their Chinese name, their Chinese address, and especially their telephone number is quite sensitive.
So that's kind of development we provide. We call opt-in mechanism to allow a user to select in the registration stage.
So we have a domain name registration agreement to -- when we write this domain name registration to cover the requirement and keep the flexibility for the end user to make decision, to make a selection, which kind of the sensitive data element to be displayed.
This is the first point I want to share with you. And the second is the just Qian said that the IDN concept, we like WHOIS just like the screen, we enter the IDN -- query the IDN and the WHOIS system can show the whole IDN submission including the reserver names, and some specific requirement for Chinese language, like traditional language and the simplified language.
So those kind of requirement is our requirement for the IDN WHOIS data display.
So back to this one is the -- we have a variant table. Each language maybe have their own language table. And the Chinese character variant table, you can see the variant table in this web site, in this hyperlink.
And also, we just started the IDN domain name. We can have original form, traditional form, simpler form, and reserved relevant domain names.
We have domain registration policy to regulate those kind of requirement. But very few case has exceptional case. If -- just like this case is the -- if register a trademark, and the document can have reserved domain names, can reserve domain names. And the potential WHOIS display is included an exception case, in the screen, you can see it. So the whole package information display and can handle the exception case. We like to show this WHOIS display requirement to you. So this is exception case query, just for the exception case only.
So the real challenge is the WHOIS query. You need to check the variant table. And the second is the you need to display the IDN for text information. And those kind of mechanisms still can handle some exception case.
And another one is the -- maybe the IDN WHOIS standards is quite urgent to discuss. And another one is the query and display is a punycode. As Qian say, this issue -- this is troublesome and tricky to deal with this issue, and critical.
Another important thing is those kind of requirement maybe depend on how to implement IDN registration policy.
So it's my viewpoint it can apply to gTLD or (inaudible) TLD.
Michael Roberts: thanks very much. Appreciate those comments.
Ram Mohan: good morning. My name is Ram Mohan, and I'm the chief technology officer for Afilias for the registry operator for dot info, and the back end service provider for dot org as well as eight other ccTLDs. Thank you for having me here to talk about WHOIS issues for gTLD registries.
I want to take just a moment to provide a little bit of a primer, and John had already done quite a bit of this work, so I've shrunk that piece here.
ICANN can't require registries to be on port 43. They're typically quiet about what ICANN registries must do on the web; however, port 43 is a requirement. And port 43 WHOIS is often used by registrars and other mostly automated sources to get authoritative information regarding contact and other information.
Now, in the case of dot info and soon to be in the case of dot org, it's a thick registry which means in addition to information about the domain, all the contact information that is associated with the domain is also stored at the registry. And this is quite a shift from the dot com or the dot net areas where all of the contact information is stored locally, distributed at various registrars.
My opinion, this is actually a good thing, particularly when it comes to internationalized domain names and finding ways to store and have that information, because you now have the opportunity to provide some level of uniformity when you're treating iDNS. And you will see, as I speak here later, you will see why that is of importance, because with the gTLD registry, there are certain considerations that do not even have to be worried about in a ccTLD registry.
Before I get there, though, I wanted to speak for a moment about some of the differences and the issues between ASCII and native representations.
You heard some of the other panelists talk about the importance of being able to accept input in the local language and being able to provide that output back in the local language. Port 43 and normal WHOIS today is expected to provide only ASCII input, and return only ASCII output. That's what the standard says. And in addition, internationalization of domain names is not the same as internationalization of the contact data or the ability to store contact data in local internationalized forms in your systems. If you're a registry operator, you have to be able to not only provide the ability to have domain names in the internationalized format but also potentially have the ability to store the contact information that way.
Domain input and output -- input should use punycode, output with use punycode, and there is a new registry -- there is a new WHOIS replacement protocol that is in the works called FIXME crisp, and that protocol, as a requirement, it says that it must support contact data to be represented in UTF-8 or non-ASCII formats.
Now, on the web there are a few choices. If you're a registry, if you're a registrar or somebody in the entire value chain, you have some choices in how you represent internationalized domain names.
Some of the simplest things that did you know can do, directly in HTML the native forms can be used in many instances by escape characters, I have an example there on the screen, which is a possibility. What it does is for a modern browser that should be able to take the HTML escape character and put in a localized format. However, at the same time if it is a text-based system that is accessing the same page, it still doesn't break that text-based system. It will show up in the way that you see in the bottom-most line here but it still does not break the old application, which is an important consideration.
In terms of the impact on registry protocols itself, what does IDN do to registry protocols?
EPP, the extensible provisioning protocol that many are going to and some work on, it does allow for the provisioning of localized addresses, so you can two addresses, one international, which could be English, and another in a localized format.
In the case of dot info, we just announced the launch, and this is going to be happening later in the year, early next year, the ability to register dot info domain names in German script with umlauts in them. And these are some of the issues we are looking into. Port 43 WHOIS, how do you represent non-ASCII. What do you do with registrars? How do you help registrars display their pages accurately so that their users and the end users and the customers can actually see the information in the way that they expect to see it?
And finally, to talk to registrars about how they store information in their own local databases.
Now, having spoken about all of this, there are some unique issues with the gTLD that is different from a country code top-level domain. You are now no longer having to support just one language input or one language output. It's not just Arabic, it's not just Korean, it's not just Chinese. The user who is coming in could be providing the information in any language that they want that the registry supports, and expects to get a response back in the appropriate manner.
In addition, a registry has to figure out how to differentiate between variants and represent packages across multiple languages, not just -- and scripts, not just in one script.
And finally, the search of WHOIS records and contact information, even from within a registry, or from within a registrar, is now much more difficult because you now have to be able to support the search in multiple scripts and languages.
And think about this: billing and invoicing to a registrar or to an end user with WHOIS information. today, in the registrars, registries, they provide WHOIS information as a default when they provide the bills, but think about that. You have a registry that is a gTLD registry with multiple scripts that are represented, and a registrar who is supporting them, just showing them xn dash dash is effectively, you know, it's accurate, but useless. However, if you're representing it in the various localized formats, then this is perhaps a requirement on registrars and other folks to be able to understand what they're actually seeing, to make sure that it's accurate on their own end.
And in terms of audits, the trail just got quite a bit harder.
That was all I had in prepared presentation. Thank you for the time.
Michael Roberts: thank you very much, Ram. We now have some time for questions or comments from the floor.
There's a mike right down here, if you would like to comment.
Masanobu Katoh: is this working? Thank you for the great presentations. I would like to make a couple of observations. Or -- oh, I'm sorry, Masanobu Katoh. ICANN board.
I'd like to clarify some of the points you made. First, on the issue of a variant, this is a unique issue, I understand, in general, in Chinese language. In the case of a Chinese, you have simplified and traditional languages. But in many other languages, we don't have these variant issues. So if we are talking about, in the context of a WHOIS, is whether we have such a concrete variant table or not. If somebody has a clear, concrete variant table, people can follow using the same variant table for the proper server WHOIS. So this is not a unique WHOIS question. It is more like a language question. That's one point I'd like to make.
And the other one, somebody talked about the UTF-8 and other different, you know, language questions. But this is, you know, in my words, more like an application question. Once the computers are using the same applications, namely, punycode, like Ram said, we don't have this question. Again, therefore, this is not the unique WHOIS question. It is more like the implementation of an internationalized domain name.
And if everybody follows the IETF standard for this purpose, eventually, I don't think this would be in the program, if we can have a common means for that purpose, again, we can answer this WHOIS question.
There are some other points we had about combing languages, in the case of some similar language or characters. This is, again, not the unique WHOIS question. This is more general, you know, IDN question. And, therefore, I understand that ICANN strongly suggest not to combine, you know, more than one different languages for the purpose of domain names.
If we follow this principle, which is one of the six major principles ICANN proposed for the authorization of the internationalized domain names, I think we can answer this question, too. And, well, in addition to this, ICANN is suggesting to -- or when we are implementing internationalized domain names to pick up one language and putting language for those purposes. Again, if we follow this principle, we do not have this issue as the unique issue for WHOIS.
Having said this, I'm sure there are some unique WHOIS questions because of internationalized domain names. At least, like professor Qian proposed, in the case of the language where you are using the -- you know, the language variant table, whether the gTLDs -- you know, gTLDs who are registries are using the same language should follow the same WHOIS standard, this is an open question for ICANN.
And I have my personal observations. But, again, I would characterize this question not as a unique IDN question. The question here is whether we should have a common standard for all WHOIS, even if it is internationalized domain names.
Thank you very much.
Michael Roberts: thank you, Katoh. Does anyone wish to comment?
John Klensin: Katoh-san, each time we look at issues with Asian nonalphabetic languages and then look at alphabetic languages or look at alphabetic languages and then look at Asian nonalphabetic languages, we discover they're different; and it's not surprising.
We also discover that there are interesting analogies, some of which are better than others. But there often turn out to be the same problem. The reason why variants are needed with Chinese, for example, is, as you identified, a simplified traditional issue. And that doesn't even apply in Japanese or Korean, much less in alphabetic languages. But the confusion opportunities are ultimately much the same, although the underlying linguistic and semantic character of them is different.
If you begin to look at German, there are several things you -- and I'm choosing German because it's a roman script and it's easy. There are hard ones.
If you look at German, you suddenly discover that there are centuries-old conventions about how to write characters with umlauts if you don't have enough type slugs in the box that you keep your movable type in. I want to date it to that long ago. And those conventions say that you could write a character as an o umlaut or oe. There's a problem in German.
The first problem is that you arguably would like to make a policy -- you don't need to make that policy, but it's something which can get careful consideration -- that you should not be able to have the name with oe and the name with e umlaut registered by two people, because it is the route to total confusion.
And the second problem is that because there are import words in German and some other things that you can't tell by looking, while you can always map o umlaut to oe, there are some words which contain oe which you cannot map to o umlaut. And that is ultimately the same problem as the variant problem. It's got to be done by tables which we serve things. Or you have to decide it isn't a very important problem.
Now, the traditional answer to the very simple versions of this problem with pure ASCII characters -- we'll do it by this future resolution.
But the problem with these particular character variant situations in these kinds of contexts, whether it be with languages based on what were originally Chinese characters or with these problems deriving especially from a script or almost the same script as is used in multiple languages, which is the Arabic problem; or a situation in which we've got one set of scripts which have very common origins and shared characters, like roman and Cyrillic and Greek, is that if you get a domain name label which contains more than about two characters and in that label there are very many of these things, the combinations explode.
And if we have to handle by dispute resolution a situation in which one name has hundreds of possible look alike conflicts, we will be spending a lot of money on cybersquatters who have discovered that the new price for selling off a cybersquatted domain name is $4,995.
Why? Because it costs $5,000 to file a UDRP complaint.
If we can't figure out a better variation on those problems than dealing with them by dispute resolution one at a time, it is my personal position that we are going to be in big trouble.
Michael Roberts: professor.
Hualin Qian: I totally agree with Katoh-san, but I would like to add something a little bit different from the IDN.
In IDN, we have the simplified and the traditional versions of each characters. That's the trouble when we are resolving and reserving the name for the registrant. So that's the one thing.
But for this WHOIS database retrieval, I think we have a little bit different situation. In IDN, we are encountering difficulties that a character has a meaning. This meaning corresponds to different characters, definite scripts, meaning two characters. Now when we are using spelling, ASCII spelling of Chinese characters in the WHOIS database, we encounter the trouble is from the pronunciation two different scripts, two different characters.
I take an example, for example, my name, the last character is "lin." This lin means tree or forest. But many other characters are pronounced the same lin. For example, linxiu, means neighborhood. Means sensitive. Same lin, they have different meaning.
So when we retrieve the database, we cannot understand from the spelled ASCII character what is this character. That's another different issue.
Michael Roberts: thank you. Ram.
Ram Mohan: thank you.
Katoh-san, one of the points he had made is that UTF-8 representations, one's applications catch up and implement punycode, you don't have to worry about UTF-8 or other representations. That is true.
However, I think in WHOIS, there is an extra element. The -- as a registry, you are required to provide information back for a query, a WHOIS query. If you provide that information back purely in puny code, which is what the standard specifies right now, on the receiving end, you are going to get xn, dash, dash, and a bunch of stuff. It's stuff that nobody knows what it means. And you need a converter or some special tool to be able to convert it from this ASCII string into a real local string. And I believe that is unique to WHOIS; that is a unique WHOIS issue. And it's pervasive not only for ccTLDs, but also for gTLDs.
Masanobu Katoh: (inaudible).
Denise Michel: could you repeat that in the microphone? The transcriber couldn't hear it.
Masanobu Katoh: sorry, Ram.
But it is -- I agree that it is -- it is unique. There's a time every (inaudible) and application is updated.
Michael Roberts: Vittorio, you're finally up. Please identify yourself.
Vittorio Bertola: I'm Vittorio Bertola, the chairman of the At-Large Advisory Committee.
And the point I would like to make is that I can see why you would require a registrant to provide contact details in a language or script different from his own language or script. I don't see the point, because I understand you would want to do it for contactability. But if someone doesn't know that language or the script or doesn't know the English language or script, maybe he could manage to provide the details, but when you write to him in English, he will not understand what you are saying.
So I really don't understand why, unless what you want to do is that someone from, for example, some IP lawyer from Washington, D.C. can send a cease and desist letter to a Chinese registrant and say they didn't comply.
But I can't see any other reason. If that is what you want to do, then you should keep the same opportunity to add. You should ask everybody to provide contact details in Japanese, too, because an IP lawyer from Tokyo may want to write.
I think you should allow people to provide English characters as well as French or Russian contact details, you should maybe even encourage them to provide them. But I don't see how you can ask them to compulsorily provide those details.
Michael Roberts: go ahead.
Ram Mohan: just a very quick response.
Yes, you're right; the compulsory is certainly an issue. However, when you're a gTLD registry that is providing registrations or allowing registrations from multiple countries, some level of uniformity does help.
And perhaps English is a bad choice; but perhaps it's -- it's the best of the other choices that we have right now. And EPP does allow for -- and registries have the choice to -- in addition to the English requirement, to also have the ability to support it in a local language.
Charles Shaban: one more small point.
For gTLDs, yes, it will be hard, as Vittorio said. But ccTLDs, I think, since they know who usually will access their WHOIS, they should -- they could say, for example, you can't register a domain name unless you supply the French, Arabic, and English contact information.
And there are three languages, I mean.
Michael Roberts: any other comments for our panel?
Well, thank you very much. And thank you, panelists. And we'll take a stand-up one-minute break while we change panels.
Michael Roberts: if we could have you take your seats, we'll begin the second session.
This session is on WHOIS data element review. And you will see that the large number of panelists up here reflects the large number of ways in which the WHOIS data elements are used today. We wanted to provide an opportunity for the major user constituencies and groups to comment on the data element situation we have today.
In order to lead this off, Bruce Tonkin, who is with Melbourne IT and is the current chair of the GNSO, will provide us with an overview.
In order for me to not have to introduce each of the panelists, let me just, by name, going from my immediate left down the table, we have Jane Mutimear from Intellectual Property Constituency; David Maher, from the Registry Constituency; Marilyn Cade from the Business Constituency; Hakon Haugnes, from the -- pardon me for having to back up here -- as a registrar; Thomas Keller for registrars; Thomas Roessler from At-Large; and Kathy Kleiman from Noncommercial User's Constituency.
Brian Cute who is also representing a registrar, is expected to join us in a few minutes.
With that, Bruce, I think we can turn it over to you.
Bruce Tonkin: okay. Thanks, Mike.
Is this on? Hopefully.
As Mike mentioned earlier this morning, using the term "WHOIS" really confuses the issue, because technical people have a specific meaning for the term "WHOIS," and in the wider community, "WHOIS" is often referred to more as less as servers that's provided by registries or registrars to retrieve information. So for most of this presentation, I won't mention the term WHOIS and I'll just talk about the actual information itself.
In terms of the information that's collected at the time of registration, probably one of the most important pieces of information is, is WHOIS the registered name holder for the domain. And the term that's often used is "registrant" to identify that person or entity. It's intended to be the legal holder of that domain name.
It's the person or legal entity which could be an organization, like ICANN, could be a company, like Microsoft, it could be partnership between, say, two lawyers, whatever the entity is, it's the entity that is the registered name holder.
And in some cases, that entity is an individual.
The other data collected is the -- what's called the administration contact for that domain name, the technical contact for that domain name. And the billing contact for that domain name, now, I can't actually tell you what the purposes for the administration contact or the technical contact. They're required pieces of information to be provided at the time of registration. But there's not a clear definition as to their purpose. And subsequently, they're often used in a number of different ways by domain name registrants. Often payment details are collected, such as credit card information. And that is often stored within a registrar's system.
And then there's the domain name details themselves, which is the name of the domain name, like icann.org; information about what that domain name refers to, and in technical terms, we refer to that as name server information. I'll mostly focus this presentation on the contact information rather than the technical information.
If we look at the first contact piece of information, and that is who I is the legal holder of the name, the registrant. (inaudible) there are the name. And interestingly, organization is optional. And quite often this leads to confusion, because some people register a name on behalf of their companies or they're an employee of the company, but they put their name, like Bruce Tonkin is -- because I'm the name of the person that's registering the name. And they haven't really collected or identified that I'm actually Bruce Tonkin from Melbourne IT. And it's Melbourne IT that is the registered name holder, a corporate entity.
The other piece of required information is full postal address, which is the street address, the country, the post code, all the information associated with our being able to send mail. The phone number for the registered name holder is optional. The fax number for the registered name holder is optional. The e-mail is optional.
Quite commonly, registrars collect information that they can use in future to authenticate who that registered name holder is. And mostly registrars never actually see the registered name holder in person. It's mostly an electronic, Internet transaction.
So the only way of identifying that person in the future is by collecting some sort of information, which is typically a password or a unique number of some sort, that's used in future to authenticate that the person that originally registered the name is the person that's asking you to do something with that name.
The administration and technical contacts, which aren't defined, actually, there's a lot more compulsory information required. You need to collect the name, again, organization is optional. You need to collect the full postal address. You need to collect their phone number. You need to collect their fax number, if it's available. In other words, if the entity has a fax number, you must collect it. You must collect their e-mail address, irrespective of whether they have one or not, you still must collect it. Then the authentication data is optional.
Now, one of the problems if we compare these two pieces of information, we have registrant data; then we have admin and technical contacts. But because it's not clear what the different purposes of these information are, they're often all set to be the same. And it's interesting that the requirements for admin and technical are stronger than the requirements for the registrant information.
Other information that's -- this is entirely optional to collect, is billing contact information. But most registrars do collect this so that they can send a renewal advice or for payment to that name holder. And often at the same time they collect payment information.
So the issues for the data collected is there's no clear purposes defined for these contacts. The purpose of collecting WHOIS is defined, the purpose of collecting data as a whole is defined. That is, you're collecting the data to allow you to contact the registered name holder. But the mechanisms for doing that are defined as registrant, admin, and technical. And they each have different requirements in terms of the data collected.
The problem with this is that this is often handled in a number of different ways. Some registrants -- registrars have tried to simplify and make it fast for people to register a name. So, often, the registrant's not given all those options, but just asked to give their full contact data, phone numbers, address, e-mail address. That information is placed into all these data elements and displayed publicly.
In other situations, domain names are registered through an agent of some sort, like an Internet service provider. And the Internet service provider will often put their contact details down for the admin and technical contacts, and then put the legal name holder in the registrant field.
Another variation of this is to protect the privacy of registrants. Some organizations offer a service where they will put their own name and address in the registrant field, and then if somebody wants to contact the real registrant, they first need to send information to that agent, and that agent then passes it on to the real registrant behind the scenes.
So there are a number of different uses of these data elements. But the overall objective is to provide contact information for the domain name holder.
So I've talked about what data is collected. Now I want to talk about how that data is accessed. There are three main methods for accessing the data. The first method is using a protocol called WHOIS. And to distinguish that protocol from WHOIS more generally, it -- people often refer to it as port-43 WHOIS.
And that's really something that the technical community has stuck in front of the word "WHOIS" to try and distinguish their technical definition of WHOIS from the general community definition, which is really mostly talking about content rather than the technology.
So the WHOIS protocol, as John Klensin said earlier, is very simple. Basically, a user of that protocol sends some information, and that user gets some information back. The protocol itself does not specify what that information is, either what information is sent or what information is retrieved. However, the contracts for the global top-level domain specify what data you must display using that protocol.
The other aspect of this protocol is that, in the past, many implementations of the protocol allowed you to type your best guess of a domain name and the protocol would return a list of entries that possibly match that domain name. Most major registrars don't provide that service any longer. And it's usually an exact match lookup. So unless you know the exact domain name, you won't get a response.
The other method for accessing WHOIS which the general public generally use is using a what's referred to in the contract as an interactive web page. Most people would probably call this interactive web page WHOIS. If you go to a registrar web site and you see the word "WHOIS" on the web site and you click on that word, generally you'll get to a page which allows you to enter in a domain name. And that page will display the results from that domain name. There's no real standardized way of doing this. And so this is subject to a wide range of different implementations.
The other -- the third method, which is not very commonly used, because it costs money, the previous two methods are generally provided for free. The last method, typically, a registrar will charge $10,000 U.S. for the privilege of being able to access the entire WHOIS information. And that's usually retrieved via a file transfer of some sort. And there are some protocols available for doing that. So this allows you to access the entire information, which is -- turns out to be fairly unusual now in the domain name environment across the world.
Now, talking about what data needs to be displayed for free by port-43 or via the interactive WHOIS web page, obviously, it needs to display information about the domain name. It also displays the creation date and time for that domain name, and the expiry date and time.
While that seems like a good idea, it also causes problems, because, effectively, you're providing companies that are in the business of selling domain name services with the exact information on when a particular domain name is due to expire. And that information is then used for marketing purposes.
Another element that is optional but is becoming more common is the status of the domain name. And the domain name may be an active status, which is where the domain name typically resolves to an electronic mail or to an interactive web site of some sort. But the other statuses are things like that the name might be on hold, which means that it's been registered to a registrant but it's not available for use on the Internet. The name might be listed as waiting to transfer between two registrants. Or it could be listed for pending delete, and so on. There's a number of statuses there.
Registry WHOIS data typically displays information on who the registrar is for that particular domain name.
Other information that's starting to become available is things like the last date a change was made to the record; and the last date a transfer between registrars was made. The data displayed in terms of the registrant data -- and this is what is compulsory to be displayed -- is only the name and address.
But quite often, registries and registrars publish the name, address, phone, fax, and e-mail for the registrant. The data that must be displayed for admin and technical is name, address, phone, fax, and e-mail. Now, as I mentioned before, there's not a great deal of understanding of the different meanings of these data elements. So, quite often, they default to all being the same. So registrant, admin, and technical, are the same. And the complete information is made public.
The problems that -- in the community at this stage, one is the free port-43 and interactive web pages are used anonymously by individuals and organizations to collect large quantities of contact information. And that contact information is then used for marketing purposes.
The registrars are now trying to address this problem, but each of them are addressing the problem in a different way.
So one of the problems is that we have started with a standard. We have port-43 as a standard. We have interactive web pages that initially started out all looking fairly similar. And now they're starting to diverge. And this becomes a problem for users, because users of this information then have to design special applications to access the different forms of WHOIS. And we're losing the benefits of a standard approach.
Registrants themselves are not clear on the purpose for the data collected, nor do they know how the data is provided to users of that data.
There are concerns about the accuracy of the data.
Users that wish to use the contact information for legitimate purposes find that the contact information is quite often -- has errors in it, maybe the e-mail address has been mistyped or the postal address has been mistyped.
But even worse, sometimes there are deliberate attempts to provide false information.
Then there's the issue that people have that are individuals that are registering domain names. And they're concerned that their information is made public and that information is perhaps their home address, their home phone number, and their e-mail address. So they are concerned about their privacy.
Because some of these people are concerned about their privacy, some of these people provide deliberate false information. And that is also a problem.
There are authorization problems. Who has the authority to change a domain name record? Quite often, the employee of a company registers a domain name. That employee provides their personal name, like Bruce Tonkin; and perhaps some address information. Then that employee leaves the company, and the company then, in several years down the track, decides it wants to change something about their domain name. They find that they don't have the authentication information to be able to do that. So then they need to go to the registrar and say, "I actually own this domain name, but you wouldn't know it because my name's not there. But please believe me, I do own this domain name." And then we have the problem of trying to identify whether they do, in fact, own the domain name and whether they are, in fact, authorized to make changes to that domain name.
So I'll leave it there and invite members of the panel to comment on their view of issues associated with either the data collected or the data displayed.
Michael Roberts: thanks very much.
I think we'll take our panelists in the order from my left, beginning with Jane.
Jane Mutimear: if I knew I had to go first and I had to answer all the questions first, I might not have sat where I did. But all the seats at this end were taken by the time I got up to the podium.
I'm Jane Mutimear from the Intellectual Property Constituency. And I'm just going to talk very briefly about what intellectual property owners and their advisors use WHOIS for. I'm not going to go into a lot of detail on that because I discussed that in -- at more length in Montreal.
And then I am going to look at the types of data which we use and why we use it. Because the title of this talk is "data elements."
We use WHOIS in order to prevent intellectual property infringement or -- on the Internet as much as we can. It takes all sorts of forms.
Oops. Oh, that's the end already.
For example, this site, which I happen to have on my hard disk from the last talk, is a sort of spoofing site where somebody registers a name that is -- includes a client's trademark, puts up a site requesting bank information, and sent around an sms text message which told people that if they went to this site and filled in their details, they could win a new Nokia phone.
Obviously, we needed to take action against something like this very quickly, because there were probably consumers out there who were busy filling in their bank details to somebody who wasn't Nokia.
We also use it in relation to fraudulent e-mail.
Quite recently, I had a case where a client found out that somebody was sending around an e-mail using their name, and it was a .eu.com ending. And they were holding themselves out as a European Operations Officer, trying to get information and products from companies. If you typed in the domain name, the mark .eu.com, it resolved straight to the main dot-com web site. So it made it look very legitimate. But they weren't actually -- they didn't have their own web site.
And there we used the WHOIS to find out who was behind that.
The run-of-the-mill sort of stuff for intellectual property owners is web sites offering counterfeit or pirated goods which you need to close down or go against. I was kept quite busy in recent weeks where trade secrets were leaked out from a company and then they were being posted on the Internet. Obviously, you need to act quite quickly when you're dealing with trade secrets, because they don't remain secret for very long once they're on the Internet.
And also cybersquatters, where somebody's taking a domain name, incorporating, or the client's trademark, and you want to either take legal proceedings or UDRP proceedings against them.
So we use the registrant name because we need to know who we're dealing with. We need it for context. Is this the same guy that we were dealing with last month, last week? Is it an ex-employee who is disgruntled? Is it someone in your Singapore office who's just got a bit confused and is doing something wrong?
Also, you need to know who you're dealing with in order to negotiate properly. And if it can't be resolved, you need to know who the registrant is in order to bring legal proceedings.
Unless you're in the States. The address is relevant. You need an address if you're going to be sending any hard-copy documents. And, of course, you need an address if you end up having to serve legal proceedings for service of process.
Also, the address will help you when you're figuring out which court will have jurisdiction in relation to whatever you're worried about.
E-mail is very useful, because, as half of you in the audience at the moment will know, it provides very quick communication. And things can often be resolved very quickly without the need to escalate things to legal proceedings. And phone numbers, again, are very important from that perspective.
If you can actually pick up the phone and talk to somebody and explain what the problem is, more than 50% of the time, you'll get it resolved. Fax numbers generally, I think, get less used. I've never had a case where I've resolved it by the fax number alone. But I'm sure there are some people out there where the fax number has been useful.
Really, providing it's accurate, it's useful, because, quite often, as Bruce was saying, there's a lot of inaccurate data there, and you'll go through the whole list until you find something which you can actually use in order to contact the person.
So that's what's out there that we currently use and how we use it.
What else would be nice? Well, we'd quite like reverse lookup. And this is where typing in the name of the registrant to see what else he or she owns. This is particularly important in UDRP cases -- UDRP cases, where, in trying to establish bad faith, one of the things that you may need to show is a pattern of conduct, that they have a habit of registering the domain names owned by famous trademark owners.
Another thing which would be quite helpful is historical data. Because this would help you figure out what went wrong where in a domain name registration. And I think this is similar to what the Security and Advisory Committee were requesting last year when they were asking for data which showed what data was last verified and when.
Another thing which I just thought I'd throw in, it would be quite useful from the commercial aspect, because domain names, although they aren't proprietary rights, they are interests which can be assigned; they can be pledged; they are worth a lot of money in some circumstances. And some companies need to be able to take security on them. And one thing I think would be quite good to look at from the commercial perspective is the possibility of recording security interests or pledges on the WHOIS database.
And I think that's all I had.
Oh, no, I didn't. I have got one more.
Why accurate and accessible WHOIS? What's good about it? Well, it means that registrars and registries are get out of the fight.
Because if you know who has registered it and if you can figure out where they live and how to contact them, then you don't need to start threatening the registry or the registrar or the ISP, particularly where you've got a UDRP or a local dispute resolution policy in place. It means that problems can be addressed quickly. And that's good for all Internet users, because the sort of things which we're talking about aren't just cybersquatting of a domain name that nobody ever uses or looks at.
It also encompasses fraudulent uses that is detrimental to consumers using the Internet. And it means that you can take action, appropriate action.
You can take quick -- pick up the phone, talk to somebody, get something taken down, or, if you need to, you can end up going into court and getting an injunction against them. But the fact that the data is there and available for you means that you can deal with it in the most appropriate way.
And now that's it.
Michael Roberts: thank you very much.
I think that we'll go next with David.
David Maher: thank you.
I'm David Maher, the Chairman of the Public Interest Registry. I'm wearing two hats, in a way.
First, that of the Public Interest Registry, which manages the dot org; but also all the other registries.
The registries really don't need very much data to perform their functions. From our standpoint, all we really need to know is the domain name, the name servers, the IP numbers, and the registrar, which is the entity that we deal with. The registries, of course, do not deal directly with the registrants.
I also sometimes describe myself as a recovering intellectual property lawyer, so I understand the need that was just outlined very clearly. And I understand the usefulness of the UDRP and also making other information available to resolve disputes; but perhaps even more importantly, to avoid disputes. There's a real social interest in that.
On the other hand, the Public Interest Registry in particular, which is devoted primarily to serving the noncommercial, nonprofit community, has a very strong interest in privacy. Some of you I believe heard one of our directors, Mark Rodenburg speak yesterday or the day before about privacy and the importance of its protection. We really believe that the default position should be that personal information is not available publicly. There has to be mechanisms for law enforcement bodies and for law firms, private parties to obtain information. But we think that the privacy element is paramount. And we're going to be devoting a great deal of time and energy to the protection of the privacy of registrants.
One other thing I'd like to just mention briefly.
We are, as you heard, pir.org is going to be a thick registry. So we will end up having all of that information that currently is stored by the registrars. There's a problem there that sometimes there are discrepancies. The registrar is supposed to inform us, inform the registry, of changes in data, which they may not always do very promptly.
And also, as one other footnote to this, at least one registry, dot name, of course, has a much greater interest in knowing the names of the registrants, even though they may not deal with them directly.
And that concludes my presentation. As I say, the emphasis, we believe, should be on protecting the privacy, particularly of individual users.
Michael Roberts: thank you very much.
Marilyn Cade: thank you.
I particularly appreciate the opportunity to talk today, because I am going to tell you a story, which I hope you will find helpful and interesting in understanding what commercial and business users -- how we feel, what we think about the role of data elements, which ones are important and why.
So first let me just tell you who the Commercial and Business Constituency is. By charter, and it's important to understand that all constituencies at ICANN have a charter which defines who we are and who we represent.
By charter, we represent a very disparate population of all sizes, all the way from small entrepreneurs, one and two-person businesses, up to very large corporations like my own and I'm sure some much larger than AT&T, and many trade associations and associations that encompass very small enterprises as well as large enterprises. We have members in all five regions. Commercial and business users are very, very diverse.
We use WHOIS to deal with network problems. Most people think that the only -- some people think that the only people who operate DNS servers are registries and registrars. I would contradict that and say, by the way, it's not even just ISPs. So we, too, deal with DDoS attacks, Spam problems, network problems.
We use the WHOIS to check for availability of a possible new registration for a new service or product.
We deal with conflicts between domain name registrations and trademarks. We deal with consumer protection issues, fraud problems. And we cooperate with law enforcement to where we are involved in using WHOIS and working with law enforcement to deal with not just civil problems, but with criminal investigations.
There are many, many misunderstandings about how WHOIS is used. And I think the workshops last session and the workshop this session are very important to help to build a base of understanding. But even today, there are many, many opinions, but very few facts.
So I'm going to tell you a story about a famous and well-known brand and its experience in a holiday weekend in the U.S. We use on the net ATT to represent our famous and well-known brand. AT&T holds the world's tenth most well-known brand. And we register that brand widely, not just in the U.S., but very widely throughout much of the rest of the world. Most people recognize that brand.
It's registered in many countries as a domain name in the country codes. It's also heavily registered in the dot-com, dot net, some registration in the other generic TLDs. We have a very strong and consistent brand policy, and our brand policy extends into how we use our brand in the domain names. The brand is trusted and known for integrity.
So my story is about corporate identity theft, individual identity theft, consumer fraud, and trademark infringement all in the matter of a three-day weekend.
The site that we call att-global.net, which is how it is registered, went live at 6:00 on a holiday Friday night. Three-day weekend in the United States. By 6:18, the team had detected the presence of the site on the net.
We immediately recognized there was some question about whether it was a AT&T registered name because it was not compliant with our brand policy. The team checked the dot net WHOIS and we could immediately tell it was not registered by an authorized individual.
It did, however, have complete WHOIS information. And eventually, when we were able to contact this poor individual in Ohio in the united states, around 7:00, much to his chagrin, he found his identity had been stolen as well.
He had registered att-global.net, or someone had, using a credit card that was not his. So lots of complete information, but lots of false information. Through using the ARIN and RIPE databases, we were able to find the ISP.
We also notified, of course, corporate security and corporate security began working with law enforcement.
Here's what the site did. It cloned a subscription database, complete clone, including the copyright notice, and sent an e-mail to well over a million subscribers telling them that if they did not immediately click on this site and go to a secure site and reenter their credit card information and their password, that they would lose access to the Internet within a matter of a few hours. This is a very common fraud in these days and times.
And many of you may have read that eBay has had a problem with this; CitiBank has had a problem with this; we have had a problem. There are many other examples of this.
In this particular situation, speed is absolutely essential, because these fraudsters do not stay in the same place for a long period of time, and it's not their goal to continue this scam for a long period of time.
The goal is to steal credit cards, charge a small amount of information to each of those credit cards, typically trying to gather up to a few hundred thousand credit cards or even a few hundred credit cards, they charge $50 to $75 to each credit card that often goes undetected, and in many cases, depending on the credit card law within a country, the subscriber never actually pays the $50, so the credit card company themselves may be underwriting this particular scam in an accidental sort of way.
In any case, let me continue the saga. So by 7:00, we had reached the registrant. We sent a DCMA notice to the ISP because of the copyright infringement. And the ISP accepted the notice and took the site down.
By 9:00 the next morning, however, the site was up again with a second ISP. We had by that time sent notices to our subscribers and had put a notice up on our web site telling the world and the media about this fraud.
Let me tell you that global companies -- and I would suspect any company who values their brand -- does not like to send an e-mail to their subscribers saying, "please do not trust the e-mails you receive from us." This is not a good thing. But that is what we were forced to do. We were also, of course, forced to notify the media of this scam.
The good news is that we received very few inquiries, and over time, we were able to verify that very few of our subscribers actually bought into the scam, but a few did. A sufficient number did that really demonstrates harm.
The second ISP responded to a DMCA-compliant notice and took the site down by mid-afternoon on Saturday. By this time, law enforcement was engaged and trying to find the person who was actually behind this scam.
For a while, it looked like it was someone from another country. That turned out not to be the case. The site went back up again by 9:00 P.M. on Saturday night, and another solicitation was sent.
So our subscribers received three times this message alerting them that they could lose their ISP services. And every time, some subscribers respond to that. No amount of education that all of us do can deal with that.
We used every data element in the WHOIS for dot net, and we used the RIPE and ARIN databases in order to try to pursue this. Ultimately, we resorted, of course, to a temporary restraining order, which we could not get until Monday morning.
And without the cooperation of the ISPs in complying with the DMCA notices and without the requirement of law enforcement in trying to begin the investigation, the site would have stayed up consistently from Friday night at 6:00 until Monday morning at 9:00 when we could find a judge and get a temporary restraining order, and a lot more harm would have been done.
So the enforcement teams that work at my company or the individual sysadmins use WHOIS for self-help. They use it to try to deal with network problems, problems like the one I just described to you. This is a very urgent situation.
In this case, we can't take weeks to get a UDRP.
We did eventually go back to a UDRP to recover the name from the individual who had never registered it. It was not a problem to get the name back at all in that case.
But the urgency we were dealing with was getting the site out of the zone, getting it off the net so that further harm could not happen to consumers.
Accurate and complete data elements for all contacts, we believe, are needed.
But let me say something about individuals. The commercial and business constituency agrees that accuracy of data elements is incredibly important. We think it is as yet unresolved on how access and display of those data elements should be managed.
But it's important to remember that this fraud that I just described to you, which involved corporate identity theft, individual identity theft, consumer fraud, and trademark infringement, was perpetrated by an individual. So we can't just think that it's really black and white. All individuals are over here, and certain rules apply to them; and all corporations and organizations are over here, and certain rules apply to them. There's a lot more gray to this.
And I wanted to tell you this story to sort of illustrate that point.
Michael Roberts: thank you very much.
Thomas Roessler: good morning.
My name is -- whoops. I hope you can hear me. My name is Thomas Roessler, and I am a member of the At-Large Advisory Committee.
This is a somewhat unusual panel for me to sit on, because according to the definition, again, this is a user's panel. Again, I had a hard time preparing what I was going to say here. Being a mathematician, my first instinct was to confusion you with a long, logical deduction. I am not going to do that.
Usually talking about registrant privacy, my second instinct was to actually ask whether a data element review isn't something that needs both perspectives of data subjects and of data users. I could then have gone on to speak about individual data elements and what kinds of risks there compulsory publication in the WHOIS would pose to individuals' privacy. I am not going to do that, either.
But I will point you to a statement that was adopted by the At-Large Advisory Committee last night and is posted at alac.info. The statement elaborates a little bit more about the need to move quickly on privacy for individuals.
Now, the one thing I can do is to actually answer the questions we were asked to answer here.
Question 1 was, what registrant -- what one for anybody but Bruce, was, which elements do major user groups consider essential? One important thing to note here is that the question is about essential data elements.
And I believe, although I was -- ah. That's the URL for the statement.
One reason why it is important to talk about essential data elements here is that we are ultimately dealing with a trade off, a trade off between privacy and use of WHOIS data. In such a trade off, you do not give up privacy for data elements that are nice to have. You do not force registrants to publish data elements which are nice to have. This is about essential data elements.
Now, from the point of view of an individual that uses WHOIS, that consumes WHOIS data, I can't tell you a single essential data element. I tried hard to find some. I can't find a single one.
There are lots of applications of WHOIS from an individual's point of view which are nice. I still remember from the WHOIS task force a survey that some people mentioned using WHOIS for finding old friends. That's a very nice thing to do. But it's not an essential application of a service that is set up by forcing registrants to give up their privacy.
So when we have no essential data elements but a lot of nice-to-have things, then we can, of course, go to the second question that's asked, that's what additional -- that basically boils down to the question of how to meet these needs with greatest security, less cost, and in compliance with privacy, security, and stability considerations.
But when we -- at least from an individual's point of view, we are talking about nice-to-have services, we are basically talking about a service that should be offered to registrants. We are talking about -- we are getting to a point where WHOIS could be a service that is offered to registrants to publish the kind of contact data about them that they want to be put out in the public, even if it is none.
Michael Roberts: okay. now we'll have another Thomas.
Tom Keller: thank you. I'm Thomas Keller, I'm from the Registrant Constituency and I guess my remarks will be rather brief because if you look at what the registrars actually do with WHOIS data it's a real narrow thing. So we have basically two kind of business operations, as you probably know. One is the registration of the name where you don't need any WHOIS data at all because you get the data from your customer and that's what you put in the WHOIS database to start with.
The second basic thing you need is -- I'm sorry. My computer just died on me.
The second thing we do is transfer -- so what we need WHOIS for is to verify is who that owner of a domain name so when someone requests us to transfer a domain name from one registrant to another then we can check that that person is who they claim to be. Another thing we retrieve out of a WHOIS database is some kind of contact information that we can contact that person, ask him if he really wants to have that transfer being done.
But if you talk about verification of a person, we don't need like a telephone number, street address, something like that. It comes down that we have some means to identify our customer. that could be done by, for example, a token, a code like we have for nTLDs. So we don't need personal data for doing the things we do with WHOIS.
There is one other thing our customers, which we have our customers expect from us, which is that we take their data and kind of handle it in a real responsible fashion. So they trust us, and at the time we put it on the WHOIS, certain people might feel there is mistrust because we're using the data they give us on a private base to display to the public.
So at the end of the day, as I said, I'll be very brief, we only need to verify and that's pretty much it. So there's no magic to it for the registrars.
Michael Roberts: Kathy? Or we're going to do Brian next. Sorry.
Brian Cute: good morning, bonjour. Sorry for the technical interruption. My name is Brian Cute. I'm the Director of Policy with Network Solutions. Network Solutions, as many of you know, is the largest ICANN accredited registrar in terms of registered names in the com net and org TLDs with approximately 8 million registrations sponsored by us. And our customer base consists of both business and individual users.
Following up on Thomas's presentation, I echo what Thomas said about registrars as users of WHOIS data and in the area of transfers, that is very important to us, that that operates in a clean and efficient way. And it's very important to our customers.
But I wanted to take another point of view on the question, and the orientation that I took on the question was should we, as a registrar, care about the data that we are obligated to collect and display? Throughout the process of this WHOIS policy development process or the many meetings that we have had in the past, there have been many suggestions, many opinions, and not many facts, or few facts.
So as a service provider, looking around the playing field and hearing about privacy concerns, and for example seeing competitors of ours introducing services such as private registrations, which provide the ability to register a name but have alternative data in the WHOIS field, we decided to take a look at whether or not we should care about this issue from a service provider perspective.
And with that being said, I won't bore you with the first slide because Bruce already did a nice overview of what it is we are obligated to collect and display. But we decided to ask our customers, who are registered name holders, who are, in fact, consumers, and a major user group of a certain sort, to determine for ourselves whether there are issues here that we should care about and think about.
And what I'm going to present to you are just four quotes that we received when we contacted our customers and asked them about this issue and what was or was not priority for them. And I believe that these quotes underscore some of the issues that have been floating around, some of the suggestions that have been made in terms of the question of WHOIS, the question of data elements, and the question of privacy.
The first one is very straightforward. "I have never liked having my home address and phone number freely listed for anyone who looks at my domain name information. As an individual, I do not have access to P.O. boxes or corporate numbers to use as valid replacements. Requiring us to update this highly personal information and make it public is preposterous."
Two other notions that confirmed certain things that have been suggested in prior discussions, number one, "because my information was listed on WHOIS, a man who has been harassing me online for about a year was able to get my home address and telephone number and step up his harassment of me."
And this next quote, and the last one, whereas the first two underscore some of the privacy concerns we've heard, the last two are of particular interest, as you might imagine, to a service provider in this particular sector. "I should have the choice of omitting my personal information, home address, phone number and e-mail address, from this public database."
And then lastly, "I would register more domains if I were not concerned about the privacy ramifications."
Network Solutions, as a service provider, is, in a certain sense, with all registrars, kind of at the crossroads or the crux of this question that we're all addressing. And I think that we appreciate an intelligent balance of the interest at the end of the day, but I think that this feedback from our customer underscores what we believe to be both concern and demand in the consumer marketplace for domain name registrations.
Michael Roberts: thank you, Brian.
I apologize for speaking only in English because I know there are people here who speak other languages as well, particularly French.
My name is Kathryn Kleiman, and I'm an attorney and co-founder of the Noncommercial Users Constituency.
And when I started my preparation for the meeting today, I went back and looked they WHOIS record. Do you know it's 21 years old? 21 years ago, there wasn't an Internet. It was the NSFnet connected with other networks, and it connected large users, large research institutions, large universities, and large U.S. defense contractors.
Today I'm happy to say that we're here. The Internet is diverse. It's expanding, and in fact, the majority of Internet users are small. Small noncommercial organizations; many small and home-based businesses; and many, many individuals.
And we all use domain names. Large and small, we use domain names to post our communication on-line. Domain names are the street signs of our communication on the Internet.
So I wasn't at the Montreal meeting, but I listened to the videos and watched the videos of the Montreal meeting, so I thank the ICANN staff for posting that.
At the Montreal meeting we learned a few basic principles, and one is that domain name owners are protected by national laws, and that their rights include personal privacy and freedom of expression. And that these rights are often in conflict with the complete exposure of personal information in the WHOIS record. So very briefly, because many people here were at the Montreal meeting, I'll just review that the Montreal meeting highlighted that the data elements of the registrant field, with name, home address, e-mail, and often home telephone, conflict with the privacy laws of many countries.
It also conflicts with common sense. And let me read to you a small quote from a much longer letter that was sent to Paul Twomey yesterday by the Electronic Privacy Information Center and over 46 civil liberties organizations and over 21 countries.
The small quote says "anyone with Internet access can now have access to WHOIS data, and that includes stalkers, governments that restrict dissidents' activities, law enforcement agents without legal authority, and Spammers." But there's another issue as well. The domain name registration process requires us, as a condition of entry, to disclose our identities before we can share our communication. I'm particularly concerned about communication of personal and noncommercial web sites. The web sites that have our family pictures, our hobbies, and our deeply held human rights concerns.
It turns out that this kind of disclosure is inconsistent about some national laws, and also the precedent and the weight of world history.
And let me read you again one quote, and this is from the U.S. supreme court in 1995 looking at a law that said that a person who published a political pamphlet had to put their name and address on the pamphlet or they would be guilty and there was a fine levied. And someone did that.
And the supreme court said, "anonymous pamphlets, leaflets, brochures, and even books have played an important role in the progress of mankind. The interest in having anonymous works enter the marketplace of ideas unquestionably outweighs any public interest in requiring disclosure as a condition of entry." Okay. So we have this information. It causes a problem. Are there any quick fixes?
I've heard a lot about proxy services, and Wendy Seltzer an attorney with the Electronic Frontier Foundation asked me to tell you today, since she couldn't be here herself, that proxy or escrow services proposed as a privacy solution have not developed to fill the gap. They do not work in practice. They give up the names of clients on a mere request, and even in theory they're a poor second best to registrants seeking full control of their identities. The public deserves better.
And then Thomas Keller and I picked this up the video, not directly, from Montreal said something I think illuminates the issue as well, that privacy is a right, not a service.
What do we do now? What information do we need in the WHOIS record? Technical contact seems to be needed. I read the Security and Stability Advisory committee's report from February, and they told us that technical contacts are needed for security and stability. I believe them.
But the report also noted that WHOIS records must also protect a registrant's privacy. From that I think we can conclude that stability, security, and privacy are not incompatible goals.
Okay. how do we put it all together? What data elements do we really need in a WHOIS record and I've got a proposal for a new WHOIS record that I'll put out briefly today.
First, this is today's WHOIS record. I tried to do a printout, you know a screen dump but it wouldn't be legible. You have the domain name, the creation information, the domain name registration data which for the small user is personal data. The administrative contact, which Bruce told you was often the same as the domain name registrant data. So that's also personal data for a small user. The technical contact, and the name servers.
Now, let's try a little variation. How about a new and improved WHOIS record? You've got the domain name. The technical contact. Explain it to people. Tell them that this should be whoever handles their primary functions with DNS. Their web host, their ISP, their registrar, their proxy, whoever.
Let's put the registrar in, name, address, e-mail, phone, fax. Let's put the registry in. Name, address, e-mail, phone, fax. And then the name servers.
What would be the advantages of this new type of WHOIS record, this variation? It's not that new.
One is it provides not a single but multiple technical contacts who can help to resolve problems quickly. Two, it keeps the customer data for registrars confidential. And three, there's no global disclosure of personal data. So it protects Internet stability and security; it protects against the invalid DNS renewal notices that we heard a lot about in Montreal; and from my perspective it protects privacy, free expression, and due process of law.
Thank you very much.
Michael Roberts: thank you, Kathy. We're almost up to our break time, but in consideration of the importance of this panel I think that we should take some questions or comments from the floor. so if you would like to please come forward, identify yourself, and speak slowly and clearly.
Elana Broitman: hi. Can you hear me? Elana Broitman from register.com.
I want to thank everybody on the panel. I thought it was very informative and I have a specific question for Jane.
Jane, you talked about the types of information that you need off the WHOIS. What I'm wondering about is if you could make a distinction between needing that information on a one-off basis, which was more or less the examples you gave, sort of here is a domain name, we're looking for who owns that domain name, how to contact them, and when you might need names -- or when you might need information off of port 43, which one could take in wholesale, as a database, and the second part of the question is sort of how often and who needs that. It's clear enough to me when you say an intellectual property attorney like yourself may need that, but I've heard talk about various third parties who may need that as well in the context of the sort of IP interests, and I'm not clear on who that might be. So if you could talk about those two things. Thank you.
Jane Mutimear: it's always difficult when there's two questions because by the time I've listened to the second one I've forgotten what the first one was.
Generally, the first -- I remember, the first one, generally, the sorts of abuses that we're looking at, we just want who owns that and how do we contact them.
The reason that having somebody who can aggregate the data and then provide a service to IP owners and other companies is useful is so that you can do the sort of reverse lookup. Hey, this guy, we've come across him twice now. What else does he have?
Because if we're going to have to go off to court or go to have to do a UDRP, we don't want to have to be doing one every few months. We'd rather clear out what he's got.
So being able to look up what else that person owns, and we need somebody to be aggregating that data for us. And I think that's where the questions of bulk WHOIS and port 43 access comes in.
We don't do it ourselves; we rely on a service, a third-party service provider to do that for us.
Was that both questions?
Elana Broitman: I think that sort of answers both in a way. How often was part of the first.
Jane Mutimear: oh, how often. And the other thing I would actually point out when I'm talking about needing aggregated data is sort of, to one side of enforcement is where companies are trying to police their own portfolios and figure out what they've got and who owns it so they can get it all in one place. And at the moment, you can't do that unless you've got somebody who has aggregated the data so you can actually check what you own yourself.
So that goes on quite a lot, your own aggregation.
When you're looking at infringers, it depends. It does -- you would like to do it more often than I actually do it because of the lack of availability, but I'd probably say 50% of the time you want to be able to do more advanced lookups than you can do with web access.
Michael Roberts: thank you. Roberto.
Roberto Gaetano: Roberto, At-Large Advisory Committee but speaking on a personal capacity.
I have two comments. One is rather general, and I think that we have kind of a misunderstanding about the information society. We think it's about information dissemination, and in fact, it is not. It is about information management. It is about giving to everybody access -- thank you -- access to the information that they need to know and that they are authorized to.
So to this effect, I think that we are trying to treat the domain name system as something that is completely different from other domains.
I'm sure that there are frauds that are perpetrated using the domain name system that can be easily detected by a complete and access to all information related to the domain name owners.
But -- in providing reverse lookup and all those facilities to whoever wants to access this. But that will be equivalent to say that there are lots of frauds that are perpetrated in the financial system and therefore the public needs to have complete information about who owns what in terms of assets, in terms of shares, in terms of -- that will widely reduce the possibility of perpetrating financial frauds. But this is not done because we think there is a certain right to privacy for ownership of financial assets that overrules or that regulates, that manages this domain.
So the key word here is not to provide complete access to public but to manage the information.
And the second short comment is that I think that, as usually, Marilyn has provided a very telling argument. What I see is that -- I don't know what the intention was, but I see that is a very precise argument for privacy. Because -- for privacy of the individual, of the end user, because there are two things that I can see.
First of all, that the only piece of information in the WHOIS that proved to be completely useless was the address of the registrant because he was completely innocent in this whole story and was just a victim of the thing. And the second thing is that this fraud has been perpetrated because of insufficient protection of information.
I would like to know how come that a million register list came into hands that shouldn't have had that in first place, and I think that that proves the point that the protection of the information is one of the key issues.
Vittorio Bertola: I'm Vittorio Bertola again from the At-Large Advisory Committee.
So there are a couple of short points I'd like to make first.
One is that I really think that we need to start looking at the applicable legislation, because otherwise, we are going to produce a policy that's not going to work in practice.
So I think that any policy making group on this field should review what's actually possible to do and impossible to do according to existing legislation in the different countries. And the second point is I also think you really need to separate two parts of the data collection, so one is identifying the customer who registers a domain name. So getting the identity of who owns that and storing it so that we know who owns a certain domain. And the other one is publishing that identity to the public. So taking the data and exposing it to the public. And I think these are two completely different issues because for example you don't need the second one to identify who has put up a web site and to chase him if he's doing something illegal.
So I think the policy should be really different for the two cases.
And thirdly, I'd logic to say a thing about accuracy. I think that comes from my own experience because the reason I started to get some interest in ICANN, in year 2000 the company I was working for asked me to register some 40 or 50 domains in ccTLDs and gTLDs for the company. Of course that was a total nightmare because everything is different and also the (inaudible) was different. That is nothing compared to the issue of keeping them updated after that.
So even in, for example, if you go and find my name in the RIPE database now you will find my name and you will find a company for which I have not been working for the last two years. You will find a physical address that is not even the last address for that company but is, I guess, two -- the one before the last before the last. You will find phone numbers and fax numbers that don't work anymore, and you will find an e-mail address that is no more.
But unfortunately, I think I have no means of updating that because (inaudible) working there anymore so how can I identify myself. And I do want to keep it updated; simply, I don't have a chance.
And I think this is really important, because even if you want to publish your data, you want to keep it accurate, you must have a simple way for doing that. And I think something is fundamentally wrong in the way the system is designed now, because, I mean, many years ago we moved from the (inaudible) table to the DNS system because otherwise people wouldn't keep it updated. And it's still the same thing for WHOIS. So the system is not update because it's centralized, and you have to go through a long chain of people if you're going to have access to your data and update them, because often you go to your web hosting company that has bought the name from a reseller that has taken it from the registrar that needs to go to the registry in some cases to update them.
So there's no chance it's going to be updated. And this is why I think that really you need to think of this at the architectural level.
So you may be need to find someone that brings the ownership of this data to the registrant.
For example, I think that contact data for the registrant should be embedded in the DNS zones. I think that would be a good way to keep them updated if you want to keep them updated and if you don't want to keep them updated, you're not going to get them anyway.
Thomas Roessler: could I briefly follow-up? Just a brief follow-up.
I'm very glad that Vittorio brought up this point. The point he was making was keep users in control of their own data. And that's what the accuracy principle that is listed within the OECD privacy principles, it's about keeping users, and keeping data subjects in control of the data that's being stored. That's all I wanted to say.
Michael Roberts: go ahead, George. Thank you, Mike.
George Papapavlou: I'm George Papapavlou from the European Commission. I will be much shorter, and I will only focus on the issue of bulk access.
Bulk access to WHOIS data has an obvious big potential for privacy infringement, and this has been indicated also in the debate on WHOIS that has taken place in the previous few years with the result that bulk access for marketing purposes is increasingly being questioned.
My question is what other purposes will justify bulk access to WHOIS data that would override the really enormous privacy infringement potential of this access, given that bulk access essentially, at least to me, means everybody's guilty until proven to be innocent.
I think this is a very, very large leap to take, and so I would like to ask what other purposes is there for bulk access that would justify such access.
Michael Roberts: I might observe, as someone who is involved in the drafting and adoption of the Registrar Accreditation Agreement that there was discussion about retaining the bulk access processes that were in place before ICANN was created, and the conclusion at the time was that the arrangements facilitated and would continue to facilitate a fair competition in the registrar business.
Now, that was not only a debatable conclusion at the time it was arrived at, it's certainly a debatable conclusion today but I thought it would just be useful to say that that's why that provision is in the Registrar Accreditation Agreement.
Are there any other comments? Marilyn.
Marilyn Cade: Mike, I'd like to do two things. I'd like to respond to George's comment on behalf of the previous WHOIS task force. The task force -- the survey that we -- Anna took and analyzed strongly objected to marketing uses of any of the bulk access and I think that was a strong focus. I think that broadly, we found that the community, in further consultation and in comments really focused heavily on any marketing uses of the WHOIS data. And I just wanted to offer that as a further thought, because I think that many of us feel that, although they may have been very legitimate reasons for the steps and the way that bulk access was taken at the time, certainly the task force felt that a reassessment of the purpose and what uses were possible were needed. And one of our policy recommendations addressed that.
Michael Roberts: thank you. If there are no further questions, we should now take a break. And I'd like to ask you to come back by 10 after 11:00, please. That will start our last session a little bit late, so please be prompt.
Michael Roberts: I think we'll begin very shortly, if you would take your seats.
The third session this morning is devoted to the subject of registrant/user classification and practices concerning registrant data in various WHOIS systems.
We have panelists, from my left, Chris Disspain from the Australian registry; Bruce Beckwith from the dot org PIR registry; Elana Broitman, who's going to speak to registrar practices; Keith Davidson, from the New Zealand registry, who is going to speak to registrar/registry practices in New Zealand; and Hakon Haugnes from Global Name Registry, who is going to speak to the issue of user classifications in a specialized registry.
There's been a good deal of discussion about ways in which we might protect user access, user privacy by various classification schemes. None of these have reached any sort of full definition, but there's a good deal of interest in them, which is why we have put this session together.
Some of the pertinent questions are: what is the feasibility of distinguishing different classes of domain name holders or different classes of users of WHOIS data such that the WHOIS information collected and displayed could reflect differing types of use and potentially different privacy and accuracy considerations? What are some of the ways that global TLD and country TLD registrars address accuracy and privacy issues, including data collection and verification, complaint procedures, and investigatory methods for false information, and third-party registration practices?
With that introduction and overview, I think we'll begin with Chris Disspain from the Australian registry.
Chris Disspain: thank you, Mike. Good morning, everybody. Thank you for the opportunity to come and talk about what we do in .au.
I thought I'd start by just giving a very brief overview of how .au is set up. Because we have -- our DNS is set up into separate second levels. And each one of those separate second levels has different eligibility criteria. For the purposes of this discussion, if you take com.au, that is for commercial -- commercial organizations, and it's only available to certain types of organizations, companies, registered business names, and so on.
And at the other end of the scale is .id.au, which is for individuals. And there are some in between that basically follow the com.au model.
Now, we collect all of the data that everyone's been talking about this morning, name and address and telephone number and fax number and e-mail address, et cetera, and in the case of our commercial second levels, we verify that data.
So one of the entities that is entitled to a com.au name is a company. And if you are a company, you have to provide us with what's called an Australian company number. And we check that to make sure that it is yours and that it's working. And there are various other checks and balances that we do to verify.
In the case of id.au, at the other end of the scale, which is for individuals, we don't actually require proof of Australian citizenship, even though you're supposed to be an Australian citizen if you have an id.au name.
We rely in the case of individuals on warranty that they are who they say they are and that they comply with the policy.
So one of the key things for us is that you must -- that the data that we get shows that you are actually the registrant of the domain name. We check to ensure that you are who you say you are. But what we display is only a very limited set of the data that we collect.
We only display, as far as registrant data is concerned, the name of the registrant, the verification piece of information. So in the case of a company, the Australian company number -- sit further back?
And then we have the registrant contact. And that is a contact for the registrant. And that must be either the registrant, or if it's -- obviously, the registrant is an organization, a person at that organization. It cannot be someone at the ISP or someone at the registrar or someone at a reseller.
We also publish an e-mail address for the registrant contact; we publish the technical contact's name, and the technical contact's e-mail address.
But we do not publish street addresses, telephone numbers, or fax numbers. Now, because of the way that our system works, having very easily defined second-level domains, it would be possible, if we chose to do so, to publish different levels of data for different second-level domains.
So we could, for example, publish street address, telephone number for business domain names, com.au, net.au. And not do so for individual domain names, such as id.au.
We don't do that. We have had discussions about whether we could. We don't. And we don't do it because we take a view about WHOIS that is perhaps slightly different from a lot of the views of the commercial organizations and certainly different to the views of intellectual property lawyers and some law enforcement agencies.
We think there would be -- we know there would be a cost burden to introducing separate levels of data in the WHOIS. Our feedback from the community tells us that they think it would be confusing. They don't particular want it. They're not particularly fussed. They think the level of data we provide in au is more than sufficient.
Interestingly, also, similar types of database, the company and business authority in Australia is called ASIC, the Australian Securities and Investments Commission. If you go and do a business search on the ASIC site or a company name search on the ASIC web site, you will not get the address of the company. You will not get the telephone number of the company. You will simply get confirmation that the company exists and a few other bits and pieces of information.
And we're not sure why WHOIS should be held to a higher disclosure standard than sites such as the company web site of ASIC. As a consequence of the level of data that we publish, our data is -- has a high level of accuracy than the gTLD space.
Obviously, it's accurate because it's verified to a degree. But it's also accurate because people know it's not going to get published.
So if we do have a situation where we had to, for example, send by ordinary mail a letter to registrants, we would be able to do that and know that there -- that the data is relatively accurate. And we wouldn't get that many returns.
We do have a system for dealing with complaints, complaints about inaccurate data.
There are occasions where, for example, a domain name has been registered to somebody and their business name -- the business name that is -- that makes them eligible for the domain name has expired. With almost no exceptions, we always give our registrants an opportunity to correct the data to ensure that it is accurate and that they comply with the eligibility requirement.
There is one exception. There are occasions when a company has a domain name registered and the information WHOIS shows that the company is the registrant and the company has ceased to exist as being de-registered.
And on that basis, you can actually correct. Because the company is a legal entity and if it ceases to exist, then the registrant ceases to exist.
We do not allow bulk access to our WHOIS under any circumstances whatsoever. We occasionally allow targeted access to more than single lookups. But we don't allow bulk access.
We have protocols with most of the law enforcement agencies in Australia. And those protocols are based on a simple principle that they can come to us and ask us for the full data on a particular domain name or series of domain names if they're related to a particular investigation. We have an agreed process for doing that. It has to be signed by the relevant person, et cetera. And that works quite well.
What we do not allow, however, is law enforcement agencies to go on fishing expeditions into the database to find information about domain names. And neither did we allow intellectual property lawyers to do that.
I guess what I have tried to do is simply give you an overview of the way that one ccTLD works in respect to WHOIS. We are quite used to being told by copyright attorneys and the odd law enforcement agency that our WHOIS process is obstructive and doesn't work for them.
It does, however, work for the vast majority of our domain name registrants, who seem to be quite happy and content about the fact that their data, whilst it is held by us, is not displayed by us.
Michael Roberts: thank you very much.
Bruce Beckwith: thank you. My name is Bruce Beckwith, and I am with the Public Interest Registry.
And what I'd like to speak about momentarily here are not specifically distinguishing different classes of domain name holders and different classes of users for WHOIS, because speaking on behalf of the gTLD registries, really, it made more sense to focus on data issues.
And the data issues, as has been discussed at the gTLD registry level, are, as noted here, a vast array of issues that, when you look at it, we have to classify gTLD registries in two different groupings.
One, the thin registries; and then the thick registries.
Thin -- the largest and most visible thin registry is the Verisign com, net registry, where they literally maintain the domain name, the name server, some ip information, obviously, some dates related to that, and who the registrar is.
On the thick side, you have most of the new gTLD registries, and now dot org is transitioning from a thin to a thick registry.
Remember that on the registry level, be you thin or thick, the relationship with the registrant is handled at the registrar level. Therefore, one looks to the ICANN contracts for guidance. And that is where one finds that the responsibility for the data, all the issues that were on the prior slide, are all the responsibility of the registrar.
Now, without going into a lot of detail, let me just point out two sections of the Registrar Accreditation Agreement that have a specific role here. And I've highlighted, for ease of trying to see it, and I'm not going to read the whole thing by all means.
But note that in section 18.104.22.168 of the RAA, it reads, in portion: "the registered name holder shall provide to registrar accurate and reliable contact details and promptly correct and update them during the term of the registered name registration." Simply, what that means is, as a -- the requirement is that a registrant provide accurate information, does not necessarily have to be the information of their home address; it could be their business; it could be a post office box; it could be some other location but it must be an accurate location, and other accurate details by the same token, let me point you to the very next section, which is 22.214.171.124 and here, again, I'm highlighting just the portion for the purpose of this discussion that is of significance, but I would certainly recommend reviewing the whole document and here it states, "willful provision of inaccurate or unreliable information, its willful failure promptly to update information provided to registrar," and so on, so, again, it puts the responsibility onto the registrant the registrant must communicate with the registrar to update the information and then, depending on whether or not the registry is thin or thick, the registrar then has to update the registry database given that, I would point us back to the data issues and as significant as they are, though there may be a role for registries, really, we have to look at the registrar and the registrant relationship to help identify how to handle many of these issues.
Michael Roberts: thank you.
Elana Broitman: thank you.
The issue for registrars is how much information to provide to the public, as well as how much information to collect and provide to a more limited base of users and who those users are.
The reason this issue comes up is, one, because we hear from our customers, as you've heard this morning, that they're not happy with all their data being provided.
And this is not just individuals who are unhappy because of privacy considerations, but also companies who are unhappy with having their full contact details provided when they're about to launch a new product, for example, and want to create a -- want to reserve a domain name and create a web site, attach that domain name, before they're ready to go public with the product.
And what we've seen is that the WHOIS database has actually been abused by Spammers, by others who are not just using Spam to send marketing information, but to also elicit, fraudulently elicit from registrants their credit card information and other private data.
So while we are waiting for a long-term technical solution that doesn't have to distinguish among users as much as we might today, the long-term solution being something in the nature of CRISP, there are -- various registrars have started to implement their own interim solutions to deal not with what data is displayed, but to at least deal with how that data can be mined by Spammers and others for fraud.
So I want to draw the distinction between the web-base WHOIS and the port-43 WHOIS.
The web-base WHOIS has been dealt with by a number of registrars by requiring a password by requiring a user to input a password displayed on the screen that cuts down on the readability of that information and cuts down on the data mining.
Port-43 has been a more complicated issue.
And while there has been discussion of tiered access or tiers of data provided, that seems to have fallen off the table.
And what has been under discussion more recently and what's actually becoming implemented by at least several registrars just in the past couple weeks is the idea of setting up white lists so that you've got far less data being provided to the public at large. That data may include just what's currently provided in the thin WHOIS registry; it may also include, say, a technical contact.
But you've got the full data being provided to a white list, if you will, of approved users. These users can come in through digital security licenses, they can come in through some sort of central authority or a decentralized authority provided by each registrar or registry.
The examples of users would be certainly other registrars, because that data, at least currently, is important for transfers and other considerations of just doing business, for registries for the same reasons, and for other, quote, unquote, legitimate users.
The types of users that we've -- the types of stakeholders that we've talked to so far who seem to have a traditional recognition of legitimacy are law enforcement authorities or government authorities coming in under the proper authorization, intellectual property interests.
Again, that's a somewhat difficult definition, but it's something we're working with the IP constituency on. As well as, of course, the technical users of the WHOIS database.
Now, as I said, there are several registrars today who are about to implement this kind of solution. And because it's been out on public list, I'll just mention that they are Go Daddy and eNom. And I'm sure you can go to them directly and ask them for a better description of what they're doing. I'm told that the solution they're going to be implementing soon is going to be a publicly distributed solution. So I think we'll get better information about that.
One thing just to keep in mind is that even if port-43 is limited in terms of which users get the full information, you are still left with full information available to the public through the web-based WHOIS. So you're not dealing with -- ultimately, with privacy and data elements, but you are certainly dealing with the ability to mine that data without a registrar even being able to know beforehand that that data is going to be taken and used potentially against them or their customers.
Thank you very much.
Michael Roberts: thank you very much.
Keith Davidson: hello, my name is Keith Davidson from .nz. And as you see in the picture, we live next-door to Australia down in the Pacific Ocean, but we have quite a different structure for the way we operate ccTLD to Australia, so quite interesting that people in the very corner of the South Pacific can come up with such different solutions to similar problems.
(inaudible) is for a not-for-profit society, and our concerns of the Internet are far greater than with just our .nz, more aligned to ICANN principles but running .nz is what we do. We take a fairly structured idea of society and make it difficult by running things through all different committees.
We run our registry through a wholly-owned subsidiary of .nz and we regulate through the domain name commissioner is responsible through a committee to Internet.nz. It operates totally on a first come, first served basis and we will accept any name at all, and most of our second levels.
In 1999, we started the development of our shared registry system for .nz. During that process, broad consultation occurred with the local Internet community and really evolved the appropriate policies for governance and operations of the SRS including WHOIS. With WHOIS we took careful consideration of the privacy and consumer protection laws that I would say New Zealand's privacy act is not different to many legislation in the OECD countries.
In New Zealand we looked at what other registers are publicly available and used these as reference points as to why we should or shouldn't provide information through WHOIS. Our company office allows full access to not only company names but directors and shareholders names. So you can go on-line and do a wildcard search and find people's names and which companies they're associated with. We have the same for motor vehicles. If you have a motor vehicle registration number you can search for and find the owners address. And your name and address must be listed unless you have specific law enforcement concern and get a restraining order to have your name omitted. And we have our white pages on-line with wildcard searches available throughout New Zealand.
Our WHOIS discloses the full WHOIS data that most have talked about the full details of name and address, phone number and fax and e-mail for registrant, registrar, admin contact and technical contact and the name servers, names and IP numbers. When someone does a WHOIS search of port 43, they're given the terms and conditions. And these are the terms and conditions that apply to the use of the WHOIS data.
If it's a web-based query it has no such statement of restriction. We have 137,000 names in .nz. we have 1,070,000 approximate WHOIS queries per month. There are a limit of 50 queries per hour from a single IP number and there are registrars who set higher limits than the 50 per hour.
I also, as a part-time job, run a small commercial ISP, and, for example, I use a WHOIS every 24 hours comparing to the last WHOIS and checking the data, just verifying that all my customers' details have -- where they have changed, that they're legitimate changes, and as an ISP I don't think I could live without that sort of level of WHOIS query.
We have no bulk WHOIS access and no wildcard WHOIS search.
Our registrars and registrants terms and conditions require that information will be published by WHOIS. And where false information is discovered, the registrar must take action to correct the incorrect information.
The final remedy is name removal from the zone file, and that's done at the discretion of the domain name commissioner, and there are various rules, depending on the severity of the situation, as to how long and what would constitute reasonable grounds for the removal of a name.
So far in New Zealand, we've only had one significant instance of abuse for a distributed dictionary attack through various IP collection points. It was from our neighbors in the West Island, and it involved thin (inaudible) and pro forma invoicing to registrants for equivalents of their .nz names. We've had considerable difficulty taking legal action against the perpetrator as they live in Australia and have not, as far as we know, visited New Zealand, however our thanks go to Chris Disspain for his assistance in taking action in Australia.
So I think, you know, in a small, very democratic country with a very benevolent government and a strong level of consultation with our local Internet community, I think we've been able to develop a WHOIS policy that has stood the test of time.
What we're seeing now, though, is some demand for a who-was service, mostly from the legal community wishing to know domain name details from the past. And that's something that we haven't yet addressed but we see it as coming up in due course.
We have no international domain name issues in .nz, and we see it as a low priority.
And for those who want the links, specifically, to our WHOIS and privacy policies, I'm sure this will be available on the web site later, and we have the full list of .nz policies also.
Thank you very much.
Michael Roberts: thank you, Keith.
Hakon Haugnes: great. I'm going to talk about one implementation of WHOIS, as we consider it, in a name space that is dedicated to individuals. And I've called it "would you really like to be found today?" And I'll start by telling you an interesting story that happened to us -- or not to us, not to me, fortunately, but to someone we know. And I call the story the Florida Spammer.
There was a guy, Mr. Good Guy, and he got some Spam from a certain Mr. X. Mr. X was a very well-known Florida-based Spammer, and if I tell you his name here, he would probably Spam me as well.
Mr. Good Guy put up a web site saying Mr. X is a Spammer and he didn't like that, and although the rest of the world knew it, he didn't like it personally. He went to the WHOIS and there are conditions in WHOIS that say this is not allowed but nonetheless, he got about a million e-mail addresses. He sent X million, probably to his own list as well, very obscene e-mails to the whole world, and he faked the headers of the e-mails so it appeared that it was actually Mr. Good Guy sending the e-mail.
Not only that, he, in fact, put Mr. Good Guy's address in the bottom of that e-mail saying if you don't like obscene e-mails from me, then you may just as well -- and here I'd like to have a beeping tone. "I don't care. Just contact me at my home address and my telephone number." and as a result, Mr. good guy got thousands of hate letters, hate calls, even visits to his home. And he took a long while to recover under police protection. He may have even had to move his house.
In fact, a privacy protected WHOIS would have avoided this issue. So I'm going to ask you a question. Do you really want to be found today?
I'm going to take a little bit different approach to what the others have said, and they're all very important remarks. But there's one thing I feel hasn't been really emphasized enough.
There are data protection principles enacted in national law that are applicable to any European registrar or registry for any individual on any TLD. We, as a registry are obliged to live by this law, and I would personally go to jail if we broke them.
One of these -- or three of these principles that are very important in relation to WHOIS are the following.
The first one is the collection of information shall be relevant and it shall not be excessive. In other words, you should collect information that is needed for the service provisioning. Now, for an ISP registrar, you certainly will have to collect billing information, name server information, and maybe even address if you want to contact the registrant at some point in time. But you're certainly not required to publish that information to provide the service.
The second important principle is that consent to information has to be freely given. Those of you who heard Mark speak from the epic the other day heard him say that privacy is not always about security. It's about openness. You have to be open about the information that you collect. You have to be open to the end user about how you intend to process and possibly publish that information.
And this consent has to be freely given. In other words, if you have to consent to something to get service, it's not freely given. That's the case with domain name registration today. You simply won't even get a domain name unless you consent to publishing all that information in the WHOIS database.
Finally, an important issue to data protection principles, because it's national law, is to not transfer that information outside of the European economic area. That creates a problem. WHOIS is globally accessible, and it is very difficult, maybe impossible, to restrict that transfer or to have parameters depending on where you ask from.
There's another problem with WHOIS. It's a leak. It's a big leak in WHOIS today. It relates to how we now have thick and thin registries. And you heard before that some new registries are thick. We collect all the information on registrants. But registrars, as per their contract, are still obliged to provide WHOIS. Even if we are a thick registry, we hold all the information. We are the authoritative source for that information. Registrars are still obliged to provide WHOIS in their own data models. And that creates a problem. It creates a mutual security risk. We're at risk because the data we hold could be published by someone else. And the registrar carries a risk in other cases if the registry is in a different jurisdiction than they are, the registry may be a legal risk to the registrar.
This duplication seems unnecessary now that we have thick registries and there should no longer be a need for registrars to provide WHOIS.
I was asked to consider classification parameters, and there are many classification parameters that you can apply. Who is it that you're asking about? Which registrant is it? Which jurisdiction is the registrant residing in? What's the jurisdiction of the registrar, what's the jurisdiction of the registry?
That may be several classification parameters you use. What are registrant preferences? Do you want to display your e-mail, do you want to display your phone, address? All? Registrants may have a preference about that and you can classify them in that way. And very importantly, who are you? If you're anonymous, why should you get any information at all? It's just polite, if you're asking someone who you are, I'd say tell me who you are and I'll tell you about me. And it seems to be an important classification principle, also in WHOIS. If you can identify well the requester, track him and legally bind him, it seems that access to information could be wider.
What's the frequency of requests? Are you asking once per quarter or ten times per millisecond? There are both types out there.
And what do you want? Do you want deep private information or just shallow, technical information?
You can apply some of these classification principles and we've done that. We've created a different service for WHOIS that is now being implemented and considered.
I'm not going to go into what the standard service today actually does, but I'm going to just speak quickly about how those principles are applied to the privacy enhanced WHOIS. And this applies both to what you said, port 43, and web-based WHOIS.
There should be a standard service where everybody can get some level of information. If you're anonymous, you'll get it. And you should get at least three things. You should get whether a domain name exists. You should get its status and you should get the ID's of the registrant contacts that are associated with that domain name, billing contacts, admin contacts. And you this is primarily for technical purposes.
There should be another level where you could get more information because there are legitimate uses of WHOIS. Some would argue that WHOIS is a very outdated mechanism for providing that kind of information. Is it a public service, like the telephone directory, in which case you should be able to opt out for any detailed information. It's a very hefty discussion but you heard from speakers that getting access to that WHOIS information is important. So there should be a level where you can get more.
If you enter into an agreement when you ask for the information, you will get a password that will expire after a very short period. If you provide your credit card, we can track you. We can find out more reasonably who you are. And you should provide a valid e-mail address that we can send the information to.
All those things are part of making the guy who asks more responsible. In that case, you'll get the name, address, and you'll get all contacts that differ from the registrant contact.
And I would reiterate again that it's a big problem that the registrant contacts and all the admin, tech, and billing contacts are usually the same.
Finally, there should be an extensive level where certain entities that have a particularly valid reason for looking into WHOIS will get what we call a persistent password. In other words, pretty open access, just like it is today.
It's not easy to get. You may have to send an agreement that is signed by a lawyer to us and maybe that has to be sent back to you by FedEx. In that case, you could access the whole set of information and there are only few entities with such access.
Now, does this create a good WHOIS for individuals? We think so. We think it incorporates a set of balanced principles for a WHOIS for individuals.
We don't think it's appropriate with unrestricted access to individuals' details. It's a threat to privacy, to freedom of speech and personal safety. Not only in Florida but in many other parts of the world as well.
ICANN needs to allow registrar and registries to adhere to the data protection directive and national laws. There are contracts that need to be updated. We need to remove the WHOIS leak. Why should registrars provide WHOIS and have that financial and legal burden when thick registries are providing the service?
And why do we have a multitude of contacts? How many individuals have a billing, technical, or admin contact? Is that your brother, your father and your daughter? Or is it totally irrelevant for individuals?
In fact, most individuals, I would hypothesize, has no clue what WHOIS is anyway, and they find it a big surprise, like you heard from other presenters, that information even exists out there. Preposterous, was one word that they used.
We need to maintain the accessibility to WHOIS, but we need to remove the vulnerability. And there are proxying services or e-mail forwarding services that would may be implement that well.
And finally, because these are balanced principles, we need to maintain urgent take-down procedures. There are some very bad guys out there that we want to take down. We can't keep them on the net. But that is usually sold at the registrar level. If you want someone out of the net in 15 seconds there's no point in sending him a letter. And that's totally unrelated to WHOIS. You don't need WHOIS information to take a site down in 15 and a half seconds.
That's all I have. Thank you.
Michael Roberts: thank you very much. Before we begin the final wrap-up, we do have time for a few questions from the floor if we have some.
I'd like to repeat, please identify yourself and please speak very slowly and clearly.
Willie Black: Willie Black, Nominet uk. I was interested in the white list. I think at a previous workshop we had an attempt to debulk WHOIS and it's easy for the crooks to get into the white list if you're not careful. I'd be interested if you'd like to say a bit more on how you prevent somebody masquerading as a benign trademark agent or a perfectly benign registrar. It really requires a fair amount of judgment because once the database is out as a whole, it's fresh for six months, can be used for Spam, you sue the guys, they disappear into the back of beyond and then reappear as yet another, you know, cooperative group.
Elana Broitman: Willie, you've pinpointed the exact most cumbersome issue. And Jane and I were talking about this between sessions so let me say a couple of things.
One, in terms of freshness of the data, we actually find as a registrar that the Spammers want the data within a couple of days, even just hours of registration. That's when it's at its freshest because that's when they can come in and often masquerade as an affiliate.
So the further we can move away from the date of download of the data the better off everybody is in terms of safety.
But in terms of how to craft the white list, there's no conclusion of course at this point but I think one way to approach would be you have licenses being granted to those on the white list. There could be licenses that are more permanent that go through a -- and I use the word license in a generic way. It could be digital certificates or something of that sort that facilitates Internet commerce.
So you can have ones that are more permanent that are granted based on a larger degree of scrutiny, and could be granted to better known users, and based on also contracts. Not that contracts are perfect in any way, but it's something else to try and keep users to the same kind of criteria that registrars, for example, are kept to. No marketing, that sort of thing.
And then there could be temporary licenses where there's a degree of scrutiny that may be different from permanent licenses where they can go -- they can get access either to less full data or to data on a temporary basis before they're approved for a more permanent license.
These are just some ideas. None of them are perfect but it's just ways to -- I think what we're talking about here is ways to close the gaps and close the loopholes as much as possible, recognizing we will never completely get rid of them.
Willie Black: a rider to that. The problem is being nondiscriminatory, and in our position as registries, we really have to be nondiscriminatory. And if you start making judgments about whether somebody is good guy or bad guy, it can lead you into competition problems in lots of jurisdictions.
So I just throw that one in about trying to make a judgment based on just whether somebody is a good guy or bad. Something to think about.
Elana Broitman: thank you, and we recognize that. And for registrars, certainly, there are different issues depending on the universe of user you're dealing with in terms of how much you can distinguish between the users in that universe.
Kathryn Kleiman: Kathryn Kleiman, Noncommercial Users Constituency. Sorry, Elana, you're the one being picked on today. Following up on the white list, you mentioned, as you kind of listed people who would be accessing it, that law enforcement and government, under proper authority. So I'm going to ask you what that means about proper authority, whether it's like an open pipe to the U.S. FBI or whether it's subpoenas, and whether there are similar type of proper authority for intellectual property.
And a fact question for Keith Davidson. Has anyone ever sought, in their domain name registration, anonymity, for example, being a human rights group?
Keith Davidson: no is the simple answer. What we do is if -- or what we have in case of that situation is if someone said "I do not want my data in WHOIS," we would say please go and apply to some other registry. If you're in .nz you must publish in the WHOIS.
Elana Broitman: on the question of proper authorities, as a variety of various interest groups within ICANN have been doing, we've been talking to the interagency group in the U.S. government regarding the WHOIS. And of course one of the questions, which we haven't fully explored, is what is the nature of proper authorities.
And I think that's a continuing and evolving issue. I think it's an issue that will come up in different contexts in different jurisdictions because it's such a -- as you know, such a specific issue to each jurisdiction.
And because of the changes in legal authorities, in the past two years, especially in the U.S., I think that's particularly sensitive issue that we absolutely need to explore very carefully. Because I don't believe -- and now I'm speaking personally -- that many people outside of maybe certain U.S. government agencies would think that the WHOIS database ought to be an opportunity to go beyond what is authorized legally today.
Chris Disspain: Mike, can I give you a real example about proper authority? In Australia we have as I said protocols with various government agencies and I don't think there would be any argument that the Australian federal police, for example, and so on, the Australian Consumer Commission.
But we were approached the other day by one of our local councils who are the smallest area of government if indeed you can call them government at all, and their parking inspectors were asking for access to the WHOIS database on the grounds that "we're law enforcement officers and we're entitled to it." So you have to be very careful about making judgments. If you say if it's government, it's okay. Maybe it isn't. It depends what you mean by government, so you have to be very, very careful.
Stephane Bruno: good morning. My name is Stephane Bruno from UNDP Haiti.
I see that the discussion for the WHOIS system, we want to choose between a restrictive system or a real open one. I believe we should come to a flexible one because what we can say is that national laws in different countries depend on the conception of the society we have in this country.
For example, in some developing countries, in some villages, everybody knows everybody and everybody knows where everybody lives so we don't really have this concern about protecting our identity. But in other industrialized, really developed countries, where in the city you rarely meet people you know in the industry, you have more and more concerns about protecting your privacy. So we should find a way to accommodate all the situations because we have two extremes and have a flexible system.
Second, we need to have two different policies. I repeat here what Vittorio said. because collecting data is one thing, publishing it is something else. We can have a flexible system that allows us to collect whatever information we need and have a flexible policy also to know exactly what we publish based on the different laws and situation in different societies.
So that's what I wanted to say. Thank you.
Michael Roberts: thank you.
Panel, do you have any more comments?
Well, that wraps up the third session this morning. Let's give our panelists a hand.
Michael Roberts: we'll very shortly go into some words from Bruce Tonkin on the current status of the GNSO's work on WHOIS and plans.
Bruce Tonkin: okay. In the interest of time, I might as well start.
I'm here in a slightly different role. I have a number of different roles. This role is as acting chair of the GNSO WHOIS steering committee.
The GNSO decided after -- earlier this year, the ICANN staff produced an issues report on WHOIS, and there were around 20 issues identified. Then there was a workshop in Montreal where people also raised further issues that they had, and we were starting to get a sense of the concern in the community about different aspects of WHOIS.
We decided to form a steering group within the GNSO to determine what are the highest priority issues, and how should we structure further policy development work on those issues.
So far, the steering group identified three distinct areas. One, and we've heard quite a bit about this today, is limiting the data mining of contact information. The second one is reviewing what data is currently collected, is all that data necessary, and reviewing what data is displayed and working out whether we should perhaps restrict the amount of data displayed or perhaps choose to display different data to different people. And thirdly, accuracy has continuously been mentioned as a key issue for users of the WHOIS information.
The first area, data mining. The first issue here is really trying to collect information on the current users that use WHOIS today. And one thing we're trying to get is we understand the general sense that people in the intellectual property community or consumers, et cetera, are accessing the WHOIS. What we don't have is specific information on how many queries does a typical intellectual property organization need to provide on a daily basis. What data do they need in each query, and so on.
And so the first step in looking at this problem is to collect information.
Secondly, reviewing the approaches that are being used by registrars today to prevent data mining, and perhaps specify requirements or best practices that exist today in the area of protecting against data mining.
Hopefully then we can feed requirements to the IETF protocol community to standardize approaches to data mining and there is currently a work group within the IETF called CRISP which is trying to develop a next-generation protocol which will do far more than the current port 43 protocol.
But that, in itself, is a problem, because as we've seen recently, with wildcards, protocols can do many things and we still need the policy development to determine what is appropriate for that protocol to do in the context of privacy.
The second area is looking at what data is collected and what data is displayed. First issue is finding out how registrants today are informed during their registration process about the data that's collected and the data that's made accessible by others.
Presently, most registrars probably include these terms and conditions in an agreement with the registrant that is probably several pages long, and the experience that most Internet users have, whenever they see terms and conditions, is to click "okay" and not read them.
So we need to determine what is the best practice to inform registrants on the data issues.
Then we need to conduct an analysis of the purposes for which data is collected, and really understand what is the purpose of the registrant, the admin, and the technical contacts. We need to determine what is necessary to be displayed as well. And we need to balance contactability, which there's obviously many reasons why it is good to be able to contact the person associated with the domain name, but need to balance that with protecting the privacy of the person responsible for the domain name.
So we need to determine what methods we can use to protect privacy and determine whether -- what methods are there today and whether they need to be enhanced.
The third area is accuracy. One, again, first thing here is collecting further information. What do registrars currently do today to verify contact information at the time of registration? Look at what other providers of electronic services do to verify contact information, what do telecommunications do, what do credit card companies do when they're verifying information?
Out of all this, create a best practices document that can describe techniques that can be used to verify contact information. and this document can be shared fairly widely.
The GNSO may decide to make it a consensus policy, and, therefore, enforceable in the context of the registry/registrar agreements. But the best practice document should be able to be used in other areas which are basically relying on voluntary compliance.
The final thing here, then, is to determine how to cost-effectively deal with the problem where deliberate false information is provided. This is a far more difficult issue. Just correcting for people's typing mistakes, there's certainly some techniques that can be used to try to increase the accuracy, whether somebody has mistyped an e-mail address or something. It's a different problem when someone has deliberately provided false information and it's far more expensive and that's one of the issues that needs to be considered.
So the next steps for the GNSO, the GNSO today will be meeting to review these three areas and review these terms of reference to decide whether or not to commence policy development. So we have not yet made a decision on going the next step.
If we did decide to go ahead with the policy development, then the first key step is collecting factual information, accurate factual information, on these issues. And for that we seek help from the ICANN community, and we've had panelists here today providing information.
But it's really very important to get the right information to form a solid basis for policy development.
So I'll leave it there.
Michael Roberts: thank you, Bruce.
We did provide that if the audience had any questions for you at this time that they could ask them; however, I'd like to observe that the council session is open and has its own opportunity for comments from the floor; is that correct?
Bruce Tonkin: the council session is certainly open this afternoon. We don't normally have public comments during that process.
Michael Roberts: okay. If anybody wants to say something that Bruce should bear in mind while he's chairing the council session today, now is your opportunity.
All right, Bruce, I think you're in good shape.
We're going to turn now to some remarks from -- okay. We have one comment.
Naomasa: I'm not sure my specific question is already asked by somebody or not. In the consideration of the data mining protection, I think it has very close relation to the wildcard searching, because the -- if one permit the wildcard searching, then very easy algorithm will permit to retrieve the whole database using the iteration of the wildcard search. This is a very technical -- very easy technical thing, if one knows the computer programming.
On the other hand, wildcard search very much have the demand from the IP, intellectual property, people because they want to know about the type of cybersquatting. So-called cybersquatting.
My question is whether or not this relation between the wildcard search and data mining protection is already discussed among the -- in the GNSO committee. This is my question.
Bruce Tonkin: yes, I think it's certainly been discussed. And I guess this is part of the GNSO community here today in that we do have representatives from intellectual property and registrars and so on. So it has certainly been discussed.
I think the issue, you're quite right, that if you do wildcard searches in the present port 43 or interactive WHOIS it does make it easier for data mining. And the protocols as they're currently written can support that.
What you'll find is that most major registrars don't provide wildcard searching today.
The other issue, though, is that the information that's available via bulk WHOIS, third parties may well take the information from bulk WHOIS and they, in turn, may provide wildcard searching. So one of the issues is actually tracking what happens to the data after it's registered. It first goes to the registrar. The registrar provides it to the registry, and then the registrar also, by contract, must provide bulk access. And then those people that have bulk access, in turn, can provide other services.
So it is quite a large problem to consider, and I think the view in the community is -- from many of us in the community is one of the critical issues is anonymous data mining where people are accessing a free service without any kind of identification, carrying out fairly sophisticated searches on the data. And I think what some others have said here today is that if there is a strong need in the intellectual property area to have more advanced capability to search large quantities of data that they also probably need to, one, be identified, and perhaps provide specific reasons for each case of searching.
But it will certainly be one of the things in the first terms of reference for data mining is to collect information from people like intellectual property community as to what the current uses of WHOIS are and what they're seeking.
Michael Roberts: thank you.
We're going to have some closing remarks now from Paul Twomey, summarizing the situation from his point of view, and next steps.
Paul Twomey: thanks, Mike.
And can I say thanks very much to all the participants at the workshop here today, and to the people who have been attending.
I take you back to the origins of this workshop. And the origins have been a variety of positions on an issue that's been before this community for quite some time. And when we started this process off in Montreal, we first saw the process as being a two-part one.
One is a process of sharing the information and fact-gathering; and the second one moving to a process of having the various parts of the ICANN constituencies and supporting organizations to find a pragmatic way to move forward. And I think we're coming towards the second stage now quite clearly.
I appreciate very much the presentations that have been made and the issues that have been raised.
I think we now have to think both within the GNSO process but also the GAC process, and others with the -- of how we can now move together to bring a cross-constituency approach to make sure we move towards defining four, five, six major issues through which Bruce has just taken us, through -- I understand the issue of -- a couple of other issues that I'll come back to may still be on the table, that we actually move, then, to having a cross-group look at what are pragmatic solutions forward. I think this is very important.
I think, potentially, the stage of information-sharing now may be as useful and should continue. The risk with information-sharing is that it reinforces people's positions. And while we appreciate that if we're an effective organization and play an effective stewardship role for the technical administration of aspects of the Internet, part of that effectiveness is that we get things done, we chop wood.
And so I'm now calling very much upon the community to move, you know, progressing from where we have been now to the next stage of really defining priority areas that we can come to pragmatic consensus decisions on.
In that process, probably nobody will be happy. That's -- the definition is do we have enough people in the -- a little bit unhappy, not too unhappy or whatever. But it really is important for us now to move to practical, pragmatic outcomes.
I would reinforce in that process that there's not necessarily a one-size-fits-all approach to that, and I think there is a lot to be learned, as is indicated here today, I think there's a lot to be learned in that process and to be observed about the practice being pursued by various ccTLD managers, administrators, as well as some of the discussion within the gTLD community.
I think there are some areas of your participation that I would ask for you to really consider.
In light of the workshop's presentations and discussions, I really think participants here need to consider what additional input, if any, needs to be provided to the IETF's CRISP working group to inform their efforts to refine requirements, for instance, to identify community of users, decide on scope, identifying needs, or determining features.
I strongly exhort this because that crisp work is continuing and it also will come to a conclusion. Part of the responsibility of bottom-up is participation.
So that's one point that I'd say, please, if you've got any further thoughts or issues or considerations for the CRISP working group, now is the time to address them.
I certainly think one of the issues that has come up from this workshop is how we need to move forward to address the internationalization issue of searches and presentation of WHOIS data, that the ICANN community needs to address.
And I understand that even in that discussion, there are potentially, you know, two streams of thought; one, that there are issues here to be addressed and, other, that potentially the implementation process will iron those processes out.
I suspect we need to head that down as sort of one of the issues for further consideration by the process going forward.
Process going forward obviously has to pick up the issues, I think, that Bruce outlined the GNSO is considering.
I think also in considering some of the ways the gTLD and ccTLD registrars address various accuracy and privacy issues, we should ask the question, are there any methods that the community believes are worth investigating further or any lessons that can be learned and applied broadly to advance the ICANN community's work on WHOIS.
So I suppose not just recognizing that different stuff is happening, but coming to the idea of best practice, is there an opportunity, then, to put down on paper and to synthesize for many potentially what is the best practices being followed by others.
How do we go forward around those sort of requests. The GNSO has a process that it's undertaking, and I applaud that. And that needs to continue, obviously.
But the GAC, I suspect, will probably also continue -- consider this issue to be important. I expect it will continue in the future. I know the ALAC has as well.
The president's committee on WHOIS, which we've -- we foreshadowed in Montreal, is still, I think, an appropriate mechanism we need to utilize to try to bring those various constituencies and various viewpoints to a common position.
And I will remind you as CEO that I have no intention of promoting the idea of having the board having to get, if you like, silo-type communications. I made quite a speech about this in Montreal. It's very important on this sort of an issue that the various constituencies come together and work together and try to come to a common position, and that has to be a pragmatic approach. I'm afraid that's simply a statement of reality.
Purist approaches, the word "purist" by the way is quite a word that floats around descriptions of ICANN at the moment. Purist approaches to come up through the different silos and result in the board being asked to consider a cross of range of potential options is not something that is sort of envisaged in the ICANN 2.0 framework has been put together.
So the pressure is on yourselves to start moving pragmatically towards what are the key things to be addressed and the key things to be solved.
I think there's been, you know, a number of the issues there of the three things that the GNSO's considering, the internationalization issue, issues, I spoke to potentially one as well, that are really the key questions that need to be looked at and addressed.
I would exhort you, therefore, for individuals and organizations interested in that cross-constituency process, to send recommendations, again, for individuals to serve on the president's committee on WHOIS. We will be convening that in the next weeks. We have candidates already.
I've left it open for this workshop to get further candidates. That committee will clearly, obviously, liaise and interplay with the GNSO in established processes and also GAC processes. But we want to convene that as a mechanism to have that sort of cross-constituency discussion.
So the next stage of this WHOIS discussion is one which moves now from sort of workshops, if you like, now much more to committee-type determination of here's what the issues and proposals for potential pragmatic solutions.
I don't know if there's any questions to that.
Bruce Tonkin: Paul, I might just respond briefly on the CRISP issue.
And I discussed this with John Klensin yesterday, who's representing the IETF with the ICANN board.
And the perception John has, and I think it's a good one, is the initial question that was asked of me is, are there any additional requirements that should be included in the work on CRISP.
And it turns out that's actually the wrong question.
I looked through the CRISP requirements, and it's basically a classic case of getting a lot of technical people together and all of them putting what they think their pet requirements should be to create the greatest new protocol for the future.
The correct question to ask is, what are the most important issues that CRISP needs to solve?
And I think using that as a question, that the -- one of the outcomes of some of the work in the GNSO and, clearly, your president's committee, is really defining what are the important things that CRISP needs to solve, and to solve those quickly. And the expectation I think John has is, if we are able to narrow down the requirements rather than expand them, we will be able to decrease the amount of time it takes to converge on a new protocol.
But bear in mind that we've got to be very clear that a protocol is not a policy solution. You need a policy, and a protocol can implement that.
And so to give you an example with port-43, it's perfectly acceptable from a protocol point of view for you to send me a request saying, "please tell me about the domain name icann.org." and it's perfectly correct from a protocol point of view to tell you, in my response, "get lost." So -- and that's perfectly correct technically.
But, clearly, that's not the outcome you're looking for.
So we need to be careful we distinguish policy from technical protocol development.
Michael Roberts: well, thank you.
I think with that, we'll adjourn the workshop.
I'd remind everyone of Paul's words, that this has been a way stop on a process, and we hope that this has now transitioned from educating ourselves and others to being more concrete about proposals to modernize WHOIS.
(workshop adjourned. 12:35 p.m.)