D15.2
Technical Plan for the proposed registry
operation
This
should present a comprehensive technical plan for the proposed registry
operations. In addition to providing basic information concerning the
operator's proposed technical solution (with appropriate diagrams), this
section offers the registry operator an opportunity to demonstrate that it has
carefully analyzed the technical requirements of registry operation. Factors
that should be addressed in the technical plan include:
The technical plan for KDDISOL's
new gTLD Registry is composed of two parts.
The duration of the first is expected to be approximately one year. During this time our subcontractor, VeriSign
Global Registry Services “NSI”, will be responsible for operating a
"virtual" registry. This will
enable us time to secure facilities, equipment and, most-of-all, technical
know-how which we believe to be vital to the successful implementation of the
new gTLD. Every effort will be made to
expedite the transition from this initial operational period to "Phase
2" at which time we will assume full operating responsibility for the
registry. We believe that by utilizing
the high-level of technology and experience of NSI we will comfortably introduce
not only the new gTLD but also ourselves as the registry operator.
The overriding
concern which directed us to opt for a phased approach and to use the services
of NSI was that we considered the requirement to assure the continued stability
of the internet to be the highest priority of the global community regarding
the introduction of new gTLDs. As is
set forth in this document, the experience and capacity of KDD in basic
research, technical innovation and international marketing and
telecommunications provides a remarkably compatible opportunity for the
selection of an Asian-based registry.
There is no doubt that we might have been able to successfully implement
our new gTLD without the assistance of NSI.
We believe, however, that speculation about a matter of this
significance was not in the interest of either ourselves or the global Internet
community. It was in this environment
that our technical plan for the proposed registry was developed.
D15.2.1
General
Description of Facilities and Systems
Address
all locations of systems. Provide diagrams of all of the systems operating at
each location. Address the specific types of systems being used, their
capacity, and their interoperability, general availability, and level of
security. Describe in detail buildings, hardware, software systems, environmental
equipment, Internet connectivity, etc
Figure D15.2-1 - Domain Name
Registration and Resolution Overview
A Shared Registration System (SRS) and
Top-Level Domain (TLD) infrastructure are the two major components of the
Registry. The Registry SRS enables the Registration Service, Directory Service
(Whois), and Customer Service, while supporting the Domain Name Resolution
Service by generating and distributing zone files. The TLD system provides the
infrastructure and common platform for the Domain name Resolution Service.
The SRS is a protocol and associated
hardware and software that permits multiple registrars to provide Internet
domain-name registration services within the TLDs administered by the Registry.
The SRS provides equivalent access to all Registrars to register domain names
in the TLDs administered by the Registry. The System will generate the zone
files for the new TLD and distribute them to a TLD constellation to enable
domain-name resolution across the Internet.
A Whois service will be provided
through the SRS that will allow users to query the availability of a domain
name.
Registrars access the System through a
Registry Registrar Protocol (RRP) to register domain names and perform domain
name-related functions such as registering name servers, renewing
registrations, and deletions, transfers and updates to domain names registered
by that registrar. Registrars have a web-based interface to access the System
to perform administrative functions, generate reports, perform global domain
name updates, and perform other self-service maintenance functions not
available through RRP.
The Registry invoices the registrars
for the domain names registered, renewed, and transferred. The Registry provides
support to the registrars through Customer Support Representatives (CSRs). The
CSRs have their own web-based interface to the Registry, through which they can
query and perform updates per the registrar requests after authenticating the
registrar. Registry CSRs are trained to provide first-level customer support,
and are proficient in customer care skills.
Other external interfaces include
Registry users who perform Whois queries to the System to determine the
availability of a particular domain name or names. The Whois service is
available via both a standard command-line interface and a web-based interface.
The TLD infrastructure includes
geographically dispersed TLD name servers.
These name servers will be located within the Internet at the topological
cores, which roughly correspond to major peering centers for the backbone
network providers. Locating these
servers at or near the major peering centers ensures low-latency access from
networks that carry the bulk of the Internet traffic. Initially, there will be three
name servers located in Asia, and the United States. Overall performance of the
Internet and the services that depend on name resolution is enhanced by this
server placement strategy.
In phase 1, Services
will be provided by KDDISOL by using its subcontractor VeriSign Global Registry Service’s new
state-of-the-art facility in Lakeside II, a 101,875-square-foot office building
in Lakeside @ Loudon Tech Center in Sterling, Virginia. The space will include
the computer facility, known as the data center, and most personnel involved
with the Proposed Registry, including operations personnel, engineering,
quality assurance staff, administrative support staff, and customer care
support staff (See D15.2.1.7.1).
In phase 2, Services
will be provided by KDDISOL’s owned servers in KDD Otemachi Building, a 380,000
square-feet office building in Otemachi, Tokyo. Also KDDISOL will distribute
the gTLD servers worldwide, ensuring geographically and topologically diversity
(See D15.2.1.7.2 and D15.2.1.7.3).
D15.2.1.2
System/network Diagrams
D15.2.1.2
.1 Registry Architecture
Figure D15.2-2 Sample Registry Architecture
D15.2.1.2.2
System configurations
The registry onsite
and TLD system configurations will consist of multi-processor UNIX
configurations with up to 16GBs of memory. Other equipment used to support the
Registry includes large capacity border routers, high-performance firewalls,
load balancers, and switches. The entire system and network are built so that
there is no single point of failure, and includes mechanisms to automatically
fail over when errors are detected. A second level of redundancy is provided by
an offsite Disaster Recovery (DR) facility where the Registry processes can be migrated
on short notice.
To accommodate future
growth the configuration can be scaled to handle additional registrar
connections and registrations. There is an n-to-n relationship of RRP
Application Gateways to RRP Application Servers; depending on where the
bottlenecks occur additional servers can simply be added. Because changing the
database systems is more complex, it is designed to support the full complement
of registrations expected over the next four to five years.
Equipment, processes
and procedures have been designed for the seamless operation and support of the
Registry and TLD systems. A Registry Command Center(RCC) will be established
and equipped with the latest monitoring tools for monitoring all the components
on a pro-active basis in order to identify and resolve issues before they
become problems. There will be an isolated Operations, Test, and Evaluation
(OT&E) environment for Registrars to test their interface to the SRS
software. KDDISOL will also test any new versions of SRS software or hardware
configuration upgrades before they are introduced into the production
environment.
Table
D15.2-1 Equipment List for Sample Registry Architecture
Description |
Product |
No. |
Networking |
Load balancer |
2 |
Cisco Router |
2 |
|
RRPAG |
Unix Server |
2 |
DNS |
Unix Server(Sun) |
2 |
Web |
Unix Server(Sun) |
2 |
Mail |
Unix Server(Sun) |
2 |
FTP |
Unix Server(Sun) |
2 |
Firewall |
Redundant Firewall |
2 |
Dynamo |
Unix Server(Sun) |
2 |
Whois |
Unix Server(Sun) |
2 |
RRPAS |
2CPU Unix Server |
2 |
2CPU Unix Server |
2 |
|
RRPRS/SS |
2CPU Unix Server |
1 |
External Storage |
1 |
|
ZCK |
2CPU Unix Server |
1 |
External Storage |
2 |
|
DB1 |
2CPU Unix Server |
1 |
External Storage |
1 |
|
DB2 |
2CPU Unix Server |
1 |
External Storage |
1 |
|
Storage |
EMC |
1 |
Backup solution |
EDM Symmetric |
1 |
The systems will be
initially configured with up to 16GB of memory and up over 100GB of storage.
This is more than sufficient to support the introduction of a new TLD. When needed, the systems are scalable both
vertically through the addition of memory and disk space, and horizontally with
additional systems.
D15.2.1.4
System Interoperability
The Shared
Registration System (SRS) is a protocol and associated hardware and software
that permit multiple registrars to provide Internet domain-name registration
services within the TLDs. It has been designed and is operated as a single,
interoperable system, where each component is a critical element in the
registry processing. An extensive evaluation and quality assurance process
ensures compatibility and interoperability when new features, software, or
hardware are added to the system.
D15.2.1.5
System Availability
The objective of the
Registry design is to provide 100% planned system availability. This is accomplished through complete system
and configuration redundancy, and a process commitment to not execute any
system or application changes until they are thoroughly tested in an isolated
Operations, Test, and Evaluation (OT&E) environment.
D15.2.1.6
Facility and Site Descriptions
D15.2.1.6.1 VeriSign Global
Registry Production Data Center.
This data center is
located in the Lakeside II building in the Lakeside Technical Center in
Sterling, VA. The 10,600 data center is operated 24x7x365. Onsite staff from
the Registry Command Center (RCC) operates and monitors the site and the equipment
in the data center room. This data center is not located in any flood plains.
Ceiling height is a minimum of 8.5 feet with ventilation being provided via
under-floor airflow generated by eight air-cooled HVAC units of 25 tons each,
providing for N+3 redundancy.
Temperature is maintained at 70 degrees Fahrenheit +/- 2 degrees. Static
conditions are maintained within equipment manufacturers tolerances.
Power to this
facility is routed through a Uninterruptible Power Supply (UPS) capable of
sustaining the data center for at least 15 minutes. However, the UPS is needed
only for the few seconds it takes for a 750KW generator to start automatically.
A second 900KW generator is available as additional backup. Power is routed
through eight power distribution units (PDUs) with each server being
redundantly supplied via two separate PDUs. All racks and equipment are grounded.
D15.2.1.6.2
KDD Otemachi Building
This
data center is located in KDD Otemachi Building in Otemachi, Tokyo. It is
operated 24x7x365. Onsite staff from the Registry Command Center (RCC) operates
and monitors the site and the equipment in the data center room. This data
center is not located in any flood plains. Ceiling height is a minimum of 9
feet with ventilation being provided via under-floor airflow generated by
air-cooling units on each floor. Temperature is maintained constant at all the
time. Static conditions are maintained within equipment manufacturers
tolerances.
Power
to this facility is routed through a Uninterruptible Power Supply (UPS) capable
of sustaining the data center for at least 15 minutes. However, UPS is needed
only for the few seconds it takes for a 3000KW generator to start
automatically. All racks and equipment are grounded.
KDDISOL will distribute
the gTLD name servers worldwide to best serve the Internet community. Each
remote site is required to meet high standards for support of the TLD servers.
The geographically and topologically diverse sites provide space in secure,
high-availability collocation centers designed and built using industry best
models. At these sites, TLD servers are housed in secure areas and supported by
n+1 power and cooling capabilities. They are redundantly connected to the
facility’s switching fabric with full-duplex 100Mbps connections and have
diverse access to large capacity backbone circuits. Access to the TLD servers
is controlled by Access Control Lists (ACLs) on border routers that exclude all
traffic from the Internet other than UDP and TCP queries. There are 99.7+%
uptime requirements for connectivity, power, and cooling to ensure
uninterrupted availability.
D15.2.1.7
Internet connectivity
Refer to Network Capacities in Section D15.2.10.3
D15.2.2
Registry Registrar Model
Please
describe in detail.
D15.2.2.1
Registry Registrar Model
The Registry accepts registrations and
registration service requests from all accredited, licensed registrars, while
protecting the integrity of registrations from unauthorized access and
interference by third parties. Every new domain name application is checked to
ensure that the domain name is not already registered. This function demands
exceptional speed and accuracy to confirm registrations definitively and to
arbitrate near-simultaneous requests for the same domain name.
Domain name registrations and name
servers, including domain name, name servers, IP address, registrar name,
transfer date, registration period, expiration date, status, registration
creation date, created by, updated date, and updated by information is maintained
by the Registry. The Registry is the authoritative source for its TLD zone
file content (i.e., domain name, name server, and associated IP address). This
is commonly referred to a “thin” registry model. The registrar of the
particular domain name or name server maintains all other customer data. This
protects customer privacy, gives greater flexibility to registrars, and allows
them to determine their business model. KDDISOL will have a formal contractual
relationship with each individual registrar accredited for registering domain
names in their new TLD.
The Registry database used to support inquiries to identify
the registrar associated with a specific domain name is currently called
“Whois.” Whois enables registrars and potential registrants to establish the
availability for registration of selected domain names. Internet users also use
it to identify the registrar controlling a domain name.
Registration of a domain name or name server in the Registry
database does not automatically create entries in the Internet DNS. For this to
occur, a zone file associating all registered domain names with their
corresponding IP addresses is generated and exported to the DNS root servers
for the TLD. KDDISOL will operate and maintain distributed root servers to
which the zone file is exported and from which the domain name information is
disseminated to the Internet community. The deployment and operation of the
new TLD name servers is the responsibility of KDDISOL.
To enable close to 100% Registry
availability, multiple database servers are used, with off-site backup to
protect against catastrophic data loss. Redundancy is found at almost every
level within the Registry to ensure high-availability of the systems and
applications for the Registrars.
SRS is the Registry architecture and processes used to
enable registrations by multiple registrars. It includes the Registry Registrar
Protocol (RRP), which is used to support communications between the Registry
and Registrars, and provides the security and authentication functions to
protect the Registry database while supporting all necessary registrar
operations. RRP is also used during the certification process for accredited
registrars for operational testing and evaluation of registrar implementations
of the RRP prior to commencement of actual registrar operations. KDDISOL will be responsible for providing
the RRP software interfaces, documentation, and training to accredited
registrars for the new TLD. Hands-on technical support to new registrars should
be available from KDDISOL to assist them in resolving difficulties in
successful interfacing with the Registry.
D15.2.2.2 RRP Description
The Network
Solutions, Inc. Registry under the auspices of the Shared Registration System
program developed RRP. The protocol was initially deployed in April 1999 as
part of a test bed implementation of the Shared Registration System with five
registrars. Additional registrars began
using the protocol in July 1999. RRP has been published as Informational
RFC2832, and that open source software is available for both clients and
servers.
The Registry stores information about registered domain
names and associated name servers. A domain name's data includes its name, name
servers, registrar, registration expiration date, and status. A name server's
data includes its server name, IP addresses, and registrar. RRP provides a
mechanism to perform various functions to domain names, such as:
·
Update the name
servers of a domain name.
Each RRP session is
encrypted using the current Secure Socket Layer (SSL) v3.0 protocol. SSL
provides privacy services that reduce the risk of inadvertent disclosure of
registrar-sensitive information, such as the registrar's user identifier and
password.
All registrant specific information is retained
by Registrars.
Database
size, throughput, scalability, procedures for object creation, editing, and
deletion, change notifications, registrar transfer procedures, grace period
implementation, reporting capabilities, etc.
D15.2.3.1
Size
The Registry uses Oracle RDBMS to store all of the domain names for a
TLD. Since the size of the Registry is determined by the number of domain names
which are to be stored, the size will vary as new domains are added. Oracle is
used by many organizations around the world to store large amounts of
information – in many cases, significantly more than will be required for even
the largest domain.
The throughput of the system is dependent upon several different factors
of the hardware being used; number of processors, amount of memory, disk drive
configuration. The current configuration can support well in excess of 600
million transactions a month.
Oracle has sufficient ability to scale in a variety of different methods
based upon the requirements being placed upon it. However, based on the
anticipated size of the new gTLD domain, there will be no problem with Oracle
database scaling.
The Registry implementation performs management of the Registry objects
at both the database and business layer levels. In general, the business layer
validates any request to the database and an Oracle stored procedure is used to
perform the actual changes to the database.
D15.2.3.5
Domain Level Capabilities
For each instance
where a second level domain holder wants to change its Registrar for an
existing domain name (i.e., a domain name that appears in a particular
top-level domain zone file), the gaining Registrar shall obtain express
authorization from an individual who has the apparent authority to legally bind
the second level domain holder (as reflected in the database of the losing
Registrar). In those instances when the Registrar of record is being changed
simultaneously with a transfer of a domain name from one party to another, the
gaining Registrar shall also obtain appropriate authorization for the transfer.
This information shall be provided to the losing registrar if requested. The
form of the authorization is left to the discretion of the gaining registrar.
The registration
agreement between each Registrar and its second level domain holder shall
include a provision explaining that a second level domain holder will be
prohibited from changing its Registrar during the first 60 days after initial
registration of the domain name with the Registrar.
The transfer
procedure is an RRP command executed by the gaining registrar.
The SRS automatically
will renew domain names as their current registration periods expire. Following
an auto-renewal, a Registrar has a 45-day grace period to delete the domain
name. Any names not deleted during the 45-day grace period will be included on
the auto-renewal invoice.
The system will be able to produce a variety of reports to
help monitor and analyze the type of operations performed on the system. These
reports are summarized in the following table:
Table
D15.2-2 Registrar Reports Summary
Generation
Date |
Type/Description |
Audience |
How
Available |
Daily |
Describe
registrar transactions pertaining to that particular registrar |
Registrar-specific
report to that registrar |
Registrar
tool or FTP site |
Transfer |
Describe
domain transfers pertaining to that particular registrar |
Registrar-specific
report to that registrar |
Registrar
tool or FTP site |
Common |
Each row
contains a full Registrar description. |
ICANN,
Third-Party Escrow Company |
FTP Site |
Weekly |
Total domain
name count, total name server count, total domains hosted by name server
count |
Registrar-specific
report to that registrar |
Registrar
tool or FTP site |
D15.2.3.6
Registrar Add/Delete/Modify Procedures
Adds, changes, and
modifications to the domain name records are performed by the registrars
through RRP. During the certification process the Registrars are instructed on
how to process new registrations and make changes to existing records.
Refer to Section D15.2.14.5 for a complete description of the
Registrar Tool that the registrars use to interact with the backend registry
Procedures
for changes, editing by registrars, updates. Address frequency, security,
process, interface, user authentication, logging, data back-up.
D15.2.4.1 Registrar Manipulation
of Zone Data
Registrars can access their
domain data via three methods (presented in order of automation):
1.
RRP protocol as
specified in the Informational RFC 2832.
3.
Contacting the Registry Customer
Service Representative who uses the Customer Service Tool web based interface
to access and manipulate domain and registrar data directly for unusual scenarios.
D15.2.4.2
Zone File Generation Process Overview
Custom applications have been developed to securely and
accurately extract domain registration data from the registry database to
construct the appropriate zone files. The overall process is as follows:
1.
A database “snapshot”
is prepared
2.
Custom applications
are launched to extract data from the database and format the data into zone
files
3.
Validation checks are
performed on the static zone files
4.
Zone files are loaded
on production-like servers and dynamic checks are performed against the server
5.
Validated zone files
are moved to the zone distribution process
D15.2.4.3
Validation
After the zone files
are created, a number of checks are performed against the files to ensure they
contain valid data in the proper format. Serial numbers, data values, and file
size checks are performed on the resultant static zone files.
The zone files are
then copied to a name server (to simulate the distribution process) and loaded
to verify the named application loads properly. After the process is started,
the name server-logging file is reviewed to verify that no error messages
resulted. Once the name server is operational, the following the serial numbers
are verified again and sample queries are run against the database.
D15.2.4.4
Frequency
Zone files are
generated at a minimum twice daily at 12-hour intervals. The database is
constantly being updated but the zone files are generated from a point-in-time
version of the database to avoid corruption of previously extracted data.
The RRP Application Gateway (RRPAG) is
a gateway to the RRP Application Server (RRPAS) from the outside world. The
Application Server runs behind the firewall, whereas the Gateway runs on a
machine that is visible to the outside world and listens on a well-known port.
Registrars connect to RRPAG using SSLv3.
The primary purpose of the Gateway is
to provide transport layer security using SSLv3. The initial connection to the RRPAG is authenticated by RRPAG
based on the X.509 certificate that it presents at the time of the connection.
After a successful SSL handshake, the Gateway opens a dedicated connection with
the Application Server for the connecting entity.
RRPAG is connecting to the outside
world, so is vulnerable to be attacked. If RRPAGs are suffered by DoS (Denial
of Service) or DDoS (Distributed DOS) attacks, any connection will not be
possible through RRP. To prevent the system to be hacked, Intrusion Detection
System (IDS) must be introduced to find invasion and to take actions to it. KDDISOL
proposes to utilize a sophisticated IDS named EMERALD developed by SRI
International. EMERALD should be applied to all servers outside the firewall
such as DNS, FTP, WWW servers as well as to servers in the internal network.
See more details in D15.2.9.1.
The database and zone generation and
validation process is conducted on the registry internal network and systems
protected by firewalls that restrict access to the network. A File Replication Tool that allows files to
be copied via encrypted channels between hosts controls file replication
between the systems behind the firewalls.
Access to the systems is limited to a
“need-to-know” basis. Physical data center access is limited to selected
Registry engineering and operations staffs. System logon IDs and passwords are
provided only to technical staff in Operations who are involved in the zone
generation and distribution processes, and secure shell (SSH) is used for all
logins. User logins are monitored and
logged for audit purposes and to recreate any sequence of events if a failure
occurs.
The zone generation
process is done via custom interactive applications that are controlled by
operations personnel. Some applications are automated but manual checks are
performed at many points in the process to ensure proper construction of the
zone files before they proceed to the distribution process.
All production
registry systems require the use of SSH with public/private keys and encryption
for interactive login sessions.
All transactions that
impact the zone files are captured in activity and status log files using
standard (e.g. Syslog) and custom-built logging utilities.
Processing logs will
be created to capture processing statistics, such as number of records
processed, passed, or failed, for each audit rule. The format of the logs will comply with the monitoring tool
requirements so that the monitoring tool can be used to monitor the
processing.
The Customer Service
Representative (CSR) and Registrar tools use the registration system’s
configuration-driven logging system. The developer and operator can specify how
to log messages, given their origin, type, and severity. The log message
provides valuable information to pinpoint when the event occurs and for what
reason.
The
EMC Data Manager (EDM) Symmetrix Timefinder Replication tool is used by the
Registry to perform backups of the systems and databases. Timefinder is a
utility that allows one to make exact physical copies of Symmetrix disk
volumes, on a second set of Symmetrix disks called Business Continuance Volumes
(BCVs). The BCVs can then be mounted on a server, producing an exact physical
copy of the original disks. Timefinder can be integrated with Oracle's online
backup procedure to allow the replication of a database instance, as well as
greatly enhance the speed and functionality, of database backup and recovery.
The copied data is then backed up to tape, by an EDM network backup.
D15.2.5 Zone File Distribution
and Publication
Locations of nameservers, procedures for and means of
distributing zone files to them.
D15.2.5.1
Name Server Location
TLD name servers will be located in diverse geographic
locations and on diverse Internet Service Provider (ISP) networks. The select
TLD server sites will all be housed within leading Internet collocation centers
located at or near major centers of peering among Internet backbone providers.
Each of these sites will be chosen using a rigorous set of requirements
covering network, security, power, fire suppression, and other key factors. In
terms of network availability, the following requirements are met by all of the
sites:
·
Diverse
Internet connectivity – minimum of two diverse circuits,
·
Extensive
public and private peering – number and quality of peering
and transit relationships in force at each of the proposed facilities,(At
Otemachi, Tokyo, 2.4Gbps peering connections to over 50 ISPs.)
·
Fully
redundant routing and switching infrastructure –
each facility network follows accepted best practices for high-availability
including the use of multiple ingress/egress routers, dynamic routing protocols
(BGP and OSPF), redundant layer 2 switching infrastructure, and HSRP (or VRRP)
for default router redundancy, and
·
Facilities
– secure facilities with n+1 power and cooling capabilities
·
On
site support – each facility operator has a 24x7x365
NOC with on-site “hands and eyes” support.
D15.2.5.2
Distribution Procedures
Zone files are
distributed by a completely separate infrastructure than the zone generation
process so the two processes do not impact one another. Once the extraction
process generates zone files, they are transferred to dedicated machines for
preparation and distribution to TLD servers.
TLD zones will be
distributed on a separate infrastructure from the .com, .net & .org
infrastructure to avoid interruption of service. The Service Level is designed
to be comparable.
Distribution of zone
files is performed by the rsync application over an encrypted channel using SSH
and an encrypted private VPN to all TLD servers. Distribution via this method
uses compression and a Unix “diff” type file to decrease transfer time, and
uses MD5 to verify the integrity of the file received after the transfer
process. Multiple instances of the process will be started to update all TLD
servers within a narrow time interval. Name servers are restarted at staggered
intervals to avoid disrupting DNS service and to also ensure the proper
operation of name servers with the new zone files.
The distribution
procedure will be semi-automated and closely controlled and monitored by
operations personnel. NOC personnel
monitor the distribution process from start to finish and can intercede at any
time should a situation require the interruption of the process.
Operations personnel use an MD5 checksum application on the
final TLD zone file to verify its integrity with the reference zone file. One
the zones are verified, the name server will be restarted. Operations personnel
will monitor the name server error log files during application restart to
verify the error free loading of new zone files. Dynamic queries will then run against the name server to verify
proper operation and accurate responses.
Technical
characteristics, system security, accessibility.
Finance reports are used for financial
analysis of KDDISOL’s Internet domain name registration business and for
billing purposes. These reports facilitate KDDISOL’s invoice preparation and
distribution processes and aid registrars in invoice reconciliation. Finance reports are available to Registry
staff through the Registrar tool of the Shared Registration System (SRS) and
the reporting server FTP site.
Detail and summary reports are produced on a monthly basis
for billing. Only summary reports are generated for revenue analysis and made
available internally to the Finance department. Detailed reports with domain
names that meet specified criteria for registration renewals, transfers, and
deletions are distributed to each registrar.
Registrar Tool.
A Registrar will be able to check its available credit using the Registrar Tool
on the Registry’s web site.
Low Balance Emails.
Prior to beginning registrations, each registrar selects a “Low Balance
Notification Percentage” value. The Low Balance Notification Percentage
indicates at what point a registrar wishes to be notified of a low account
balance. When a registrar’s available
credit is equal to or less than its Low Balance Notification Percentage times
its total credit limit, the system sends automated email notifications to the
Registrar’s routine email address. Emails are generated at 7:00 AM and 7:00 PM JST(Japan
Standard Time).
D15.2.6.2
Technical Characteristics
The Registry provides billing reports to their Registrar
customers that will allow them to review and reconcile their accounts. These
reports are generated automatically and made available through a secure web
site or from a secure FTP server. The Registry also uses these reports to
prepare monthly invoices, which are currently manually prepared and submitted.
No changes will be made to the SRS for billing at this time until volumes
increase to a point where manual processes are inadequate.
Table D15.2-3 Billing Report Summary
Generation
Date |
Type/Description |
Audience |
How
Available |
Weekly |
Summary
revenue. Subtotals by registrar within each report |
Finance |
E-mail
distribution |
1st of month |
Summary
revenue |
Finance |
E-mail
distribution |
7th of month |
Summary
billing & revenue |
Finance |
E-mail
distribution |
7th of month |
Detail
reports for registrations, transfers, extensions, and refund/no refund
deletions for each transaction type; registrar-specific |
Registrars
(registrar-specific info only) |
Registrar
tool and FTP site |
17th of month |
Auto-renewal |
Finance |
E-mail
distribution |
Examples
of the reports to be generated for the registrars are as follows:
Table
D15-2.4 Billing Report Examples
Monthly Billing
Reports (Detailed and Summary as currently in SRS)
·
Monthly Registration Report
·
Monthly Transfer Report
·
Monthly Auto-renewal Report
·
Monthly Additional Years Added Report
·
New Registration Deletion Report
(Refund and Non-Refund)
·
Auto-Renewal Deletion Report (Refund
and Non-Refund)
·
Additional Years Deletion Report (Refund
and Non-Refund)
·
Transfer Deletion Report (Refund and
Non-Refund)
Revenue Reports
(Monthly and weekly as currently in SRS)
·
Registration Report
·
Transfer Report
·
Auto-Renewal Report
·
Additional Years Added Report
·
Auto-renewal Report
D15.2.6.3
Accessibility and Security
There are two ways to
access the registrar billing reports: through the Registrar Tool using a
browser, and by logging on to a secure FTP site and downloading the
reports. IP filtering based on source
address restricts access to the FTP server to accredited registrars, and all
logon attempts are logged and periodically checked.
Denial of Service (DoS) attacks occur
when one or more systems flood a network or individual services on that network
with disruptive traffic. These attacks may come from many source addresses–a
so-called distributed DoS (DDoS) attack–or from a single address. In either
case, recovery options are limited and involve quenching the source of the
attack either by filtering traffic at network routers or tracing the attack
back to the origin and taking the originating server(s) off the network.
Therefore, it is very crucial for us to
find DoS and DDoS as soon as possible. Automatic detection mechanism should be
required. KDDISOL will establish IDS (Intrusion Detection System) called
EMERALD which adopts excellent intrusion detection technique.
All logon access to the registrar billing information is
limited to specific points of contact at the registrars, who are provided
unique IDs and passwords. Any changes to registrar contacts must be authorized
and authenticated through Customer Support.
D15.2.7
Data Escrow and Backup
Frequency and procedures for backup of data. Describe
hardware and systems used, data format, identity of escrow agents, procedures
for retrieval of data/rebuild of database, etc.
The
goal of the Escrow Process is to periodically encapsulate all
Registrar-specific information into a single Escrow File and to make this file
available to a third party for escrow storage.
Existing
Daily and Weekly reports as well as a new Registrars Report will be used to
construct the Escrow File because these reports, when taken together, describe
completely the entire set of Registrars.
The
Escrow Process employs a method of encapsulation whereby the Daily, Weekly, and
Registrar reports are concatenated, compressed, signed, and digested into a
single file. The format of this encapsulation enables the single file to be
verified for Completeness, Correctness, and Integrity by a third party.
Steps of the escrow process
require that a format file be created for each report file. A “tar” utility is
used to concatenate the files into a single data file, which is then
compressed. For authentication, a digital signature is applied to the data
file. A “checksum” algorithm is then used to check the data value and create a
message digest for the digitally signed file. The message file is then
concatenated to the data file to create a single file suitable for escrow.
D15.2.7.3
Data Verification
The
verification process uses layers of meta-data encapsulated in the escrow file
to construct a verification report, which indicates whether an escrow file
meets the above authentication requirements.
D15.2.7.4 Data Format
Standard UNIX utilities are used
to concatenate and compress the files into a single file for more efficient
storage and recovery.
D15.2.7.5
Restoration Process from Escrow Data
If file recovery from the escrow
data is required, the tapes are retrieved from the offsite storage facility and
the escrow steps reversed to uncompress and recover the files.
D15.2.7.6
Backup Procedures
The domain name database is
backed up fully on a daily basis.
D15.2.7.7
Backup Hardware and Software
KDDISOL will use EMC and Storage Tek hardware and Veritas
software for backing up the files for escrow.
D15.2.7.8
Escrow Agent Identity
KDDISOL will employ the reliable agent in
phase 2.
If escrow data are needed, KDDISOL’s offsite storage is
contacted and the appropriate tape or tapes are couriered back to the KDDISOL.
D15.2.8
Publicity accessible lookup/Whois Service
Address
software and hardware, connection speed, search capabilities, coordination with
other Whois systems, etc.
D15.2.8.1
Hardware and software
The Whois daemon
will run on multiple servers that are scalable with more memory, CPUs and disk
space as needed. These servers are actively/dynamically load balanced to
provide optimum response time and reliability. Each server accepts connections
from a variety of clients, and accesses a local copy of the Whois data files.
This architecture is scalable as query traffic increases by adding additional
servers and/or increasing the capacity of the existing servers.
The Whois service is implemented via two major software
components:
1. Data extraction and format applications
2. Whois
server daemon
The Whois data
extraction applications generate the Whois data files and indexes from a static
read-only portion of the Registry database. These applications will run on
servers located on the internal network of the registry and cannot be accessed
by the Internet population.
The formatted Whois
data files are then transported to the Whois server machines. All Whois servers
have the same data and will be actively load balanced. These Whois servers
handle Internet users queries directly after passing through site load
balancing equipment.
The Whois daemon runs on each of several
servers, accepts connections from a variety of clients, and accesses a local
copy of the generated files. The daemon is configured using configuration file
that may be edited, then re-read on the fly. This configuration file controls
much of the dynamic behavior of the daemon, including disclaimer and other
query response output, maximum load, and speculation control. The daemon may be
configured to have different properties for each of several ports, thus
allowing users of different classes to obtain different qualities of service.
The Whois daemon gives the administrator control
as is reasonable over the number, type, and behavior of incoming sockets. This
control does not affect the rest of
the daemon architecture—e.g., logging, error-handling, searching, state
management, etc.
In the daemon, two fundamental objects must be configured: sockets
and behaviors. Customizing these
objects enable the Registry to tune the operation of the server to provide
almost any level of service required.
D15.2.8.2
Network Connectivity
Whois servers will be located in a segmented LAN
configuration to segregate them from other internal Registry functions for
performance and security reasons. The Whois service is supported by the same
Internet connectivity that supports the Registrar-to-Registry interaction.
Multiple connections to multiple ISPs provide the capacity and redundancy
required for high availability Whois services. See Section D15.2.10.3 for more
network connectivity details.
The Whois
implementation will use the standard Whois server application used by the
Internet population. This application can be used to look up records in the
registry database (via the Whois data files) to provide information about
domains, nameservers, and registrars. Searches for text strings embedded in
domain information fields will be searchable as is limited by current standard
Whois server implementations.
In the future, KDDISOL will require registrars to include
new information such as more detailed company profile in records of registrar
database when making contract with registrars. Also, KDDISOL will develop a new
Whois client software which have a upward compatibility to existing software.
And then registrants enjoy having more beneficial information from the
registrar database. Stability of the Internet must be maintained because no modification
is required in the registry database.
D15.2.8.4
Coordination with Other Whois systems
An implementation of Referral Whois (Rwhois) can be
implemented in a controlled, test bed fashion if interaction of other
Registrars/Registries Whois services is required. This service is not currently supported at the Registry.
Technical
and physical capabilities and procedures to prevent system hacks, break-ins,
data tampering, and other disruptions to operations. Physical security.
D15.2.9.1
Registry System/Network Security
The Registry will be connected to the
Internet via two border routers and multiple DS3 connections for diversity.
Border routers will use Access Control Lists (ACLs) to control access from the
Internet. RRP Application Gateway, Whois, and web servers will reside behind
the border routers but outside the firewalls, and have access to them
controlled by destination IP address and port number. Access to the application
Gateway is also filtered by source address block, ensuring that no one other
than the accredited registrars will gain access. One of the TLD servers will
also reside on this network and be accessible from the Internet to answer
queries.
The Application Gateway servers will be
configured with internal and external interfaces, each assigned to a different
subnet. External interfaces will receive queries and registration requests from
the Internet, whereas the internal interface will be used for communicating to
the application and database servers. Acting as a proxy, the Application
Gateway will accept and pass query requests and registration information
through the firewall to the application server, thereby eliminating direct
registrar access to the backend servers. This approach provides superior security
from hackers or other Internet based threats.
Firewalls will be used to secure the
internal network and the application and database servers. The firewall will be
configured with rules to allow only data traffic between the Application
Gateway on the external network and the application and the database servers on
the internal network. Additional rules will allow the Registry’s internal
management systems to access the servers for monitoring purposes and to refresh
files as necessary.
Changes to the ACLs and firewall rules
are tightly managed by Operations, who use structured change management
techniques to oversee changes when registrars are added or deleted, or other
changes are made. The Registry utilizes security scanning software to
constantly monitor its network for security leaks, and has contracted with an
outside firm to run “friendly” scans against the network at least twice a
year. Results of the scan are promptly
reported to Registry Operations.
Even with these standard
securities, a determined attacker will bypass the defenses. Thus, the perimeter security is decreasing,
as the registry businesses open up the networks to registrars, internet users. This need to secure core assets makes an
Intrusion Detection System (IDS) introduced to monitor and respond to misuse. In Figure D15.2-3, as the Registry IDS, we
chose EMERALD, developed by SRI International is a comprehensive
highly-scalable open system, and was reported by MIT Lincoln Laboratory on 13
Dec. 1999 as having the highest overall
performance in their intrusion detection evaluation program.
Figure D15.2-3 Registry Business Requires Distributed Monitoring
EMERALD, consisting monitors
of host computers and network traffic, rule-based and statistical analysis
engines, has a three tier system architecture which separates data collection
from analysis and reporting. This highly
customizable platform allows rapid addition of new monitors, analysis engines,
correlators, and reports for new intrusion threats during the course of
Registry business. At rule-based intrusion detection, a
stream of events are mapped against an abstract representation (i.e. “rules”)
of target activity, and an expert system characterizes known attacks and
vulnerabilities. Analysis engine sends
alarm when matches are identified.
Since the Internet technology is rapidly evolving, many new attacks have
been created thus far. IDSs other than
EMERALD are weak for these new or unknown attacks, and cannot be used for the
Registry service. The EMERALD’s
advantage is that it has a solution called Statistical Anomaly Detection for
these unknown attacks, which, as shown in Figure D15.2-4, builds profile of ‘normal’ activity (individual
user profiling , client-server session analysis, network traffic profiling), compares
short and long term activity patterns, and raises alarm when use departs from
established patterns.
Figure D15.2-4 Statistical Anomaly Detection Required for
New Hackers
Security Breach Recovery
A
security breach occurs when one or more systems are accessed (and potentially
modified) by unauthorized personnel. Often such breaches occur via a network
connection. Recovery from security breaches is straightforward, but is often
consuming, and potentially disruptive to the services hosted on the affected
systems. Certain security breaches may disable a service, for example
Registration, for the duration of the recovery and cleanup activities.
Following is a summary of the steps involved in recovering from a security
breach:
1. Using IDS (Intrusion Detection System) and its logging data,
identify affected systems and remove them from the network to prevent further
damage
Physical security for the Registry is of paramount
importance based on the value of the services provided to the Internet
community. In this regard, the following precautions will be enabled:
Base Building
Physical Security
D15.2.9.3
Others
The Registry must be
operated according to well-documented principles for information and physical
security, implemented by adequately trained personnel. It will be a paramount
target of sophisticated hackers worldwide, motivated by curiosity, malice, or
greed. It therefore must incorporate the most robust information assurance
technology to protect the database and other servers from corruption, preclude
theft of private information by unauthorized third parties, and resist external
denial-of-service attacks.
The physical Registry
system must be secured against intrusion and protected against normal vicissitudes
of operation that might compromise operational security. Shared Registration System (SRS) is the
Registry architecture and processes used to enable registrations by multiple
registrars. It includes the Registry
Registrar Protocol (RRP), which is used to support communications between the
Registry and Registrars, and provides the security and authentication functions
to protect the Registry database while supporting all necessary registrar
operations. RRP is also used during the
certification process for accredited registrars for operational testing and
evaluation of registrar implementations of the RRP prior to commencement of
actual registrar operations. KDDISOL
will be responsible for providing the RRP software interfaces, documentation,
and training to accredited registrars for the new TLD. Hands-on technical support to new registrars
should be available from KDDISOL to assist them in resolving difficulties in
successful interfacing with the Registry.
Personnel responsible
for software implementation and hardware operation must be screened carefully
to eliminate potential internal security risks. KDDISOL Registry provides a
web-based maintenance tool, Registry CSR Tool, for domain updates and
administrative functions. This site is
password protected to maintain security for individual Registrar information. Through this site, the Registrar will have
access to daily and weekly reports, billing information, and the ability to
update administrative and domain information.
To assist in providing quality support, KDDISOL CS will have access to
view and update individual Registrar administrative and billing information
through this web-based tool. KDDISOL CS
will also have the ability to update domain information for the Registrar and
view Registrar reports.
In addition, to ensure the sanctity of remote distribution
of the Registry products, The Registry must have 100 percent control of the
remote distribution services, that is, the TLD servers.
Technical
capability for handling a larger-than-projected demand for registration or
load. Effects on load on servers, databases, back-up systems, support systems,
escrow systems, maintenance, personnel.
D15.2.10.1 Average System Capacities
D15.2.10.2 Peak System Capacities
Peak system
capacities are dependent on equipment configurations. KDDISOL is designing the
new TLD registry infrastructure to accommodate numbers and growth rates similar
to .com. Effective June 2000 the subcontractor was processing over 20 million
transactions a day and had over 19 million domain names. Individual system capacities, containing
escrow and backup system, are scalable as needs required, but in addition, the
registry systems are designed to be expanded by adding additional systems and
load balancing between the systems. By expanding horizontally with additional
systems as well as vertically with additional processors, memory and disk
space, there is huge growth potential. The Oracle database supports
significantly more records than required even for the largest domain.
D15.2.10.3 Network Capacities
Phase 1
In Phase 1, KDDISOL
will subcontract with the VeriSign Global Registry. The subcontractor designed
and constructed its registry network to deliver exceptional availability,
performance, scalability, security, and maintainability. In terms of bandwidth
and connectivity the registry supports four DS3 connections to the Internet
from four different major ISPs. The border routers pass up to 1 million packets
per second to and from the Internet. KDDISOL and the subcontractor monitor the
circuits constantly for utilization and upgrades the circuits when they reach
50% average utilization.
Phase 2
KDDISOL designs and
constructs its network to deliver exceptional availability, performance,
scalability, security, and maintainability by connecting to KDDI’s public
Internet service at sufficient bandwidth. In terms of bandwidth and
connectivity at the main location (Otemachi, Tokyo) of registry, KDD supports about 2.4Gbps IX and direct
peering to over 50 ISPs in Japan and have approximately 1.7Gbps connections to
US and 330Mbps to Asia. Each border routers pass at least 1 million packets per
second to and from the Internet. KDDISOL and KDD monitor the circuits
constantly for utilization and upgrades the circuits when they reach 50%
average utilization.
Future upgrades to the registry production network will
include increasing the size of the circuits to the Internet and replacing fast
Ethernet links with gigabit Ethernet links.
The
TLD configurations are also designed to scale in the same manner as the size of
the zones and the number of queries increase.
D15.2.10.5 Personnel
KDDISOL operates on a 24x7x365 basis with a full complement
of support staff for supporting the registry, back office, and TLD
infrastructures. In critical situations, all the technical staff can be
contacted via pagers or cell phones. Sufficient personnel are available to
monitor and maintain current systems, troubleshoot, and develop additional
features to the registry infrastructure. Each server’s location is a major
technology center of each area with access to a deep pool of engineering and
operations talent.
Define, analyze, and quantify quality
of service.
D15.2.11.1 System Reliability,
Availability, Serviceability
The Registry system
is designed to be highly reliable with State-of-the-practice architectural
elements and operational procedures applied throughout. Using elements such as
component redundancy, load balancing, high-availability (HA) configurations,
hot spares, aggressive vendor maintenance contracts, and optionally, multi-site
operations, the Registry will be able to ensure the uninterrupted availability
of Registry services. The Registry will be designed to meet the following
goals:
·
Provide uninterrupted service redundancy to mitigate the risk of most
system failures
In
addition to the core Registry infrastructure, the TLD name servers are to be distributed
in multiple locations throughout the world. Although each TLD site depends on
the facility where it resides, the TLD system, as a whole, will not depend on
the Registry site except for updated zone files. Even with a loss of the
Registry, the global TLD servers will continue to provide basic Domain Name
Resolution Service within current zones.
The
Registry will use the Business Continuity Volume (BCV) software feature of the
EMC Symmetric Array to periodically perform backups, Ad-Hoc and regularly
scheduled reporting, and corruption detection. Backups and restores are
performed using the EMC EDM backup product providing complete images of the
Oracle database are posted to tape on a daily basis. Both ad-hoc and regularly
scheduled reports are constructed from a physically separate reporting server
connected to the Symmetrix array using BCV technology for the daily Oracle
database image. Exhaustive Oracle block level corruption detection and
application-level data scrubbing are performed on the BCV image so operations
personnel can detect corruption, determine actionable root cause of failure,
and implement solution alternatives early in the process. Both the primary and
secondary sites have equal and compatible backup and restore technology.
The Registry will provide a variety of tools to support the
system. For problems that occur within the normal operation of the system
(e.g., Customer Service requests), a web-based tool is available that allows
for a variety of domain operations to be performed. For troubleshooting of
system problems, a Registry Diagnostic Tool will be used which interrogates
each of the system components to verify their proper functioning. This
includes:
D15.2.11.4
Processes and Procedures
KDDISOL will document
and use standard operating procedures (SOPs) in running the registry. Each step
in the process of registering domain names, generating zone files, distributing
zone files, and maintaining the backend infrastructure will be tested in an
isolated Quality Assurance(QA) environment before being released. The QA
environment will be designed to closely emulate the operational environment,
and QA Engineers stress test hardware, software, and processes and procedures
to ensure they will integrate cleanly and not be the cause of an interruption
of service. The results of the tests are thoroughly documented and test results
are reported back to Engineering and Operations. This process is a closed loop
process; any problems encountered during testing are fed back through the
process, corrected, and retested.
For the most part,
registry processes will be automated. Where operations intervention is
required, there will be strict guidelines and checklists to ensure that all
steps process correctly. The RCC monitors all the processes on a 24x7x365
basis. When a problem occurs, the RCC staff follows pre-defined procedures to
identify and resolve the problem. If
the problem cannot be quickly resolved, there will be an aggressive escalation
path to quickly involve the appropriate technical management and staff.
Registrars will be required to be accredited by ICANN. Once
accredited, they must pass certification by the KDDISOL to begin registering
domain names. This process is an essential ingredient ensuring that registrars
will not face complications when beginning to register domain names in
production mode. To assist when needed, there will be CSR’s available on a
24x7x365 basis to answer questions and provide transactional assistance when
required.
We will use change
management systems and processes in both Engineering and Operations departments
to keep the KDDISOL’s Registry Systems in operation. This includes periodic
planned outages to perform maintenance on the registry systems. As indicated
above, integrating changes into the registry requires passing a rigorous
testing and evaluation stage before being allowed.
KDDISOL will employ technical project managers to plan and
track execution of changes made to the Registry. They will conduct a risk
analysis of any proposed change, and ensure that all affected parties are
involved in any change.
D15.2.11.6
Service Level Agreement (SLA) Summary
The Registry will provide
a world-class level of service to its customers. A Service Level Agreement will
be used to provide metrics and remedies to measure performance of the Registry
and to provide accredited and licensed Registrars with credits for certain
substandard performance by the Registry coupled with a yet to be defined
Registrar License and Agreement.
Shared Registration
System ("SRS") Availability shall mean when the SRS is operational.
By definition, this does not include Planned Outages or Extended Planned
Outages. Planned outage shall mean the periodic pre-announced occurrences when
the SRS will be taken out of service for maintenance or care. The Registry will
achieve 99.4% or better availability for the SRS system.
Unplanned outages are
generally defined as the amount of time recorded between a trouble ticket first
being opened by the Registry in response to a Registrar’s claim of SRS
unavailability for that Registrar through the time when the Registrar and
Registry agree the SRS Unavailability has been resolved with a final fix or a
temporary work around, and the trouble ticket has been closed. Unplanned
outages are also defined as any time that exceeds the planned outage time or
the planned outage time interval.
SRS Unavailability
shall mean when, as a result of a failure of systems within the Registry’s
control, the Registrar is unable to either:
a) Establish a session with the SRS gateway which shall be
defined as:
b) Execute a 3 second
average round trip for 95% of the RRP check domain commands and/or less than 5
second average round trip for 95% of the RRP add domain commands, from the SRS
Gateway, through the SRS system, back to the SRS Gateway as measured during
each monthly Timeframe.
The Whois service
will be updated once a day and availability will be equal or better than that
defined for the SRS system.
TLD servers will be
updated a minimum of once a day and the collection of servers as a whole will
provide 100% query service availability to the Internet population. The TLDs
geographic and network diversity ensures that multiple servers will be
operating at any given time.
If any service levels
are not met during a defined interval (e.g. Month), a credit based on the
volume of add domain transactions will be given to the affected registrar(s).
The maximum credit provided will be limited to 5% or 10% depending on the
metric that was exceeded or not met.
A
specific SLA agreement will be negotiated after contract award.
D15.2.12
System Outage Prevention
Procedures for problem detection, redundancy of all systems,
back up power supply, facility security, technical security, availability of
back up software, operating system, and hardware, system monitoring, technical
maintenance staff, server locations.
Although
high-availability features will be designed into all the registry systems and
services, efforts will be concentrated on make core services “bullet-proof”.
These core services include those that are required for the smooth operation of
the Internet and are immediately evident to the Internet community in the event
of a failure. These core services include:
Other services that are important to the operation of the
Registry, but whose failure or degradation would not affect operation of the
Internet include:
The Registry intends
to use employ IBM and Sun UNIX systems in high-availability configurations to
ensure no single point of failure. In addition, we expect to use offsite tape
storage and an offsite disaster recovery facility that will be constantly
updated with current information. Such a site would be utilized during full
outage and some partial outage scenarios. See Section D15.2.1 for more system
information, and Section D15.2.13 for more fail over information.
Note: Not all
registry services include secondary facility support.
D15.2.12.2
TLD Systems and Constellation
The TLD
configurations are designed so there are no single points of failure. This is
accomplished through the use of redundant components, both at the system and
component level. For example, multiple switches and load balancing devices will
back one another up in the event one fails, and the devices will be configured
with dual power supplies when available. Configurations are designed so that
when a failure is detected, the service will fail over to the backup systems.
High-availability operational procedures as established in RFC 2870, “Root Name
Server Operational Requirements”, will be used as guidelines for building and
maintaining the name servers.
There will initially be three
geographically distributed TLD name servers to support the new TLD in phase 2. Then
it will be placed maximum six servers in next five years and total number will
become nine. These name servers will be strategically placed at topological
cores of the Internet; those areas that serve the greatest number of hosts and
users. As well as topological, there will be geographic diversity to ensure
that manmade or natural disasters in a single region will not affect the
ability to answer queries by the remaining servers. It is anticipated that the
name servers will be placed in the following locations:
1. Tokyo, Japan
TLD query rates will be constantly
monitored, and the TLD name servers re-deployed as necessary to best serve the
needs of the Internet users of the new TLD.
The
DNS software is also designed to handle a failure of one or more name servers,
so a failure of one or more servers in the constellation will not materially
affect TLD resolution services.
D15.2.12.3
Network Architecture
The network infrastructure is designed
with redundant devices, multiple physical routes and physical diversity. The
objective is to isolate single-point failures with no interruption of services
or degradation in performance. In most cases, isolation of failures is
automatic and occurs within a few seconds of the event. It would take a minimum
of two simultaneous network-component failures to disable the network
infrastructure. Certain component failures (such as firewall failure) may
require manual intervention to complete the fail-over.
Internet connectivity is enabled
through KDD’s own backbone as well as through peering and transit relationships
with multiple ISPs. A failure of the KDD backbone or another ISP’s network will
not disable access by registrars.
The
KDDISOL will utilize a range of standard and custom enterprise systems
management tools to monitor and manage the registry production systems and the
globally dispersed TLD constellation. These tools are used both by the Network
Operations Center and the Registry Operations staff for system and network
monitoring. A brief description of each tool and its use is outlined below.
WebNM is an
SNMP-based monitoring is tool used to monitor system attributes such as:
Tool features will include monitoring
real-time system availability for servers and network devices, an interactive
web interface, and graphical displays of historical performance data.
Thresholds can be set from which alarms are generated and forwarded to the RCC.
Concorde
SystemEdge is an agent based monitoring tool that uses SNMP to monitor system
specific attributes, including:
This tool features will also
include an integrated alert manager, an interactive web interface, system
self-monitoring, and logfile monitoring. Thresholds can be set from which
alarms are generated and forwarded to the RCC.
A
DNS Remote Real Time Monitor will be used to monitor the real-time traffic flow
of root and TLD DNS servers. It monitors the following attributes:
§
Response time of last
DNS query
§
Real-world query to
server and compare to expected result
EMERALD
is an Intrusion Monitor to show the real-time results (locations of invasion,
ways to be attacked, internal network traffic trends, etc.) of IDS operations
over the target Registry system.
TeamQuest
is a performance analysis, diagnostic, management and modeling product
suite. It incorporates highly detailed
operating system statistics, process accounting, custom data, and RDBMS
performance data, including:
§
Identification
of server problems
§
Drill-down
investigation of events, alarms, and unusual system behavior
§
Root
cause analysis of system performance issues
§
Trend
analysis
§
Correlation
of cause and effect
§
Compliance
with service level objectives
§
Understanding
the impact of substantial changes or new applications
§
Modeling (Analytical Queuing Analysis or Discrete
Event Simulation)
When problems are either reported to or observed by the NOC,
the NOC staff will open a trouble ticket and perform preliminary analysis to
determine the severity, diagnose the root cause and correct the problem if
possible. Problems are assigned one of the following categories:
· Severity 1 – service outage; severe or potentially severe
impact
· Severity 2 – service degradation; impact is not severe
· Severity 3 – component outage; redundant components or
workarounds prevent any service impact.
If the remote Registry NOC cannot resolve the problem, it
will immediately escalate through the KDDISOL NOC to either the on-call System
Administrator (SA) or on-call DNS engineer in KDDISOL Technical Operations
(depending on the nature of the problem). In the unlikely event that the
problem cannot be resolved at this level, the problem is escalated to KDDISOL
Engineering. A workaround may be provided until the issue is resolved. The
KDDISOL NOC will maintain update the remote Registry via phone or email on a
periodic basis until the problem is resolved.
Monitoring of the remote Registry will also be conducted
from the KDDISOL NOC. Any detected problems at the NOC will be communicated to
the new Registry NOC for resolution. If the problem cannot be resolved locally,
the problem will be escalated through the NOC as described above.
Phase 1: VeriSign Global Registry
Production Data Center
The VeriSign Global
Registry production data center is
protected by onsite security staff 24x7x365 and the use of card readers. Only employees are permitted unescorted
access to the building. Additionally, the data center room is further
restricted (via card readers) to only those employees who perform hardware
installations or maintenance. Between the hours of 7pm and 7am all card access
is disabled, and anyone requiring access to the data center must obtain a
special entry badge from the Network Operations Center.
Phase2: KDD Otemachi Data Center
KDD Otemachi data
center is protected by onsite security staff 24x7x365 and the use of card
readers. Only employees are permitted unescorted access to the building.
Additionally, the data center room is further restricted (via card readers) to
only those employees who perform hardware installations or maintenance.
Remote Sites
All remote sites
provide 24x7x365 onsite security that meets or exceeds the security at KDDISOL.
KDDISOL equipment is contained in locked cabinets and, in some cases, locked
cages. Most sites also provide separate data center rooms with limited access
to each room.
Please refer to Section D15.2.11
VeriSign Global Registry is located in a new
state-of-the-art facility in Dulles, Virginia.
The 10,600 square foot data center will house primary Registry systems
and personnel engaged in Phase1 activities.
Please refer to Section D15.2.1.7 for more primary site details.
The secondary data
center is located at a facility in suburban Maryland that provides secondary
site support services. There are
multiple high-speed direct connections to this site from the VeriSign Global
Registry Production Data Center to facilitate backup and fail-over
scenarios. The facility is supported by
n+1 power and cooling, and is staffed 24x7x365.
KDDISOL’s Registry will be located in KDD Otemachi Building
in Otemachi, Tokyo. The 66,000 square foot data center will house primary
Registry systems and personnel. Please refer to Section D15.2.1.7 for more
primary site details.
D15.2.12.9
Natural and Man-Made Disaster Impact and Fire Suppression
VeriSign Global Registry Production
Data Center.
This data center,
located in northern Virginia is not in an earthquake zone, and therefore does
not need protection against earthquakes. It does provide protection from
flooding, but only limited protection from other natural disasters. Fire
suppression is provided by an FM200 system that is smoke activated. As a backup,
a heat-activated water sprinkler system will engage sprinkler heads
individually.
VeriSign Global Registry Secondary Data
Center
Same as above
except that protection from all natural disasters is provided in a structurally
reinforced facility.
KDD Otemachi Data
Center
This data center has a protection against earthquakes and
heat-activated non-water based system using halon gas. It does provide
protection from flooding, but only limited protection from other natural
disasters.
Remote Sites.
Some remote sites provide for earthquake “hardening”
depending on specific location. All the
sites are in data collocation centers that are designed to withstand natural
disasters endemic to the respective area. The sites all have fire suppression
systems similar to that employed in KDD data center, with a non-water based
system.
Redundant UPS
units protect the data center. Additional redundant power features include:
Heating, ventilating and cooling (HVAC) units are air
cooled, and so no cooling water pipes are located within each of the aforementioned
data centers. Additionally, the current HVAC units provide sufficient
redundancy that up to some of them could fail and the remaining units would
maintain the data center within designed tolerances.
WAN network connectivity
has been designed with physical and logical diversity as a design goal. 1st
tier Internet Service Providers have been selected to guarantee network and
routing diversity in case one or two carriers experiences problems. Physical
diversity is realized by working with the local access provider(s) to ensure
diverse physical routing of circuits was used where possible. At Tokyo,
Otemachi, main registry location of Phase 2, KDDISOL uses KDD’s public Internet
which have sufficient diversity. KDD supports about 2.4Gbps IX and direct
peering to over 50 ISPs in Japan and have approximately 1.7Gbps connections to
US and 330Mbps to Asia.
Local
Area Network diversity is enabled through diverse pathing and employing routing
and switching configurations that automatically detect failures and re-route
packets transparently. The network is designed to exclude any single point of
failure.
D15.2.12.11
Technical Maintenance Staff
Technical maintenance staffs of KDDISOL check logs from all
critical servers and routers several times a day via automatic error log
monitoring system and proactively examine symptoms of crucial failures,
intrusion to systems, and so forth. Staffs always follow the activities of
security advisory councils such as CERT (Computer Emergency Response Team), and
JPCERT. If a problem to threaten our system is announced by these bodies,
KDDISOL amends it as soon as possible.
D15.2.13
System Recovery Procedures
Procedures for restoring the system to operation in the
event of a system outage, both expected and unexpected. Identify
redundant/diverse systems for providing service in the event of an outage and
describe the process for recovery from various types of failures, the training
of technical staff who will perform these tasks, the availability and backup of
software and operating systems needed to restore the system to operation, the
availability of the hardware needed to restore and run the system, backup
electrical power systems, the projected time for restoring the system, the
procedures for testing the process of restoring the system to operation in the
event of an outage, the documentation kept on system outages and on potential
system problems that could result in outages.
As described in
System Reliability Section of this document, the Registry will employ
infrastructure and operational processes to mitigate the possibility of a
crippling failure. However, there also
are a variety of methods available to handle various system problems that might
occur.
Business continuity
and reliability are not after market products. They are designed into services
and systems from the outset. The Registry application of business continuity
design elements, coupled with rigorous test and validation procedures, ensure
that the critical services provided by the Registry, and the systems that
support them, are sufficiently robust to mitigate the risk of potential
business interruptions.
To support the scope
of this section, Registry Services are separated into Critical Services and
Non-critical Support Functions. The Registry Critical services are those
required for the smooth operation of the Internet. They include:
• Domain Name
Resolution Service
• Registration
Service
• Whois Service
• Customer Service
Critical Services are defined as those services that directly
support registrars and DNS resolution
services available to all Internet users at large. Non-critical Support
Functions are other processes for which the external impact of an outage would
be minor or nonexistent.
Two types of
failures can impact providing DNS services to the Internet at large:
1.
Zone file generation
failure
(1) Zone File Generation Failure
A full fail-over means all processes are manually shifted in
a controlled manner to operate on the
secondary site. During a full fail-over, any zone-generation processes running
at the primary site may be terminated (as necessary) to allow for the secondary
site to take over these functions. Any zone files currently under construction
are treated as unreliable and are discarded. If fail-over to the secondary site
occurs while the zone-generation process is not running, no steps are necessary
for the fail-over to occur.
A partial fail-over
means all processes are shifted to operate on the secondary site in an uncontrolled manner. During a partial
fail-over, terminating zone-generation processes running at the primary site
may or may not be necessary.
If the zone-generation process is not running at the time at
which fail-over to the secondary site occurs, no steps are necessary to
fail-over zone generation. If, however,
the fail-over occurred during zone file distribution, then the administrator
will execute procedures to initiate the file distribution process to the sites
affected.
Through the use of
the Business Rules Engine in the Registry systems, data is validated before it
is placed in the Registry database. If the data in the database has been
corrupted, then the administrator will perform database cleansing procedures.
In addition, an attempt would be made to determine if the corrupted data has
been propagated to the TLD servers. If it has, the administrator will follow
procedures for reverting the TLD servers to a previous copy of the affected
zone file(s).
Zone files are distributed within and outside the Registry
system and their contents are validated at each step. If the validation ever
disagrees with the master copy, then the replication is considered to have
failed and the flawed copies are destroyed. If a host intrusion on the zone
file tagging area or any of the root and TLD servers is detected, then the
one(s) on the affected host(s) should be compared with the master copies on the
zone generation machine inside the Registry firewall. Standard Operating
Procedures regarding the rollback of corrupt zone files on a root or TLD server
should be followed to repair the damage.
(2) TLD Server Failure
Various components at the TLD locations are configured in a
high availability configuration. Should a redundant component fail, the
“backup” component is designed take over automatically. If a specific hardware
component is not redundant, NOC personnel will work with onsite personnel to
isolate the problem. Once the failed component is found, NOC personnel will
initiate procedures to replace the defective component. Due to the “load
balancing” nature of the DNS protocol, any single TLD failure is dynamically
accommodated by standard DNS processes and a different TLD server would be
utilized.
The Registry NOC will
constantly monitor the health of the TLD constellation to maintain performance
and availability goals. Once an anomaly is detected by NOC management systems,
troubleshooting procedures will be initiated by NOC personnel to isolate the
problem. Name server log files on the TLD server and archived log files will be
reviewed to determine the nature of the problem.
Once the problem is corrected, log files are reviewed and
queries are performed against the server to verify proper operation. Depending
on the size of the zone files being used, the name server application will
resume operation within 2 to 20 minutes after a restart of the application has
been initiated.
Extensive procedures have been developed to ensure that zone
data files located on the TLD servers are error-free. Some situation may occur
where one or more zone files resident on the TLD server get corrupted
accidentally or intentionally. Once a determination is made that a current zone
file is corrupt, NOC processes will be executed to restart the name server
application using local copies of previously used zone files. The local
versions are created automatically and stored to an archive area on the local
hard disk each time a new version of the zone files are loaded.
The
registration services is primarily supported by the Shared Registration System
(SRS), which consists of a protocol and the associated hardware and software
that permits multiple registrars to provide Internet domain-name registration
services within the TLD administered by the Registry.
A number of entities interface with the SRS, primarily
registrars and Registry Customer Service Representatives (CSRs). Registrars
access SRS through the Registry-Registrar Protocol (RRP) to register domain
names and perform domain-name related functions such as the registration of
name servers, renewal of registrations, deletions, transfers, and updates to
domain names registered by that registrar. Registrars also have a web-based
interface to access SRS to perform administrative functions, generate reports,
perform global domain-name updates, and perform other self-service maintenance
functions not available via RRP. The Registry provides support to the
registrars for the SRS through the CSRs. The CSRs have a separate web-based
interface to the Registry, through which, after authenticating the registrar,
they can query and perform updates per the registrar requests.
The SRS consists of the following
components:
The majority of disasters result in
some sort of physical damage to the SRS hardware, facilities or communication
channels; however, some of these disasters are less obvious in nature. For
example, a denial of service of attack could adversely affect the performance
of gateway servers, rendering them useless for the duration of the attack. A
hacker could compromise security and subsequently jeopardize the integrity of
the SRS data. A software virus could infect one of the production servers and
adversely affect performance, or result in data corruption.
There are different levels of severity
associated with each of the potential disaster scenarios. For example, a small
flood may destroy only a small section of a data center, bringing down one set
of components in the system. On the other hand, a severe flood could damage or
destroy the entire building, resulting in a complete loss of the primary data
center. The disaster recovery process that would be followed for the former
case may differ from the process followed for the latter. After reviewing the
potential failure scenarios carefully, there were four categories of failure:
1. Full Fail over
2.
Partial Fail over
3. Non-Fail over
4. Business Reconstruction
Full fail-over
There are many types of failures that would result in a
full, fail-over from the primary site to the secondary site. For example, if
the primary site were unavailable to the registrars because of a fiber cut,
then a full fail-over would be necessary. If the primary site data center was
destroyed or rendered unserviceable as a result of a severe natural disaster
(e.g. flood, tornado, earthquake, etc.), then a full fail-over would obviously
be warranted.
Since the
other secondary site components should all be in stand-by mode, they would not
need to be reconfigured. All of the secondary site processes should be started.
The registrars should be notified of this fail-over and instructed to use the
secondary address(es) only to access the SRS.
Partial fail-over
Certain types of failures can occur which would be
considered a disaster, but would not require a full fail-over to the secondary
site. An example of this type of disaster would be some sort of primary site
Oracle HA cluster failure. The servers themselves could be physically
destroyed, or the power supply to the cluster could be interrupted
indefinitely. Whatever the reason for the failure, a partial fail-over to the
secondary would be required. A partial fail-over is when one or more components
fail over to the secondary site, but a portion of the primary site remains
operational.
Certain types of failures or disasters will not require a
fail-over to the secondary site at all. If the hardware and physical network
are still available, then it’s probable that the failure is due to user
behavior, a security breach, or a software issue of some sort. These types of
failures would most likely affect both the primary and secondary site and
should be directly rectified, if possible. For example, if performance of the
system were degraded as a result of a denial of service attack, both the
primary and secondary sites would be affected by the attack. In this situation
a full or partial fail-over to the secondary site would not make any sense.
Directory
service consists of two major components: Whois servers and the Whois data
extraction process.
The Whois
daemon runs on each of several servers, accepts connections from a variety of
clients, and accesses a local copy of the directory service database to answer
these queries. The Whois data extraction process generates the directory
service database from the Registry database.
Directory service is able to run at both the primary and
secondary sites. Whois queries are load balanced to the directory-service
servers across both sites. Also, the directory service process is run in test
mode at the secondary site to verify functionality and accuracy in case site
fail over is required.
Full fail-over
In full
fail-over, the directory service is manually switched over from primary site to
secondary site. Since Whois daemons on both the sites provide directory
service, if all the daemons at one site fail, the daemons at the other site
continue to provide the service. There is no fail-over required.
If the Whois
file generation system becomes unavailable, the Whois file generation service
is failed over to the secondary site and the Whois daemon servers are shut down
on the primary site. The Whois file generation process on the secondary site is
configured to run in production mode. It generates the Whois database,
validates it and replicates it on Whois daemon servers on the secondary site
only.
If the database becomes unavailable on the primary site, the
Whois file generation process is disabled on the primary site. The Whois daemon
servers are shut down on the primary site. Whois file generation process is
enabled on the secondary site to run in production mode. It generates the Whois
database, validates it and replicates it on Whois daemon servers on the
secondary site only.
In an
uncontrolled fail-over there is no opportunity to gracefully shut down the
service on the primary site. In this scenario, the Whois daemon and Whois file
generation both go out of service due to unforeseen circumstances. Disaster results in service being
unavailable. In such a situation the ser-vice is manually enabled on the
secondary site. The Whois file generation process is enabled on the secondary site
to run in production mode. It generates the Whois database, validates it and
replicates it on Whois daemon server on the secondary site only.
Non-fail-over
Denial of service (DoS)
Directory
service can also become unavailable because of a DoS attack. The Whois daemon
has built-in defenses against DoS attacks. It is configured to block IP
Addresses that send more than a pre-configured number of queries per second.
This is not a complete defense against denial of service attack because Whois
daemon resources are used in determining the IP Address of the client sending
queries. This results in degradation of the quality of directory service.
Failing over to the secondary site is not a solution because directory service
load is distributed across both the sites and hence both the sites are under this
attack. Denial of service attacks are best solved at border router level. The
offending IP Address is blocked at the border router itself. This saves the
directory service resources from identifying the offending IP Address and
blocking them. As KDDISOL also utilizes IDS (Intrusion Detection System) called
EMERALD which can find new and unknown attacks by Statistical Anomaly Detection
method, the system will become much tough.
If a hacker
compromises the Whois daemon servers and the service is consequently
unavailable, a full fail-over to the secondary site is initiated.
If the Whois database
at one of the Whois daemon servers is corrupted on the primary site or the
secondary site, then that server is shut down, uncorrupted data copied over
from the one of the other Whois daemon servers and the shut down server is
brought up. If all the Whois daemon servers at one site have a corrupted
database, all of the Whois daemons are shutdown; uncorrupted data is copied
over from the other site and the shutdown servers are brought up. If all the
Whois daemon servers at both the sites have corrupted database, all the Whois
daemons on both the sites are shutdown. The Whois database is reverted to the
previous days known good database and the Whois daemons are restarted. Whois
dumper is started to regenerate the database on the primary site. Once Whois
database generation is complete it is replicated on Whois daemon servers on
both the sides. All the Whois daemon servers on both the sites are restarted to
refresh their data.
Customer Services provides the 24-hour technical support via
telephone and e-mail. One-on-one support includes both general information and
problem resolution. CSRs have their own Web-based tool (CSR Tool) for querying
and modifying the database. This tool gives the CSRs the ability to query
registration information at the request of the contacting registrar. CSRs with
appropriate access levels can modify the registration information to correct
errors made by the registrars. If a problem occurs that is beyond the scope of
the CSRs to rectify, a well-defined escalation process is followed to alert
appropriate Operations and Engineering personnel.
Impacts of Failures
The following scenarios address the system-level disaster
recovery processes (tools and E-mail). There are two failure points in the
systems: CSR web-server fail-over and underlying database fail-over. Along with
the system fail-over decisions, the decision must be made whether to relocate
the CSRs to the backup location, entailing rerouting of telephone
communications.
In full fail-over, it will be necessary to complete any
write transactions (database modifications) in progress in the CSR tool. After
the write trans-actions are complete, the next action depends on the area where
the failure was detected:
·
If the failure occurs in the CSR web
servers, the underlying network routing mechanisms will automatically route
further actions to the operational web servers. No further actions are
necessary besides disabling the currently active web server.
· If the failure occurs in the underlying database, the web
servers will have to be pointed to the secondary database. This action requires
changing a configuration parameter (IP address) on both of the web servers and
restarting the web server application.
·
If the decision is made to relocate the
CSRs, the CSRs will physically move to the secondary site and begin their
operations at that site. No system changes are necessary.
For an uncontrolled fail-over, the process is the same as
for a full fail-over, except that there is the possibility that transactions in
progress have not completed successfully. Once the underlying systems have
successfully failed over, the CSRs will have to query the database to determine
if their last action was completed successfully (using the CSR Tool). At this
point, it may be necessary for the CSRs to contact the customer to ensure that
the data is correct.
Non-fail-over
Denial of service (DoS)
Since almost all the CSR Tool operates
on an internal network, it is not so susceptible to many typical service
interruptions (loss of communications lines, DoS attacks, etc.). Furthermore,
KDDISOL will use Intrusion Detection System (IDS) called EMERALD on WWW server
directly connected to the Internet outside of firewall.
For the identified
areas of vulnerability, the actions are:
·
CSR Tool–follow the process for
uncontrolled fail-over.
·
Database–follow the appropriate process
for database fail-over.
The CSRs are a
resource that can determine data corruption (e.g., customer notices a failure
in a registered domain or name server). However the CSR tools have no inherent
capability of detecting or correcting data corruption. In the event of
large-scale data corruption, the procedure to be followed would be the
procedure for recovering the database.
D15.2.13.2.1
Data Recovery
To protect and recover data associated
with critical services, the Registry will employ the EMC Synchronous Remote
Data Facility (SRDF) product in conjunction with the Oracle Database Management
System (DBMS). SRDF provides for significant operational flexibility in the
following areas:
Each TLD
location maintains a tape backup of its system configuration in case of a
hardware failure. If multiple name servers are present at the location, once
the downed system has been repaired/replaced, it is rebuilt from system tapes.
The zone data is either copied to the TLD server from the NOC or is transferred
locally in the case of a multiple name server location.
TLD servers also keep backup copies of previous valid zone
files in case the current zone file becomes corrupt or the application has
problems using the current zone file. Restoration of name server operation will
occur with the backup copies of the zone data until a valid current
The network
infrastructure (both WAN and LAN) is designed to isolate single-point failures
with no interruption of services or degradation in performance. In most cases,
isolation of failures is automatic and occurs within a few seconds of the
event. Certain component failures (such as firewall failure) may require manual
intervention to complete the fail-over.
The Network Operations Center (NOC) will be proactively
monitoring all equipment and WAN circuit activity at local Registry data
centers as well as remote TLD sites to prevent outages. Once an outage occurs,
the NOC will act immediately to isolate the problem and initiate actions to
repair the problem.
Denial of Service (DoS) attacks occur
when one or more systems flood a network or individual services on that network
with disruptive traffic. These attacks may come from many source addresses–a
so-called distributed DoS (DDoS) attack–or from a single address. In either
case, recovery options are limited and involve quenching the source of the
attack either by filtering traffic at network routers or tracing the attack
back to the origin and taking the originating server(s) off the network.
Therefore,
it is very crucial for us to find DoS and DDoS as soon as possible. Automatic
detection mechanism should be required. KDDISOL will establish IDS (Intrusion
Detection System) called EMERALD which adopts excellent intrusion detection
technique.
A security breach occurs when one or
more systems are accessed (and potentially modified) by unauthorized personnel.
Often such breaches occur via a network connection. Recovery from security
breaches is straightforward, but is often consuming, and potentially disruptive
to the services hosted on the affected systems. Certain security breaches may
disable a service, for example Registration, for the duration of the recovery
and cleanup activities:
1.
Identify affected
systems and remove them from the network to pre-vent further damage
3. Notify appropriate law-enforcement authorities of the event
4. Correct weaknesses exploited on all systems including those
not breached
5. Collect and preserve evidence and other information for
turnover to law enforcement
D15.2.13.4
Redundancy/diversity
Please refer to Section D15.2.11 for information on system
redundancy.
D15.2.13.5
Training of Technical Staff
KDDISOL
will train its staffs to recover systems. Staffs are chosen from experts who
have most profound knowledge of IP and UNIX technology and much experience of
designing a large IP network and server system. KDDISOL staffs should be equal
to or more excellent than those qualified as Cisco Certified Internetwork
Expert (CCIE), Oracle Certified Professional (OCP), and so forth.
D15.2.13.6
Facilities
Please refer to Sections D15.2.1.7 and D15.2.12.8 for
information on facilities.
D15.2.13.7
Process and Procedures
KDDISOL maintains a four-tiered data storage architecture
for production data that includes the following:
1.
Primary on-line data
and Critical Data Archive (CDA)
The primary
on-line data is dynamic data that is created and maintained on a real-time
basis as the Registry performs normal business operations. The dynamic data may
change from as often as hundreds times a second to periodic ad-hoc
changes. Full-copy disk mirroring
protects most primary tier-1 online data. Critical Data Archive (CDA) is also a
process for storing tier-1 data, but represents data that has been moved off of
the production OLTP database for capacity reasons. Tier-2 data is less critical
because it is copied periodically from the production systems.
Periodic, or
tier-2, disk copies for several purposes. First, they serve as the backup for
tier-1 data. Secondly, they provide the ability to execute read-only
instructions and batch activities without impacting performance on the main
production OLTP database.
Offsite, or
tier-3, on-site backups and archives are stored in automated tape
libraries. These tapes contain not only
backups of data, but system configurations as well. Retention periods vary based on the nature and criticality of the
data.
Offsite, or tier-4, tape backups and archives are copies of
a subset of the on-site backups. There
is nothing off-site that does not also exist on-site. Critical backups (for
disaster recovery) and long retention archives are stored offsite.
KDDISOL thoroughly documents the following items:
·
Backup and Archive
Policies
·
Technical Operations
Plan
D15.2.14
Technical and Other Support
Support for registrars and for Internet
users and registrants. Describe technical help systems, personnel
accessibility, web-based, telephone and other support, support services to be
offered, time availability of support, and language-availability of support.
Customer Services provides 24-hour
technical support via telephone and e-mail.
One-on-one support includes both general information and problem
resolution. CSRs have their own Web-based tool (CSR Tool) for querying and
modifying the database. This tool gives the CSRs the ability to query
registration information at the request of the contacting registrar. CSRs with
appropriate access levels can modify the registration information to correct
errors made by the registrars. If a problem occurs that is beyond the scope of
the CSRs to rectify, a well-defined escalation process is followed to alert
appropriate Operations and Engineering personnel.
KDDISOL intends to contract with a
translation service to provide real-time translation for over 155 languages. When
a call from a non-Japanese or English speaking contact is received by Customer
Service, the language translation service will be conferenced in and the
problem or issue addressed immediately.
D15.2.14.2
Registry Command Center
The NOC provides 24x7x365 global systems monitoring and
support. Automated systems monitoring tools and technology (See System Outage
Prevention) continually assess the health and well being of servers, networks,
and applications. This often enables the Command Center to detect and address
anomalies before they result in service outages. Strong problem management and
escalation procedures ensure that issues are identified, escalated and quickly
resolved.
D15.2.14.3
Registry Technical Operations
The KDDISOL Technical Operations staff provides 24x7x365
onsite or on-call support of all production systems operated by the KDDISOL.
This includes the following operational systems management disciplines:
·
Performance &
Capacity Planning
·
Data Center Planning
& Management
·
Deployment Planning
& Execution
·
Data & Systems
Backup, Restore & Archive
·
Business Continuity
& Disaster Recovery
·
Problem & Change
Management
·
Asset &
Configuration Management
·
Metrics Collection
& Reporting
The Technical
Operations staff is continuously on-site or on-call to address urgent problems
and/or service degradation. Routine inquiries and requests (such as reports,
metrics, etc.) are handled during standard business hours.
D15.2.14.4
Remote TLD Site Technical Support
At each of the TLD sites, there are contractual arrangements
in place for technical support at each remote site. This support includes
24x7x365 “smart hands” support from staff employed at the site as well as quick
response by vendor field engineers.
The
Registry provides web-based tools that are used by both the registrars and
Registry Customer Support Representatives.
Registrars can used the Registrar Tool to access domain name and name
server status and availability information, update registrar information, and
generate Registrar Daily Transaction and Weekly Snapshot Reports. The CSR Tool provides the ability to add,
delete, or modify domain name and name server information.
Registrar Tool
The Registrar Tool site provides the
Registrar with access to registrar specific information about transactions with
the Registry. It is accessed through the Registry web site and uses SSL as
supported by version 4.0 and above of Netscape, Microsoft Internet Explorer and
AOL browsers for securing the connection. The registrars can perform the following
tasks with the tool:
CSR
Tool
The CSR version of the tool provides all
the above functionality, but has additional capabilities to allow the CSRs to
access the database and make changes directly to the domain name and name
server records. This real-time capability provides superior service by enabling
the CSRs to address and resolve issues immediately. Following are the functions
that can be performed by CSR’s with the CSR Tool:
·
Query, add, update, delete, transfer,
renew, and purge a domain on behalf of the registrar
·
Query, add, update, and delete a name
server on behalf of the registrar
·
Delete domain Credit
·
Query, add, and update a Registry user
·
Update a registrar’s credit
·
Produce various reports
·
Administer a registrar’s account. This
includes querying, adding, an updating registrar information, as well as
querying, adding, updating registrar contact information.
The CSR tool will not
allow CSRs to register new domain names on behalf of a registrar. Registrars
must enter this information themselves.
To further empower
the registrars, the Registrar Tool will be enhanced in the near future to
provide all the functionality in the CSR Tool, except for the ability to add
domain names.
Figure
D15.2-5 Customer Support Process Diagram
D15.2.14.6
Personnel Accessibility
The Registry will have multiple layers
of personnel dedicated to ensuring the uninterrupted operation of the SRS, TLD,
and other systems, and to provide registrar support around-the-clock. There are
pre-established escalation procedures that ensure that the appropriate person
can be contacted at all times to quickly and effectively deal with any issues
that may arise. Phone and email support are all used at various points in the
escalation process.
Table D15-2.4 Personnel Accessibility
Resource |
Time of
Availability |
Contact |
Customer Service Representatives |
24x7x365 |
Phone, email |
Technical Operations |
24x7x365 |
Phone, email |
Engineering |
8x5 plus 24x7x365 emergency call |
Phone, email |
Management |
8x5 plus 24x7x365 emergency call |
Phone, email |
D15.2.14.7
Operations Testing and Evaluation Support (OT&E)
The OT&E environment
will provide a protected environment in which to validate the operability of
prospective registrars. It will replicate the production software environment
separate from all production data and operations and allows for debugging of
interoperability issues. It also will be an ongoing test area for evaluating
future system upgrades.
The OT&E process will ensure that a
registrar’s system is compatible with the Registry. To participate in the process, the following steps will occur:
1. Registrar requests OT&E activation
5.
Registrar passes
OT&E and is activated in the production environment.
The OT&E environment will have an RRP gateway outside a
firewall. All other activities will be directed through the Registry
Application and Database servers with other equipment added as needed. Initial
capability will be hosted on multi-processor UNIX servers.
D15.2.14.8
Non-Technical Registrar Support
D15.2.14.8.1
Account Management
Account Management will be responsible for maintaining and
nurturing the relationship between the Registry and the Registrars (our
clients). This team will be dedicated to constantly interfacing with the
registrars and providing feedback to the Registry regarding the level and
quality of service. As often as possible, the Account Managers will meet
face-to-face with the registrars to discuss the relationship and explore ways
to improve it.
D15.2.14.8.2
Customer Affairs Office
The Customer Affairs staff will be responsible for the
contractual relationship with the registrars, and for support during the
ramp-up process. They will be also responsible for interpretation and
compliance with ICANN guidelines, and communicate this information both
internally and to the registrars.
D15.3
Subcontractors
If you intend to subcontract any the following:
·
all
of the registry operation function;
·
any
portion of the registry function accounting for 10% or more of overall costs of
the registry function; or
·
any
portion of any of the following parts of the registry function accounting for
25% or more of overall costs of the part: database operation, zone file
generation, zone file distribution and publication, billing and collection, data
escrow and backup, and Whois service please
(a) identify the subcontractor; (b) state the scope and
terms of the subcontract; and (c) attach a comprehensive technical proposal
from the subcontractor that describes its technical plans and capabilities in a
manner similar to that of the Technical Capabilities and Plan section of the
Registry Operator's Proposal. In addition, subcontractor proposals should
include full information on the subcontractor's technical, financial, and
management capabilities and resources.
KDDISOL will elect to
subcontractor most of the registry functions to Network Solutions, Inc.(NSI)
during the first of two phases currently planned for the implementation of its
new gTLD.
During the initial
period of KDDISOL’s TLD registry administration, many of the basic
responsibilities will be handled by our subcontractor, Network Solutions, Inc.
(NSI) of Herndon, Virginia. They are the acknowledged world leader in registry
services with sufficient financial and technical resources (see attached 10K)
to accommodate our requirements. The duration of this phase is anticipated to
be approximately one year. NSI’s responsibilities will include designing system
and software sufficient for KDDISOL to operate and manage a world-class
registry. The basic operational relationship envisioned between KDDISOL and NSI
is one that will be designed to diminish as a function of time. By the end of
Phase 1, KDDISOL will be fully capable of operating its registry and by the end
of Phase 2, the necessity for NSI’s direct operational participation will have
been eliminated.
We understand that
ICANN has had a long and close relationship with NSI. In this sense, supporting
documentation of NSI’s Registry capabilities seems relatively unnecessary.
However, should you require additional information about NSI or about NSI’s
relationship with KDDISOL, aside from what is included herein, please advise.
Please see attached
Onsite Registry Service Proposal.
_______________________________
Signature
Tohru Asami____________________
Name (please print)
_______________________________
Title
_______________________________
Name of Registry Operator
_______________________________
Date