(i) General description of proposed facilities and
systems. Describe all system locations. Identify the specific types of
systems being used, their capacity and interoperability, general
availability and level of security of technical environment. Describe in
appropriate detail buildings, hardware, software systems, environmental
equipment and Internet connectivity.
|
Facilities are the foundation for building, operating, and maintaining reliable
platforms that guarantee the uptime of critically hosted Internet services.
VeriSign classifies its major data centers as critical facilities, and also
operates services from outsourced facilities that are used to host a number of
our DNS name server sites. All facilities are available for inspection upon request.
VeriSign Advantages:
+ Facilities declared Critical Infrastructure by U.S. government
+ Over 30 business offices and 18 technical facilities worldwide
+ Long-term investments in robust, fully redundant facilities and systems
+ Integrate comprehensive, world-class security into all facilities and systems
+ Thorough background checks and screening of all employees and contractors with
periodic review.
This section outlines the general description of VeriSign's facilities and
systems for the .net registry. Throughout this section, we describe all system
locations with the specific types of system being used in each facility, including:
* Description of all System Locations. VeriSign carefully selects facilities and
locations based on well-established criteria for each system.
* Specific types of systems being used, their capacity, interoperability, and
general availability. All systems in VeriSign data centers are designed and
operated 24x7x365 to the highest specifications requisite for critical
infrastructure services.
* Level of Technical Security. VeriSign provides world-class security for all
data centers. This security prevents against data tampering, system hacks, and
physical break-ins. It includes 24x7x365 security and biometric access controls.
* Detailed description of building details, hardware and software systems, and
environmental equipment. VeriSign buildings are hardened and fully secured, with
infrastructures that are designed, built, and managed specifically to support
critical Internet infrastructure services VeriSign data centers maintain fully
redundant environmental systems to provide reliable services
* Internet Connectivity. VeriSign data center facilities maintain redundant
Internet connections with diverse high capacity service.
VeriSign currently operates the .net registry from its major data centers in the
continental United States and hosted resolution facilities across the globe.
Our major data centers are hard shell, fully secured, and supported critical
infrastructure buildings designed, built, and managed to the highest possible
standards. Facilities failure would compromise the mission critical objective of
providing uptime reliability necessary for critical services supporting the
Internet. Verisign has invested enormous resources into building the most
reliable data centers possible by:
* Partnering with the best critical facility design engineers in the world,
drawing on their combined knowledge and experience to design and build the most
robust possible critical data centers possible, each designed to provide as
close to 100 percent Internet uptime as technically possible.
* VeriSign data centers so secure, reliable, and important to the Internet
infrastructure that they have been designated as critical infrastructure sites
by the United States Department of Homeland Security since 2001.
* VeriSign data centers that incorporate and operate on multiple redundant
support systems designed to maximize service uptime.
* Employing world-class engineers to manage and operate the data centers, using
best possible maintenance standards regardless of cost.
* Periodically replacing critical infrastructure equipment well before end of
lifecycles, using the best new equipment possible.
* Using cutting edge technology to support availability requirements by linking
multiple data centers together to provide the most robust and reliable solutions
possible.
* VeriSign data centers use multiple and redundant extremely high bandwidth
optic fiber.
* VeriSign data centers are fully secured by the best possible electronic
surveillance systems, guarded by highly trained and capable security teams, and
remotely monitored 24x7x365 from a highly classified Global Security Operation
Center.
Description of All System Locations
VeriSign has set the industry standard for registry availability by delivering
scalable, secure, and stable registry services with the global presence required
for critical Internet infrastructure. VeriSign operates the .net TLD in
facilities designed specifically to support large-scale, domain name registries.
Services for essential elements of the Internet infrastructure must be housed in
fully redundant, world-class facilities to support all registry operations to
reliably meet growing demands on registry systems. This section discusses
VeriSign's selection criteria and locations.
The Naming and Directory headquarters, located in Dulles, Virginia, provide the
management of registry facilities for all registry technical functions. These
include the fully redundant, functionally identical primary and alternate
primary facilities for the SRS and the registry database, our global
constellation of DNS services hosted at carrier grade facilities, Customer
Service, and the Network Operations Center (NOC). Figure 5(b)i-1 shows the
locations of VeriSign facilities for .net.
The primary data center is located within 10 miles from the Dulles, Virginia
headquarters and provides production services for the shared registration system
(SRS), the registry database, zone generation and distribution, and Whois. The
location for the data center was chosen to enable the operations staff to
maintain high-volume; real-time data synchronization between the primary and
alternate primary data centers with the ability to quickly relocate personnel,
yet sufficiently distant enough to isolate it from a catastrophic event at the
alternate primary data center.
The alternate primary data center is functionally identical to the primary data
center. This data center provides fully functional registry systems.
Specific Types of Systems Used, Capacity, Interoperability and General Availability
The infrastructure supporting the primary data center is depicted in Figure
5(b)i-2. The systems in each facility have the capacity to operate the .net
registry well into the next decade and are designed to operate 24x7x365. The
design, functions, and operation of these systems are described in the following
paragraphs.
Electrical Systems. The electrical supply for the primary data center features
design elements that are common to all data centers. If the normal utility
should fail (storms, construction accident, brownout, etc.), all three
generators are signaled to start after 15 seconds by the programmable logic
control (PLC). The uninterruptible power supply (UPS) units use battery power to
provide critical power to the data center and base building until the generators
provide emergency power or the normal utility returns. Once started and running,
all generators automatically supply emergency power to the main distribution
boards via the paralleling switchboard, until the normal utility returns. The
load shed capability, a fuel savings measure, allows the generators to shut down
and start automatically as the load requires. The NOC annunciator panels reflect
UPS and generator operation status. The Infrastructure Monitoring System (IMS)
is monitored by the NOC for all mission critical equipment.
Load Balancing. Registry services are load-balanced within the data centers.
Each data center has independent infrastructure and security. This design,
coupled with the ability to load-balance and/or shift services between the
primary data center and alternate primary data center facilities, provides for
the most robust infrastructure imaginable for the .net registry.
Heating, Ventilation, Air Conditioning (HVAC). Each data center's HVAC and
humidity control system is designed to a minimum of N+2 redundancies. This means
that two HVAC and/or humidifier units could fail (or be taken down for
maintenance), and still provide proper cooling and humidity.
Fire Suppression. Primary fire suppression is provided by a clean agent (FM200)
gas with individually activated sprinkler heads as secondary. The sprinkler
system is a preaction system, which means that compressed air keeps water from
the overhead pipes in the data center to avoid the risk of water leaks damaging
equipment. In the event of an FM200 discharge, all data center HVAC equipment
would be turned off to allow the clean agent to suppress any combustion. Only
authorized emergency response personnel can reset the system and HVAC equipment
will automatically restart. However, FM200 will not damage equipment. A data
center equipped with FM200 can be back up and running after a discharge, as soon
as the reason for the discharge is identified and fixed. No equipment cleanup is
required. Data centers, as well as UPS and battery rooms are fully protected by
FM200 gas.
Zone Protection. At the data center facilities, an extra step has been taken by
designing two separate data center zones in one facility. Each data center zone
has its own electrical infrastructure, HVAC, humidity control, and fire suppression.
Air Handlers (Air Conditioning [AC] Units). Each of these units is designed to
supply conditioned air into the subfloor to maintain the data center
temperature. Condensate pumps below each unit dehumidify the return air and pump
excess water into adjacent building storm drains. There are also water leak
detectors located underneath each unit. Each unit has a built-in alarm panel.
Air handlers, by design, are not on UPS backup due to electrical load demands.
If a power outage occurs, the units will momentarily shut off, restart within 15
seconds, and indicate a power restart alarm.
PermAlert Leak Detection Monitor. This unit uses a braided cable in the subfloor
to monitor the area for water leaks. Water will create an electrical path at the
cable and send the PermAlert unit alarm, giving the approximate distance to the
detected leak.
Fike Alarm Panel. This panel is located in the entrance area of the data center.
It monitors all the ceiling and subfloor smoke detectors in the data center and
the UPS room and battery rooms. This panel has a remote information display
(RID) unit in the NOC, which is manned 24x7x365. This is a fully cross zoned,
automatic, clean gas fire suppression system that fully complies with National
Fire Protection Agency (NFPA) 2001.
Remote Monitoring Panels in NOC. There are four generator annunciator panels,
four UPS annunciator panels, and the Fike Alarm system RID panel located in the
NOC. These panels are labeled and provide light emitting diode (LED) lights and
alarms for any change in the status of the aforementioned equipment. In
addition, the IMS mission critical equipment monitoring system is monitored by
the NOC.
Remote Facility Systems
The following specifications define VeriSign's standards for our global
constellation of DNS services that are hosted at carrier grade facilities .The
facility operator provides HVAC and other environmental components, including,
but not limited to UPS, power, breaker panels, lightning, and fire suppression,
all of which must be operational 24x7x365.
Space/Power. Physical (rack space) requirements vary by site and include
provisions for growth. Electrical service includes primary and backup power
requirements noted below:
* Power generators:
- Dedicated to the building
- At least twice the capacity of the electrical power of the data center
- 48 hrs of emergency fuel reserve
* UPS:
- Engages immediately upon power interruption to support the data center,
including cooling systems.
- Continues to operate until power generators start up.
- Full load battery reserve time of 15 minutes to transition to the standby
generator system
Network. The facility provides access to multiple network providers with a
minimum of 1 Gigabit (burstable) Internet connectivity.
Security. Physical access to the space is restricted to facility operator and
VeriSign personnel (or subcontractors hired by either party) who are directly
involved with the operation and support of the VeriSign Servers or who are
performing obligations of either party under a master services agreement.
Escort and/or Monitored Access Service. The facility operator provides lobby
security where badges are issued to visitors, who must sign in and be escorted.
The facility operator provides a suitable number of security personnel to patrol
the site on a regular basis, and monitor the security equipment installations at
all times throughout the year. The facility operator provides fax/email
equipment to enable VeriSign to communicate its access requirements.
Level of Technical Security
A world-class security team provides security for all VeriSign data centers.
Security measures common to all three of the discussed data centers, and at all
of VeriSign critical facilities are:
* Data center employees, including security, who go through extensive pre-hire
security backgrounds, including criminal and credit
* Highly trained, 24x7x365 security force ready to deploy the appropriate
trained emergency response protocols and procedures
* 24x7x365 secured single entrance/exit
* Tiered security zones using escalating biometrics and identification badge
access control
* Redundant monitoring from both on premises security and from the Global
Security Operation Center.
The 24x7x365 video surveillance of the external grounds, emergency generators,
exterior doors, UPS rooms, transfer switch rooms, transformer rooms, and
registry services are load-balanced between Data Center A and Data Center B in
the primary data center. Each side of the data center has separate
infrastructure and security. This design, coupled with the ability to load
balance and/or shift services between the primary data center and alternate
primary data center facilities, provides for the most robust infrastructure
imaginable for the .net registry.
Description of Building Details, Hardware, Software, and Environmental Systems
VeriSign data center facilities provide the secure underlying physical
infrastructure required to support a growing critical Internet infrastructure at
a time when external attacks (physical and logical, malicious and nonmalicious)
are an ever-growing reality. A detailed description of buildings, hardware and
software systems is provided in Table 5(b)i-1. All facilities are available for
inspection by ICANN.
VeriSign has carefully considered the choice of operating its Internet services
from VeriSign owned data center facilities or outsourced data center hosting.
Outsourced data centers are usually caged off spaces within large collocation
centers, which represent a reliability risk and also have other operational
constraints. Building and owning reliable and scalable data centers with
inherent redundant critical infrastructure is prohibitively expensive for all
but the most successful and reliable companies. The cost to build a large,
first-class data center runs into the tens of millions of dollars, and to fully
populate the data center with cutting edge servers runs into the hundreds of
millions of dollars.
VeriSign has opted to invest in owned facilities for all major data centers.
Owned facilities are not always practical or affordable in cases where a large
number of sites are necessary to extend our global network coverage.
Facility security must address the entire spectrum of threats, including:
inadvertent or malicious activity, natural disasters, and terrorist activities.
The data center facilities possess the most obvious characteristics of security,
including:
* Low profile (e.g., no external markings or signage)
* The facilities are isolated from easements, rights of way, and adjoining tenants
* Hardened against regional weather events (e.g., high winds or hurricanes)
* Located outside flood areas
* Multi-level physical security, including 24x7x365 onsite security force, badge
readers, and biometric access control devices
* 24x7x365 video surveillance.
Internet Connectivity
Internet connectivity is a critical element for any facility supporting registry
and global DNS functions. Sufficient bandwidth is the primary defense against
Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks.
Internet connectivity is provisioned through multiple providers and through
multiple physical routes. The data centers have multiple DS-3 and OC-3
connections to the Internet provisioned through diverse providers. At data
center facilities, redundant Internet connections enter the facility through
diverse cable conduits, travel to the border routers via separate conduits
within the facility, and terminate at border routers positioned in separate
cabinets in different sections of the data center. Name servers positioned with
collocation partners have a minimum of diverse 1 Gigabit connections. A
redundant, diverse wavelength division multiplexing (DWDM), dedicated optical
fiber ring is used to connect the data centers. Figure 5(b)i-3 depicts the
network connectivity at the .net sites.
Conclusion
VeriSign data centers are designated Critical Infrastructure by the U.S.
government, meaning they warrant additional protection in the event of a
national emergency. VeriSign provides world-class security for all data centers;
all systems are designed and operated to the highest specifications requisite
for critical infrastructure services. Our comprehensive technical and physical
security prevents against data tampering, system hacks, and physical break-ins.
Each VeriSign data center facility maintains redundant Internet connections with
diverse high capacity service. These provide the bandwidth for all services and
support functions and have the capacity to support the forecasted demand for .net. |
|
(ii) Stability of resolution and performance
capabilities, including: response times and packet loss targets;
availability of authoritative name servers; processes, tools and automated
monitoring to ensure accuracy of zone data for resolution; diversity of
DNS infrastructure; diversity and redundancy of network and DNS
infrastructure to handle bandwidth congestion and network failures of ISPs
and host providers.
|
It is critical that the .net operator provide extremely stable and high
performance resolution for this critical TLD. VeriSign has managed the largest
DNS constellation in the world for more than a decade with 100 percent
resolution availability. We have sustained this performance track record through
extraordinary growth in Internet usage and under increasing complex security
threats.
VeriSign Advantages:
+ 100 percent uptime for DNS resolution for over 7 years
+ Resolution supported by our award-winning ATLAS platform
+ Support for 20x average daily load
+ Lowest packet loss and fastest response times in the industry
The performance capabilities of this system exceed the capabilities of any other
registry. VeriSign is committed to continue providing the greatest stability
with the best performance on our world-class resolution system.
This section presents VeriSign's solution for stability of resolution and
performance capabilities, including:
* Response Times and Packet Loss Targets. Within a resolution site, our goal is
5 millisecond response time and no packet loss. We strategically locate our
resolution sites to distribute queries evenly among our global constellation and
to provide low response time and packet loss rates to all DNS clients throughout
the Internet. In practice, our monitoring shows that we meet or exceed ICANN's
Cross-Network Nameserver Response Time requirements while other registries do not.
* Availability of Authoritative Name Servers. Over the past 7 years we have a
track record of 100 percent availability of the .net resolution system. We
maintain this high level of availability with extraordinary capacity, a
redundant architecture, a cautious approach to maintenance, and extensive
monitoring.
* Processes, Tools, and Automated Monitoring to Ensure Accuracy of Zone Data for
Resolution. We maintain absolute zone data integrity through use of checksums
for file transfers, a comprehensive approach to data validation before
publication, a process that audits the zone once it is published, and extensive
end-to-end monitoring of the system from zone data generation through publication.
* Diversity of DNS Infrastructure. The .net DNS infrastructure relies on two
completely separate name server implementations of a primary system and a
warm-standby backup and uses diverse hardware and operating system software so
that no single failure or vulnerability can affect the entire system.
* Diversity and Redundancy of Network and DNS Infrastructure to Handle Bandwidth
Congestion and Network Failures of ISPs and Host Providers. The use of diverse
ISPs at our resolution sites, combined with the large number of sites, ensures
that the .net name server system can continue to function even in the face of
one or more network failures and congestion.
1. Response Times and Packet Loss Targets
(a) Measurement Perspectives
When considering DNS response times and packet loss, it is important to specify
exactly how and where these performance measurements are made. VeriSign
distinguishes performance based on measurements from two different perspectives:
(i) Intra-site performance is measured from the perspective within each of
VeriSign's current 14 worldwide resolution sites, without regard to external
factors beyond the site itself. We have very specific DNS resolution performance
goals:
* DNS response time under 5 milliseconds (5 ms). The time period from when a DNS
query is received at a resolution site, processed by our ATLAS authoritative
name server, to when a response is sent will not exceed 5 milliseconds. In
practice, our ongoing monitoring confirms that we consistently meet or exceed
this performance target at every one of our resolution sites.
* DNS packet loss of zero percent (0 percent). We strive to answer every DNS
query that reaches our resolution sites. Any level of packet loss within our
sites is unacceptable and represents a problem that must be corrected. Our
network engineers and system administrators troubleshoot any reports of packet
loss until the underlying problem is identified and corrected.
(ii) Internet performance refers to the latency and packet loss measured at
various points across the Internet. This measurement is not an easy task because
of the size and breadth of the Internet.
VeriSign's goal with .net resolution is to have the lowest latency and packet
loss possible to every possible Internet client. Achieving this goal means
deploying the maximum number of authoritative name servers in an optimal
distribution.
(b) Number of Sites
For reasons relating to maximum DNS packet size, the historical maximum number
of authoritative name servers for a single zone has been 13. Recent experiments
have shown that anycast, when used conservatively and sensibly, can expand the
number of name servers.
Choosing appropriate locations for our resolution sites is a complicated process
that is described in Section 5(b)vi. Our goal is to optimize the strategic
distribution of constellation sites all over the world. An optimal distribution
results in roughly even distribution of DNS queries among all the sites and
roughly even latency and packet loss to all DNS clients. We have stringent
facilities requirements for resolution sites that are detailed in Section 5(b)i.
Resolution site distribution undergoes constant refinement; we adjust site
locations as traffic patterns and other requirements change. 5(b)vi describes
this process in detail.
We currently use anycast technology to expand the number of resolution sites
beyond the traditional maximum of 13. We first applied anycast technology to the
j-root server to increase its capacity and reliability in the wake of the DDoS
attacks of October 2002. (The j-root servers are live at 15 sites around the
world.) Other root operators have deployed anycast as well, and our joint
experience proves that judicious use of anycast provides an excellent mechanism
for increasing the number of authoritative name servers for a zone. The root
zone is an excellent example, because some of its servers are anycast while
others remain unicast for stability and diversity. A total reliance on anycast
(that is, anycasting all of a zone's name servers such as the current
implementation used for .info and .org) is a questionable operational practice
because of unexplained problems and outages presumed to result from Border
Gateway Protocol (BGP) routing anomalies.
We have applied anycast conservatively to the .net name servers. In July 2004,
we added a 14th resolution site using anycast in Seoul, South Korea. A 15th site
in Beijing, China is planned for the first quarter of 2005, and more sites in
traditionally underserved and emerging markets are planned for 2005. Details of
these expansion plans are provided in Section 5(b)vi.
(c) Cross Network Name Server Performance Requirements
ICANN's Cross Network Name Server Performance Requirements represent a good
starting point for minimum quantifiable measurements of name server performance
for latency and packet loss.
VeriSign constantly monitors our own name server performance across several
dimensions, all of which are discussed in greater detail in Section 5(b)xvi,
System Outage Prevention. One trend we measure is latency and packet loss from
eight of our resolution sites to all of our other sites. Figure 5(b)ii-1 shows
typical cross-site latency measurements. The graphs in this figure show that
many inter-site roundtrip times from each monitoring site to multiple resolution
sites are below 100 milliseconds. The graphs also show almost no packet loss.
This same monitoring software also takes latency measurements of other TLD name
servers. As a contrasting example, measurements to the .org and .biz
authoritative servers for the same time period, shown in Figure 5(b)ii-2,
indicate higher average latency. The latency measurements show erratic response
times, which are an indication of under-provisioning and poor management. The
same graph also shows more packet loss to these servers than to the VeriSign
.net authoritative name servers.
VeriSign is committed to externally measure performance of less than 100
millisecond response time averaging less than 1 percent packet loss per month
with less than 5 percent packet loss in any 5 minute period.
2. Availability of Authoritative Name Servers
The availability of the .net authoritative name servers can be considered from
two perspectives: the availability of the system as a whole and the availability
of individual name servers.
(a) Name Server System Availability
From this perspective, all the .net name servers are viewed as a system. The
system is considered available if a sufficient number of name servers with
adequate capacity are responding to .net queries. See Appendix D for detailed
availability specifications. VeriSign's goal is nothing less than 100 percent
uptime for the .net name server system. We have met this goal for the past 7 years.
Many factors contribute to this uptime record:
* Extraordinary Capacity. We have .net name servers deployed at our worldwide
resolution sites, and each site is over-provisioned with excessive capacity.
This surplus of sites and capacity means that the .net name server system can
continue to function even when some sites are unavailable, such as in the event
of an ISP failure or a DDoS attack. For security reasons, VeriSign does not
discuss performance capabilities in detail. We can guarantee that the .net
system can continue to serve even peak query levels with only a fraction of the
14 sites available in the event of an unforeseen catastrophic failure.
* Redundant Architecture. Each resolution site is designed with a highly
redundant architecture, which is described in the next section. The resilience
and reliability of each individual site contribute to the overall system's
historical high availability. VeriSign designs redundancy at all levels into all
our systems.
* Cautious Approach to Maintenance. We have three hot standby sites, which are
exact copies of the resolution sites deployed globally. When we perform
scheduled maintenance on a resolution site, we first transfer that site's
various services to one of these backup sites. With a site's services handled by
a standby site, we can safely perform maintenance without any fear of
interrupting or impacting a production service. When maintenance is complete, we
transfer service back to the main site after exhaustive testing. Thus we do not
need to take an outage for even routine maintenance and upgrades. Standby sites
are described in more detail in Section 5(b)viii.
* Monitoring. Our extensive monitoring infrastructure, which ranges from
standard tools to monitor health of individual systems to customized tools that
report DNS-specific statistics, allows us to quickly identify and address problems.
More details on the processes we have to increase system availability are
described in Section 5(b)xvi, System Outage Prevention.
(b) Individual Name Server Availability
VeriSign also strives for 100 percent uptime of individual name servers at our
resolution sites. At each site, the .net zone is served by ATLAS, our
proprietary highly performing, highly scaling, and highly available name server.
A single .net name server site comprises the multiple systems that are part of
an ATLAS name server installation, along with various supporting network
equipment. Figure 5(b)ii-3 shows a high-level architecture illustrating the
redundancy of our ATLAS resolution sites.
The diagram clearly shows how all components at a resolution site are deployed
in at least a redundant pair configuration. This redundant architecture, present
at each of the 14 resolution sites, helps ensure that a site itself can
withstand a failure of any single component (and in some cases, multiple
components) and continue to handle queries.
The highly redundant architecture of each resolution site and the redundancy of
the overall system, along with our careful maintenance approach and
sophisticated monitoring, combines to make the .net name server system one of
the most reliable pieces of computing infrastructure in the world.
3. Processes, Tools, and Automated Monitoring to Ensure Accuracy of Zone Data
for Resolution
VeriSign considers data integrity to be an extremely serious issue. A company's
Internet presence is now a critical resource; it is therefore vital that the DNS
delegation information contained in .net always be published completely and
accurately. This section describes some of the checks VeriSign has in place to
ensure that .net zone data is always published correctly. More information about
the zone distribution and publication process is provided in Section 5(b)viii.
(a) Checksums for File Transfer
VeriSign produces .net zone data in several formats, as explained in Section
5(b)vi. Regardless of the format, whenever a file containing .net zone data
needs to be moved, we calculate a MD5 checksum over the file. The source file
and the checksum, contained in a separate file, are copied to the destination.
At the destination, another MD5 checksum is compared against the transferred
file and compared against the source checksum. This process allows us to detect
any errors in transit. Even local copies of zone data within a file system
follow this process.
When a name server loads zone data, it verifies the MD5 checksum associated with
the file as well. ATLAS was designed with this verify-on-load feature and our
engineers retrofitted this feature into Berkeley Internet Name Domain (BIND) as
well. Zone data, therefore, is completely integrity protected from generation to
name server load.
(b) ATLAS Validation
Before zone data is distributed to the ATLAS name servers at the resolution
sites, a rigorous validation process verifies the file as part of our serious
commitment to absolute data integrity. The file is loaded by a local ATLAS name
server, who contents is then compared against the registry database to ensure
that the published data will match exactly. Only then is the file published to
all resolution sites.
(c) ATLAS Auditing
The ATLAS auditor is a separate component that continually verifies that the
data served by name servers at the resolution sites matches the contents of the
.net registry database. Auditing complements the validation process.
(d) Monitoring
VeriSign has developed and implemented extensive monitoring for all aspects of
the .net registry. Some monitoring tools are standard, while others were
developed in-house to suit our particular needs. All our monitoring systems are
described thoroughly in Section 5(b)xvi.
We developed a graphical Heads Up Display (HUD) to show the instantaneous
performance and status of the .net name server constellation. Specifically with
regard to data accuracy, the HUD uses a color code to display the status of each
resolution site. This monitoring software constantly probes each .net name
server and uses color codes to indicate potential problems with a site. The HUD
also shows the status of individual near real-time updates as they propagate to
all the resolution sites. Any propagation delay of these updates, which
ultimately would cause inconsistent data among the resolution sites, is clearly
visible on the display.
4. Diversity of DNS Infrastructure
VeriSign believes diversity at all levels of the .net registry system is an
important factor in the system's stability and security. Diversity is related to
redundancy. A system with redundant components can survive the failure of one of
those components. Diversity can be considered another way to add redundancy to a
system. With sufficient diversity, no single vulnerability, failure, or attack
can affect the entire system.
At the DNS infrastructure level, we have diversity in name server software level
and in name server host hardware and operating system.
(a) Name Server Software Diversity
All .net name servers run ATLAS, which was deployed starting in November 2002.
Since that time, ATLAS has performed well, and we are very confident in its
stability and security. Because the .net zone is such a critical piece of
Internet infrastructure, we have designed diversity into the .net authoritative
name servers at the DNS software level.
We run BIND name servers in a warm standby mode as a backup to ATLAS at each
resolution site. These BIND name servers are always running with an up-to-date
copy of the .net zone. In the event of a critical ATLAS failure, these BIND name
servers can begin answering queries because a systemic failure of .net cannot be
tolerated under any circumstances.
(b) Name Server Host Hardware and Operating System Diversity
In addition to the diversity of DNS software mentioned above, we deploy diverse
hardware and operating systems for the computers on which this software runs.
ATLAS runs on two completely different hardware and operating system
combinations, guaranteeing that no single vulnerability can affect every
resolution site.
5. Diversity and Redundancy of Network and DNS Infrastructure to Handle
Bandwidth Congestion and Network Failures of ISPs and Host Providers
VeriSign designs for diversity and redundancy at all levels of the .net registry
system. In addition to the diversity specifically related to the DNS
infrastructure described above, all other aspects of the resolution system are
engineered with diversity and redundancy in mind. The way in which this
diversity and redundancy help overcome network-related problems is described below:
(a) DNS and Network Diversity
VeriSign's diversity of our resolution sites continues down to the network
level. We have two different network designs for our resolution sites. Each
design uses the same network components, as shown in Figure 5(b)ii-3, but from
different vendors. There is no vendor overlap between the two designs, so any
failure or vulnerability of a given vendor's equipment will not affect all of
our resolution sites.
At a higher level, our resolution sites do not depend on a single ISP. Our 14
resolution sites use a wider array of providers, so that a problem with a single
vendor's network will not affect all of our resolution sites.
(b) DNS and network redundancy
Earlier in this section, we described the completely redundant architecture of
our resolution sites, as shown in Figure 5(b)ii-3. We also described redundancy
at a higher level. There are 14 .net name server locations to handle DNS
queries, and just a fraction of these sites would be sufficient to answer even
the peak query loads seen today.
Conclusion
VeriSign currently manages the largest DNS constellation in the world and has
done so more than for a decade. Our system will continue to sustain 100 percent
resolution, the highest levels of stability compared with any other registry.
Our performance capabilities of this system exceed the requirements of our
current service level. VeriSign has over-provisioned extraordinary capacity and
redundant architecture, a cautious approach to maintenance and extensive monitoring.
VeriSign is committed to continue providing this world-class resolution system
for .net, meeting or exceeding our SLAs with the greatest stability and the most
performance. |
|
(iii) Operational scalability sufficient to handle
existing registry database and projected growth; DNS queries including
peak periods and projected growth; DDoS attacks, viruses, worms and spam;
and restart capabilities.
|
Operational scalability is the ability to service increasing registry workloads
and the ability to ensure the system can be extended to handle the anticipated
and unanticipated demands as needs grows. To ensure a high level of service,
VeriSign uses various Quality of Service (QoS) technologies at every layer of
the .net registry.
VeriSign Advantages:
+ Proven and reliable provisioning and resolution systems have met extraordinary
scalability demands over many years
+ Experienced staff provides security and reliability through proven ability to
deflect continual abuse, inadvertent misuse of services, and DDoS attacks
+ Rapid problem resolution and root cause analysis to continually improve
processes and operations
This section presents VeriSign's approach to providing operational scalability,
including:
* Scalability sufficient to handle the projected growth of the .net registry
* DNS query capacity, including peak periods and projected growth
* DDoS attacks, viruses, worms, and spam
* Restart capabilities
Operational Scalability Sufficient to Handle Existing Registry Database and
Projected Growth
Extraordinary and unpredictable demands due to aggressive business growth or
even malicious DoS attacks are a fact of life on the Internet. We are able to
predictably deliver uninterrupted service even when demand exceeds historic peak
volumes.
VeriSign registries have maintained the operational capacity and scalability to
support domain name registrations as the demand has grown by orders of
magnitude. Initial registration rates have grown from 400 new registrations per
month in 1993 to a current rate of 1 million new registrations per month. Since
the introduction of the SRS in 1999, transaction workloads have grown by a
factor of 300, with a peak daily volume of more than 225 million transactions,
while the average transaction response time has been reduced to less than
one-twentieth of its original value. The number of registrars using the SRS to
conduct business has also grown dramatically, as shown in Figure 5(b)iii-1.
Transaction Volumes. In 2001, transaction volumes rose dramatically as
registrars began competing for deleted names, running large arrays of systems
capable of consuming 100 percent of the available .net registry's resources.
Rates of 3 million registration attempts per hour for .net are not uncommon.
Figure 5(b)iii-2 shows the historical daily average and peak SRS transactions by
quarter. The increase in 2002 was largely due to the competition for deleted
domain names. VeriSign worked with registrars and modified our operational
procedures and technical systems to greatly improve transaction efficiency for
registrars and the registry. A registry operator must have significant
scalability expertise to support this type of unpredictable behavior.
Dataset Increase. Figure 5(b)iii-3 shows the number of domain names and active
domains registered in the SRS database. This growth in registered domains was
accompanied by a rapid corresponding growth in transaction volumes. Based on
forecasts from historical data, projections of average and peak transaction
volumes are shown in Figure 5(b)iii-4. Figure 5(b)iii-5 shows the anticipated
growth of registered domains.
Scalability of VeriSign's Provisioning System
The provisioning architecture that was designed in 1999 continues to satisfy
today's demands. In addition to the Network Layer, the SRS is a three tiered
system to manage the existing load and provide sufficient scalability to address
projected growth. These tiers (see Figure 5(b)iii-6) are the Gateway Tier,
Application/Business Logic Tier, and Database Tier.
Network Layer. Our SRS network infrastructure provides bandwidth in excess of 1
Gigabit per second from multiple network providers. Additional bandwidth is
provisioned should the normal workload approach 50 percent. This provides highly
available, fault-tolerant network connectivity. The SRS is tuned such that each
registrar receives an equivalent slice of bandwidth. This is done through the
use of QoS equipment. This equipment also allows us to deflect DDoS activity,
throttle aberrant behavior, and regulate workloads to preserve system response
times.
Gateway Tier. The Gateway Tier manages registrar connections into the SRS by
stripping off the secured socket layer (SSL) encryption and forwarding commands
to the Application Layer. The Gateway Tier was designed to protect the business
logic from potential compromise by placing the registrar connection management
outside an additional firewall layer of network security and placing the
application layer inside the firewall.
Business Logic Tier. The Business Logic/Application Tier was designed to manage
the business logic of the SRS. To support both short- and long-term growth, the
Business Logic/Application Tier, along with the Gateway Tier, was designed to
scale proportionally with the needs of the registrar community. Scaling is
achieved using horizontal scaling as described below.
As the demands on the system increase, additional hardware can be safely added
to the SRS in the form of sets to accommodate the increased growth. Each set of
hardware consists of multiple gateway servers, which connect to an application
server. Adding a new set of hardware can provide an increased number of
connections into the SRS, while avoiding any impact on the existing sets. As
demand increases, scaling in the SRS can be achieved with a simple and
transparent increase in the number of sets in the system. This methodology
ensures our ability to scale and simultaneously addresses the need to maintain
high levels performance.
Database Tier. This tier is dedicated to managing the most essential asset of
the SRS -the data and historical transactions associated with all the .net
domain names. The Database Tier is designed to be scalable for both storage for
the domain name base and compute power. The current .net database is served on
the same system as that of .com. This system manages over 38 million domain
names and is capable of scaling, at its current design, to handle 250 million
domain names. The design point for the primary database was set in 2000 during
the migration to the current IBM systems, 5 years ahead of its need. We are
currently prototyping the next generation database that is being designed to
meet the .net registry needs for the next 5 years.
Capacity Planning
VeriSign has invested considerable effort and resources in delivering system
capacity to meet forecasted demands. We collect extensive amounts of data on
system performance. Dedicated operations staff analyzes this data for trends,
compare the results to historical data, and take proactive steps to ensure the
SRS scales and performs as expected.
Design Principles
VeriSign uses both vertical and horizontal scaling solutions to provide a highly
scalable application. Our systems are designed, implemented, and operated with
operational scalability as a key focus, as demonstrated by the following design
principles:
* Design systems for loads greater than peak by using peak as a starting point.
VeriSign's systems are designed to perform consistently and predictably through
anticipation of loads even greater than known peaks.
* Design for scalability to promote the growth of the .net TLD. We use the data
collected from capacity planning efforts, and incorporate design metrics that
ensure our ability to address future needs.
* Regulates system workloads to prevent crisis situations by using various QoS
technologies at every layer of the .net registry to regulate workload.
* Constantly monitor systems and QoS to protect registrars and Internet users
whose business would suffer if registry performance deteriorated.
Pool System
In response to the demand for recently deleted domain names, VeriSign designed
and deployed a pool system that ensures equivalent access and maintains
consistent system performance for the batch pool and non-batch pool activity.
Traditional registration activity, which accounts for over 95 percent of the
registration activity, is protected from what could otherwise be a daily
degradation of services.
Scalability Sufficient to Handle DNS Queries Including Peak Periods and
Projected Growth
Resolution scalability is ensuring that access to the DNS lookup service is
maintained under the most severe conditions. Using our ATLAS technology and
significant network capacity, we have deployed a highly scaleable, diverse
solution that meets our current and future DNS resolution needs.
Current Peak Periods and Projected Growth
Between March 2001 and October 2004, DNS query rates for .com and .net grew from
just over 3 billion queries per day on average to over 14 billion queries per
day, with peaks of 19 billion. This is a growth rate of over 350 percent (Figure
5(b)iii-7). Although .net comprises approximately 15 percent of our .com and
.net domain name registrations, .net is responsible for approximately 30 percent
of the DNS queries. The peak for .net queries routinely exceeds 60,000 queries
per second. During a DDoS attack, this query rate can easily exceed many
multiples of the routine peaks.
Scalability of DNS Resolution
DNS resolution is the most critical component to maintain Internet stability.
Poor resolution can cripple Internet traffic and negatively impact a vast number
of daily business transactions. During a DDoS attack, the query rate may be
several times the typical peak. We place the highest priority on delivering 100
percent availability and we work diligently to maintain the capacity to support
growth without sacrificing performance. VeriSign's patented ATLAS technology
ensures that we can scale to achieve this lofty objective.
By 2011, we project that DNS query rates for .com and .net will grow to over 120
billion queries per day (Figure 5(b)iii-8). Given historical growth rates of
Internet hosts and forecasted growth of new devices, .net is projected to
account for approximately 40 billion queries per day, which will require a DNS
capacity of 800 billion queries per day. The capacity of our current ATLAS
generation supports loads exceeding 400 billion daily queries, with planning
efforts underway to ensure the next generations of VeriSign's DNS solutions can
handle millions of queries per second well before 2010.
Network Capacity and Segregation
To accommodate ever-increasing DNS traffic, every site in our DNS resolution
constellation is provisioned with at least one Gigabit of network bandwidth with
redundancy and fault tolerance facilitated through multiple network providers.
Each site is located at a major telecommunications peering point in which
various large ISPs have established hubs for management of their portion of the
Internet backbone. In 2005, we are scheduled to expand our constellation
footprint with an additional site in Asia and plans for future sites in South
America.
Capacity Planning. Extraordinary and unpredictable demands, in the form of
legitimate commerce or malicious DoS attacks, are a fact of life on the
Internet. Our centralized DNS data store, fed by statistics relayed by the
various monitoring applications located at each site, currently houses daily
statistics dating to the year 2000. Data is collected every 4 seconds and stored
with an extremely fine granularity at the IP, TCP/UDP, and DNS levels. The
database and related tools allow for on-the-fly report and graph generation. Our
technical staff uses these tools to analyze any combination of statistics for
any specified time period and compare new trends against historical data. These
records also feed our detailed uptime and traffic growth analysis and reporting
efforts.
Our continuous monitoring and planning initiatives enable us to trend demand
versus usage and legitimate versus malicious activity. These valuable statistics
help build the right scenarios to test our ability to maintain this "always on"
capability. Stress testing, load testing, and performance testing are emphasized
to deliver a robust and stable system. Our current DNS solution has been
designed and tested to scale well beyond 400 billion queries per day, with peaks
of 2.3 million queries per second.
DDoS attacks, viruses, worms, and spam
The experiences we have garnered over the years have led us to realize that
existing guidelines for DNS operators are not adequate for critical TLDs that
are continually under attack (both intentional and unintentional). The Root Name
Server Operational Requirements (Request for Comment [RFC] 2870) provides
guidelines for root zone management that also serve as a guide for the operation
of other major zones such as the .net TLD.
Section 2.3 of RFC 2870 states: "At any time, each server MUST be able to handle
a load of requests for root data, which is three times the measured peak of such
requests on the most loaded server in then current normal conditions."
Experience in managing the resolution of .net and other major TLDs led VeriSign
to implement a capacity requirement much larger than recommended in RFC 2870.
This includes availability during DDoS attacks, increases in network traffic due
to viruses and worms, spikes that occur as a result of occasional third-party
DNS configuration errors, and spam. VeriSign's position on managing these types
of events is a simple one: to be able to subdue an attack on the Internet, one
must first be able to withstand it. If the event can be withstood, it can then
be analyzed, allowing for the necessary countermeasures to be implemented. This
approach has served us well and ensures our ability to withstand extreme
increases in query load.
VeriSign's DNS constellation has experienced extreme fluctuations in DNS traffic
as a result of DDoS attacks, viruses, worms, and spam; however, some of the most
significant spikes have come from unintentional misconfigurations of DNS entries
by various individuals and corporations. Due to our vigilant monitoring and
practice of over-provisioning resolution capacity, none of these events has
impacted the availability of DNS services for .net, even though some of these
events have caused peak load to increase by more than 200,000 queries per
second. As Internet usage continues to rise, these events are certain to
increase in frequency and intensity. Specific examples of these incidents are
extremely sensitive. Due to security concerns, these are not publicly disclosed
One of the most effective tools for resolving end user configuration errors has
been the rapid update of DNS information. Because 95 percent of .net updates are
available on our global constellation within 3 minutes, correcting a
configuration error can be quick and straightforward.
Restart Capabilities
Restart capabilities allow the registry operator to rapidly recover from any
catastrophic event and ensure minimal interruption to the Internet community
worldwide. These capabilities are addressed for the provisioning systems and DNS
resolution services.
Provisioning System Restart Capabilities
The redundancy built into SRS architecture allows for very flexible restart
capabilities. This is demonstrated by the fact that VeriSign operated the .net
registry at an exceptional reliability exceeding 99.98 percent over the past 3
years. In addition, VeriSign provides the ability to rapidly recover the .net
registration systems from events that can disrupt registry services. This is
accomplished by maintaining a fully operational alternate primary site that is
built as an exact copy of the primary.
Under most circumstances, restoring service at the alternate primary data can be
accomplished within 30 minutes from the time the event is identified. Since
registry transactions flow through the alternate primary data center in
real-time as part of normal business, connectivity to the facility is ensured.
Procedures for Testing Restart. Well-developed and thoroughly documented
procedures are vital for restarting .net registration services. The procedures
must be rehearsed regularly to minimize outage time and maintain data integrity.
VeriSign periodically tests failure scenarios and the procedures for restart. We
maintain a staging environment that provides the opportunity to practice
database and system restart without impacting production operations.
During production maintenance periods, we also take the opportunity to test
high-availability database failovers. These procedures include database failure
scenarios and test our ability to restart production databases and application
servers within minimal time standards.
DNS Restart Capabilities
VeriSign's DNS constellation has operated at 100 percent availability since
1997. Still, the possibility of a catastrophic failure remains a reality. The
ability to restart a DNS service after such an event is complicated by the fact
that any restarted service will experience load spikes as clients turn to it for
service. The ATLAS architecture includes two attributes that mitigate this effect.
The first attribute is an approach to congestion avoidance that answers the
maximum number of requests within reasonable timeouts to avoid excessive
queuing. In the ATLAS architecture, this congestion avoidance is implemented by
the protocol engine (PE), which allows the lookup engine (LUE) to continue to
perform at maximum rates. The standard testing procedures for ATLAS resolution
sites include saturating the network with queries.
The second attribute of ATLAS that helps it withstand extreme loads is simply a
very large capacity. Having enormous capacity at individual sites lessens the
burden and coordination required during restart, and minimizes the critical
overload period.
VeriSign maintains a minimum of three additional sites, known as hot standby
sites. These sites are activated when any of the resolution sites are taken
offline for routine maintenance, or if there were an unplanned loss of service.
As a result of these regular maintenance activities, we have extensive
experience restarting a DNS site. This operation is transparent to end users. If
the need occurs, we can switch operations from a failed DNS resolution site to
any of our three hot-standby sites within an hour.
Conclusion
VeriSign has built proven and reliable DNS and SRS systems that have exceeded
every operational scalability challenge over the past decade. Our systems
provide the operational scalability sufficient to handle the existing registry
database and projected growth. Our provisioning system has successfully
supported extraordinary and unprecedented demands through meticulous capacity
planning while using the scalable design of the SRS and database.
Our resolution forecasting, capacity planning and early deployment of scalable
systems have ensured exceptional DNS responsiveness with 100 percent
availability. The resolution infrastructure currently deployed supports a
request load at least 20 times higher than peak loads, and capacity is
continually being increased which mitigates risks from DDoS attacks, viruses,
works, and spam.
The redundancy built into the SRS and DNS architecture of VeriSign's .net
registry and the procedures rehearsed regularly allow for very flexible restart
capabilities. |
|
(iv) Describe the registry-registrar model and
protocol; availability of a shared registration system, including
processing times for standard queries (add, modify, delete); and duration
of any planned or unplanned outages.
|
The Registry-Registrar Model and Protocol describe the operational structure
used for registry-registrar interaction and the communications language used
between them. The structure and language are important because they define and
constrain the features and functions available to both registries and
registrars. A poor model or a poor protocol can place needless restrictions on
the features available to Internet users and the business models available to
registries and registrars. Additionally, the various models and protocols have
privacy implications for the holders of domain names.
VeriSign Advantages:
+ Authorship of Registry-Registrar protocols
+ Experience operating both RRP and EPP
+ Offer the lowest risk, highest performance, and most stable registry-registrar
model and protocol for .net
This section describes VeriSign's .net registry-registrar model and protocol
proposal:
* Registry-Registrar Model and Protocol. VeriSign will continue to support the
existing "thin" registry-registrar model using the Registry-Registrar Protocol
(RRP) documented in RFC 2832. We will deploy the Extensible Provisioning
Protocol (EPP) in 2005 as part of a transition plan that has been coordinated
with the registrar community to minimize stability risks. We will also explore
the possibility of a separate transition from the "thin" registry-registrar
model to the "thick" registry-registrar model in cooperation with the ICANN
community.
* Availability of a Shared Registration System. VeriSign consistently exceeds
our SRS requirements for SLA availability. We will continue to provide the most
stable and reliable SRS in the domain name industry.
* Processing Times for Standard Queries (add, modify, delete). VeriSign's SRS
consistently delivers the best performance. Our monthly average response time of
15 milliseconds for add, modify, and delete transactions and 10 milliseconds for
query transactions is more than 10 times faster than other registry operators.
* Duration of Any Planned or Unplanned Outages. VeriSign consistently meets and
exceeds our SLA requirements, providing a stable, reliable SRS for our registrar
customers.
Registry-Registrar Model and Protocol
In 1997, the United States Department of Commerce (DoC) issued an RFC on DNS
administration. The RFC solicited public input on issues relating to the overall
framework of the DNS administration, the creation of new TLDs, policies for
domain name registrars, and trademark issues. The comments received led the DoC
to publish two papers to explore the issues associated with changing the
InterNIC model. The first paper, "A Proposal to Improve the Technical Management
of Internet Names and Addresses," and known more commonly as the "Green" paper,
was published in January 1998. Discussions followed, and the concepts presented
in the Green paper were refined into a proposal that was presented in a second
paper titled "Management of Internet Names and Addresses." This second paper,
published in June 1998, is more commonly known as the "White" paper.
Both the Green and White papers described a model for competitive registrars
working with a shared registry. When this model was put into practice, two
distinct models ("thin" and "thick") of registrant data management evolved.
A thin registry model is a data management model in which information associated
with each registered domain name is distributed between the registry and the
sponsoring registrar. The registry maintains delegation information needed to
publish the DNS zone, while the registrar maintains information describing the
registrant and other contacts (such as technical, administrative, and billing)
associated with the domain.
A thick registry model is data management model in which the registry maintains
copies of all information associated with registered domains, including
registrant and contact information. Registrars typically maintain their own
copies of registration information; therefore, registry-registrar
synchronization is required to ensure that both registry and registrar have
consistent views of the technical and social information associated with
registered domains.
Registry-Registrar Protocol: The "language" used by registries and registrars to
exchange information is known as a protocol. While different registry-registrar
communities have used different protocols over time, two protocols have been
developed and widely deployed to implement the concepts described in the Green
and White papers.
Network Solutions, Inc., (NSI) Registry-Registrar Protocol (RRP) has been used
in the management of .net since April 1999. Designed and developed by NSI (and
later VeriSign) architects, and first documented publicly as Informational RFC
2832, this protocol is tailored for use in the thin registry model currently
used for .net.
The Extensible Provisioning Protocol (EPP), designed and first developed by
Scott Hollenbeck of NSI (and later VeriSign), was first published in November
2000. EPP was designed to provide features unavailable in RRP. It addresses
requirements of additional registries, for example, by providing features to
support both thick and thin registry models. EPP also supports provisioning of
products other than domain names. EPP was adopted by the Internet Engineering
Task Force (IETF) "provreg" working group and published as a series of
Informational and Proposed Standard RFCs (3730, 3731, 3732, 3733, 3734, and
3735) in March 2004.
VeriSign's Solution: VeriSign plans to deploy EPP for use with .net in the first
half of 2005. This effort will require a transition from RRP to EPP. VeriSign's
transition plan provides the lowest risk solution for migrating from RRP to EPP
by allowing registrars to operate both protocols in parallel until both registry
and registrar implementations of EPP can be confirmed to be completely
operational. The VeriSign transition plan has been developed in conjunction with
the registrar community, ensuring that acceptance and adoption will proceed with
minimal risk.
Both RRP and EPP have been extended to add support for functions not described
in the core protocol specifications. VeriSign's implementation plan includes
support for current RRP extensions used by .net (including migration of those
extensions to an EPP implementation), while gradually adding support for EPP
extensions to support new features, such as those defined for ICANN's redemption
Grace Period and DNS security.
VeriSign proposes to continue parallel support for RRP, while deploying and
testing a fully RFC-conformant implementation of EPP. Continued use of RRP,
while gradually transitioning to EPP, provides several advantages that help
preserve the stability of the .net gTLD:
* Registrars have gained significant operational and business logic experience
with VeriSign's implementation of RRP. Initial use of RRP, with the existing
VeriSign infrastructure, ensures that transition risk to another registry
operator, with a newly developed implementation of RRP, is absolutely eliminated.
* VeriSign has been able to optimize its RRP implementation to the point of
being able to support peaks of more than 183 million transactions per day, and a
sustained daily average of more than 170 million transactions per day for a
month, and up to 300,000 transactions per minute. No other registry operator has
demonstrated similar processing capabilities with RRP.
* RRP is stable. Parallel use of RRP will support registry operations that have
no timetable dependencies on the migration to EPP.
* Both registries and registrars are still gaining experience with EPP, which
became a proposed standard in March 2004. Initial implementations are likely to
have defects, and performance limitations that can only be discovered and
corrected over time.
VeriSign will use its existing RRP implementation, designed and developed by the
original authors of the protocol, as a springboard to a full migration to EPP.
Our RRP implementation has been refined, optimized, and tailored to ICANN
processes over the course of 5 years of high-pressure, high-volume registry
operations. RRP provides registrars with levels of real-time service and
performance unmatched in the domain name industry. Satisfactory levels of
stability, reliability, and error-free performance are guaranteed with this
implementation of RRP.
VeriSign has been the driving force behind the specification of EPP and has been
developing implementations of EPP for several years. Our first implementation
was developed in 2001, based on Internet-draft EPP specifications for the
provisioning of Internet keywords. A second implementation was developed and
deployed in 2001 to support operations of the .name gTLD registry. That
implementation has been refined and updated over time and is still in use today.
Finally, VeriSign has developed and deployed an RFC-conforming version of EPP
for the provisioning and management of domain names in the .cc, .tv, and .bz
country code top-level domains (ccTLDs). An active project is in place to deploy
EPP for use in the provisioning and management of domain names in the .com and
.net gTLDs in 2005.
Our early implementations of EPP have already been subjected to several rounds
of development and formal QA testing. With protocol stability and transition
costs being a significant concern in the ongoing management of .net, VeriSign
will provide registrars with extensive "hands-on" support for several months to
facilitate the transition from RRP to EPP. During the transition period,
registrars will be able to access consistent back-end registry data using both
RRP and EPP. Free client software development kits (SDKs) that interoperate with
other EPP servers will be available in multiple programming languages to
minimize the amount of software development required of registrars. An isolated
operational test and evaluation (OT&E) environment will be available for
registrar testing before "live" operations, allowing registrars to develop and
test their software systems with no risk to "live" data or systems. Issues,
defects, and errors found during OT&E testing will be corrected rapidly and
deployed in both the OT&E and "live" environments, ensuring that both
environments use the exact same software code bases, and that transition will
require little more than minor changes to client configuration parameters. A
detailed description of the transition plan for retiring RRP and deploying EPP
is provided in Section 8-9.
The migration solution proposed by VeriSign is one that has been developed in
coordination with the registrar community as part of the migration to EPP for
the .com gTLD. Unlike the earlier transition of RRP to EPP that was proposed and
implemented by PIR/Afilias as part of the .org migration, our solution allows
registrars to update and deploy their systems using registry-provided tools and
services on a schedule that works for them. Our transition effort will be
conducted as a partnership between VeriSign and its registrar customers.
Availability of Shared Registration System
The availability of the SRS directly impacts the ability of the registrar
community to operate its businesses efficiently. SRS availability is affected by
multiple factors, including unplanned outages and planned outages for scheduled
maintenance or system upgrades.
The registry must also maintain SRS availability and performance, while
providing registrars with equivalent access and supporting the large transaction
volume without performance degradation. VeriSign currently sets a minimum of 15
connections for each registrar with 3.5 Gigabits per second of bandwidth equally
divided among all registrars.
The principal indicator of VeriSign's QoS approach is historical availability of
the SRS, which has been consistently above 99.9 percent over each of the past 7
years and has averaged 99.98 percent since January 2002. This level of ensured
availability is of the utmost importance to registrars because unexpected or
extended outages create operational and business challenges.
Table 5(b)iv-1 compares published gTLD availability statistics; the information
summarized is reported monthly by each gTLD and is posted on the world wide web
at http://www.icann.org/tlds/monthly-reports/. Note that .info and .org do not
report planned outages exceeding the Planned Outages SLA as unplanned outages.
Table 5(b)iv-1 shows that during comparable periods, VeriSign consistently
delivers the highest planned availability for .net that is well above the .net
SLA, while other registries frequently miss their SLA. The overall impact to
registrar business is not obvious in the gross percentages; however, more
detailed analysis highlights the registrar impact from the average duration of
outages and chronic excess time required for planned outages.
Processing Times for Standard Queries
VeriSign delivers exceptional processing times for SRS queries. VeriSign's
performance and testing on the effects of load on the registry system are
discussed in detail in Section 5(b)v. VeriSign will commit to improved SLA
response times of 50 ms for add operations and 100ms for delete and modify
operations for 95 percent of all transactions.
Actual response times and the standard deviation of those times vary greatly
among the TLDs operated by different registry providers. The exceptionally rapid
response times for .net consistently exceed those of other registries.
Inconsistency in response times of other registries is indicated by the large
standard deviation. VeriSign's load and stress testing has shown that a large
response time deviation is an indicator of inability to scale under increasing
transaction loads. This appears to be the case in other registries where
increasing response times are closely correlated to increasing transaction
volumes. The standard deviations reported reflect only deviation of monthly
averages, since more granular information is not reported. Average times are not
reported for .biz. This information is published monthly on the world wide web
at http://www.icann.org/tlds/monthly-reports/.
* .net (VeriSign):
- 15 ms average add/write response time
- 6.6 ms standard deviation add/write response time
- 10ms check/query response time
- 0 ms standard deviation check/query response time
* .com (VeriSign):
- 15 ms average add/write response time
- 6.6 ms standard deviation add/write response time
- 10 ms check/query response time
- 0 ms standard deviation check/query response time
* .biz (NeuLevel):
- 99.7 percent within 3 sec (3000 ms) average add/write response time
- unknown standard deviation add/write response time
- 99.9 percent within 1.5 sec (1500 ms) check/query response time
- unknown standard deviation check/query response time
* .info (Afilias):
- 823 ms average add/write response time
- 552 ms standard deviation add/write response time
- 212 ms check/query response time
- 115 ms standard deviation check/query response time.
* .org (PIR/Afilias):
- 1527 ms average add/write response time
- 539 ms standard deviation add/write response time
- 168 ms check/query response time
- 140 ms standard deviation check/query response time.
Duration of Planned or Unplanned Outages
Scheduled maintenance is required for a registry to maintain predictable,
reliable service. Scheduled maintenance is announced at least 30 days in advance
of each planned outage. We also remind registrars of scheduled maintenance at 7
days, 48 hours, and a final announcement the day of maintenance. These planned
outages are scheduled during 0100 to 0900 GMT on Sunday to minimize the impact
on registrar operations. Planned outages will not exceed 45 minutes per month,
with a target SLA of 99.99 percent availability.
Occasionally, system maintenance tasks, such as a major database upgrade, cannot
be performed within the 45 minute maintenance window. Once per year, VeriSign
may incur a 4 hour planned outage. No more than once every 3 years, VeriSign can
incur one extended planned outages of up to 8 hours in duration. Extended
planned outages represent total allowed planned outages for the month. Extended
outages have been required for the .net registry only twice in the past 4 years.
Before a planned outage, the maintenance on the .net registry is staged and
rehearsed in an operations staging area to minimize the risk of a problem with
the system or a problem with the time require to complete the maintenance. Table
5(b)iv-1 shows the results of this practice with VeriSign never exceeding the
planned outage SLA. The planned outages for the .net registry have always been
within the SLA terms and never exceeded the time announced for scheduled
maintenance.
As shown in Table 5(b)iv-1, the frequency and duration of VeriSign's unplanned
outages are the lowest in the industry. Outages can be caused by any number of
factors; our experience has shown that unplanned outages are not caused by a
single event, but most occur from a combination of events that include unique
failures of hardware and/or support systems. This reinforces the need for
in-depth monitoring and immediate access to experienced staff for
troubleshooting as well as access to third-party support staff to quickly
restore systems, regardless of external dependencies. The .net registry has not
had an unplanned outage that caused any data corruption or was caused by a
problem with the registry system itself. The architectural redundancies of the
.net registry have prevented a number of incidents, such as a routine hardware
failure or loss of electrical power, from becoming outages. These are often
resolved without any impact noticeable to registrars.
Conclusion
VeriSign offers the lowest risk, highest performance, and most stable
registry-registrar model and protocol for .net. Our engineers are recognized
industry leaders in the design, development, and deployment of these
technologies. Our demonstrated excellence in operating .net stands in sharp
contrast to the documented performance of our competitors. We are committed to
maintaining the same levels of stable performance that we have demonstrated to
date, and we are equally committed to exceeding our historic performance levels
in the future. |
|
(v) Database capabilities including database software,
size, throughput, scalability, procedures for object creation, editing,
and deletion, change notifications, registrar transfer procedures, grace
period implementation , availability of system with respect to unplanned
outage time, response time performance; ability to handle current volumes
and expected growth and reporting capabilities.
|
The primary online transaction processing (OLTP) database is the most critical
element of the .net registry provisioning function. To support .net at current
levels, the registry operator must demonstrate the ability to support over 5
million domains and transaction peaks of 100,000 operations per minute without
degraded performance or increased risk of data corruption.
VeriSign Advantages:
+ Unparalleled, proven reliability with 100 percent data integrity and low downtime
+ High performance platform designed to process registry functions
+ No Data Loss architecture complete with fully redundant, alternate primary site
+ Incremental zone propagation hosted on award-winning ATLAS platform
The Oracle OLTP database used by VeriSign (currently version 9i) has delivered
years of scalability, performance, and security to .net registrars and Internet
users. Our long-term relationship with Oracle has resulted in over 13 years of
experience managing the current .net database with a record of 100 percent data
integrity, no data loss, and unparalleled uptime.
Our .net registry database architecture is scalable to over 250 million
registrations while still operating within our SLAs and providing robust
business continuity capabilities. This description includes a description of the
following database capabilities:
* Database Software, Size, Throughput, Scalability. VeriSign emphasizes the
performance, throughput and scalability of the .net database in every decision
made to the .net database architecture.
* Procedures for Object Creation, Editing, and Deletion. Every modification made
to objects within the .net registry database is audited and traceable back to
the registrar, session, and transaction that initiated the modification.
* Change Notification. We have developed a notification framework that can scale
to millions of messages.
* Registrar Transfer Procedures. VeriSign is fully compliant with the most
recent transfer policies intended to protect the interests of registrants and
ensure the stability of the .net domain.
* Grace Period Implementation. Verisign uses an extensible grace period
framework that supports an array of grace period rules and permits rule changes
to occur in real-time and without system outages.
* Availability with Respect to Unplanned Outage Time. VeriSign's .net database
system maintains multiple live copies of the OLTP database both onsite and
offsite to provide data recovery without data loss from a complete failure of
all primary data systems.
* Response Time Performance. VeriSign's .net database processes check
transactions in less than 10 milliseconds and add transactions in less than 20
milliseconds, while processing transaction volumes that vary from 30 million to
over 200 million per day.
* Ability to Handle Current Volumes and Expected Growth. The .net registry
database architecture has been engineered to handle enormous growth leveraging
Oracle's partitioning features.
* Reporting Capabilities. VeriSign maintains a real-time copy of our databases,
which is used strictly for reporting purposes; this prevents report processing
demands from degrading the performance of the registry database itself.
Database Software, Size, Throughput, and Scalability
This section discusses the primary databases required for the .net TLD,
including the registry, resolution, and Whois databases.
Registry Database
The .net registry database uses Oracle technology and is designed and tested to
host a minimum of 250 million registrations, while operating within strict
performance levels. The database provides robust business continuity
capabilities that exceed projected capacity requirements over the next 6 years.
It has a proven history of handling more than 200,000 transactions per minute on
a continuous basis, with peaks of more than 300,000 per minute and more than 4
billion per month. In handling this massive transaction volume, VeriSign's .net
registry consistently exceeds the best SLAs in the industry; typically
processing check commands in an average of less than 10 milliseconds and add
commands in less than 20 milliseconds. Figure 5(b)v-1 illustrates the average
daily transaction volumes, and the average daily peaks for the period from Q2
2000 through Q3 2004.
The reliability and scalability of our database architecture are based on the
application of a diverse set of hardware and software systems. The technologies
listed below are a subset of the technologies used to run the .net registry and
are considered the best of breed in the industry:
* IBM High Availability Cluster Management Protocol (HACMP) is a dual-node
cluster of redundant database servers that host the registry database. If one
server of the cluster fails, the architecture automatically fails to a second
server so that database processing can continue with minimal disruption.
* EMC Symmetrix Remote Data Facility (SRDF) synchronously mirrors the entire
primary database in real-time to a remote EMC storage device, while EMC
Timefinder software is used to make frequent, identical copies of the database
for backup, reporting, and other purposes.
* Oracle Partitioning segments large volumes of data grouped by month and year
so that database responsiveness and performance can remain stable as size increases.
The .net registry database leverages these technologies to create a scalable,
fault tolerant database architecture where copies of the database are replicated
and stored in different locations, as shown in Figure 5(b)v-2. VeriSign enjoys
preferred-vendor relationships with IBM and Oracle and has a technical
representative from EMC working onsite.
The .net database is replicated within the primary data center facility, in
addition to being replicated to a geographically diverse alternate primary site.
EMC SRDF software provides real-time replication of the .net registration
database to the alternate primary site. In case of site failure, the full .net
registry data is available in near real-time at the alternate primary site. The
alternate primary site is designed to replicate the primary site database
systems and uses identical servers and storage, so that registry activities can
continue with no performance or functionality degradation. VeriSign frequently
rehearses the failover procedure to ensure that the site is fully operable in
the event of a disaster.
Multiple copies of the database significantly reduce the possibility of data
loss or corruption and allow for rapid recovery of services in the event of a
disaster. Tape backups and transaction logs are generated and stored in the
unlikely event that they are needed. Refer to Sections 5(b)x and xi for
information on backup and data escrow processes and Section 5(b)xviii for more
information.
Resolution Database
Our zone generation process, described in detail in Section 5(b)vii, begins with
a change of .net zone information in the SRS database. VeriSign's award-winning
and innovative ATLAS system then extracts zone information and changes from the
database. Incremental zone file updates are extracted from the database within 1
minute, then validated and subsequently distributed via a secure virtual private
network (VPN) to another set of distribution servers at each site in the
VeriSign DNS constellation, which is described in detail in Section 5(b)xviii.
The zone updates are then loaded into multiple in-memory databases for
publication, making the new information available for DNS query resolution. In
addition, a full copy of the .net zone file is extracted twice daily to provide
a recent full copy of the .net zone.
Whois Database
The VeriSign Whois system, described in Section 5(b)xii, offers domain name,
name server, IP address, and registrar search capabilities. Redundant Whois
databases at each data center provide dependable Whois service. Each Whois
database is based on a duplicate, point-in-time copy of the primary .net OLTP
database servers.
Procedures for Object Creation, Editing, and Deletion
Verisign maintains stringent procedures for the creation, editing, and deletion
of any object within the .net registry database. Modification to objects can be
traced to the appropriate registrar using transaction and session level auditing
within the application. Verisign maintains auditing fields within every object
that captures when it was last modified and by whom. Pre- and post-copies of
each object are captured when a transaction has completed. All objects within
the database adhere to these procedures including domains, name servers, IP
addresses, both RRP and EPP statuses, and their associated links. All historical
data, dating back to 1999, is archived to a separate Oracle database with the
full transaction and object modification history.
This tracking is particularly important for financial transactions, and the
integration of the .net registry with the VeriSign Registrars Billing and
Collection System (RBACS), which is detailed in Section 5(b)ix.
Change Notification
Registrar interaction with the .net registry database occurs either through an
Application Program Interface (API) or through a web-based tool. Change
notifications for registrars using the Registry Registrar Protocol (RRP) are
delivered by email. With the introduction of VeriSign's implementation of EPP,
notification of registry changes occur through poll queue messages to support
increased automation of registrar operations. Our poll queue implementation uses
Oracle Advanced Queuing and can scale to thousands of registrars and millions of
messages. Refer to Section 5(b)iv for more information on the benefits of
registry interaction through EPP.
Registrar Transfer Procedures
VeriSign is fully compliant with the Inter-Registrar Transfer Policy that
protect the interests of registrants and enhance the stability of the .net domain.
The following process outlines domain name transfers between registrars:
* The requesting registrar processes a transfer request on behalf of the
domain's registrant.
* The transfer request includes a minimum of a 1-year extension. Our EPP
implementation, described in Section 5(b)iv, allows registrar to specify
multi-year extensions with transfer requests.
* The registrar of record has 5 days to acknowledge or reject the transfer
request through the SRS.
* If acknowledged, the transfer is immediately executed.
* If rejected, the transfer is cancelled, and transfer fees are credited to the
requesting registrar.
* If the registrar of record does not respond within 5 days, the transfer is
automatically processed by the system and the transfer is completed.
In the event of the default, acquisition, or closure of a registrar, the .net
database has a bulk transfer capability to transfer domain registrations to
another registrar either en masse or according to a specific list of domain
registrations, per ICANN's specification.
Grace Period Implementation
Grace periods, pending periods, and overlapping periods comprise a complex set
of rules with a large number of corner cases and what-if scenarios. VeriSign has
implemented and refined the specific rules covering these cases and maintains a
library with tens of thousands of test cases. These are critical to adequately
validate proper functionality is maintained through every system update and
ensure registrars are correctly credited for any transactions that fall within
the grace period rules.
In the current .net database, grace periods are configurable. In the event that
changes to the current grace period implementations are considered, VeriSign is
prepared to cooperate fully with ICANN and the Internet community with regard to
proposed changes.
A detailed description of system behavior for each type of grace period is
available in VeriSign's Registrar Reference Manual, which is available on
request or through our website. The following paragraphs contain brief
descriptions of the various grace periods:
Add Grace Period. This is a specified number of calendar days, currently 5 days,
following the initial registration of a domain. The system ensures that if a
domain is deleted or explicitly renewed during this grace period, then the
appropriate business logic is executed.
Explicit Renew Grace Period. This is a specified number of calendar days,
currently 5 days, following the explicit renewal of a domain. The system ensures
that if a domain name is Transferred, Deleted, or Explicitly Renewed during this
grace period, the appropriate business logic is executed.
The Auto Renew Grace Period. This is a specified number of calendar days,
currently 45 days, following the auto renewal of a domain. The system ensures
that if a domain is Transferred, Deleted, or Explicitly Renewed during this
grace period, the appropriate business logic is executed.
Transfer Grace Period. This is a specified number of calendar days, currently 5
days, following the transfer of a domain. The system ensures that if domain is
Transferred, Deleted or Explicitly Renewed during this grace period the
appropriate business logic is executed.
Redemption Grace Period. This is a specified number of calendar days, currently
30 days, following the deletion of a domain name that is outside the Add Grace
Period. The system permits a registrar to restore a previously deleted domain name.
System Availability with Respect to Unplanned Outage Time
VeriSign's .net database systems can fully recover from a complete failure of
all its primary data systems to an alternate data center site without data loss.
The reporting database is an archive database of all registry transactions since
the very first day of registry operations. It can be updated from the primary
database without downtime and can respond to queries 24x7x365. As with all
complex hardware and software systems, every risk and contingency must be
considered and accounted for in prevention and recovery processes. For the
volumes and load that the .net registry database must support, VeriSign believes
that recoverability starts with the design of the database and continues into
the hardware architecture, software architecture. Our investment in the .net
database architecture supports VeriSign's commitment to system availability of
99.99 percent. Refer to Section 5(b)xviii for detailed description of these
recovery procedures.
Response Time Performance
The .net OLTP database must provide consistently rapid responses. Historically,
most registry transactions are commands to check domain availability and to
register domains. The .net registry will continue to process check commands in
less than 25 milliseconds seconds, add commands in less than 50 milliseconds,
and modify and delete commands in less than 100 milliseconds. Refer to Table
5(b)v-1 for a comparison of historical performance and proposed service levels.
Section 5(b)xiv provides information about response time performance under peak
capacities.
Ability to Handle Current Volumes and Expected Growth
Based on thorough consideration of the following architectural design and
implementation factors, VeriSign's database has remained stable and scalable:
Capacity. The current VeriSign .net database has a demonstrated ability to
process more than 5,000 transactions per second (300,000 transactions per
minute) during peak periods. Beyond handling transactional volume, the .net
registry database is also designed with capacity ample enough to store
information associated with 100 million domain name registrations. To meet
capacity requirements dictated by overall registry activity and registrar
demand, the .net registry database must be integrated with the entire registry
architecture. Our database has been designed as a component of the entire
system, including server performance, bandwidth, network architecture, business
rule logic, batch processing, monitoring and reporting, data backup, escrow, and
disaster recovery. The system supports a high transaction volume that varies
over time, with minimal variation in response time and central processing unit
(CPU) use, as described in Section 5(b)xiv.
Scalability. VeriSign's database is designed to support sustained loads of over
200 million transactions per day with daily peaks of over 250 million
transactions. VeriSign designs its registries to meet unexpected growth demands
for every variable, including growth in the number of registrars, transaction
volumes, and database size. In addition, the .net registry database has the
capacity to scale well beyond current growth projections. The database is
horizontally scalable to support the most optimistic growth forecasts over the
next 6 years by increasing capacity of the storage arrays.
Reliability. VeriSign has operated the .net registry with unparalleled
reliability and availability with 100 percent data integrity and with no
unplanned outages due to database failures. Our .net database has a
well-established history of providing the performance, availability, and data
accuracy necessary for reliable and trusted operations. These qualities of
VeriSign's database have been achieved through: a proven and highly available
architectural design, a high degree of technical and operational database
expertise, and by maintaining a close relationship with a carefully selected
group of hardware and software vendors. These combined factors help minimize the
risk of data loss and service outages.
Reporting Capabilities
VeriSign maintains a real-time copy of our databases for reporting purposes to
prevent processing demands from our reporting process from degrading the
performance of the registry database itself. This database copy, called the
Critical Data Archive (CDA), is used by the reporting component of RBACS,
described in Section 5(b)ix, to generate a variety of daily, weekly, and monthly
reports. In compliance with equivalent access requirements, each registrar has
access to their data in identical report formats. The CDA is also used to
generate the monthly ICANN registry reports.
The various audits that VeriSign undergoes as part of our normal operations rely
heavily on these robust reporting capabilities. The full list of VeriSign
audits, their scope, and their frequency is described in Section 5(b)xiii.
Conclusion
VeriSign's experience with Oracle 9i is the foundation for the fastest, most
reliable database system deployed by any TLD registry operator. VeriSign
maintains critical database administration procedures and monitoring
capabilities to deliver the overall reliability, availability, and performance
of the various .net registry database systems.
VeriSign maintains data centers that contain identical hardware and software
systems. Fail-over is rehearsed frequently to confirm the viability of each
site. Each data center is fully functional as a primary site with no degradation
of performance. In the case of a system fault, we can fully recover from a
complete failure of all its primary data systems to a remote data center site
without data loss.
VeriSign processes transactions with the most consistent, rapid response times
of any TLD, while processing transaction volumes that range from 30 million to
over 200 million per day. These transactions rely on our proven implementation
of the complex interdependencies associated with grace periods and pending periods.
VeriSign's database architecture has remained stable and scalable because
VeriSign designed and built this powerful, innovative data store, based on
thorough consideration of capacity, scalability, reliability, security, and
equivalent access. |
|
(vi) Geographic network coverage, including physically
diverse sites and support of growing and emerging
markets.
|
The quality of service for a registry depends on global network coverage, which
is essential to maintaining the stability of .net and ensuring continuous
business operations. VeriSign has created a vast constellation of resolution
servers for .net that provides the fastest and most reliable service possible
with the highest degree of protection. VeriSign is the only registry operator
that has a proven infrastructure which can provide the stable and robust service
this critical TLD necessitates. Unlike TLDs hosted by other registry operators,
.net has not had a resolution failure in the seven years it has been under
VeriSign's stewardship.
Verisign Advantages:
+ Over 7 years of experience providing global network coverage without failure
+ Robust and diverse infrastructure with 14 sites strategically located to
support high demand and provide local redundancy
+ Time-proven tools and resources to monitor and quickly respond to
ever-changing Internet needs
+ Planned expansion already underway in support of growing and emerging markets
VeriSign has partnered with premier hosting providers in the United States,
Europe, and Asia to deliver fast, reliable DNS responses. Resolution sites are
carefully selected, then continually monitored and reevaluated against the
ever-changing Internet traffic requirements. This ensures that existing sites
are measured against regional demand and that new sites are added in the most
strategic locations available.
In the following section, we will explore the following facets of global DNS
service delivery for the .net TLD:
* Geographic Network Coverage. How DNS resolution network performance is
measured and how this impacts the selection of physical locations for .net name
servers
* Physically Diverse Sites. The evolution of VeriSign's network coverage,
detailing the 14 .net resolution server locations as they relate to network
topology and geographic traffic demands
* Support of Growing and Emerging Markets. VeriSign's future plans for
dramatically increasing the global .net footprint to meet the ever-changing
needs of growing and emerging markets
Geographic Network Coverage
Global DNS coverage is determined by the geographic distribution of name servers
with the capability for each site to provide adequate capacity. The reliability
and stability of any global TLD dictate that DNS responses are given quickly by
providing answers close to the point where a query is generated. The standard
measure for DNS performance is round trip time (RTT) measured from the time a
lookup is issued to the time at which a response is received. Current ICANN
guidelines call for DNS query RTTs of less than 300 milliseconds (as measured
from four root name server locations: US East Coast, US West Coast, Europe and
Asia). There are two major factors that affect RTTs for DNS lookups: latency and
number of network hops. Both of these factors can be managed by locating the
answer close to the point of question, from both a physical and network standpoint.
Verisign has continually sought to locate .net name servers in geographic
locations where demand is the highest. In selecting locations for the global
.net resolution footprint, VeriSign researched the prime geographic areas
through which Internet traffic is exchanged. The Internet backbone, according to
the Global Internet Geography Report issued annually by TeleGeography, Inc.
(Figures 5(b)vi-1 and 2), would logically necessitate name servers in the United
States, Europe, and Asia.
However, it is not enough to simply locate sites within these targeted
geographic areas; we must also consider the connectivity to the specific sites
relative to major Internet peering points. There is a direct relationship
between the number of hops and DNS performance for individual users. For this
reason, VeriSign carefully selects hosting partners for .net resolution sites
who provide premier facilities that offer only the closest connections to the
Internet's major backbone providers. VeriSign continually researches demand and
performance to evaluate the effectiveness of existing sites and to propose new
candidate locations for TLD resolution servers. The evolution of the specific
locations that Verisign has chosen for .net DNS service hosting is described below.
The .net global footprint by year's end in 2000 consisted of 11 name server
sites to support the resolution of DNS queries. Until this time, the .com, .net
and .org zones were hosted on the j-root name servers. Starting in early 2000,
VeriSign began a large-scale rollout, seeking to deploy dedicated name servers
for these critical TLDs at distributed sites around the globe. These 11 sites
were selected based on approximation to global peering points to minimize the
number of hops for the maximum number of users. Once deployed, these 11
locations (Figure 5(b)vi-3) collectively became known as our resolution
constellation.
As Internet traffic grew in Asia, Europe, and South America, it became evident
that additional sites were needed to support growing demand. Not only was DNS
traffic dramatically increasing as more Internet users came online, but also the
geographic dispersion of queries was widening. Through vigilant monitoring and
ongoing research, the global footprint of .net resolution servers has been
continually adjusted to meet changing demand. Four years later, the view of
geographic network support for .net looks considerably different (Figure 5(b)vi-4).
Physically Diverse Sites
Today, VeriSign provides diverse geographic coverage through 14 .net resolution
sites in eight countries, with five additional countries slated for deployment
in 2005. This evolution included the relocation of almost all the original sites
deployed in 2000, as well as additional deployments at three new locations, all
within the past four years.
Given the distribution of .net hosts and DNS query load, 60 percent of the .net
resolution sites are now located on the East and West Coasts of the United
States, and the remaining 40 percent are hosted in Europe and Asia. For maximum
coverage, growth allowance and physical redundancy, no less than three sites are
maintained in each targeted geographic area.
Hosting Site Selection
Network proximity to major peering points remains a primary factor in site
selection for resolution services. In addition, VeriSign applies several other
factors in determining the specific locations to host a resolution site. As
detailed in Section 5(b)i, Proposed Facilities and Systems, site selection of
each resolution location is an important consideration for the continued
security and functionality of the .net registry. VeriSign defines and evaluates
multiple criteria for the selection of each site, including:
* Connectivity through a minimum of two separate network providers
* Facility's environmental controls and redundancy
* Geopolitical stability and physical security
* Ability to recruit and retain skilled personnel
VeriSign has invested in both internal and third-party research to identify
prime targets for hosting DNS resolution service to ensure that .net resolution
servers are positioned in the most strategic locations to achieve the best
levels of service to the regions in which they are hosted. Each of the locations
in North America, Europe and Asia and strategy for site placement is further
explained in our more detailed response at www.verisign.com/nds.
Site Selection: A Case Study
While seemingly intuitive, it is not always a simple matter to determine the
ideal network location for a .net name server. For a TLD of this importance, it
is not enough to choose any well-connected site within a major metropolitan
area. Sites must be continually evaluated for performance to determine actual
usefulness to the geographic region in which they are located. For example, the
original .net installation in Hong Kong was one of only two sites located in all
of Asia. Daily monitoring and input from Internet users revealed round trip
times exceeding 300 milliseconds for queries originating in this region. Average
traffic was less than 60 percent of loads observed at other .net resolution
sites. This site was the worst performing and most underused .net DNS
installation. The daily snapshot shown in Table 5b(vi)-1 shows DNS traffic
growth from 2002, and is representative of normal traffic distribution across
the 13 .com and .net DNS sites, listed by highest traffic load to smallest.
Upon further investigation, VeriSign's Operations staff determined that although
this site was hosted within a premier Tier 1 data center; network connectivity
limitations resulted in increased latency for responses. In many cases, .net DNS
lookups originating in Hong Kong were resolved by servers in the United States
due to faster response times. Therefore, in 2003, this site was relocated to
Singapore, resulting in a significant improvement in both performance and load
dispersion among the Asian name servers, as shown in Table 5(b)vi-2. This is
just one example of the ongoing investment that is required to evaluate and
continually improve service for this critical domain.
Performance and Growth Evaluation
Without the proper tools, personnel and time investment, it is impossible for a
registry operator to assess the effectiveness of its geographic network coverage
or identify growth areas for better global service. VeriSign's goal is to
continually evaluate server locations for their usefulness to the regions that
they serve and to analyze traffic patterns to identify potential areas for
expansion or relocation. VeriSign has invested significant time and resources
into the development of programs to research, monitor, and forecast Internet
traffic trends. These efforts allow better insight into network trends, as well
as provide performance statistics and traffic flow data regarding the current
.net DNS constellation. Global DNS service performance and growth are evaluated
using several means:
* Engineers analyze data gathered by VeriSign's extensive monitoring platform to
identify growth trends indicated at existing resolution sites.
* Monitoring data is compared to maps generated by VeriSign Research
Laboratory's NetMapper IP traffic geolocation tool to drill in on the specific
origins of traffic (Figure 5(b)vi-5).
* Engineers confirm candidate locations using third-party analysis from CAIDA,
RIPE, and commercial research firms which track worldwide network capacity and
Internet growth trends.
Given the detailed statistics collected at the resolution sites (see Section
5(b)xvi), VeriSign Engineers are able to easily identify and analyze growth
patterns. Figure 5(b)vi-6 shows the actual growth rates in received traffic over
the past 2 years for each of the geographic regions in which VeriSign hosts .net
resolution servers.
Overall monitoring data reveals far greater growth in traffic at Asian and
European resolution sites than in the United States. This would indicate that
new sites are warranted in the geographic areas currently served by existing
servers in Asia and Europe. To identify these emerging markets, VeriSign
engineers must further investigate the origins of the traffic being received at
these sites.
The NetMapper project was developed with the goal of providing operators with a
view into the origins of DNS traffic, as shown by each of the .net name server
sites. NetMapper IP maps have served as a useful tool for tracking the
geographic origin of queries that are actually resolved at existing .net
resolution sites. Figures 5(b)vi-7 and 8 are representative of the Internet
backbone at two locations in the VeriSign network that identify particular
regions of increased activity.
The Atlanta .net name server installation (Figure 5(b)vi-9) shows significant
traffic from Brazil, Argentina, and Australia; the Singapore resolution site
(Figure 5(b)vi-10) shows increased traffic from India, China, and Australia.
These two maps are particularly interesting as they show queries from Australia
resolving at .net DNS sites as close as Singapore and as far away as the East
Coast of the United States. This is a prime example of a geographic area that
could be better served by a resolution site in-county, in theory decreasing
overall round trip times for these users. These and other NetMapper views have
helped VeriSign's engineers confirm general growth patters indicated on the
resolution constellation and identify target areas for expansion.
Support for Growing and Emerging Markets
As detailed above, VeriSign's efforts to support growing markets have resulted
in the relocation of multiple resolution sites to support the dramatic growth of
Internet traffic in Europe, Asia, and North and South America. In addition to
these sites, VeriSign has also identified five additional growing or emerging
markets which are targeted for further expansion:
* Australia
* Africa
* South/Central America
* Middle East/India
* Eastern and Central Europe.
Given these target markets, VeriSign engineers developed a scaled-down platform,
known as Regional Resolution Sites (RRS), which are sized to meet regional
demand and, therefore, are less expensive to deploy and operate. Regional
resolution sites will improve local connectivity in growing and emerging markets
and increase the visibility and granularity of Internet usage patterns. Other
benefits include:
* Better end user experience
* Improved protection against DDoS attack and other malicious attacks because of
added capacity and localization of attack traffic
* Faster DNS lookups for .com and .net.
While international connectivity and proximity to major peering points remain
considerations in the selection of new locations, these factors are often of
less importance for emerging markets. Some of the candidate locations are
remarkably lacking in significant network bandwidth; this is the very reason for
their consideration. Areas without easy access to major international peering
points are perhaps the most in need of regional service to improve response
times by decreasing the number of network hops and overall latency for DNS requests.
On the other hand, higher traffic growth areas, such as China and South America
warrant site location in the most well-connected points within the continent
given the large number of users in the region that will likely utilize the site.
A more detailed description of emerging market identification and key
considerations for each of the targeted expansion sites are detailed below.
Australia
Australia is a prime target for the deployment of a RRS in 2005. With average
traffic loads comparable to other major cities such as Beijing and Singapore,
Sydney perfectly fits the profile for the anycast solution. Sydney is also the
only major peering point within the continent (Figure 5(b)vi-11), so this is a
natural choice of location for the new installation. Site evaluation and
selection will begin in early 2005 with a deployment target of late 3Q 2005 or
early 4Q 2006.
Africa
The African continent has become an increasingly feasible expansion target, with
Internet bandwidth growth rates of 71 percent in 2003 and 41 percent in 2004.
Bandwidth costs are projected to continue decreasing with the growth of network
connectivity and increasing numbers of Internet providers. Candidate locations
include less connected parts of the continent, such as Kenya, where bandwidth
limitations cause high latency for DNS lookups to other parts of the world.
Perhaps the lowest demand area of all the proposed expansion sites, Africa is of
special interest because early site deployment will allow for historic traffic
growth monitoring for this region as Internet usage gradually increases.
Location evaluation is currently underway, with targeted deployment in mid-2005.
Central and South America
In 2002, VeriSign relocated a .net resolution site to Miami, FL, a major peering
point for Central and South America, to better serve Internet users in this
region and to track overall growth for these areas. Monitoring at this site has
shown a consistent increase in traffic, presumably the result of continued
growth of Internet usage in these regions. Increasingly better connected,
engineers have a variety of locations (Figure 5(b)vi-12) from which to choose
for site deployment.
Brazil, Chile, Peru and Argentina all offer possibilities, with Brazil leading
the candidate countries with the best connectivity and highest Internet usage
rates on the continent. Brazil actually ranks 10th on the list of countries with
the highest number of Internet users, just behind Canada and France. However,
Argentina is also a prime target for consideration given its high inter-regional
connectivity to Chile and Brazil. Location evaluation is currently underway, and
this deployment is slated to be an early expansion target for 2005.
Middle East/South Asia
The Middle East is perhaps the most rapidly emerging market with Internet usage
experiencing almost 230 percent growth since 2004, higher than any other
geographic region in the world. Connectivity is also increasing in these
regions. Telegeography's 2004 Global Internet Geography report specifically
calls out these regions, forecasting rapid Internet growth in the next year.
This report details major connectivity additions which are planned to be
completed in 2005, including several fiber installations connecting the Middle
East to South Asian sites such as India, Sri Lanka, Bangladesh, as well as to
Europe and East Asia.
Given both the current limited access to international peering points where .net
resolution servers are located, and the anticipated connectivity growth between
Middle Eastern countries and to/from outside regions, this typifies a "best of
both worlds" location for the deployment of an anycast server. Early deployment
will better serve this area with regional resolution to decrease round trip
times to sites further away. However, as connectivity and Internet usage
increases, this could grow to a high-traffic resolution site, providing service
to South Asian countries as well as Eastern and Central Eurpoe.
Eastern and Central Europe
Of the top 20 countries in Europe ranked for Internet usage, 6 are located in
Eastern and Central Europe, with two, Poland and Russia in the top 10. Verisign
engineers are seeking to both provide faster .net resolution to these and other
countries in this region and to redistribute load from the busiest resolution
sites in Western Europe. More research is needed into strategic location targets
which will begin in 2005.
Conclusion
VeriSign's global constellation is designed and managed with one thought in
mind; providing the highest quality service, rapid response times, and 100
percent availability. This is implemented through rigorous selection of remote
sites, constant re-evaluation of those sites, and taking into consideration not
only physical location but also network diversity.
Market need and user demand drive site selection, resulting in a highly focused
network expansion plan for the coming decade. Many critical network services
rely on the VeriSign constellation. VeriSign has achieved flawless resolution
service by continually enhancing our geographic sites, layering redundancy and
increasing security. Finally, we remain firmly committed to continue the
expansion and improvement on this coverage to growing and emerging markets. |
|
(vii) Zone file generation including procedures for
changes, editing by registrars and updates. Address frequency, security,
process, interface, user authentication, logging and data
back-up.
|
The .net zone comprises information about more than 5 million second-level
domains, including nearly 13 million name server records and almost 200,000
address records. Many authoritative name servers, including those used by nearly
every TLD, have names ending in .net. More than 58 percent of all Internet hosts
rely on .net (source: "Net TLD Project: Measuring the Role of .net in the
Internet," Matthew Zook, Ph.D., ZookNIC Consulting, April 20, 2004.) Any
inaccuracy in, or unavailability of, .net zone information would have a
significant impact to the Internet community beyond just .net registrants.
Therefore, it is essential that the .net registry operator has the proven
ability to generate the .net zone file reliably, accurately, and securely. It is
also important to have the ability to generate .net zone file information in
near real-time. This capability satisfies the increasing expectations of .net
registrants, who want changes to their second-level domain information to be
visible in the Internet's DNS infrastructure more quickly than in the past.
VeriSign Advantage:
+ Commitment to, and investment in, highly accurate .net zone generation
processes and technologies, which result in reliable services for critical
Internet infrastructure
This section addresses:
* Zone file generation, including procedures for changes, editing by registrars,
and updates. Any changes to .net zone information result from create, delete, or
modification actions initiated by ICANN-accredited registrars, VeriSign customer
service representatives, or automated batch processes. Registrars do not edit
.net zone information directly, but change data through the SRS, or through the
web-based Registrar Tool.
* Frequency, security, process, interface, user authentication, logging, and
data backup. VeriSign updates the .net zone in near real-time with updates
generated approximately once every 15 seconds. These updates are visible in all
.net name servers within approximately 1 minute. The zone information is
generated on secure servers in a private network. Registrars use the SRS or the
web-based Registrar Tool to make changes to .net zone information. These
protocols use the SSL to provide a secure communications channel; registrars are
authenticated with a digital certificate, source IP address range and user name
and password. All applications that are part of the .net registry, including
zone generation, write extensive log files. Authoritative zone file information
is stored in an enterprise data storage system from EMC with mirrored Redundant
Array of Independent Disks (RAID) protection and offsite real-time replication.
VeriSign has demonstrated a proven record of success, having generated the .net
zone reliably, accurately and securely for 13 years. Recently, after a period of
thorough testing, we have begun updating .net zone information in near-real time
to increase the level of service to the Internet community.
Procedures for Changes, Editing by Registrars, and Updates
Any changes to .net zone information result from create, delete, or modification
actions initiated by ICANN-accredited registrars, VeriSign customer service
representatives, or automated batch processes.
Registrars do not edit .net zone information directly, but change data in the
SRS. More information about SRS protocols are available in Section 5(b)iv.
VeriSign also distributes the registrar tool to registrars, which provides an
easy graphical user interface (GUI) to perform common administrative tasks, such
as adding, modifying and deleting domain names and name servers.
In addition, authorized VeriSign customer service representatives make changes
in the SRS on behalf of, and with the consent of, registrars using web-based
customer service tools. Regularly scheduled automated batch processes also
change data in the SRS database. These batch processes implement published
business rules, such as automatic renewal of expired domains and the Redemption
Grace Period (RGP).
Update Process and Frequency
A description of VeriSign's .net zone file update process first requires some
background on ATLAS, the award-winning resolution platform that VeriSign developed.
VeriSign developed the ATLAS name server because no off-the-shelf solution could
be found to handle the unique requirements of the .com and .net TLDs. ATLAS's
economic scaling properties allow us to provision overwhelming capacity to
satisfy the extreme performance and reliability demands of these largest and
busiest of all TLDs. One of the ways ATLAS provides such high capacity is by
placing an in-memory relational database holding the .net zone information at
each resolution site. This in-memory database contains a subset of the core .net
registry database and allows DNS queries to be answered at tremendously high
rates. The standard zone file format defined in RFC 1034 is not well-suited to
the task of updating these in-memory databases at the resolution sites as data
changes in the SRS. Instead, ATLAS distributes the database using replication
files designed to support transactional database updates and several levels of
data integrity verification. These files are analogous to RFC 1034 zone files,
but have a different format and include database metadata.
The .net registry system still produces an RFC 1034-compatible zone file, which
is used by our BIND-based warm-standby backup DNS infrastructure and for
publication to participants in our TLD Zone File Access Program. Our BIND-based
backup system is described in Section 5(b)ii.
The remainder of this section describes how the .net registry system generates
both the ATLAS-specific zone information and RFC 1034-compatible zone files in a
reliable, accurate, and secure manner.
ATLAS
ATLAS updates .net zone data in a three-step process of extraction, validation,
and distribution. Figure 5(b)vii-1 shows the ATLAS architecture and the
extraction, validation, and distribution components.
Extraction. A process called extraction produces zone information for use by
ATLAS in two formats from the core .net registry database.
The first file format is called an Initial Send File (ISF), which is somewhat
analogous to a traditional RFC 1034-compatible zone file. An ISF contains
complete information about the .net zone. ISFs are produced twice per day at
12-hour intervals. An ATLAS name server only needs an ISF when the server
process is initially started up so it can load the .net zone contents into its
in-memory database. In this way, the ISF serves the same purpose as a
traditional zone file. After an ATLAS name server loads from an ISF, it receives
subsequent updates to .net zone information through the near real-time
incremental update feature.
This near real-time update feature uses a second zone information file format,
the Send File (SF). Each SF is simply a list of all changes (in the form of
additions, modifications, or deletions) to the .net zone since the previous SF
was generated.
The extraction process generates SFs continually as changes occur in core
registry database, resulting from RRP transactions and other activity. The
extraction process generates a new SF approximately every 15 seconds.
Validation. Before an ISF or SF is distributed to the ATLAS name servers at the
resolution sites, a rigorous validation process verifies the file as part of our
serious commitment to absolute data integrity. Both ISFs (containing complete
.net zone information) and SFs (containing a list of recent changes to .net zone
information) are subject to this same validation process.
This process runs on a validation server, which duplicates the operational
environment at the resolution site as closely as possible. A validation server
is essentially a captive name server with additional verification capabilities.
First, the validation process checks the file for syntax and other obvious
errors. Then, it loads the file and compares the contents of its in-memory
database with the core registry database to ensure that both match. If the two
match, then the information in the file was processed successfully and the file
can be safely sent to the ATLAS name servers at the resolution sites, and the
data contained in the file will be properly published. Any discrepancies
indicate a potential data integrity problem, so an alarm is triggered and human
intervention is required. The validation process verifies every record in every
file, both ISF and SF, in this manner. VeriSign has a patent pending on this
innovative data validation process.
Distribution. Once the file passes validation, it is distributed to the
resolution sites in a process described in more detail in the next section,
Section 5(b)viii, Zone File Distribution and Publication. The ATLAS name servers
at each resolution site process files as they arrive. ISFs are extracted,
validated, and processed twice per day; however, SFs are extracted, validated,
and processed in a continual stream as the core registry database changes. In
the case of an SF, the entire process of extracting a batch of changes,
validating them, and distributing them to all resolution sites takes about 1
minute on average.
RFC 1034-Compatible Zone Files
The .net registry system also produces an RFC 1034-compatible .net zone file
twice per day at 12-hour intervals. This file is used for mainly for two purposes.
* As described in Section 5(b)ii, VeriSign runs BIND name servers in a warm
standby mode as a backup to ATLAS at each resolution site. These BIND name
servers are always running with an up-to-date copy of the .net zone, which they
load twice per day from this RFC 1034-compatible zone file. In the event of a
critical ATLAS failure, these BIND name servers could begin answering queries
after a simple load balancer configuration change at each resolution site. While
ATLAS has been extremely stable in production, and we are quite confident in its
reliability, we use the BIND warm backup to be prudent and conservative. A
systemic failure of .net cannot be tolerated under any circumstances, and this
solution offers additional protection and peace of mind.
* RFC 1034-compatible zone data files for .net are also used for the TLD Zone
File Access Program, which allows access to the .com and .net zone files by
participants who sign an agreement governing the use of the files. More
information is available at
http://www.verisign.com/products-services/naming-and-directory-services/
naming-services/com-net-registry/page_001052.html
Security
The preceding zone file generation processes are automated using thoroughly
tested software on secure servers in a protected network. Zone file distribution
occurs over VPNs to each of our DNS resolution sites. Administrative access to
each component in the system is limited to authorized VeriSign administrators.
Each server is built using a locked-down operating system image from a standard
build and scanned for security vulnerabilities before being put into production.
Each server is further monitored and scanned periodically for security
vulnerabilities once in the production environment.
Section 5(b)xiii describes the security of VeriSign's .net registry systems in
much greater detail.
Interface
The interface to make changes to .net zone information provided to registrars is
through the SRS and access through the web-based Registrar Tool. There is no end
user interface per se to the zone generation process itself.
User Authentication
Registrar authentication for SRS access with the RRP API uses SSL certificate
authentication, source IP address authentication, and application-based
authentication with registrar user name and password. Registrar authentication
for changes with the registrar tool uses the registrar user name and password.
Each zone generation, validation, distribution, and publishing process uses
server and application authentication to ensure only authorized accounts are
used to update .net zone information. Each generation and validation process is
authenticated at the application level using server account information, and at
the database level using account information specific to the particular process.
Each distribution and publishing process is authenticated at the application
level using server account information. Each process is designed to provide
notification to the NOC if access is attempted by an unauthorized account.
Logging
VeriSign applications log .net zone file updates at each phase of the
registration process. In the registration process, activity from the RRP API,
and EPP API, and registrar tool is logged by the specific applications. Database
transaction logging occurs in the SRS and serves as the authoritative source for
all transaction information. Each zone generation and validation process is
logged by the application and error-checked using our distributed monitoring
system. Each zone file update, and any errors in the generation or validation
process, are included in the logs. The logs are periodically reviewed by system
administrators to make certain appropriate validation criteria are met.
Data Backup
Authoritative zone file information is kept in the .net SRS using an enterprise
data storage system from EMC with mirrored RAID protection and offsite real-time
replication. Generation of zone files is performed using a server connected to
the enterprise storage system. Each distribution and resolution server, in turn,
uses mirrored data storage to provide data protection in case of hard drive
failure. A compilation of recent zone file updates is kept on each distribution
server. The compilation provides for fast publishing times after server
maintenance. Backup processes and technologies are described in greater detail
in Section 5(b)x.
Conclusion
VeriSign currently delivers reliable .net zone file generation processes and
commits to continuing highly accurate .net zone generation processes and
technologies, resulting in reliable services for critical Internet infrastructure. |
|
(viii) Zone file distribution and publication.
Locations of name servers, procedures for and means of distributing zone
files to them.
|
VeriSign delivers distribution, publication, and resolution services based on
technologies that promote data accuracy and that enable the rapid growth of the
.net gTLD. VeriSign provides .net domain name system (DNS) resolution services
from our constellation of DNS resolution sites. VeriSign follows mature and
rigorous procedures for distributing the .net DNS zone files.
VeriSign Advantage:
+ Distribution, publication, and resolution services, based on technologies that
promote data accuracy and that enable the rapid growth of the .net gTLD
This section describes zone file distribution and publication processes, including:
* Locations of Name Servers. VeriSign serves .net from 14 resolution sites
located in the United States, Europe, and Asia. We also use three warm standby
sites for maintenance and emergency capacity.
* Procedures for and Means of Distributing Zone Files. Zone files are validated
and then distributed over a secure virtual private network. The end-to-end
publication time is within 3 minutes 95 percent of the time.
The .net operator must have the proven ability to quickly and accurately
distribute .net zone information to a set of name servers distributed around the
world. Our customers demanded rapid update of .net zone information: registrants
wanted the domain name changes they made at registrars to be reflected quickly
in DNS. VeriSign responded with ATLAS and its rapid update feature. With other
registries, it can take 5 minutes or even an entire day for changes to appear in
DNS. But VeriSign guarantees changes made to the SRS will appear in DNS within 3
minutes 95 percent of the time.
Locations of Name Servers
VeriSign provides .net DNS resolution services from our constellation of 14 DNS
resolution sites, which are located throughout the world, as shown in Figure
5(b)viii-1. We select resolution sites based on a rigorous process involving
multiple factors, as detailed in Section 5(b)vi, Geographic Network Coverage.
Site locations are adjusted over time as conditions, such as Internet traffic
patterns change.
Each site has multiple load-balanced systems running ATLAS, our proprietary
high-performance name server. ATLAS is fully compliant with all relevant DNS
RFCs, including RFCs 1034 and 1035 which are the core of the DNS specification.
Each site is managed remotely over secure VPNs and monitored around the clock.
Each site is based on a redundant server architecture and uses a complete set of
redundant hardware components and supporting network infrastructure to eliminate
single points of failure. For example, each site has a minimum of 2 gigabit
Ethernet connections and is served by at least two separate Tier 1 network
providers. The diversity of network providers is important so connectivity
issues for a single provider do not interrupt services. Resolution site
architecture is discussed further in Section 5(b)ii, Stability of Resolution and
Performance Capabilities. Each resolution site around the globe must adhere to
strict facility standards established by VeriSign.
Standby Sites
In addition to the 14 main resolution sites, VeriSign uses three warm standby
sites. The standby sites have identical configurations to the main resolution
sites, including all the redundant system and network components, along with
similar Internet connectivity and bandwidth. These sites are used primarily for
maintenance, but can also be activated to provide emergency resolution capacity,
such as in the event of a DDoS attack. For security purposes, we do not publish
the locations of these sites.
To perform extensive maintenance on a given main resolution site, we redirect
all the services provided by the main site to one of the standby sites. The
standby site concept is possible because each of the 14 main resolution sites
uses provider independent IP address space and VeriSign controls the
announcement of this address space, using the BGP routing protocol. To activate
a standby site, we first configure the site's equipment with the same IP
addresses, as the main resolution site it will stand in for, effectively
"cloning" the main site. After verifying that all systems at the standby site
are ready to handle service, we withdraw the BGP route announcement at the main
site and announce the same address space from the standby site. The effect is
that all traffic moves from the main site to the standby site with only a brief
outage for that particular site as the Internet routing information changes. We
can then perform the necessary maintenance at the main site without worry.
During this time, individual name servers and resolvers throughout the Internet
continue to use the site's services, unaware that they are accessing the standby
site instead of the main site. This process is reversed to restore service to
the main site.
Our hot standby sites have been a significant factor in allowing us to maintain
our unparalleled track record of 100 percent uptime for .net resolution services
for the past 7 years.
Expansion Using Anycast
VeriSign plans to increase the Internet footprint of .net DNS resolution
services by using a technique called anycast. Anycast allows a service to be
replicated in multiple locations using Internet routing. Anycast works by
configuring multiple instances of a service in different locations, but each
with the same service IP address. Each instance announces its availability using
the BGP routing protocol. The global Internet routing table shows multiple ways
to reach the service at this IP address, just as when an organization's network
is connected to the Internet via multiple providers and announces the same
address space to each provider. In both cases, Internet routing chooses the best
route from a given client to the destination server. With a multihomed network,
the client always reaches the same destination. In the case of anycast, the
client reaches the topologically closest anycast instance, as determined by BGP.
But it doesn't matter which instance a client reaches, because all are
configured with the same IP address and provide the same service.
Anycast is now a proven method for increasing the number of name servers for a
given DNS zone. Many of the operators of Internet root name servers now use
anycast to expand the number and geographic scope of their name servers.
VeriSign, one of the original root name server operators, uses anycast to expand
the j-root server to 15 geographically distributed locations throughout the world.
VeriSign has also used anycast to expand the number of .net (and .com)
resolution sites beyond the traditional DNS protocol-imposed limit of 13. In
July 2004, a site in Seoul, Korea, became the 14th site in our global resolution
constellation as an anycast instance of b.gtld-servers .net: services for
b.gtld-servers .net are now offered from both Sunnyvale, California and Seoul,
Korea.
We plan to add an additional five .net resolution sites over the next year in
emerging markets. Section 5(b)vi, Geographic Network Coverage, describes our
expansion plans in more detail.
Procedures for and Means of Zone File Distribution
Section 5b(vii), describes how VeriSign generates .net zone information in two
file formats: traditional RFC 1034-compatible format and an internal format
suitable for ATLAS' database replication needs. Zone information in both file
formats is distributed to all the resolution sites. Throughout the rest of this
section, the term "zone file" refers to files in both formats.
VeriSign follows mature and rigorous procedures for distributing the .net DNS
zone files. After generation, each zone file is validated against the .net SRS
database. Any validation error will stop the distribution process and prevent
erroneous information from being published on the Internet. After validation,
zone files are ready for distribution to the resolution sites.
We do not use the standard DNS zone transfer protocol, which is not well suited
to a large zone such as .net. The zone transfer protocol is wasteful of
bandwidth because it does not allow incremental changes; the entire zone file
must be sent each time the zone is transferred. The standard zone transfer
protocol is also based on TCP, which does not perform well in difficult
conditions of high network latency (delay) and packet loss.
Instead, VeriSign developed our own zone file distribution mechanism. This
mechanism supports incremental transfer of large zone files, eliminating the
need to send the entire file. We also developed our own UDP-based file transfer
protocol that is designed to perform well, even in the face of the difficult
network conditions. Our zone file distribution mechanism copes well with less
than ideal network conditions. In practice, this protocol works extremely well
in distributing .net zone information quickly and efficiently to all resolution
sites.
The zone files are distributed via secure VPN to each resolution site. Every
zone file is generated with a cryptographic checksum that is verified every time
the file is transferred. After this checksum is verified at the resolution site,
the file is loaded by the name server.
Each point of the distribution and publication process is monitored by two
separate monitoring systems. The VeriSign distributed monitoring system checks
for server health and application health. VeriSign's custom DNS monitoring also
monitors the status of a generated zone file and its serial number along each
point in the distribution process. The serial number, assigned at the time of
file generation and based on the time of day, provides an effective method to
identify and track each zone file as it is being distributed and published.
VeriSign's extensive monitoring systems are described further in Section
5(b)xvi, System Outage Prevention.
Zone File Publication with ATLAS
ATLAS provides advanced publication features that contribute to high-performance
resolution of .net DNS queries. ATLAS uses an innovative in-memory database that:
* Allows ATLAS to perform the .net DNS resolution services and to simultaneously
receive the .net zone file data feed from the central distribution servers. Zone
changes are incorporated at the resolution site, while queries continue to be
answered.
* Offers advantages over commercial database products, such as Oracle. Since all
zone data is stored in memory, queries for even less popular .net domain names
and name servers are answered quickly. Name servers, based on commercial
database products, result in longer query response times caused by slower access
to zone data, which must be retrieved from disk.
ATLAS uses advanced techniques for scrubbing the data sets in memory and
comparing every instance across all sites to catch data anomalies before a bad
response can be returned. In the case of DNS resolution for TLDs, a single
record error for a domain can cause widespread disruption of services on a scale
unthinkable for most industries dependent on the Internet.
As a result of these features of VeriSign's award-winning ATLAS architecture,
query response times measured internally consistently average less than 5
milliseconds.
Conclusion
VeriSign distributes and publishes .net zone information quickly and accurately.
Registration changes are guaranteed to be reflected in DNS within 3 minutes 95
percent of the time, and in practice, changes appear in less than 1 minute. Both
our guarantee and our ongoing performance are the fastest in the industry. In
addition to supporting these rapid updates, our proprietary ATLAS platform
includes other features critical to a TLD of .net's importance, such as
comprehensive data validation and efficient and reliable zone file distribution. |
|
(ix) Billing and collection systems. Technical
characteristics, system security, accessibility.
|
VeriSign provides registrars 24x7x365 access to the registry billing and
collection system with functionality that is fully integrated with the SRS.
VeriSign uses multi-tiered system security and conducts a variety of periodic
third-party audits of our systems and networks.
VeriSign Advantage:
+ ICANN-accredited registrars benefit from VeriSign's experience in providing
advanced billing functionality, including online maintenance of account
information, improved invoice format and content, and reporting.
This section outlines VeriSign's billing and collection systems, including:
* Technical Characteristics. VeriSign's billing and collection system
functionality is fully integrated with the SRS.
* System Security. VeriSign uses multi-tiered system security and conducts a
variety of periodic third-party audits of our systems and networks to maintain
secure billing and collection systems.
* Accessibility. VeriSign provides registrars 24x7x365 access to the registry
billing and collection systems through the Registrar Tool with extensive
detailed and summary reporting.
Technical Characteristics
System security, accessibility, auditability, and reliability are the
cornerstones of the VeriSign Registrar Billing and Collection System (RBACS).
ICANN-accredited registrars benefit from the ability to operate and manage their
businesses based on the functionality of this advanced Oracle-based system. This
system is fully integrated with the VeriSign SRS and has been tailored during
years of service to the needs of registrar business operations.
The RBACS is comprised of three main components, as depicted in Figure 5(b)ix-1.
First, the main financial and invoicing component consists of Oracle Financials
(Oracle 11i) and is responsible for debiting and crediting registrar accounts
and for generating monthly invoices. The web-based Registrar Tool provides
registrars with 24-hour access to their SRS account information, the ability to
modify certain account parameters, and the ability to set permissions. Finally,
the reporting engine delivers a variety of accounting and operational reports
that registrars use for financial reconciliation and business operations.
While registrars have 24-hour self-service access to their SRS account
information through the web-based Registrar Tool, registrars can also access
additional assistance 24x7x365 by contacting the VeriSign Customer Service Desk
(CSD). VeriSign personnel can provide help to registrars not only through the
direct use of the Registrar Tool but also through a variety of support systems,
including the VeriSign Customer Service Representative (CSR) Tool and Clarify
Issue Tracking System to provide responsive and reliable customer service.
Finally, VeriSign is committed to frequent improvement to the RBACS through the
introduction of additional functionality and more advanced tools to assist
registrars in managing their businesses. Consolidated invoicing and additional
reporting are examples of recent improvements, and, with the deployment of our
EPP implementation described in Section 5(b)iv, VeriSign will introduce the
VeriSign Registrar Console that will serve as the platform for registrars to
access EPP-enabled registry functionality and will replace the existing
Registrar Tool.
Registrar Financial Requirements: VeriSign establishes registrar financial
requirements during the new registrar setup process. Within 24 hours of the
receipt of a registrar accreditation notice from ICANN, VeriSign sends the new
registrar a Welcome Package that includes the Registrar Credit Application and
the Registrar Data Information Sheet, which captures billing contact information
and can optionally be completed on the VeriSign website.
Once VeriSign receives both the Registrar Credit Application and the Registrar
Data Information Sheet, VeriSign creates an account for the registrar in the
RBACS. This process also allows each registrar to provide a payment security in
the amount identified on the Registrar Credit Application and required in the
Registry-Registrar Agreement. Registrars have three options available to satisfy
the payment security:
* Deposit Account (funded via check or wire)
* Irrevocable Letter of Credit
* Payment Security Bond
Once secured, the payment security amount establishes the registrar's credit
limit within the SRS. Registration volumes during the monthly billing cycle may
not exceed the registrar's credit limit. To assist registrars in monitoring
their available credit balance, the RBACS generates and transmits low balance
notices similar in form and content to the sample provided in Figure 5(b)ix-2
when remaining credit balances fall below the registrar-configurable threshold.
Batch processes will continue to automatically send the registrar low balance
notifications six times daily until the registrar sends VeriSign funds to bring
its available credit to an amount above the registrar established threshold. If
a registrar's available credit reaches zero dollars, the registrar will no
longer be able to perform billable transactions within the SRS.
Registrars can apply for emergency credit from VeriSign 24x7x365 to allow them
to continue to conduct billable transactions within the SRS. Our process for
granting registrars emergency credit is well-established and strictly followed
to treat all registrars in an equivalent manner.
System Security
Registrars access the SRS through the Registrar Tool or through an API to
perform registration activities. All billable transactions performed by
registrars in the SRS are passed to the RBACS via an API in near real-time. Each
SRS session is authenticated and encrypted using two-way SSL protocol. In
addition to SSL protection, all registrars are contractually obligated to
authenticate every client connection with the SRS using both an X.509 server
certificate and its password. All registrar self-service functions are performed
via the Registrar Tool, which also requires registrar specific user IDs and
passwords. Each individual registrar is responsible to safeguard his user name
and password and to notify VeriSign within 4 hours of learning that its password
has been compromised in any way, or if its server certificate has been revoked
or compromised in any way.
VeriSign undergoes a variety of periodic audits of our systems and networks,
including the RBACS. Penetration tests are performed by an outside company and
include numerous tests for exploits, versions, and patch levels. Various groups,
such as Klynveld Peat Marwick Goerdeler (KPMG), regularly perform Statement on
Auditing Standards (SAS) 70 and British Standard (BS) 7799 audits to validate
the effectiveness of the security measures that VeriSign uses. Detailed
information on security audits is available in Section 5(b)xiii.
System Accessibility
VeriSign provides registrars full-time access to the RBACS through the Registrar
Tool. In addition, the RBACS delivers reports and invoices on the schedule
described below, and customer service support is available to all registrars.
Access to customer support as well as reports, the Registrar Tool, and invoice
information is granted to all registrars on an equivalent basis. Finally,
VeriSign monitoring systems, operations personnel, and outage prevention
procedures, all detailed in Section 5(b)xvi, cover RBACS to promote efficient
and reliable registrar operations.
System Auditability
Registrars require easy-to-use tools and reports to assist them in managing
their business. The VeriSign RBACS meets this need by providing daily, weekly,
and monthly reports that are available to registrars via a secured file transfer
protocol (FTP) server. To assist registrars in their ability to monitor their
registration activity throughout the month, VeriSign generates daily and weekly
reports for each of our registrars, as listed in Table 5(b)ix-1.
Registrars are able to use these reports to monitor their registration
activities and to reconcile their activity to the monthly billing reports. These
reports are published on the first day of each month and provide registrars with
a means to reconcile their monthly invoices to their transaction activities,
listed below:
* Registrations (including deletions within the grace period)
* Transfers (including deletions within the grace period)
* Auto-renewals (including deletions within the grace period)
* Explicit Renewals (including deletions within the grace period)
* Restores
* Syncs
* Non-Refund Deletions
The monthly reports include the transactions for the previous month as well as
the grace period deletes for the previous month. The grace period deletions are
flagged with a deletion date. The domains that are deleted outside the grace
period will continue to be listed on separate reports. These "non-refund"
reports include all deletions from the previous month that occurred outside the
grace period, whether or not they were added, renewed, auto-renewed, or
transferred during that month or during a prior month.
The reports reflect the actual activity for the period that affected the
registrar deposit account. The registrars receive one monthly invoice reflecting
the counts for all billable activity for the month and the detail behind the
summary counts on the invoice are provided in the detail reports identified
above. The system associates a transaction ID with every transaction that occurs
in the system. These transaction IDs provide an audit trail of all financial
transactions allowing registrars, as well as VeriSign personnel, to easily trace
activity during the audit process. Figures 5(b)ix-3 and 4 provide a sample of
the new invoice format that VeriSign recently made available to registrars.
System Reliability
As with the SRS, the Oracle 11i billing system reliability and availability are
paramount to VeriSign's ability to provide registrars with an accurate view of
their financial status with the registry. The financial systems that support the
registry have had online availability of 99 percent over the past 2 years.
Typically, errors are identified within 4 hours of occurring and corrected
within 4 hours of identification. We expect the Oracle 11i availability to
increase to 99.9 percent when an infrastructure upgrade is completed in January
2005.
Conclusion
VeriSign operates a highly secure and reliable billing and collections system
that includes a web-based Registrar Tool, allowing registrars to add users, set
permissions, view credit limit/account balance information, look up credit
transaction history, and set low balance percent threshold. This self-service
allows registrars to quickly make changes and access information necessary to
operate efficiently and effectively. Our comprehensive daily and weekly activity
reports provide registrars a variety of operating metrics for use in financial
and operational planning. Registrars receive clear and concise invoices on the
first business day of each month that include all products to make
reconciliation and processing fast and easy. Finally, registrars have immediate
access to monthly transaction reports to aid invoice reconciliation. VeriSign's
comprehensive solution to billing and collections is reliable and customer focused. |
|
(x) Backup. Describe frequency and procedures for
backup of data. Describe hardware and systems used, data format, identity
of suggested escrow agent(s) and procedures for retrieval of data/rebuild
of database.
|
VeriSign has developed a comprehensive backup and escrow system to protect the
data of the .net registry. Backups that pertain to procedures, techniques, or
hardware are used to help recover lost or destroyed data, or to keep systems
operating. These are essential items to ensure continuity of service for a
stable and reliable system.
VeriSign Advantages:
+ Protection of registry data is maximized by a layered architecture supported
by robust data backup procedures.
+ Operational five-tiered architecture to ensure critical data backup reliability.
This section describes VeriSign's approach to data backup and the procedures
necessary for recovery.
* Frequency. VeriSign's SRS system has a tiered backup architecture
incorporating real-time backup of data to multiple disks, full database
snapshots made several times a day, and full backups to tape every day.
* Procedures. Data from the mirrored disks are backed up to snapshots, which are
then used to create daily tape backups. Additionally, the data is mirrored in
real-time across a fiber link to a fully redundant alternate primary site, and
separate database transaction logs are compiled and downloaded to our escrow
agent on a daily basis.
* Hardware and Systems Used. Our SRS data backup and recovery system is composed
of a primary site with a secondary backup site. Each site uses a combination of
multiple EMC Symmetrix storage frames for live and snapshot databases and IBM
3584 high-speed tape libraries.
* Data Format. The SRS data is stored in used the native Oracle 9i data format
under compression.
* Escrow Agents. Our escrow agent is Iron Mountain Intellectual Property
Management, and our long-term data storage facility is operated by Iron Mountain.
* Retrieval of Data/Rebuild of Database. VeriSign's recovery procedures are
choreographed around our five-tier backup architecture. Depending on the nature
of the failure, recovery can be performed in a matter of minutes with our
redundant EMC Symmetrix storage frames or within 48 hours from tape in the
worst-case scenario.
Because of the importance of the .net registry to the stable operation of the
entire Internet, our backup procedures are critical to our ability to safely and
confidently operate the registry. VeriSign's five-tiered structure using
industry-leading storage hardware and software for protecting critical data
provides the following:
* Maximum protection against the corruption of critical data
* Maximum confidence in the ability to never drop or lose a single real-time
transaction, which is especially important in an SRS environment that is
processing 300,000 real-time transactions every minute
* Ability to quickly restore data
Frequency and Procedures for Backup of Data
The VeriSign five-tiered hardware and software structure, shown in Figure
5(b)x-1, is focused on the maximum protection of the primary .net registry
database. This database is the most important element of any registry
provisioning function.
Tier 1. Enterprise Symmetrix Remote Data Facility (SRDF) replication technology
ensures integrity of data replication across database storages and provides high
performance replication. Each disk drive in the EMC Symmetrix storage frame is
fully mirrored, with significant automated checking for physical corruption and
failover. Additionally, VeriSign creates periodic business continuance volumes
(BCVs), or snapshots, from the primary database. These BCVs provide the ability
to quickly restore the primary database in the event of an emergency. This also
allows us the ability to perform various administrative batch activities (e.g.,
reports and backups) without affecting the performance of the primary OLTP
database. This architecture is critically important to maintaining SLAs in an
environment with high transaction volumes and significant transaction peaks.
Tier 2. VeriSign uses the BCVs created in Tier 1 to generate tape backups from
the OLTP database. These tapes are stored in a tape library at the primary data
center facility. VeriSign uses enterprise tape storage software to store data
onto tape libraries. Copies of these tapes are created daily and stored at a
short-term offsite tape storage facility. These tapes are accessible within 10
minutes to operations personnel. The data on these tapes includes database
transaction logs, which can be used to recreate transaction history in the case
of recovery and contribute to overall data integrity.
Tier 3. Each real-time operation against the primary OLTP database is
synchronized to the primary EMC Symmetrix enterprise storage frame, which then
is synchronized to a secondary Symmetrix enterprise storage frame at the
secondary site using SRDF technology. This synchronization ensures that all OLTP
database transactions are replicated in real-time to a secondary database at a
different location and facilitates the rapid recovery if needed.
Tier 4. Daily full backup tapes are transported weekly from the short-term
offsite tape storage facility to a secure long-term offsite tape storage
facility operated by a third-party storage vendor. These tapes are retrievable
in a few hours at the request of specifically named and authorized individuals.
Tier 5. OLTP critical transaction logs are compiled and packaged for daily
download by the escrow vendor. The data is available in the event that registry
services are lost, and the data is needed to recreate registration information.
Hardware and Systems Used
Detailed backup procedures developed specifically for registry operations
through years of operations experience, and supported by state-of-the-art
hardware and software, help deliver reliable and efficient data processing,
storage, and retrieval. VeriSign uses hardware and backup systems from leading
vendors, including EMC and IBM. These proven systems enable us to confidently
support our assertion that we do not lose registry data.
Hardware. VeriSign uses EMC Symmetrix storage systems to back up the online .net
registration database. In the backup processes associated with Tier 3, the data
resides on mirrored storage at the primary site with fully synchronized data at
the alternate primary mirrored storage site. A full copy of the database is
taken each day and backed up on tape using the IBM 3584 tape library with an
expansion frame and a minimum of 16 tape drives and 721 available tape slots.
This solution is capable of physically storing up to 144 Terabytes (TB) of data
in the library. This tape media has a storage life of up to 30 years. The tape
drives are each capable of data transfer rates of 35 Megabytes per second (MB/sec)
With the 16 drives, the backup solution can currently back up 2 TB per hour, 48
TB per day. At the current size of the .net database, 400 full tape backups of
live data, or 133 full backups of archived .net data, could be performed per
day. Although VeriSign is capable of such numbers, it is unlikely since EMC BCV
technology is used.
The BCV allows us to take a point-in-time snapshot of the .net database within
minutes and mount it to a server anywhere on the network where backups, database
restores, or reporting functions are performed. IBM P-Series servers are
deployed to provide the tape storage capabilities for the Veritas NetBackup
software. Each site has a master server and multiple media servers that connect
to the tape library via a storage area network. This allows for securely
directing data from each separate network environment to tape without the
concern of competing backup jobs limiting bandwidth or drive usage.
Disaster Recovery. In addition to primary backup solution, VeriSign has a
disaster recovery solution that uses the same architecture and implementation.
Any tape that is written at the primary site can be recovered from the offsite
facility and recovered at either site in case of emergency.
As a failsafe there are redundant 2 Gigabit (Gb) fiber circuits between the
primary and alternate primary sites that can be used to back up .net data in a
matter of minutes. If the entire primary facility were lost, the disaster
recovery backup solution can perform backups of the .net disaster recovery
systems immediately (once databases are recovered). This solution is always
online and operational. The primary and alternate primary sites are listed in
Tables 5(b)x-1. The software specifications are listed in Table 5(b)x-2.
Software and Systems Solutions. VeriSign uses leading software and systems
solutions with proven technology to provide a robust software backup solution
for the .net gTLD. We use the Oracle relational database for the .net
registration system, which is deployed on a pair of database servers clustered
using IBM High-Availability, Cluster Management Protocol (HACMP) software. This
software provides protection against database server or network failure. The
database servers are connected to an EMC Symmetrix storage array. The database
uses EMC TimeFinder software to provide mirrored storage protection on the EMC
Symmetrix storage arrays. In addition, EMC SRDF provides further protection
against site failures by replicating data in real-time to the secondary
registration site.
A snapshot of the database is backed up to tape daily using Veritas NetBackup
software, which is used for application and database backups onto an IBM tape
library. The NetBackup software consists of two main packages:
* Media Server
* Master Server
Media Server. The Media server maintains information about the storage devices
and individual tapes. It is responsible for managing information related to:
* Location of stored data
* Data contents of each volume (tapes, disks, etc.)
* Condition of the volumes (how old, etc.)
Media server functions cover four main areas:
* Media Management: the tape location, age, and index of data stored on each tape.
* Device Management: the drives available for backup.
* Robot Management: tape inventory in the tape library.
* Maintenance Management: tape drive cleaning.
Finally, the media server groups tapes in two ways to accomplish the tape
management role:
* Volume Pools. Each tape belongs to a volume pool that determines what the tape
will be used for such as incremental or full backups
* Volume Groups. Tapes also belong to a volume group that determines where a
tape is located, such as in a robot or offsite
Master Server. The NetBackup master server software uses the services provided
by the media server to provide the following backup, archive, and restore functions:
* Unattended full and incremental backups
* Administrator initiated manual backups
* User directed backups/restores
* Scheduling and reporting
Tape backups are generated from the primary .net database and system
configurations and are stored in a tape library at the primary data center
facility. Each day, copies of these tapes are created and stored at a short-term
offsite tape storage facility. These tapes are accessible within 10 minutes to
operations personnel. Daily full backup tapes are transported each week from the
short-term offsite tape storage facility to a secure long-term offsite tape
storage facility operated by Iron Mountain, the leading media storage facility
in the United States.
These tapes are retrievable in hours at the request of specifically named and
authorized individuals. Backup retention is conducted in accordance with
agreements between VeriSign and ICANN.
Veritas NetBackup performs scheduled cleaning of tape drives, testing of media,
and tracking of media usage to ensure quality tape backups. Tapes are purchased
in bulk and refreshed near the end of the tape life-cycle. Media destruction is
performed by Iron Mountain.
VeriSign uses several methods to promote successful backups. Veritas NetBackup
provides a GUI interface for visual inspection of backup processes. Veritas also
provides a command-line interface that offers the ability to script backup
commands and poll results. VeriSign uses this command line interface to automate
backups in processes, such as the database snapshot. The results of a backup are
then distributed through email and monitors.
Frequency and procedures for data backup. Full backups of the .net database and
database archives are performed daily. The full backup is a point-in-time
snapshot of the live database. A full backup can be used to restore the database
to the exact condition it was in at the time of the snapshot. The full archive
backup is a collection of all critical database archives. Therefore, VeriSign
can extract historical .net reports, such as the number of domains registered
one year ago, compared to current registered domains.
The backup, which is automatically initiated through the use of a script,
performs the following functions:
1) Creates a snapshot (mirror) image of the live .net database
2) Makes multiple copies of the database, which are then mounted on reporting
and backup servers
3) Initiates the tape backup and remains mounted (in case it is needed for
restoration purposes) until the following day
4) Dismounts the copies and restarts the snapshot process.
Once the full backup has been completed, a flag is sent to the database server,
which begins the data archival process and a script upon successful completion.
At this time, the tape is ejected and sends out a notification from the tape
library, and is placed in a fireproof safe located in another building. Once
every week, Iron Mountain picks up the tapes for storage at its
climate-controlled facility.
Data Format
Data kept on tape storage remains in a native binary format compressed to
improve the volume of storage on each data tape.
Identity of Suggested Escrow Agents
VeriSign currently provides escrowed data of the .net registry database through
Iron Mountain. The registry data is packaged, compressed, encrypted, and
deposited to the escrow agent via secure file transfer. VeriSign performs the
compilation, processing, and packaging of .net TLD data required for escrow.
VeriSign provides this in an electronic format. Under this arrangement, the .net
gTLD database transaction logs are electronically delivered in a secure mode on
a weekly basis to the escrow agent, as well as incremental updates on a daily
basis. The data escrow agent receives the data, conducts verification testing
for completeness and integrity, and stores the data onto DVD. This process
ensures that current registration data is always available. The terms of the
escrow agreement also specify the conditions under which the data will be released.
Retrieval of Data/Rebuild of Database
If a situation occurs that requires data recovery, the severity of the event
determines the specific procedures that VeriSign executes. In the event of a
failure of the primary EMC Symmetrix data storage device, the database would be
recovered from a secondary EMC Symmetrix data storage device at the alternate
primary data center. This second device is maintained up-to-date with real data
mirroring from the primary OLTP database. Therefore, the recovery time is
minimal and the confidence in data integrity is high. Should the primary data
center be rendered completely offline, registration functions would be recovered
at the secondary data center as well. A dedicated EMC Symmetrix data storage
device has been kept up-to-date in real-time, promoting a speedy and reliable
recovery.
With all this data protection, redundancy, and reliability in place, it is
difficult to envision a scenario in which data recovery from tape would be
necessary. However, this contingency has been planned for as well. There are
three EMC Symmetrix frames located in various data center facilities that could
be used to restore data in the event of an emergency. In a worst-case scenario,
where all online copies of the .net registration database are completely
destroyed and the primary data center facility is offline, full recovery of .net
registration functions could be accomplished in less than 48 hours at one of
VeriSign's other facilities. DNS functions, the most critical functions for the
stability of the Internet, would not be impacted. A description of backup
procedures for shared systems, development, operational test and evaluation
environment and corporate services is available at www.verisign.com/nds.
Systemwide Backups. This default type of backup captures all locally mounted
file systems on a given computer system. These are backups can be used in
recovering system configuration and application files needed in case of complete
system failure. This data is kept short-term and does not include critical data,
such as financial records.
Individual Backups (Non-Systemwide). These backups are specifically designed to
capture only the files necessary to recover critical data that is kept
long-term. An example of this would be the backup of the registry database. This
type of backup reduces the number of tape media required, which makes cost and
recovery/search speed much less than a systemwide type backup.
Backup Personnel. VeriSign backup personnel have unique insight in that they
have participated in the architectural, engineering, and operational aspects
that went into the planning, provisioning, and deployment of the backup
solution. These employees have also attended numerous third-party training
courses on the hardware and software components that make up this solution.
These third parties include Iron Mountain, Veritas, and EMC. Training is kept
up-to-date with the latest software versions to come from each vendor. Backup
and recovery methods are continually tested by VeriSign to ensure timely and
accurate recovery in disaster recovery scenarios.
Conclusion
Backups provide the last line of defense against data loss. VeriSign considers
backup processes and procedures a critical facet of our solution and uses
hardened methods to verify that back-ups are made in an accessible and
retrievable format. Some of the primary features and benefits include:
* EMC Symmetrix storage of live database which provides real-time backup of data
to facilitate data retrieval.
* EMC SRDF, which provides rapid recovery of data to minimize operational
downtime and maintain high levels of availability.
* EMC BCV, which reduces stress on the OLTP database and supports data recovery
by providing snapshot of database for backups and batch processes
* Our tape libraries, offsite storage, and relationship with our world-class
escrow provider ensure fast, secure recoverability. |
|
(xi) Escrow. Describe arrangements for data escrow, or
equivalent data backup security, data formats, insurance arrangements and
backup plans for data recovery.
|
An escrow agreement between the registry operator, the escrow service provider,
and ICANN is essential to ensure business continuity and stability of the .net
TLD. In a single week, the average number of modifications to .net registration
data includes nearly 200,000 transactions, including new domain names, deleted
domain names, and modifications of name server IP addresses and transfers
between registrars. The registry operator must have an escrow arrangement in
place with ICANN and a reputable escrow agent. VeriSign has an existing data
escrow service for the .net TLD. This agreement will continue into a new .net
agreement without interruption, thereby never creating a window without escrow data.
VeriSign Advantages:
+ A tested, end-to-end solution for data escrow and recovery
+ Comprehensive insurance arrangements
This section presents VeriSign's comprehensive escrow solution. We define our
solutions for:
* Arrangements for Data Escrow. VeriSign has a long-standing contract with Iron
Mountain Intellectual Property Management for data escrow services.
* Data Formats. VeriSign completes daily and weekly deposit of reports and
meta-data for all ASCII and non-ASCII domain name registrations. These are
tested for accuracy to avoid failure to restore due to data corruption.
* Insurance Arrangements. This contains a description of the property, business
interruption, and general liability insurance that VeriSign maintains.
* Backup Plans for Data Recovery. The process that VeriSign would undertake to
retrieve and restore escrowed data is regularly tested and refined to ensure
execution of the plan.
Failure to have all escrow arrangements in place, even for a short time, could
result in a significant data loss. The risk of a registry failure by a new
operator during a transition period is significant and should warrant
verification that a fully verified escrow process is in place before any
transition of a critical registry, such as .net.
VeriSign tests all backups to ensure the data formats are accurate and the files
are not corrupted. We have a verified, proven record of delivering completeness,
correctness, and integrity of the data within each escrow file.
Organizations with Internet services built on .net domains rely on the
continuity of their domain registrations. Escrow provides ICANN, .net
registrars, and registrants with a high level of confidence in the continuity of
the registration information managed by the registry operator.
In the event of a catastrophic registry loss or termination of the Registry
Agreement, failure to have a verifiable escrow arrangement, or a failure in
retrieving data from escrow could cause irreparable damage. With more than
25,000 changes to registration data in the .net registry database every day,
reconstructing the .net TLD would be extremely difficult, if not impossible,
without current data. This would certainly not be possible in a timely manner.
Data backup by a registry is important for continuity by the registry operator.
In the event that any registry operator is unable or unwilling to deliver data
from a primary or backup source, data escrow with a neutral third party provides
ICANN with this capability.
While VeriSign has architected multiple layers of redundancy in the registry
platform that provide levels of failure prevention, our escrow arrangements and
processes are essential to maintaining the stability of the .net TLD and
ensuring business continuity and will remain in place.
Arrangements for Data Escrow
VeriSign has a long-standing relationship with Iron Mountain Intellectual
Property Management (Iron Mountain), formerly DSI Technology Escrow Service, for
data escrow services for the .net TLD. With over 30 years of experience, $1.5
billion in revenues, and the broadest service platform serving the most global
markets, Iron Mountain is the "leader in records and information management
services." Refer to Iron Mountain's website at www.ironmountain.com more
information about the company and its escrow services.
Our .net Escrow Agreement has been in place since 30 November 2001 with
automatic yearly extensions. In accordance with Appendix S of the .net
Agreement, VeriSign is compliant with all requirements to provide periodic
updates in escrow of registry data in an electronic format mutually agreed on by
ICANN and VeriSign.
VeriSign and Iron Mountain have a proven, rigorous escrow process with daily and
weekly deposits of registry data, as described in the following section. The
remainder of this section discusses the process flow for providing escrowed data
to Iron Mountain, the format and frequency of the escrowed data, the insurance
agreements in place, and the process for recovering and restoring data.
Escrow Process
The steps of the escrow process are listed below and shown in Figure 5(b)xi-1.
1) The report files are concatenated into a single data file.
2) The data file is compressed.
3) A digital signature is applied to the data file based on the message digest
of this file. The data file is also encrypted in this process.
4) This data file is ready for escrow.
Escrow files are then transmitted and stored on a secure FTP server in
VeriSign's data center. VeriSign provides a secure ID and password for Iron
Mountain to use when pulling the file from the secure server.
Upon receipt of the data file, Iron Mountain performs the verification process
within 2 business days of receipt of the escrow data files. The verification
process is as follows:
* Obtain the data escrow file
* Decrypt the data escrow file and verify the digital signature
* Decompress the single file
* Extract the report files using provided scripts. These scripts will produce a
Verification Report that Iron Mountain then submits to ICANN on a monthly basis
* If Iron Mountain discovers that any data files fail the validation process, it
will notify ICANN of the nonconformity within 48 hours
Once the verification process is completed, Iron Mountain will store the file in
a secure location, generate a File Listing, and forward the File Listing and
Verification Report to VeriSign via secure email. This report is forwarded
within 10 days of receipt of the data files.
Data Formats
VeriSign's escrow agreement with Iron Mountain specifies the daily and weekly
deposits of registry data, consisting of a snapshot of each registrar's data. As
shown in Table 5(b)xi-1, the weekly deposits are comprised of a complete set of
registry data. The daily deposits provide a transaction log for each operational
registrar, representing transactions that occurred over the previous 24-hour period.
The daily and weekly escrow processes encapsulate existing registrar reports,
along with certain meta-data, into single daily and weekly escrow files. The
reports contain data for both ASCII and non-ASCII domain names. The format of
this encapsulation enables the escrow service provider to verify the
completeness, correctness, and integrity of the data within the daily and weekly
escrow file.
Completeness, correctness, and integrity are defined as follows:
* A data file transfer is "complete" if all data files transferred from the
source machine are present on the destination machine
* A data file transfer is "correct" if each data file on the destination machine
has the same information content as that on the source machine
* A data file transfer has "integrity" if no data file was altered by a third
party while in transit
Weekly Escrow Files
Verisign deposits a complete set of registry data into escrow on a weekly basis
by electronically and securely transmitting a snapshot of each operational
registrar's data (the "deposit materials"). The snapshot captures the state of
each registrar's data at the time the snapshot was created. The weekly deposit
materials consist of four reports described below:
* Registrar Domain Report. This report contains all domains associated with a
specific registrar. The domain is listed once with each current status and
associated name server. Refer to Figure 5(b)xi-2 for the format of this report.
* Registrar Name Server Report (Listed with IP Address). This report contains
all name servers associated with a specific registrar. The name server is listed
once with each associated IP address. Refer to Figure 5(b)xi-3 for the format of
this report.
* Registrar Name Server Report (Listed with Domain). This report contains all
name servers associated with a specific registrar. The name server is listed
once with each associated domain name. Refer to Figure 5(b)xi-4 for the format
of this report.
* Registrar Common Report. This report contains one row for each registrar.
Fields of the report contain name, location, contact, financial, and business
information. The format of this report is shown in Figure 5(b)xi-5.
Weekly database snapshots are taken at midnight EST on Sundays and are made
available to Iron Mountain no later than 6 p.m. each Monday. Before making the
Weekly Deposit Materials available to Iron Mountain, VeriSign runs the
verification process described above to ensure the file is complete and accurate.
Notification
In conjunction with the delivery to Iron Mountain of the Weekly Deposit
Materials, VeriSign delivers a written statement to Iron Mountain, via secure
email, specifically identifying all items deposited, and stating that the
Deposit Materials have been inspected by VeriSign and are complete and accurate.
The format of the Weekly Deposit email is shown in Figure 5(b)xi-6.
Daily Escrow Files
VeriSign will securely and electronically deposit a transaction log for each
operational registrar representing transactions that occurred over the previous
24-hour period. The logs will be escrowed daily and will include all registrar
activity, such as add, delete, and transfer of a domain name. The daily deposit
materials will consist of one report described below:
Registrar Transaction Report. This report contains transactions associated with
a specific registrar. Domain operations produce one row for each associated name
server. Name server operations produce one row for each associated IP address. A
transaction ID is included to allow unique identification of transactions. The
format of this report is shown in Figure 5(b)xi-7.
Daily transactional data is made available at the close of business, six days a
week, Tuesday through Sunday, for the previous calendar day. For example,
transactional data created on Monday would be available to Iron Mountain on
Tuesday at the close of business. Before making the Daily Escrow File available
to Iron Mountain, VeriSign runs the verification process described above to
ensure the file is complete and accurate.
Notification
In conjunction with the delivery to Iron Mountain of the Daily Escrow File,
VeriSign delivers a written statement to Iron Mountain, via secure email,
specifically identifying all items deposited and stating that the Escrow File
has been inspected by VeriSign and is complete and accurate. The format of the
Daily Deposit email is shown in Figure 5(b)xi-8.
Insurance Arrangements
As a financially sound, U.S.-based, public company with robust technical
capabilities, VeriSign presents negligible risk of a sudden business
interruption to the .net registry. VeriSign maintains property, business
interruption, and general liability insurance to mitigate the risk of a
catastrophic event that would necessitate reassignment of the .net registry with
reconstitution of the registry from escrow. To further provide assurance of
.net's stability, VeriSign has the following insurance coverage and limits:
*Property Insurance
- Coverage. Real property in which VeriSign has an insurable interest,
including personal property owned by VeriSign and property of others in
VeriSign's custody to the extent that VeriSign has legal liability for physical
loss or damage. Coverage for income lost and extra expenses incurred to avoid or
minimize the suspension of business as a result of damage to VeriSign's property
caused by a covered loss.
- Insurer. Factory Mutual Insurance Company (FM Global)
- Property and Business Interruption Limits. $500,000,000 per occurrence loss limit
* Commercial General Liability Insurance
- Coverage. For legal liability as imposed by law or assumed under a contract
for bodily injury, property damage, personal injury, or advertising injury
arising from an occurrence during the policy period.
- Insurer. Chubb
- Limits. $5,000,000 per occurrence
Insurance program information, additional insurance coverage (worker
compensation, professional liability, and auto liability) and limits are
available upon request.
Backup Plans for Data Recovery
The primary plan for data recovery is discussed in Section 5(b)xviii, which
outlines VeriSign's detailed processes and mechanisms to restore the .net
registry. In nearly every possible scenario, VeriSign remains fully capable of
supporting data recovery and is committed to provide ICANN with full
cooperation. VeriSign is also committed to establishing appropriate measures to
facilitate data recovery and full system restoration in the most remote and
unlikely scenarios that might occur in the absence of VeriSign's active
participation. There are three major components to VeriSign's plans for data
recovery:
1) Recovery of Zone File Data. DNS resolution is the most critical aspect of
preserving operations; this data is vital to continuity of DNS operations for
.net. Due to resolver behavior, caching, and other unpredictable behavior by a
wide range of legacy services, depending on the current DNS service, this
service cannot be seamlessly transferred. VeriSign expends considerable effort
and resources to maintain a robust infrastructure. However, in the event that a
smooth transition is not possible, this must be the highest priority issue
addressed.
A complete loss of VeriSign's primary and backup DNS services would require an
immediate modification of the root zone with distribution to each of the root
zone servers designating new .net name servers. VeriSign posts a BIND formatted
zone file for .net on an FTP server, with over 1,000 zone file access customers,
many of whom download the zone daily. A DNS provider, or multiple providers with
the sufficient capacity and experience, would have to be designated to publish
the .net zone file. Upon completion of the root zone modification, DNS traffic
would begin to be routed to the substitute servers. Such a sudden change would
cause an unpredictable shift in DNS traffic.
2) Data Recovery. The data held in escrow by Iron Mountain is validated for
completeness, correctness, and integrity. Refer to Appendix R for detailed data
retrieval provisions specified in the .net Escrow Agreement.
3) System Reconstitution. Restoring the .net registry in a scenario without
VeriSign's participation would require extensive planning, development, and time
by any operator. The extensive cooperation required between registries to
execute a smooth transition is a good indicator of the difficulty and complexity
under the most ideal circumstances.
During the time required for reconstitution, assuming DNS continues to operate
with static zone information, businesses and organization would not be able to
update any data. Therefore, they would be forced to maintain the status quo
until the provisioning system is reconstituted. Given the interdependencies of
.net and nearly every other gTLD and ccTLD, the impact would be tremendous.
Simply migrating data to a new operator would not be sufficient, nor is this
expected to be the most expedient solution to restoring a secure, stable
registry. Therefore, in addition to maintaining all necessary data to restore
services from any of VeriSign's major data centers, VeriSign proposes to
maintain the following information with a third party that would be released to
ICANN in the event that VeriSign were unavailable to support a smooth transition:
* Detailed System Specifications
* Architecture Diagrams
* Technical Documentation
* Operations Guides
Conclusion
VeriSign has a long-standing relationship with Iron Mountain for data escrow
services for the .net TLD. We are compliant with all requirements to provide
periodic updates in escrow. Through a time-proven process, we have a verifiable
record of delivering completeness, correctness, and integrity of the data within
each escrow file.
As a financially sound, U.S.-based, public company with robust technical
capabilities, VeriSign presents negligible risk of a sudden disruption to the
.net registry operation. VeriSign maintains property, business interruption, and
general liability insurance to mitigate the risk of a catastrophic event that
would necessitate reassignment of the .net registry.
VeriSign has a carefully developed plan for data recovery, including provisions
for DNS restoration, provisions for data retrieval, and provisions to facilitate
system reconstitution. |
|
(xii) Publicly accessible WHOIS service. Address
software and hardware, connection speed, search capabilities and
coordination with other WHOIS systems. Frequency of WHOIS updates,
availability and processing times. Identify whether you propose to use a
“thick” registry model or “thin” registry model, and explain why you
believe your choice is preferable.
|
VeriSign has operated the Whois lookup service for .net, .com, and other TLDs
since 1991. We will continue to provide these proven services for the .net
registry, work with the Internet community to improve the utility of Whois data,
while thwarting its application for abusive uses, and will lead the development
of an implementation based on the emerging Internet Registry Information Service
(IRIS) standard.
VeriSign Advantages:
+ Proven track record of providing a fast and reliable Whois service
+ Commitment to solve the problems of Whois through open standards and protocols.
The term "Whois" refers both to the data regarding a domain registration and to
the means to access the data. Both of these definitions stem from the original
definition of Whois specified in RFC 812, which described the types of data
needed for a white pages service of Network Control Protocol (NCP) host
operators of the original ARPAnet and a simple, centralized transactional
protocol for which to access the information. Later updated by RFC 954 to
describe the service over Internet Protocol Version 4 (IPv4) instead of NCP, the
Whois protocol is now most accurately described by RFC 3912.
This section describes VeriSign's publicly accessible Whois service for the .net
registry. We define this service as follows:
* Software. We provide access to Whois via the traditional port 43 Whois/Nicname
protocol interface (as defined by RFC 3912) and via a web-based interface. Our
software is custom-written to meet the high performance needs of over 500,000
queries per minute.
* Hardware. VeriSign operates the .net Whois service on five IBM servers located
at multiple sites. The architecture of our Whois service allows us to scale our
hardware both vertically and horizontally.
* Connection Speed. Our primary site is connected via two OC-3 and two OC-12
connections. Our alternate primary site is connected via two OC-3 connections.
* Search Capabilities. Our Whois search capabilities allow searches on exact
match and partial match of domain names, name servers, and registrars.
* Coordination with Other Whois Systems. VeriSign works carefully with domain
registrars to ensure that no restrictions are placed on their Whois systems. We
are also leaders in the effort to bring about the next generation Whois service,
the IRIS.
* Frequency of Updates. We currently update our Whois database twice per day,
and we are working with the various Internet constituencies to deploy near
real-time Whois updates by the end of 1Q 2006.
* Availability. Our service currently has and will continue to meet availability
performance comparable to our DNS service. We have the ability to throttle back
queries based on individual IP addresses to avoid DoS attacks.
* Processing Time. VeriSign currently processes and will continue to process 95
percent of Whois queries within 5 milliseconds.
* Registry Model. Continuity in the operation of the .net registry is
imperative; therefore, VeriSign will continue to operate the .net registry using
the "thin" registry model and promote the use of the IRIS protocol to give end
users a seamless view of registry/registrar thin data and the thick data of
other TLD registries.
This section includes a discussion of the next generation of Whois services,
IRIS. VeriSign employees are the primary developers of this important new
standard that balances global privacy concerns with legitimate needs for Whois
data access.
The failure to provide an adequate Whois service will seriously hinder the
timely resolution of many technical problems, investigatory phases of law
enforcement, and many other legitimate, non-abusive uses of domain registration
meta-data. One of the fastest growing uses for Whois data today is in the
automated, analytic engines of Internet reputation services used to prevent spam
and combat net-based identity crimes, such as phishing.
VeriSign recognizes the need to maintain the current Whois service as well as
the need for a seamless implementation of new protocol access mechanisms to
Whois data. While VeriSign has been an active participant in the creation of the
next-generation Whois system, we understand that many currently deployed Whois
clients depend on the Whois/Nicname protocol interface on port 43 (as defined by
the Internet Assigned Numbers Authority (IANA) protocol port number registry).
Many of these clients assume that the data contained at whois.internic.net
contains both .com and .net domain registration data. VeriSign is committed to
providing the necessary resources to ensure continued function for the currently
deployed clients, while promoting unhindered solutions, such as the new IRIS
protocol.
Software and Hardware
VeriSign currently answers between 300 and 400 million Whois queries per month
for .net; we are capable of handling peak capacities of 500,000 queries per
minute in peak conditions. Only .com, with between 800 and 900 million queries
per month and also operated by VeriSign, exceeds the number of queries processed
for .net. Other gTLD registry operators process no more than approximately 100
million queries per month.
VeriSign operates two Whois environments containing information from the .net
registry: whois.verisign-grs.net and whois.internic.net. As depicted in Figure
5(b)xii-1, both services contain identical information, and each environment can
be housed in a separate facility. For each environment, VeriSign operates a
web-based Whois service using standard Common Gateway Interface (CGI) methods
and a Whois/Nicname protocol interface on port 43, as specified by the IANA
protocol port registry (this is commonly referred to as the "command-line"
interface).
At each facility, the Whois servers are connected to the Internet by multiple
OC3 connections (450 Mb of network bandwidth). Redundant routers, QoS devices,
and load balancers at each site provide reliability and scalability. Finally,
refined and tested intrusion detection software and procedures provide proven
security in the Whois environment.
VeriSign's redundant Whois databases further contribute to overall system
availability and reliability. Our database servers use five IBM servers with
memory and kernel configurations tuned to deliver the fastest possible query
response. The Whois software is written by VeriSign and is fully compliant with
RFC 3912 (that obsoletes RFC 954), which was written by a VeriSign employee. It
uses an advanced in-memory database technology to provide overall system
performance and security.
The hardware and software for our service are architected so that we may scale
our Whois service both horizontally (adding more servers) and vertically (adding
more CPUs and memory to existing servers) to meet future need.
Connection Speed
We understand the importance of reliable and high-performance Internet
connections in the Whois environment. Each facility hosting the Whois service
has multiple Internet connections. The primary Whois facility connections
consist of two OC3 connections and two OC12 connections, while the alternate
primary facility also has two OC3 connections. Each facility is served by more
than one ISP. Our Whois databases are optimally tuned at the TCP network layer
to accept connections, deliver query responses, and drop the connection at
levels required to support millions of Whois queries each day.
Search Capabilities
The information contained in the .net Whois database is based on the thin
registry model. Record types that can be searched include .net domains, name
servers, and registrars. The search for name servers can be performed using IP
addresses or host names. Whois will perform a broad search, based on the user
input. Searches will match everything beginning with the input if a trailing
period ('.') or the 'PArtial' keyword is used. Using the following
keywords/characters, a user can narrow his search or change the behavior of Whois:
Expand: Show all parts of display without asking
FUll or '=': Show detailed display for EACH match
SUMmary or '$': Always show summary, even for only one match
HELP: Enter help program for full documentation
PArtial or trailing '.': Match targets STARTING with given string
Search Examples:
domain root
nameserver nic
nameserver 198.41.0.250
registrar Network Solutions Inc.
net.
= net
FU net
$ ibm.com
SUM ibm.com
Conducting a search for a domain, name server, or registrar using its full name
will ensure that a search matches a single record. If VeriSign implements a
thick registry during the term of the agreement, Whois search capabilities will
be extended to search for additional record types for registrants and contacts.
Coordination with Other Whois Systems
VeriSign has the ability to fine-tune access to our Whois database on an
individual IP address basis. We currently work with domain registrars to ensure
that their services are not limited by any restrictions placed on Whois.
Frequency of Whois Updates
VeriSign currently updates the .net information in Whois twice per day, and is
committed to enhancing this service with near real-time updates. Full data
extracts are performed against a copy of the registration database and copied to
the Whois servers for publication.
VeriSign has experience performing near real-time updates in the .name Whois
service. As information is updated in the .name registration database, the
information is propagated to the Whois servers for quick publication. This type
of change is not something that should be rushed into production without careful
consideration and coordination with appropriate Internet constituencies. Upon
completion of system testing and coordination, VeriSign intends to provide
real-time updates for the .net Whois service to align with the real-time
publication of DNS information as it is updated in the registration database.
While the technical requirements for real-time or near real-time updates are not
complicated, VeriSign and the Internet community must consider the impact of
more frequent updates. The Whois database will become a more lucrative target
for data mining, or other behavior changes that lead to significantly increased
queries. VeriSign intends to implement near real-time Whois updates; while it is
important to understand the assumptions of increased traffic to bandwidth,
server capacity, and rate limiting solutions, we must also consider the
non-technical impact, such as privacy concerns, needs of the intellectual
property community, and impact to registrar systems.
Availability and Processing Time
With 100 percent availability, our Whois response time is less than 5
milliseconds for 95 percent for all Whois queries. The response time, combined
with our capacity, provides the capability for the Whois system to respond to up
to 30,000 searches (or queries) per second (a total capacity of 2.6 billion
queries per day). As shown in Figure 5(b)xii-2, our Whois service has sustained
loads over 1.38 billion queries in a peak month, and we have seen query traffic
as high as 60 million queries in a single day.
To further promote reliable and secure Whois operations, the current Whois
service has rate-limiting characteristics within the software, such as the
ability to throttle a specific requestor if the query rate exceeds a
configurable threshold to prevent abusive behavior, such as data mining. In
addition, QoS technology enables rate limiting of queries before they reach the
actual servers, which provides protection against DoS and DDoS attacks. Our
software also permits restrictions on search capabilities. For example, wild
card searches can be disabled. VeriSign is generally not in favor of restricting
searches unless it is clear that the results of the search are being used in
ways not beneficial to the .net registrants. If the need arises, it is possible
to temporarily restrict and/or block requests coming from specific IP addresses
for a configurable amount of time. Additional features that are configurable in
the Whois software include help files, headers and footers for Whois query
responses, statistics, and methods to memory map the database.
Since many registrars use Whois to verify name availability during a scheduled
maintenance period, we never perform scheduled maintenance on the SRS and Whois
at the same time.
Registry Model
The information contained in the .net Whois database is based on the thin
registry model. It contains records for .net domains, name servers used to host
domains delegated in .net, and contact information for .net capable registrars.
VeriSign believes that the thin registry model is the best overall fit for .net
for the reasons: described in Table 5(b)xii-1.
As described above, the primary discriminator in favor of a thick Whois model is
the impact to end users who lack a single source of data and a consistent Whois
interface. Both of these factors are addressed through the IRIS protocol. A
sample of Whois data from thick registries shows that registry Whois data is
frequently inconsistent with the data displayed by the associated registrar.
This sample data highlights the potential end user confusion and registrar
challenges of keeping even a modest amount of data current across multiple
systems. Not all registrars are willing to disclose details about their customer
base that is inherently available through a thick registry model. The increasing
trend of registrars who offers private registrations exacerbates this problem.
Through our IRIS and coordination with registrars, we will offer seamlessly
integrated access to .net Whois data contained in the .net registry and .net
registrars that offer the advantages of a thick registry, without the
liabilities of an unnecessary conversion.
Enhancing the State-of-the-Art
In addition to continuing to deliver Whois services through the current
architecture, VeriSign announced a pilot IRIS service, based on the standard
documented in RFCs 3981, 3982, and 3983, and developed in coordination within
the IETF, with input from the registrar community. In conjunction with the IRIS
pilot, we will contribute to the further development of technical standards and
policies related to centralized Whois access. We are a leader in the Cross
Registry Information Service Protocol (CRISP) working group of the Internet
Engineering Task Force (IETF), which is actively defining this new protocol to
allow greater flexibility for coordination among Whois systems. The IRIS
protocol applies the lessons learned over the past 20 years to resolve many of
today's Whois lookup problems. In this context, IRIS offers:
* Access controls to address privacy concerns, while allowing flexible policies
in accordance with law and property rights enforcement
* Standards-based support of IDNs and other internationalization features, such
as client localization
* Standardized mechanisms for structured queries and responses
* Search continuations and entity references
* DNS label server location.
We will work with ICANN, registrars, and applicable communities parties to
determine an appropriate transition from the current Whois implementation to the
IRIS implementation. Our current plan is to implement an IRIS pilot program
early in 2005. Key to our approach is the timing for the final "cutover" from
the current version of Whois to the IRIS-based Whois capability. Refer to
Section 5(b)xvii, Ability to Support Current Feature Functionality of .net, for
more information about VeriSign's IRIS pilot.
The IRIS-based Whois system we are developing will comply with Appendix W of our
.net agreement with ICANN and demonstrates our commitment to identifying,
developing, and implementing new technologies capable of enhancing DNS
functionality. Another important VeriSign innovation related to our Whois
services is the lightweight domain availability service currently under
development. This new technology will enable more efficient domain name
availability queries to benefit registrars and Internet users.
No other TLD operator has been as active in finding true community consensus for
solutions to Whois problems. In addition to participation in IETF working
groups, ICANN task forces, and numerous privately held design team meetings of
key stakeholders, VeriSign has held public hearings and talks at the following
events:
* First UWho Consultation, August 15, 2001; Washington, DC, USA
* Second UWho Consultation, November 15, 2001; Marina del Rey, CA, USA
* Third UWho Consultation, November 19, 2001; Washington, DC, USA
* DNR WG of RIPE 40, October 1-5, 2001; Prague, Czech Republic
* Database WG of RIPE 40, October 1-5, 2001; Prague, Czech Republic
* General Session of NANOG 23, October 21-23; Oakland, CA, USA
* DNR WG of RIPE 41, January 14-18, 2002; Amsterdam, The Netherlands
* Database WG of RIPE 41, January 14-18, 2002; Amsterdam, The Netherlands
* NANOG 24 Universal Whois BOF, February 10-12, 2002; Miami, FL, USA
* CENTR General Assembly, February 21-22, 2002; Rambouillet, France
* CENTR Technical Advisory Working Group 11, July 17, 2003; Vienna, Austria
* CRISP/EPP BoF at APRICOT, February 26, 2004; Kuala Lumpur, Malaysia
* Database WG of RIPE 48, May 3-7, 2004, Amsterdam, The Netherlands
* DNS WG of RIPE 49, September 20-24, 2004, Manchester, UK.
Through our commitment for finding true community consensus to the problems of
Whois, we have also worked with the RIRs in pursuit of solutions that affect our
common constituencies. VeriSign has held private consultations and symposiums
with the American Registry for Internet Numbers (ARIN), the Reseaux IP Europeens
Network Coordination Center (RIPE NCC), and the Asia-Pacific Network Information
Center (APNIC).
Unlike other gTLD operators, VeriSign has invested resources in the development
of advanced client and server software to meet the needs of sophisticated users
and to better serve non-English speaking end users. VeriSign is unique among
gTLD operators in that we make this software freely available under common,
well-understood open source licensing terms.
Conclusion
VeriSign has a proven track record of providing a fast and reliable Whois
service to support the registry functions of .net and other TLDs. Combined with
our commitment to solve the problems of Whois through open standards and
protocols, open source software, and consistent dialog with the Internet
community, we can confidently describe our Whois service as the best in the
industry. |
|
(xiii) System security and physical security.
Technical and physical capabilities and procedures to prevent system
hacks, break-ins, data tampering and other disruptions to
operations.
The applicant's response to Part 2, Section 5, Subsection
b, Question xiii will remain confidential to ICANN and the independent
evaluators. ICANN will use reasonable efforts to avoid publication or
release of this information, but in no circumstance will ICANN's liability
for any release of this information exceed the amount of the application
fee.
|
[RESPONSE IS CONFIDENTIAL] | |
(xiv) Peak capacities. Technical capability for handling a
larger-than-projected demand for registration or load. Effects on load on
servers, databases, back-up systems, support systems, escrow systems,
maintenance and personnel.
|
VeriSign has engineered the .net registry as a system of integrated components,
based on peak demands to deliver highly reliable, highly available, and highly
scalable registration and resolution systems for the .net registry. Our systems
and processes for stress, load, and performance testing are scaled to exceed the
projected .net capacity demands, including the addition of IPv6 and DNSSEC.
VeriSign Advantages:
+ Capacity to support over 400 billion queries daily with peaks over 500,000
queries per second
+ Registry systems proven reliable through DDoS attacks, viruses, worms, and spam
+ Scalability to meet extraordinary demand to maintain business continuity
+ Demonstrated experience in peak capacity planning and forecasting
This section describes VeriSign's capabilities for capacity planning to meet
peak demands, including:
* Technical capability for handling larger-than-projected demand for
registration or load. VeriSign proactively designs systems to scale both
horizontally and vertically in anticipation of peak load requirements, and have
demonstrated experience in maintaining excess processing capacity.
* Effects on load on servers, databases, backup systems, support systems, escrow
systems, maintenance, and personnel. VeriSign has gained unique experience as
the longest-serving gTLD operator in the world. Our operational experience helps
us understand and plan for the effects of load on all of our systems.
Since VeriSign began operating domain name registries, registration rates have
grown from 400 per month to exceed 1 million new registrations per month. Total
transaction volumes have increased at an even greater rate, with a peak daily
volume of more than 225 million transactions. However, the average transaction
response time has been reduced to less than one-twentieth of its value 4 years ago.
Over the past 4 years, VeriSign's DNS resolution demands have grown at an
exponential rate from less than 1 billion queries per day to over 13 billion
queries per day, with peaks over 15 billion per day. Although .net comprises
approximately 15 percent of the combined .com and .net registrations, .net is
responsible for approximately 30 percent of the DNS queries. The peak for .net
queries routinely exceeds 60,000 queries per second. During a DDoS attack, this
query rate can easily exceed many multiples of the routine peaks.
Technical Capability for Handling Larger than Projected Demand
VeriSign's approach to addressing peak capacities is a three-step process, which
is outlined by setting scalability strategies, scalability standards, and
capacity planning.
Scalability Strategy. VeriSign systems are designed to perform consistently and
predictably through anticipation of loads greater than known peaks. Our capacity
management begins by analyzing historical data for .net domain growth, DNS query
growth, and registrar transaction behavior. New registry services, such as IPv6
and DNSSEC are also considered. Network, server, and storage architectures are
designed, scaled, and tested to provide capacities for unexpected peak loads.
Capacity planning and monitoring provide feedback for production systems to
determine when scaling is required and when projections should be adjusted.
VeriSign has incorporated QoS considerations into the design and testing
requirements for mission critical registry services. For example:
* Hardware is selected for its ability to meet the forecasted demand for 4 years.
* Load-balanced arrays, such as resolution, gateway, or application servers, are
N+1 horizontally scalable systems.
* Core systems, such as OLTP servers and mass storage frames, use state-of-the
art equipment, and provide substantial vertical scalability.
* Component performance is profiled and analyzed to determine the effects of
various workload factors, including: transaction type, rate, and connection count.
* Provisioning systems are designed and tested to handle, at a minimum, double
the expected peak workloads, without deterioration in the QoS.
* Critical systems are tested to scale in a QA environment and re-tested in an
operations staging environment before every upgrade, configuration change, and
code deployment to verify consistent performance under stress.
VeriSign engineers have developed test cases, scripts, load generation
resources, and have the extensive experience and tools necessary to evaluate the
results. All changes to the registry system undergo the following tests before
being deployed into a production environment:
* Stress Testing. To validate that systems will perform to specification,
VeriSign conducts full load tests in the QA environment. This verifies that
systems meet capacity and performance requirements without degradation over an
extended period of time.
* Load Testing. Realistic loads are generated and applied to the system to
validate that the system will exceed the system capacity design specifications.
The loads are continually increased to determine how far beyond the design
specification systems will operate before they begin to fail.
* Performance Testing. By evaluating a series of related metrics, such as
transaction throughput, response time, and CPU utilization for each component of
the system, we test and validate the expected performance in preproduction
environments.
The following are two examples of technologies implemented by VeriSign:
The SRS network infrastructure includes QoS equipment that ensures each
registrar receives an equitable slice of system bandwidth. This equipment allows
VeriSign to identify and resolve aberrant behavior to preserve system
performance for all registrars.
1. A new form of competition has evolved resulting in highly automated, repeated
attempts to register a name. In this scenario, competing registrars run large
arrays of systems consuming all available registry resources. Traffic rates of 3
million registration attempts per hour for .net are not uncommon. VeriSign
designed and deployed a discrete pool system that maintains equivalent access,
while separating the DDoS style registration workload from the conventional
registration activity.
2. Process workload monitoring and QoS are designed to anticipate and rapidly
detect problems and discussed in Section 5(b)xvi. Our QoS metrics, detailed and
analyzed in Section 5(b)xv, exceed our SLA performance metrics.
Resolution System Scalability Standards. The ATLAS resolution system is scaled
for extraordinary demands associated with the exponential growth of Internet
communications. Worms, viruses, spam, and DDoS attacks provide constant
challenges to availability. VeriSign offers a resolution system with the
scalability that absorbs the impact of Internet events, while providing the
Internet community with continued high-performance DNS resolution services for
the .net domain.
VeriSign addresses this requirement by scaling each resolution site to the
following standards:
* Scale sufficient to handle the 4 times the projected growth of the zone file
over the next 6 years, including IPv6 longer IP-address for IPv6 and the
addition of 128-bit keys for DNSSEC.
* Scale to handle DNS query loads, including normal peaks and projected growth.
* Scale to handle events, such as DDoS attacks and traffic generated by viruses,
worms, and spam. VeriSign designs and deploys excess capacity of at least 10
times the measured peak rate on the most loaded server as a requirement. To
handle simultaneous attacks across the network, VeriSign's maintains multiple
geographically dispersed points of presence.
Registration Systems Scalability Standards. Transaction volumes and the number
of registrars, together with developments, such as IPv6 and DNSSEC, present
challenges to deploy scalability systems. VeriSign standards for .net
registration system include:
* Scale beyond historic peak volumes and projected growth
* Scale to 2 times base capacity
* Database CPU utilization less than 50 percent during peak loads
* Database memory allocations less than 40 percent of the total available
physical memory
Capacity Planning
VeriSign's dedicated capacity planning engineers work with development engineers
and QA teams to baseline performance profiles for each registry system. Results
for all transaction types are benchmarked, including: CPU usage, input/output
rates, and operations rates. The performance points provide measurable
thresholds to determine when increased capacity is required. VeriSign performs
real-time monitoring and historical analysis and reassesses the performance
points with each major architectural change in hardware or software.
Effects of Load on Servers
SRS Server Load. The SRS application and gateway servers are affected by
transaction load and connection count. VeriSign systems are designed to deliver
consistent performance during tremendous transaction spikes. A typical day's
transaction volume is shown in Figure 5(b)xiv-1. Daily peaks are the result of
demand for expired domain names. Optimum system performance is characterized
with an orthogonal test matrix. VeriSign engineers determine the most efficient
processing level for each type of system. They then tune system components and
QoS equipment to the appropriate connection and transaction rate protocol.
VeriSign has deployed excess capacity to handle twice the observed and expected
peak systems load. QoS hardware meters the incoming workload to prevent systems
from being driven to a queuing condition. Figure 5(b)xiv-2 shows a typical day's
response times with minimal effect corresponding to the daily demand surge for
expired domain names.
DNS Server Load. VeriSign places the highest priority on delivering 100 percent
availability, while maintaining the capacity to support this growth and weather
attacks without sacrificing response time. The DNS server load is comprised of
the zone file size, frequency and number of updates, and volume of DNS queries.
VeriSign's current system easily handles increases in the size of the .net zone
file. For example, the .com zone is more than 6 times the size of the .net zone
and our ATLAS resolution system stores the entire zone file in physical memory.
The effect of zone file updates on the .net DNS servers is determined by the
number of domains and name servers added, deleted, and modified. Currently,
VeriSign supports an average of more than 130,000 updates per day, while
maintaining 100 percent data accuracy and integrity. These updates are
distributed via highly available VPNs to each resolution site. While our DNS
capacity requirements are driven by the peak demand for .com and .net lookups,
.net peak queries routinely exceed 60,000 queries per second. During a DDoS
attack, our servers are scaled to support many multiples of the routine peaks.
Load on Whois Servers
VeriSign applies the same capacity methodologies to the Whois services that
apply to other VeriSign registry systems. Details of Whois service are described
in Section 5(b)xii.
Effect of Load on Databases
VeriSign's registry storage systems are continually assessed to meet future
demands. The most recent upgrade, deployed in May 2004, was designed to scale to
the projected needs for the next 4 years, with the ability to scale horizontally
without the need for a major outage. This design point includes the possibility
of conversion to a thick registry, which would add contact records to the
registry database. Upcoming technologies, such as IPv6 and DNSSEC, together with
the increasing number of registrars and .net domains, necessitate highly
scalable OLTP and reporting databases.
Database Performance. VeriSign designed the .net database systems to maintain
CPU use rates below 50 percent during the daily peak transaction demand to
maintain stable registry operations. Figure 5(b)xiv-3 shows a sample 24-hour
time range that includes the period when expiring names are released. The
minimal relative increase in system usage is compared to the increase in demand.
Database Scalability. To alleviate load on the OLTP database, Verisign uses BCV
technology to make several copies of the database to run reports, DNS, Whois,
and backups. The growth of the OLTP database and transaction logs is reflected
for each mirrored copy of the database that could be used for backups and
generation of reports. Implementation of DNSSEC will provide significant
increases of database size.
Extracts and Reporting Capacity
BCV instances are used by separate systems for the production of reports, Whois
data, and DNS data. Therefore, substantial changes in workload and batch
activity will not degrade provisioning and resolution services.
Effects of Load on Backup Systems
The VeriSign backup and restore implementation, which relies on a proven
five-tiered backup design, mitigates typical problems associated with subjecting
traditional tape backup systems to increases in data quantity if implemented on
a production network. The VeriSign approach, therefore, mitigates the risks of
increased backup times and increased restore times associated with increased
load. The details of this process are described in Section 5(b)x.
Effect of Load on Support Systems
VeriSign's support infrastructure is mature, efficient, and capable of handling
the next 6 years of projected increases in registration demand. Increases in
registration or resolution transaction load might cause residual effects within
the support systems. Typical effects include the following:
* The NOC supports .net and other domain name registries. Short-term changes in
registration demand do not directly affect these systems. Long-term changes,
such as an increase in the number of systems to be monitored, have an
incremental, but minimal overall affect.
* Security systems such as firewall, VPN, and intrusion-detection-devices, are
already sized appropriately to handle the demands of .com and .net. This
equipment is also guarded by QoS systems, which can deny overload traffic, as
described earlier in this section.
* Customer services are most affected by the number of registrars and the rate
of certifying new registrars. Customer Service has maintained consistently
superior service, described in Section 5(b)xix, as the number of operational
registrars has grown from 61 to more than 250 over the past 4 years, as shown in
Figure 5(b)xiv-4. During this time, VeriSign aggressively worked to streamline
the ramp-up process, reducing the average time for a registrar to become fully
operational by 30 percent. VeriSign has been extremely successful supporting
this growth.
Facility Support
VeriSign's facilities are prepared to rapidly scale systems in response to
increased loads. Each data center is designed with extra capacity for rack
space, power, and cooling to accommodate sufficient servers and storage systems
exceeding optimistic growth forecasts over the next 6 years.
Effects of Load on Escrow Systems
Escrow load is proportional to the total number of registered domains.
Short-term increases in registration traffic do not significantly affect
VeriSign's escrow services. Larger than projected demand for registrations will
ultimately increase the volume of data backed up for data escrow. The result is
increased generation times for data escrow reports, increased data storage for
the larger reports, and increased transfer times to the third-party escrow
agent. VeriSign has proven the ability to scale escrow systems to support more
than 35 million domains.
Effects of Load on Maintenance and Personnel
Registry system hardware maintenance is typically scheduled during off-peak
hours for load-balanced systems or during scheduled maintenance periods for
mission critical systems. Mission critical systems are designed so that the
failure of a single field replaceable unit results in performance that will meet
production demands. This type of architecture permits scheduling repairs during
off-peak hours or during planned maintenance windows.
VeriSign's registry systems are sized, based on the following system maintenance
considerations:
* N+1 or greater status for load-balanced systems is maintained at all times.
Load-balanced arrays are expanded, according to measured and forecasted demand.
* Mission critical systems are sized to operate to meet production demands, even
in degraded mode, to sustain nominal and design peak workloads.
* Mission critical systems are configured in symmetrical high availability (HA)
pairs and/or include fault tolerance. The OLTP database HA configuration
provides reliable detection of failed components.
* VeriSign maintains fully redundant data centers to shorten maintenance
periods. The maintenance can be performed at an alternate site; services are
then redirected to that site while maintenance is completed at the primary site.
* VeriSign maintains three standby resolution sites (in addition to the 14 sites
around the world) that provide the ability to maintain full resolution service
capacity without impacting availability even during peak periods.
Personnel
VeriSign's .net registry does not rely on routine manual components that are
affected by changes in registration demand. Indirect effects of increases in
registry system load include:
* Software Engineering, QA, and stress testing must support and meet higher
performance benchmarks. VeriSign has a mature development infrastructure with 5
years of experience supporting .net, and other high-volume DNS architectures.
* Network Administration must manage increased bandwidth and server use. Scaling
of network components, provisioning of multiple ISPs, and load balancing leads
to efficiencies in manpower.
* Systems Administration will see incremental increases in workload due to
larger load balanced arrays. VeriSign has achieved exceptional efficiency by
using well-developed standards and automation tools.
* Database Administration will see incremental increases in workload due to
transaction volumes. VeriSign's database and storage infrastructure provides
capacity to processed increased transactions without operator involvement.
* The 24x7x365 on-call staff can receive more system alerts and escalations due
to increased numbers of systems or the workload on those systems. Potential
system overload problems are unlikely due to the QoS design characteristics of
our registry services.
* Customer Service and Customer Affairs workloads generally increase with the
number of registrars and not with the load on registry systems.
* The 24x7x365 NOC uses VeriSign's highly automated alerting system to achieve
efficiency of scale.
Conclusion
VeriSign provides a proven history of reliable operations for the .net registry.
Our demonstrated performance includes:
* A proven, reliable SRS:
- Eliminates the risk of transition
- Preserves continuity of business for registrars with processes based on known
system interactions
- Delivers assurance of QoS through demonstrated ability to handle peak loads.
* An SRS scaled for extraordinary demand driven by the secondary market for
deleted domain names:
- Allows registrars to compete aggressively for valuable domain names, while
delivering equivalent access.
- Maintains continuity of service of both registration and resolution systems.
* A staff experienced in peak capacity planning with demonstrated ability to
deliver systems that exceed peak demands:
- Provides security and reliability through proven ability to deflect continual
abuse, inadvertent misuse of services, and DDoS attacks. |
|
(xv) System reliability. Define, analyze and quantify
quality of service.
|
VeriSign has a stable and robust DNS and SRS system that have provided
consistent availability. In addition, VeriSign customer service has provided the
entire registrar community with industry-leading service.
VeriSign Advantages:
+ 100 percent DNS resolution for over 7 years and industry leading SRS reliability
+ Resilience to myriad forms of DDoS attacks, viruses, worms, and spam
+ Numerous proactive methods to ensure the stability and security of the Internet.
This section includes the following:
* Definition of System Reliability. The probability that a system, including all
hardware, firmware, and software will satisfactorily perform the task for which
it was designed or intended, for a specified time and in a specified environment.
* Analyze Quality of Service. VeriSign measures the availability, data accuracy,
and response times of the registration and resolution systems.
* Quantify Quality of Service. The most significant portion of the registry is
resolution; this can be summarized as being available 100 percent of the time.
The registration component has been meeting service levels above 99.9 percent
uptime.
VeriSign has designed reliability into the registry solution and sites,
resulting in a scalable and supportable infrastructure to achieve availability.
The concepts of simplicity and parallelism have been pursued in both code and
systems. Going counter to feature-rich development patterns, the number of lines
of code between the end user and the data delivered is intentionally minimized.
The result is a network of restorable components that provide both rapid,
accurate updates and name resolution lookups.
Definition of System Reliability
System reliability is the probability that a system, including all hardware,
firmware, and software, will satisfactorily perform the task for which it was
designed or intended, for a specified time and in a specified environment.
* Availability is defined as the percentage of time for which a system is
available for use. That is, A = Uptime/(Uptime + Downtime). As downtime
approaches zero, availability goes to 100 percent. Thus, where component
reliability fails, redundancy, frequent failover testing, and a supply of spare
parts reduce downtime, and therefore, increase availability.
* Operational Availability: We measure every incident where a system or any
significant component thereof becomes unavailable for any reason, and how long
it stays unavailable. The availability for the existing .net provisioning system
takes into account planned and unplanned downtime. Overall availability is the
product of planned availability and unplanned availability.
* Data Accuracy: Data accuracy is defined as the percentage of
registrar-initiated transactions that are eventually distributed to the name
servers and Whois servers with no data loss or data corruption.
Performance: Provisioning system performance is defined as the time from when
the registry receives a request to add, modify, delete, or query the .net
domain, measured starting at the registry gateway, to when the system returns a
response.
For actual DNS traffic, internal response times are measured from receipt of DNS
query at a name server until said name server provides the response.
Additionally, ICANN has established the Cross Network Name Server Performance
metrics from four geographically dispersed sites.
Analysis of Quality of Service
Our registry environment has a provisioning system with customer-facing
gateways, web servers, and ancillary servers. These hosts connect back to
application servers, which connect to a redundant database of record.
Performance is increased by adding more servers and bandwidth, and by increasing
the computational throughput of the database server. Refer to VeriSign's website
for a detailed analysis of quality of service at verisign.com/nds.
The difficulty lies in providing a consistently performing system. Quality of
Service can then be expressed in terms of availability:
*95 percent - A nonredundant three-tier provisioning system running mature code.
*99 percent - Adding local redundancy in the way of parallel network gear,
servers, and database clustering.
*99.9 percent - Adding a full, pre-equipped, four-hour failover disaster
recovery site with a zero data loss replication mechanism.
Downtime includes planned maintenance, link failures, code-related breakage,
data center issues, etc. A scalable approach to build further availability into
the SRS is to add redundant active provisioning sites, but in a manner that
doesn't introduce complex interdependencies. We can achieve 99.99 percent system
availability by making the recovery site active, but directing the application
servers at this site toward the database at the primary site. Synchronous
database commits at both sites ensure zero data loss in the event of failure.
We can similarly examine alternative approaches to resolution sites:
Single site - If .net resolution traffic peaks at 70 million queries per second,
then all DNS requests could be serviced via a single Internet feed using a fast
BIND server. However, there would be a minute or two of downtime every time the
zone file was refreshed.
Multisite - At first glance, setting up six or more 95-percent available
parallel sites should result in zero downtime for DNS resolutions. But all DNS
resolvers do not always query all advertised name servers. Network failover
mechanisms such as BGP anycast do not always respond appropriately to failure,
with slow convergence.
Local redundancy - This reduces the likelihood of any one site failing. Single
site availability in normal traffic patterns should climb to 99 percent by
adding local redundancy. Yet a single bad update still corrupts all sites.
Validation code - Code is introduced to validate updates before they are
distributed, and possibly additional code is used to compare query responses at
the resolution sites to the system of record. A DDoS attack or other asymmetric
event can increase traffic eight-fold on the resolution sites, overwhelming the
system.
VeriSign's DNS solution recognizes the shortfalls of each of these approaches by
maintaining 14 operational sites with 3 hot standby sites. Each site is fully
locally redundant to maintain service during updates. Zone updates are validated
during each step of the generation, distribution and publication process.
Detailed description of our ATLAS DNS service is provided in Sections 5(b)ii,
vii and viii).
Quantify Quality of Service
VeriSign's core SRS currently is run from one locally-redundant site, with an
equivalent alternate primary site ready to receive transactions. Each
provisioning site is comprised of about 100 servers; overall availability is
above 99.5 percent per site. The SRS also been operated in an active-active
configuration, which is necessary for further increases in overall system
availability.
Each of VeriSign's resolution sites has from 12 to 24 servers; most of these are
front-end protocol engines, which receive the Internet DNS queries and transmit
responses. In addition, a BIND backup service on separate servers also exists at
these sites, in the event of loss of ATLAS. Diverse systems are used to resist
day-zero exploits. Hardware is replaced on a 3 to 5 year cycle.
The provisioning system and ATLAS system components are described below in terms
of accuracy, performance, and availability. Detailed reliability predictions of
each component layer are made based on past performance, and overall system
operational availability is then determined [Tables 5(b)xv-1 through 3].
Analysis of Provisioning System
VeriSign is the only registry to undergo an annual audit by ICANN to ensure all
registrars are guaranteed equivalent access. Equivalent access for the .net
registry is managed via a QoS device that acts as a front-end to the SRS. This
QoS device manages two critical aspects of registrar access to the .net database:
SSL connections - Each registrar is permitted the same maximum number of SSL
connections to the database.
Equivalent access - The QoS device is configured so that each registrar has an
equivalent amount of network bandwidth.
When the SRS experienced massive load spikes, resulting from intense registrar
demand for expired and subsequently deleted domains, VeriSign quickly deployed a
unique system where registrars access the registry through three connection
pools, each with equivalent connections and associated bandwidth.
Reducing capacity for registrar connections, transaction rates, or response
times to levels comparable to other registries would require changes to
registrar systems and/or reduce their business transactions. For example, if
domain name check response times increase from 10 to 400 milliseconds,
registrars can lose customers whose first choice of domain is not available and
might be unable to offer suitable alternatives as quickly.
Table 5(b)xv-4 compares response times and the standard deviation of those times
for different registries. The exceptionally rapid response times for .net
consistently exceed those of other registries. Just as important is the
inconsistency of response times of other registries, as indicated by the large
standard deviation.
Over the past 2 years, the average time of an SRS outage has been less than 50
minutes, which includes time to identify, diagnose, and resolve the problem. The
.net registry has never exceeded the time announced for scheduled maintenance.
This provides registrars with consistent, predictable performance and rapid
restoration of service when infrequent incidents occur.
Table 5(b)xv-5 demonstrates the availability comparisons among the generic
top-level domains (gTLDs). The information summarized in Table 5(b)xv-5 is
reported monthly by each gTLD and posted at
http://www.icann.org/tlds/monthly-reports/.
The VeriSign SRS is designed with a focus on the integrity of each database
transaction. The use of well-known Oracle and EMC technologies promotes
transactional integrity. These technologies also serve as the foundation for
comprehensive and redundant logging (Refer to Section 5(b)x).
Provisioning System Availability Review
The primary provisioning site is hosted at a VeriSign data center. An
equivalent, alternate primary site is also active at a separate VeriSign data
center facility. VeriSign advertises four autonomous systems at each SRS site
through independent providers. Edge routers perform BGP load sharing. They are
used to transmit registry registrar protocol requests, Whois requests, and
various other services, such as https, FTP, and email. VPNs also are in place
from each SRS site to the individual DNS resolution sites; these are used to
transmit updates and receive log information.
Figure 5(b)xv-1 depicts RRP traffic flows and local redundancy in any of the
resolution sites. Similar paths are followed for the Whois and https traffic
into the provisioning site. Local redundancy is maintained for each piece of
equipment.
On average, the individual pieces of equipment (routers, switches, load
balancers, servers) have shown availability of about 99.99 percent. Spare
network and host equipment is available at the SRS data centers.
Switches: Load is segmented across trunked switches: one-half of the protocol
gateways are attached to one switch, while the remaining gateways are connected
to the other switch. The firewalls are similarly connected. This was a
deliberate decision to favor simplicity over a cross-connected cabling approach.
Load Balancers: Three pairs of load balancers are deployed in the provisioning
system. In each case, two load balancers are set up in an active-standby
configuration. One set of load balancers is dedicated to the RRP gateway
traffic; another is dedicated for Whois traffic; and a third is used for web,
mail, and file transfers. In each case, the active load balancer distributes
load via a round-robin mechanism to the underlying servers. The choice of
round-robin load distribution was made in favor of simplicity.
Shared services servers: Web, mail, and FTP servers are set up in an N+1
configuration.
Gateway Servers. As depicted in Figure 5(b)xv-1, two branches of gateways are
deployed at each SRS site. These gateways are divided into three groupings,
corresponding to the guaranteed, overflow, and auto-batch pools set up for
registrar transactions. If the switch or load balancer in front of the gateways
fails, that entire branch of gateways becomes unavailable. Failure determination
and recovery was determined to be simpler in this approach than channeling
traffic to an alternate switch. If one gateway branch cannot carry the entire
RRP load for the entire site; performance degradation would occur, and failover
to the alternate site becomes necessary.
Application Servers. Application servers are scaled according to the number of
gateway servers, in a 1:4 ratio. Similar to the arrangement of the gateway
servers, application servers are connected to different switches, so that if one
switch fails, the entire infrastructure is not lost. Application servers connect
to the single OLTP database.
Whois Servers. Five hosts service Whois requests from memory-mapped databases
that are rebuilt twice daily from a mirrored copy of the OLTP database.
Currently, four of the five systems are required to handle peak load.
Disaster Recovery: An identical set of production hardware exists at the
alternate primary site.
OLTP Database. As detailed in Section 5(b)v, the OLTP database is the most
critical element of the .net registry provisioning function. Thus, the OLTP
database is clustered at the operating system level in an active-passive
configuration. Data files are stored on an external EMC frame. The array fills
the role of ensuring that database commits are performed at both the primary and
alternate primary sites synchronously over metropolitan fiber links to ensure
zero loss in the event of outage. Archive/redo logs are copied to the alternate
site.
Report database. The critical data archive acts as the SRS data warehouse. It
uses Oracle transportable tablespaces to retrieve records from an array-based
copy of the OLTP database. This is necessary because aged transaction data is
periodically purged from the OLTP database. Figure 5(b)xv-2 depicts array-based
data replication between the primary and alternate SRS sites.
Analysis of Resolution Systems
For the .net name server constellation, QoS is measured primarily in terms of
the following elements:
* Performance, measured in queries per second and packet loss, for valid domains
and nonexistent domains.
* Availability of the service based on system uptime across the constellation
* Accuracy of zone file data.
VeriSign has supported the growth in average daily volumes of DNS queries, as
discussed in Section 5(b)xiv. Figure 5(b)xv-3 shows the DNS activity for a
typical day and reveals obvious peaks. Thus, resolution sites are provisioned to
support volume spikes, rather than meeting averages. The constellation sites
must still maintain the surplus capacity necessary to withstand malicious
attacks and frequent inadvertent DNS errors that can massively increase
resolution site load. VeriSign, therefore, maintains resolution capacity that is
scaled to 20 times the peak volume on the most loaded server. VeriSign also
combines capacity with overall performance by delivering response times
averaging less than 5 milliseconds with less than 0.5 percent packet loss
measured internally. From VeriSign's external monitoring of cross network name
server performance from six global locations, the response times (including
Internet latency) are well under 100 milliseconds.
Resolution Site Availability Review
Eleven of the DNS constellation sites are collocated with six partners, which
provide space, power, and bandwidth. The remaining sites are hosted at VeriSign
data centers. VeriSign advertises all the IP addresses of all constellation
sites in a single Autonomous System (AS). Peering at the individual facilities
is delegated to the bandwidth provider. Figure 5(b)xv-4 shows DNS traffic flows
and local redundancy in any of the resolution sites.
The ATLAS protocol engine (PE) receives the DNS requests, which it packages and
forwards to the .net memory-mapped database on the lookup engine (LUE) for
actual name/address resolution. The results are returned to the PE host, where
they are unbundled and transmitted to the host that originated the query.
Switches: Server switches are dedicated for DNS traffic. Load is segmented: half
the PEs are attached to one switch, while the remaining PEs are connected to the
other server switch. The LUEs are similarly connected. This was a deliberate
decision to favor simplicity over a cross-connected cabling approach.
Load Balancers: The two load balancers are set up in an active-standby
configuration, where the active load balancer distributes load via a round-robin
mechanism to the underlying PEs. Health checks are performed from the load
balancer to the underlying PEs.
Two sets of PEs are deployed at each site, each set receiving content through
its own switch and feeding bundled queries to a single LUE.
At least two LUEs are configured at each site. These LUEs maintain a database of
name servers and glue records. Sendfile updates are received and applied to that
in-memory database around the clock. These boxes currently also provide BIND as
a backup authentication mechanism to ATLAS.
Support Systems. Customer service for .net registrars is available via telephone
and email 24x7x365 with local language support, as required. The first line of
support is highly trained to quickly resolve a range of technical, financial,
and procedural questions and to quickly escalate problems that cannot be
immediately resolved. Section 5(b)xix, details VeriSign's customer and technical
support capabilities.
Conclusion
VeriSign is committed to continue providing a stable and robust system that has
had consistent availability since the inception of the SRS. This is confirmed by
VeriSign's past performance delivering SRS availability over each of the past 7
years. At the same time, the VeriSign DNS resolution system has achieved an
availability rate of 100 percent for the .net zone. No other registry operator
matches this reliability. |
|
(xvi) System outage prevention. Procedures for problem
detection, redundancy of all systems, backup power supply, facility
security and technical security. Outline the availability of backup
software, operating system and hardware. Outline system monitoring,
technical maintenance staff and server locations.
|
VeriSign's ability to prevent unplanned system outages is a result of our robust
planning processes and disciplined execution of proven practices.
VeriSign Advantages:
+ A highly available architecture with multiple layers of redundancy
+ Extensive proactive monitoring that minimizes the likelihood, frequency, and
duration of unplanned outages
+ Systems that withstand distributed denial of service (DDoS) attacks, viruses,
worms, and other nefarious activity.
This section describes the following areas:
* Procedures for Problem Detection. Extensive monitoring systems managed from
our Network Operations Center (NOC) identify issues before they become problems.
* Redundancy of All Systems. VeriSign has full redundancy built into each aspect
of our service delivery.
* Backup Power Supply. Each data center has redundant power systems for service
continuity in the event of electrical failure.
* Facility Security. Security for all VeriSign facilities is monitored from our
Security Operation Center.
* Technical Security. VeriSign's security practices protect against physical and
logistical intrusions.
* Availability of Backup Software, Operating System, and Hardware. Each
component of VeriSign's backup solution is architected in the same
high-redundant manner as our provisioning and resolution systems.
* System Monitoring. Our NOC continuously monitors the .net gTLD through a
central event management console.
* Technical Maintenance Staff. VeriSign provides onsite and on-call 24x7x365
technical support.
* Server Locations. VeriSign operates 18 technical facilities around the world
that are strategically selected, based on Internet demand.
Procedures for Problem Detection
The NOC is the central point for rapid detection and resolution of any registry
problem [Figure 5(b)xvi-1]. A distributed monitoring system, based on the
standard Simple Network Management Protocol (SNMP) polls system devices at
1-minute intervals. Upon detection of an error condition or threshold violation,
a prioritized service alert is sent to a single event management queue. Each
event is acknowledged by the NOC staff, and the incident is logged and tracked.
Escalation procedures are used to troubleshoot and resolve problems. Issues that
the NOC cannot resolve are escalated to a skilled operations team for
second-level support.
Problem detection is accomplished using our worldwide monitoring infrastructure.
Servers are individually and collectively monitored down to the Internet
Protocol (IP), port, transaction, and packet level. Monitoring systems run on a
distributed architecture, and details are fed back to a centralized data store
that keeps a history of every transaction. Monitoring output determines if
mission-critical services are operating properly and provides an indication of
service availability specific to the server. Each router, switch, load balancer,
quality of service (QoS) device, and firewall is measured at the packet level
for load level, failover, and SYN floods. Should the need arise, packet sniffers
are used at critical network points to analyze specific events that can cause
degradation of service.
So critical are VeriSign's DNS sites to the global stability of the Internet,
and so extensive is our monitoring, that the National Communications Center
(NCC) and the FBI's National Infrastructure Protection Center (NIPC), and others
have requested and received a direct link to the screens used by the NOC to
monitor the status and performance of these critical resources.
Redundancy of All Systems
VeriSign recognizes the significant role redundancy plays in preventing outages.
This section describes the redundancy built into various components of the .net
system.
Registration System
The .net registration system is protected by a highly redundant server and
facility infrastructure. Registrars benefit from the use of multiple Internet
Service Provider (ISP) connections that use redundant routers and switches
configured for automated failover. Registrar access is protected by QoS devices
that prevent abusive behavior from affecting all registrars. Each transaction
terminates at a load-balanced gateway server. The load balancers provide the
ability to remove a server for maintenance or after a failure without system
impact. The gateway servers communicate through redundant firewalls to clustered
application servers.
A failure in any application server will be recognized by the gateway, which
will take the affected server out of rotation. The .net registration data is
served by a pair of Oracle database machines that are configured using IBM's
High Availability Cluster Management Protocol (HACMP) software. Failures
affecting a database server or device will trigger failover protection to
prevent or minimize system outage. These servers are connected to an EMC
Symmetrix storage system, equipped with mirrored storage for data protection and
internal component redundancy to minimize the likelihood of a system failure.
The EMC storage system replicates the .net database in real-time to an alternate
primary data center for protection against site failure [Figure 5(b)xvi-2]. The
system is designed to support failover and service resumption within 30 minutes
from the time the severity of the problem is recognized.
Resolution System
System outage prevention for the resolution system is enhanced by the use of 14
global sites. Each site is designed and built with complete independence and
redundancy. Each location has multiple ISP gigabit Ethernet feeds that connect
to a redundant network and redundant load balancers that distribute load across
the DNS servers and route around servers should a component fail.
A critical element of preventing DNS outages is over-provisioning. A DDoS attack
can be an extreme challenge. Under most circumstances, the system must absorb
the attack though massive capacity until the event can be analyzed and a counter
measure can be devised. The ability to absorb such an attack lessens the
potential impact. Our global DNS solution can process more than 400 billion DNS
queries a day, with fully redundant, geographically dispersed sites.
Network Redundancy
All VeriSign facilities have diverse Internet connectivity with extensive public
and private peering and a fully redundant routing and switching infrastructure.
Both the primary and alternate primary data centers offer multiple, high-speed
connections to the Internet provisioned through diverse network providers that
enter the building via multiple physical entry points. Inside the facilities,
the fiber travels by disparate routes and terminates at routers in physically
separate cabinets. The resolution sites are each similarly architected and have
a minimum of two diverse gigabit Ethernet connections.
Redundant Facilities
The ability to shift operations between servers or facilities is a critical
element of system outage prevention. We maintain redundant equipment at both our
primary and alternate primary data centers [Figure 5(b)xvi-2]. Not only is each
facility able to assume full operations, but also each facility can also assume
partial operations because the application servers at one site can service the
registry database at the other. This partial operations model provides a
recovery option that is faster than full site failover and enhances our ability
to deliver reliable and stable service to registrars and Internet users. Table
5(b)xvi-1 shows general types of failures that could be expected and the
recovery procedures.
Backup Power Supply
Each VeriSign data center is protected with dual power feeds from the local
power provider. Should local power fail, the data center is further protected by
a series of power-protection devices. Each rack of servers is connected to two
separate power distribution units (PDUs) to monitor and distribute electrical
load. Databases, storage devices, application servers, and network devices are
connected to two separate PDUs that provide protection in case of a single PDU
failure. The primary protection devices are large uninterruptible power supplies
(UPSs) that can sustain the load of the entire data center for up to 30 minutes.
Should a power event last more than 10 seconds, redundant power generators are
automatically started, with each generator capable of handling the load for the
entire facility.
Facility Security
Physical security includes 24x7x365 onsite security guards, card readers, and
biometric controls. VeriSign strictly enforces the use of biometric controls to
access our data centers. All biometric access, as well as card access, is
monitored and logged. We also insist on this discipline for all facilities
hosting our global DNS constellation. In addition, we insist that:
* Buildings are isolated from easements, rights of way, and adjoining tenants.
* A security guard is positioned at the entrance to the building or grounds to
prevent unauthorized access.
* No exterior signs that indicate the type of business are visible from any
public way.
* Backup generators are located within a secured space.
* Data center access is controlled using a mantrap, card readers, and biometric
controls.
* Cameras are located at all entrance points and in sufficient quantity to
monitor all critical infrastructures.
Technical Security
VeriSign's security policies have been developed within the following general
outline:
* Do everything humanly possible. This includes staying abreast of the latest
exploits and technology, as well as implementing all known state-of-the-practice
safeguards.
* Develop contingency plans for those events that are outside your control.
* Monitor frequently.
* Conduct security tests on all devices before placing them into the production
environment.
Dedicated IT Security Staff
Our dedicated IT security staff performs a number of functions in support of our
security posture, including: development and enforcement of application and
network security standards; implementation and management of network security
devices (e.g., firewalls, Access Control Lists (ACL), etc.); working with
government and industry entities responsible for critical infrastructure, such
as the NCC and the NIPC; and participation in government and industry
cooperative forums.
Strategic Vendors and Suppliers
One of the most important elements of any critical operation is the selection
of, and relationship with, vendors and suppliers. As required, vendor staff is
onsite during business hours, alongside Information Technology (IT) staff during
major deployments, and, if necessary, on-call 24x7x365. Additionally, because of
our long-standing relationships and the critical nature of the services we host,
these vendors frequently advise of potential bugs or exploits before public
announcements are made. As a result, we are generally able to devise and deploy
a security-related solution before widespread attempts to use that exploit
actually occur. We strictly follow a policy that requires that each deployed
computing device must pass an intense security check before connecting to the
network. By following these practices, we have often detected and informed
vendors of vulnerabilities that their testing did not discover.
Hardened Network
The .net registration service is connected to the Internet via two border
routers that use ACLs to control access from the Internet and block all but
authorized users. The registry gateway layer is configured with internal and
external interfaces, each assigned to a different subnet. External interfaces
receive queries and registration requests from the Internet, whereas the
internal interfaces communicate with the application and database servers.
Acting as a proxy, the gateway layer will accept and pass information through
the firewall to the application server, thereby eliminating direct access to the
backend servers. This approach provides superior security from hackers or other
Internet-based threats and is consistent with industry best practices for this
type of service.
Firewalls
Firewalls are used to secure the internal network. The firewall is configured
with rules to allow only specific types of data traffic between the gateway
layer on the external network, and the application layer and database servers on
the internal network. Additional rules allow internal management systems to
access the servers via a separate interface, for monitoring purposes and to
refresh files. Security scanning software is used to constantly monitor for
potential vulnerabilities; an outside firm has been contracted to run "friendly"
scans against the network at least twice a year. Results of the scan are
reported to our technical staff, and noted exceptions are promptly corrected.
Secure Virtual Private Network
VeriSign uses encrypted IPSEC virtual private network (VPN) connections to
manage and monitor the global DNS constellation. The VPNs are configured to only
permit connections from authorized machines and users. They are also configured
in a high-availability mode to provide secure connectivity with failover protection.
Availability of Backup Software, Operating System, and Hardware
The architecture of our backup system, including details on software, operating
system and hardware, is covered fully in Section 5(b)x, Backup. Its availability
is 100 percent.
System Monitoring
Distributed Monitoring System
VeriSign's enterprise monitoring toolset uses many web-based utilities tied
together by a portal. All monitoring alerts are fed to a central event
management console used by the NOC to oversee the status of our worldwide
assets. The major component of this monitoring work set is Nagios, an
open-source, web-based tool that uses a secure and reliable distributed
architecture.
Nagios alerts provide information for the NOC to quickly diagnose critical
issues. Historical data is evaluated to provide further insight and ascertain
root cause. Error code handling is configured with each new application.
Product-specific error codes and configurations are included with each code
release. Batch processes provide comprehensive logging and error handling to
quickly diagnose any failures that occur.
Custom and Application-Specific Monitoring
VeriSign uses the advanced capabilities of the TeamQuest performance and
capacity management tool to provide custom monitoring of per transaction detail
for the .net registration system. TeamQuest's output is fed to Nagios for
real-time alerting and is simultaneously stored for subsequent analysis and
historical trending. This information is used to create performance and capacity
trend lines that are vital to our ongoing efforts to prevent outages and system
degradation by predicting when proactive intervention is necessary. As new
software packages are developed, we create custom methods to monitor detailed,
process-level activity to further identify even internal application status. In
addition to feeding our real-time monitoring tool, Nagios, these custom monitors
provide information used to optimize software code and tune system performance
parameters.
DNS Monitoring
In addition to the monitoring detailed above, Verisign has invested
significantly in DNS application monitoring to ensure 100 percent availability
of the .net resolution service. These in-house tools were developed to provide
real-time visibility into the status of our global DNS resolution assets and
consist of four major components.
DNS Site-Level Monitoring. This infrastructure provides a real-time view of DNS
traffic for each resolution site, including the overall health of the site,
network traffic statistics, potential anomalies, and zone version information.
Figure 5(b)xvi-3 shows a sample view of this data for a site.
Heads-Up Display (HUD). The HUD was created to provide a global "dashboard" view
of the overall status and health of our resolution service. Site-level
statistics are aggregated and presented in a detail-rich interface that shows
traffic load, along with a color-coded status for each site. Figure 5(b)xvi-4
shows the HUD. Of special note on the HUD display is the graph on the lower
right, showing the amount of traffic received for the current day versus the
same day from the previous week. This graph [Figure 5(b)xvi5] provides a
continual reference point for operators, allowing them to assess traffic peaks
and valleys in context.
ATLAS Component-Level Monitoring. Every component of ATLAS is instrumented to
report its status and traffic statistics. This information is fed into the
real-time monitoring system and produces alerts when problems occur. Figure
5(b)xvi-6 shows ATLAS components at one resolution site. The green color
indicates all components are healthy. Zone version information is also shown,
allowing operators to quickly assess health and, if necessary, troubleshoot
problems.
Central Storage and Reporting. A central monitoring database serves as a
repository for all historical summary statistics relayed by the various
monitoring applications at each DNS site. This database currently houses daily
statistics dating to the year 2000. Data is stored with an extremely fine
granularity (at the IP, TCP/UDP, and DNS levels) and is collected every 4
seconds at each site. The database and related tools allow for on-the-fly report
and graph generation of any combination of statistics for any specified time
period. Our technical staff uses these tools to compare new trends against
historical data. These records also feed our detailed uptime and traffic growth
analysis and reporting. In addition, our technical staff uses this data to
contact and assist administrators of misconfigured name servers. Figure
5(b)xvi-7 shows anomalous DNS traffic received at three sites in July 22, 2003.
A single DNS misconfiguration was responsible for this spike in traffic. Our
engineers contacted the organization responsible, and the graph clearly shows
the moment when the problem was fixed.
Technical Maintenance Staff
We employ a highly skilled maintenance staff experienced in the operation of the
.net registry service. As shown in Table 5(b)xvi-2, our staff provides onsite
and on-call 24x7x365 technical support. Our staff retention rate rates are
extremely high, which is important for stable, problem-free operations.
Server Locations
VeriSign operates two data centers (in Dulles and Ashburn, Virginia) for the
.net registration system. Either site can be active for registration, and either
can provide full registration services in the event of a site failure. Section
5(b)vi, Geographic Network Coverage, describes our comprehensive process for
selecting resolution sites, which is comprised of 14 active and three
warm-standby sites in the locations shown in Figure 5(b)xvi-10.
Conclusion
VeriSign's ability to prevent unplanned system outages is a direct result of our
robust planning processes and disciplined execution of proven system reliability
practices. Our staff has the technical experience in running large registry
services that are unparalleled by our competitors. Our well-designed, highly
scalable system architecture and continuous system monitoring also help prevent
or minimize unplanned outages. Our robust DNS solution is operated using our
award-winning ATLAS technology and provides real-time updates, 100 percent
uptime, and the capacity to withstand even the largest DDoS attacks. VeriSign's
proven, reliable registry system provides stable operations for a critical
component of Internet infrastructure. |
|
(xvii) Ability to support current feature
functionality of .NET (to the extent publicly or otherwise available to
the applicant, including IDNs, support of IPv6,
DNSSEC.
|
Registrants, registrars, and Internet users depend on features available in .net
today, and in some cases have relied on them for years. The stability and
continuity of services in the .net registry depend on the availability of these
features. These features are in production and in active pilot programs.
VeriSign Advantage:
+ Robust and diverse feature functionality in .net that benefits registrars,
registrants, and end users.
In addition to the standard registry features and functionality, VeriSign is
committed to continued support for the following:
* Internationalized Domain Names (IDN). VeriSign currently has over 90,000 IDNs
representing 350 languages in the .net registry. We have been a pioneer in the
support for IDNs and will continue to improve upon our services as they relate
to IDNs.
* Internet Protocol Version 6 (IPv6). VeriSign currently supports the
registration of IPv6 DNS resource records and the resolution of DNS over IPv6
networks.
* DNSSEC. VeriSign has been a participant and leader in the development of the
DNSSEC protocol. Having deployed several DNSSEC pilots, VeriSign is committed to
a rollout of DNSSEC in .net once the final standard is ready.
* Redemption Grace Period (RGP). VeriSign was the first registry to fully
support the RGP policy, and we will continue to fully support RGP in .net.
* Internet Registry Information Service (IRIS). In our continuing endeavor to
provide stability to the Internet infrastructure, VeriSign has been a leader in
the standardization of the IRIS protocol, and we are the first registry to
deploy an IRIS pilot.
* ConsoliDate. In responding to requests from registrars and registrants,
VeriSign has implemented a program for domain renewal, allowing a registrant to
synchronize the renewal dates of a large number of domains. The program, called
ConsoliDate, alleviates an administrative burden for customers with large
portfolios of domain names.
These features are described below with additional information available on
VeriSign's Naming and Directory Services website. When any available feature or
service fails, or a decision is made to change the implementation or terminate
the service, every customer who depends on the service is affected. If an IDN
implementation did not support each of the scripts currently supported, then all
registrations that depend on that script would fail, and any services that
depend on that domain name would be affected.
Feature 1: IDN
The Internet is used by more than 500 million people around the world. As the
Internet grows, more and more users will speak languages other than English.
IDNs provide a convenient mechanism for users to access websites and other
Internet resources in their preferred language. IDNs are an important factor in
the transformation of the Internet into a truly global and multilingual tool.
Use of IDNs allows Internet users to:
* Navigate to Internet content and address email in their preferred language and
script
* Preserve local culture and support their preferred language
The value of IDNs to registrants enables them to:
* Reach target audience more effectively by communicating in their preferred
language
* Protect, strengthen, and extend existing brand and trademarks; identify secure
brand equity in local markets
* Eliminate any confusion on brand communication
* Improve the customer's navigation experience
* Create global utility that adheres to international standards and leverages DNS
IDNs enable registrars to:
* Provide complementary extension to existing product offering and line of business
* Leverage existing infrastructure investment
* Increase revenue opportunity
* Expand addressable markets; reach new segments
* Provide opportunity to extend new products into reseller network.
VeriSign has supported IDNs in the .net registry since November 2000 when the
IDN testbed was opened to test proposed standards for the deployment of IDN
technology and to provide operational experience with those proposed standards.
As of 31 October 2004, the .net registry contained nearly 90,000
standards-compliant IDN registrations representing most of the 350+ available
languages. VeriSign is the only gTLD to support all available code points and
languages for IDNs.
Several applications are IDN-enabled, but Microsoft's Internet Explorer (IE)
browser continues to lack IDN support. Therefore, VeriSign developed the free
i-Nav plug-in for IE to serve as a bridging technology for a majority of the
Internet's end users. The i-Nav plug-in is a browser tool that enables
standards-based IDN web navigation for all languages and all standard-compliant
TLD IDN implementations. The objective was to create a critical mass of
IDN-enabled desktops and motivate application developers to include IDNs as a
feature in their product lines.
While the plug-in has enjoyed significant adoption worldwide, VeriSign
recognized that end user assistance was only one small step. VeriSign made the
core IDN technology from i-Nav freely available by releasing an IDN software
developer's kit (SDK) in April 2003. This SDK is now the basis of many
IDN-enabled applications in use today.
VeriSign has continued to demonstrate comprehensive leadership in support of IDN
adoption. At the end of the second quarter in 2003, VeriSign recognized the need
for a multi-faceted industry organization to propagate the usage of IDNs and
increase standards adoption in the applications used by Internet end users. This
led to the creation of the IDN Software Developer Consortium (SDC).
In November 2003, VeriSign sponsored an IDN summit for several domain name
registries, registrars, and application developers to discuss how to advance the
opportunity and usage of IDN products across the globe. Table 5(b)xvii-1 shows a
list of application developers, registries, and registrars who have participated
in the SDC. Subsequent meetings of the SDC have been held to drive the adoption
and further proliferation of IDN technologies. The success of this consortium
has been shown through the growing number of IDN applications and announcements
by others concerning the inclusion of IDN capabilities.
Details about VeriSign's implementation are provided at
http://www.verisign.com/products-services/naming-and-directory-services/
naming-services/internationalized-domain-names/index.html
Feature 2: IPv6
IPv6 features a 128-bit addressing scheme, as opposed to the 32-bit addressing
scheme of IPv4, supporting a much larger number of addresses. It also features
other improvements over IPv4 with the expanded addressing capabilities stated in
RFC 2460.
VeriSign has been advancing IPv6 functionality over the past several years and
fully supports IPv6 for the .net registry. VeriSign currently supports IPv6 name
server provisioning, DNS resolution, and transport. All three features must be
considered by the .net registry operator.
(a) Provisioning Support
In May 2002, VeriSign began accepting registration of AAAA records in the .net
(as well as .com and .org) registry. IPv6 addresses allowed must be from a block
allocated to a Regional Internet Registry (RIR). As Internet Assigned Numbers
Authority (IANA) and the RIRs post updates to the allowable IPv6 blocks, the
.net registry system is updated.
(b) Resolution Support
VeriSign has also supported DNS resolution of IPv6 since May 2002. DNS queries
return AAAA records, if present, along with A records in the additional section
of replies.
(c) IPv6 Transport
VeriSign introduced support for IPv6 transport after ICANN's acceptance of the
Root Server System Advisory Committee (RSSAC) recommendation to "proceed with
adding AAAA glue (name server) records to the delegations of those TLDs that
request it."
VeriSign had conducted extensive internal testing, had participated in IPv6
pilot with the root server testbed network, and provisioned IPv6 network
connectivity from multiple providers before submitting a root zone change
request to the IANA.
VeriSign added support for accessing the .com and .net zones using IPv6
transport on 19 October 2004. On that day, AAAA records for a.gtld-servers .net
and b.gtld-servers .net were added to the root and gtld-servers .net zones.
Community reaction has been positive, resulting in more there 500 queries per
second at each site over IPv6 transport.
Feature 3: DNSSEC
The original DNS specification does not include support for security from data
integrity or data authentication. DNSSEC uses digital signatures based on public
key cryptography to add these features.
(a) DNSSEC Pilots
No other registry operator has been involved in DNSSEC research as extensively
as, or for as long as, VeriSign. VeriSign engineers and the Applied Research
Department have played an active role in DNSSEC development almost from its
beginning, contributing to DNSSEC's design over the past several years, and have
edited the current DNSSEC specifications document set. To demonstrate the
application of DNSSEC, VeriSign has developed several pilots.
(i) DNSSEC Lookaside Validation (DLV)
This mechanism for creating a cryptographic chain of trust to a zone that
bypasses the normal DNS delegation hierarchy was developed by the Internet
Systems Consortium (ISC). It is primarily described as a transition mechanism to
help DNSSEC adoption for zones whose parent zones do not support DNSSEC.
Essentially, the DLV proposal uses the concept of a trusted "lookaside" zone as
a method for finding keying information for a signed zone without a secure
delegation from its parent.
VeriSign's DLV pilot implements an example DLV registry. It allows external
users and developers to experiment with the DLV model without setting up their
own lookaside zone. It allows anyone with a publicly available signed zone to
register his zone with the pilot. This pilot is the first known publicly
available DLV registry and is provided at http://dlv.verisignlabs.com.
(ii) .net DNSSEC Pilot
This pilot provides a DNSSEC signed version of the .net TLD zone, using the
latest version of the standards published by the IETF. This pilot provides both
Verisign and external users and developers with operational experience with the
latest DNSSEC specification.
The pilot allows anyone with a valid .net domain name to register a secure
delegation. By participating in the pilot, users can get experience with DNSSEC
using their own zones, and can get experience with secure name resolution within
.net.
The pilot also demonstrates VeriSign's experience with signing, managing, and
serving a DNSSEC-signed version of the net zone. The pilot uses incremental zone
signing and automatic, scheduled key rollovers, all designed to minimize the
overall size of the zone. The pilot uses modern cryptographic hardware security
modules to both increase signing speeds and to protect the security of the
private keys while allowing their constant, online use. This pilot is in
operation and is provided at http://dnssec-net.verisignlabs .net.
(b) Successful Implementation of DNSSEC in .net
A registry operator can offer to implement DNSSEC; however, successful
implementation will require coordination of and cooperation from many
constituencies, as well as a deep understanding of this complicated protocol. An
implementation plan should address the coordination and role of the registry
operator to facilitate a successful deployment, and the operator should be able
to point to a record of working in the DNSSEC research community.
VeriSign has conducted market research on multiple occasions to determine the
level of concern and interest with implementation of DNSSEC, and has led several
forums to encourage community involvement, including our most recent effort to
establish a DNSSEC Applications and Service Providers Consortium.
Implementing DNSSEC will require signing the .net zone, which by our estimates
could increase the size of the zone by a factor of four. Additionally, the
amount of data returned in the DNS response is increased, so the impact on
bandwidth and server capacity must be considered.
VeriSign is committed to implement DNSSEC in .net once the protocol is published
as an RFC or a Proposed Standard. Only VeriSign has the experience and resources
to successfully implement DNSSEC in .net.
Feature 4: Redemption Grace Period
In response to a growing trend of complaints about unintentional domain-name
deletions, in 2002, ICANN proposed an RGP for deleted names. RGP expanded the
5-day delete pending period to create a new "safety net" following any deletion
of a domain name. The grace period would allow the domain-name registrant,
registrar, or registry operator time to detect and correct any mistaken deletions.
VeriSign was the first registry to implement RGP in a solution that is fully
compliant with ICANN policy. In the current .net registry, RGP is the time
period where any "delete" of a domain name outside the add grace period will
result in a 30-day Deleted Name Redemption Grace Period. This grace period will
allow the domain name registrant, registrar, and/or registry time to detect and
correct any mistaken deletions. Figure 5(b)xvii-2 describes the RGP provisioning
process and domain deletion lifecycle.
VeriSign has authored an extension to the Extensible Provisioning Protocol (EPP)
to support RGP and other registry grace periods. This extension was published as
Proposed Standard RFC 3915 in September 2004. Support for RGP will be included
as VeriSign migrates from RRP to EPP in 2005.
Feature 5: IRIS
The Internet Registry Information Service (IRIS) is a protocol developed by the
IETF's Cross-Registry Internet Service Protocol (CRISP) Working Group. It is
intended to be a replacement of the aging Nicname/Whois protocol currently
defined in RFC 3912.
VeriSign has been a leader in the development of the IRIS standard. In August
2004, VeriSign announced the availability of an IRIS pilot service for the .com
and .net registries. This pilot is offered to help foster the adoption of the
IRIS protocol and move the domain industry forward in ways compatible with the
needs of private citizens, law enforcement, and intellectual property holders.
Users can interact with the service either directly using an IRIS client or can
query it using our web interface.
VeriSign has authored multiple client and server software packages and
libraries, which are freely available and licensed under common open source
terms. We operate a website dedicated to the promotion of IRIS to solve the
meta-data problems surrounding domain names.
Feature 6: ConsoliDate
The .net registry supports synchronizing expiration dates of domain names
through a function called ConsoliDate. This feature was implemented at the
request of registrars specifically to support registrants who have multiple
domain names, which often have different expiration dates so they are not all up
for renewal at the same time.
This service has been available in .net since May 2002 and allows registrars'
customers to adjust the expiration dates of their .net domain names,
consolidating them under a calendar day (or days) of their own choosing. The
service addressed a frequent request for the capability to allow registrars to
help their customers manage renewals for sizable domain name portfolios.
ConsoliDate is implemented with the RRP SYNC command. This capability was
implemented without requiring any changes by registrars who did not choose to
implement it. Only minimal modifications were required for those registrars who
did choose to support it.
The benefits of ConsoliDate to the registrar community include:
* Increased customer satisfaction rates (via better account management, etc.)
* Increased renewal rates (reduction in lapsed/deleted domains)
* Decreased average cost of renewals (including software and hardware operating
infrastructure, mailing costs, data management, service costs, manhours, etc.)
* New revenues (depending on implementation model)
* Protection against downward price pressure.
Should a registry operator choose not to implement this service, customers who
have already used the service will be unable to synchronize expiration dates of
any future registrations. The impact to registrars will vary, based on the
extent of changes required to their systems to accommodate removal of a service
from their portfolio.
Conclusion
VeriSign offers robust and diverse feature functionality in .net that benefits
registrars, registrants, and end users as follows:
* Support of IDNs. VeriSign has supported development of IDN standards since
November 2000. As of 31 October 2004, the .net registry contained nearly 90,000
IDN registrations representing more than 350 languages. As a bridge technology
for web browsers that are not IDN enabled, VeriSign has developed the i-Nav(tm)
plug-in as a web browser companion to enable IDN web navigation.
* IPv6. VeriSign fully supports IPv6 for the .net registry, including
registration of IPv6 name servers, resolution of AAAA records, and access to
.net name servers over native IPv6 transport.
* DNSSEC. VeriSign's experience in DNSSEC is unparalleled by any other registry.
We have been involved in DNSSEC development almost since its beginning and have
implemented several pilot programs to gain real-world operational experience.
Only VeriSign has the resources and experience to support a successful
implementation of DNSSEC in .net.
* RGP. VeriSign was the first registry to implement RGP in a solution that is
fully compliant with ICANN policy.
* IRIS. VeriSign has been a leader in the development of the IRIS standard
within the IETF's CRISP Working Group. In August 2004, VeriSign announced the
availability of an IRIS pilot service for the .com and .net registries. VeriSign
has authored multiple client and server software packages and libraries, which
are freely available and licensed under common open source terms. VeriSign
operates a website dedicated to the promotion of IRIS to solve the meta-data
problems surrounding domain names.
* ConsoliDate. Through ConsoliDate, VeriSign supports synchronizing expiration
dates of domain names. This service was implemented in response to registrar
requests to help them improve service to their customers.
It is our commitment to security and stability, coordination with industry, and
a passion to innovate that created this feature functionality. |
|
(xviii) System recovery procedures. Procedures for
restoring the system to operation in the event of a system outage, both
expected and unexpected. Identify redundant/diverse systems for providing
service in the event of an outage and describe the process for recovery
from various types of failures. Describe the training of technical staff
that will perform these tasks, the availability and backup of software and
operating systems needed to restore the system to operation and the
availability of the hardware needed to restore and run the system.
Describe backup electrical power systems and the projected time for system
restoration. Describe procedures for testing the process of restoring the
system to operation in the event of an outage, the documentation kept on
system outages and on potential system problems that could result in
outages.
|
VeriSign has a comprehensive system recovery solution.
VeriSign Advantage:
+ Comprehensive recovery procedures and reliable technology infrastructure
provide the ability to rapidly recover .net registration and resolution systems
from any critical event.
This section defines the following considerations for restoring registry systems
to operation:
* Procedures for restoring the system to operation in the event of a system
outage, both expected and unexpected. VeriSign has created a framework and
processes for fast and successful restoration.
* Identification of redundant/diverse systems for providing service in the event
of an outage. VeriSign has built a registry infrastructure that includes
redundancy at all levels.
* Process for recovery from various types of failures. VeriSign has detailed the
processes necessary to address failure scenarios that can occur within the system.
* Training of the technical staff that will perform these tasks. All VeriSign
technical operations staff undergoes training in system restoration procedures.
* Availability and backup of software and operating systems needed to restore
the system to operation. VeriSign maintains a change control system with
production versions of all software and operating systems used.
* Availability of the hardware needed to restore and run the system. We maintain
redundant server and network devices at each registration and resolution site.
* Backup electrical power systems. Each data center incorporates a backup
electrical power system that allows for service continuation in the event of an
electrical failure or planned shutdown of any component.
* Projected time for system restoration. We have ranked the major functions of
the registry business and identified the recovery windows for each that we have
committed to meeting.
* Procedures for testing the process of restoring the system to operation in the
event of an outage. For all components, procedures are rehearsed regularly to
ensure restoration in a timely fashion.
* Documentation kept on system outages. Any outages and incidents are captured,
and each month, we submit information on availability to ICANN.
* Potential system problems that could result in outages. We identify various
potential scenarios that could impact the availability of the registry system.
VeriSign understands that if any of the .net systems fail, impact to the
Internet community and to global commerce could be widespread. Together with the
procedures and training of our experienced staff, we provide the ability to
rapidly recover the registration and resolution systems from events that could
disrupt registry services. We have invested millions of dollars in our systems
and continually strive to provide the best registry infrastructure in the world.
To ensure business continuity, we have investigated the pros and cons of various
alternatives for both restoration methods and recovery site options. Our
solution uses a combination of many alternatives [Tables 5(b)xviii-1 and -2] to
ensure business continuity:
* Synchronous/Mirrored Replication: We deploy enterprise storage systems with
mirrored backup protection at both the primary and alternate primary data
centers. In case of site failure, the full registration data is readily
available, with zero data loss, at the alternate primary site.
* Internal Hot-Site Data Center with Full Disaster Recovery: We provide two
separate data facilities for the registration system. Either site can be active
for registration and have the capability to provide full registration services
in the event of a site failure at the other site [Figure 5(b)xviii-1].
In 2005, we will implement an improved solution for managing the registry
system. This implementation will introduce a tertiary hot-site with asynchronous
data replication and log shipping [Figure 5(b)xviii-2]. This new implementation
will accomplish the following:
* Continued synchronous replication of the registry database to the alternate
primary site. The alternate primary site is schedule to be relocated to a Tier 4
facility between 75 and 300 miles from the primary site and will maintain the
synchronous, mirrored replication of data, ensuring no data loss.
* VeriSign will install a tertiary site that is functionally equivalent to the
primary registry system and will use asynchronous data replication. To address
scenarios where multiple failures at multiple sites could result in data
corruption or loss, log shipping (copying of Oracle redo logs) ensures that the
data sets are complete before restoration of synchronous or asynchronous
replication to the alternate primary or tertiary sites, respectively.
Procedures for Restoring the System in the Event of Expected and Unexpected Outages
Expected Outages
For proper deployment and maintenance of registration and resolution
applications, operations personnel follow detailed Deployment Plans, Operations
Installation Guides, and Operations Tasks Guides. These guides are certified
through our QA process and are delivered with each code release from
Engineering. Our technical staff also maintains a searchable knowledge base that
provides essential information useful in recovery and maintenance duties.
Unexpected Outages
VeriSign has developed detailed Escalation Procedures for every aspect of the
Registry System that dictate the proper steps to follow during event management.
All steps are based on the Operations Installation Guides, and Operations Tasks
Guides certified through QA process and delivered with each code drop from
Engineering. Our library of Escalation Procedures resides in an online
searchable knowledge base on internal web-based systems. In the event of an
outage to our online searchable knowledge base, paper copies of all procedures
are stored in the NOC. These copies are indexed for rapid access to necessary
information in the event of an emergency.
SRS
Through detailed analysis of various disaster scenarios, we have designed the
system for the following situations:
* Full Failover: If the primary site data center were destroyed or rendered
unserviceable, then a full failover would be warranted. Figure 5(b)xviii-3
provides a depiction of full failover. Table 5(b)xviii-3 describes the steps
that would be taken for service restoration. Once full failover is complete, the
registrars will be notified of this condition and be provided any additional
instructions, if necessary. A full failover can be conducted in less than 30
minutes.
* Partial Failover: A partial failover is when one or more components fail over
to the alternate primary site, but a portion of the primary site remains
operational. We have identified three different partial failover scenarios. Any
of the following scenarios can be conducted in less than 30 minutes.
- Primary Oracle HA Cluster Failure: In this scenario, only the primary Oracle
HA cluster fails, while all remaining components remain active. Steps identified
in Table 5(b)xviii-3 are taken for service restoration [Figure (b)xviii-4].
- Primary SRS Application Server Failure: If the SRS Application Servers or SRS
Gateway Servers at the primary site fail, steps identified in Table 5(b)xviii-3
are taken for service restoration [Figure (b)xviii-5]. Operations of all other
components are unaffected in this scenario.
- Primary Registrar Tool or Customer Service Tool Server Failure. In this
scenario, only the Registrar Tool web interfaces and/or the CSR Tool servers are
affected. Steps identified in Table 5(b)xviii-3 are taken for service
restoration [Figure (b)xviii-6]. Operations of all other components are
unaffected in this scenario.
* Tertiary Site Failover: VeriSign will implement a revised solution in 2005
introducing a tertiary hot site that will allow for restoration of the registry
system within 1 hour of event recognition. Since synchronous data replication is
not technically possible across long distances, asynchronous replication will be
used. This tertiary site provides a means of rapid recovery should a large-scale
disaster impact the primary and alternate primary locations [Figure 5(b)xviii-7].
Network Services
VeriSign's network infrastructure is redundant within each facility and across
facilities. The alternate primary site mirrors the primary site in all
aspects-equipment, topology, configuration, capacity, connectivity, etc. It
takes a minimum of 2 network-component failures to disable the network
infrastructure [Figure 5(b)xviii-8] at a facility. It would take a minimum of
four network component failures (e.g., failure of all four Internet border
routers) to disable the primary and the alternate primary facilities.
The network at the primary site [Figure 5(b)xviii-9] is designed to be modular
and easily supports upgrade, maintenance, and component additions that scale
performance, capacity, or functionality.
Under normal operations (primary site in service), each site uses separate,
non-overlapping addresses. ISP connections are preconfigured to allow the
address advertisements to move from the primary site to the alternate primary
site. Propagation of routing changes throughout the Internet takes less than 15
minutes.
The network infrastructure is designed to isolate single-point failures with no
interruption of services or degradation in performance. In most cases, isolation
of failures is automatic and occurs within a few seconds of the event. Certain
component failures (such as firewall failure) can require manual intervention to
complete the failover. Where currently manual failover mechanisms exist, we are
working with the hardware and software vendors to develop products that support
automated failover.
Resolution Systems
VeriSign's DNS, both in architecture and protocol, is inherently tolerant of
failures. Each of our name server locations is advertised from its own Class C
address space. This allows our network technicians to make a Border Gateway
Protocol (BGP) announcement and quickly "move" a remote location to one of the
three available standby DNS sites. This capability is used extensively for
maintenance activities and allows us to provide access to the service even
though it has been "logically" moved to a new location. This capability is also
extremely useful in ensuring continued operation should a third-party vendor be
unable to provide service (as was the case with a major European ISP
approximately 3 years ago).
Redundant/Diverse Systems for Providing Service in the Event of an Outage
VeriSign has built an infrastructure that includes redundancy at all levels:
networking, databases, storage, and applications, as well as redundant facility
elements. Additionally, most service outages are mitigated through automated
methods, such as failover to redundant systems or through load balancing. Larger
events can require direct failover to an alternate primary network, system,
database, and/or environment.
* SRS: Not only are systems at our primary and alternate primary site able to
assume full operations due to built-in redundancy, but also each facility can
also assume partial operations. This partial operations model provides a
recovery option that is faster than full site failover, and therefore,
contributes to our ability to deliver reliable and stable service.
* Resolution: VeriSign has built a constellation of sites for the resolution of
.net. The constellation consists of 14 worldwide sites [Table 5(b)xviii-4] that
manage the query volume of both .net and .com. Each site is located at a major
telecommunications peering point in which various large ISPs have established
hubs for management of their portion of the Internet backbone. Each site of the
constellation has a minimum of 1 gigabit of burstable bandwidth from a minimum
of two high-speed connections to the Internet, each provisioned through a unique
ISP. Additionally, we have deployed multiple high-speed servers to support
capacity and availability demands that, in conjunction with our suite of
recovery software, offer automatic failover, load balancing, and threshold
monitoring of critical servers.
Process for Recovery from Various Types of Failures
VeriSign has spent extensive time and effort conducting failure mode analyses on
every registry component. Figure 5(b)xviii-5 shows general types of failures
that could be expected for the registry and the procedures used to recover from
them. Sophisticated hardware and software automatically handle the majority of
these failures. This is only a subset of all types of failures that have been
identified through our failure mode analyses. Our Business Continuity Plan
details the processes necessary to address failure scenarios that can occur
within the registry system.
Training of the Technical Staff That Will Perform These Tasks
All technical operations staff undergoes training in system restoration
procedures. This training takes the form of system outage scenarios, automated
and manual responses, documentation on hardware outages and restorations, and
documentation of indications of potential problems requiring maintenance
actions. Additionally, personnel receive training on all vendor software and
operating systems used in the registration and resolution system.
Availability and Backup of Software and Operating Systems Needed to Restore the
System to Operation
Installations of application software and operating systems are based on
standard builds developed by our Infrastructure Engineering (IE) and Product
Development teams. Vendor software, built and installed in our production
environment, is performed in compliance with internally developed procedures and
standards. The production version of all vendor software and operating systems
used are kept on change-controlled servers. Operating systems are also
standardized and tuned by our IE team and are available on change controlled
build servers as well. These build servers are routinely audited and maintained
to ensure that the most current certified build is available as well as builds
for all legacy systems in production.
All application software developed by VeriSign is checked into our Configuration
Management System. This software is installed and configured in accordance with
the Installation and Operations Tasks Guides certified by QA. For further
preservation, the registry database software and configurations are backed up to
tape each day.
Availability of the Hardware Needed to Restore and Run the System
VeriSign maintains redundant server and network devices at each registration and
resolution site. Many of these servers are load balanced, so that a failure of
one will result in the remaining servers assuming the load. We maintain a ready
supply of spare servers for our network, application, and protocol devices. We
have put rapid-response maintenance contracts in place with all the hardware
vendors of the larger servers, such as the databases and storage systems,
ensuring response within up to 4 hours. We also maintain direct technical
relationships with our major vendors to address any concerns that might be
discovered by our engineers.
Backup Electrical Power Systems
Each data center incorporates a backup electrical power system that allows for
service continuation in the event of an electrical failure or planned shutdown
of any component. Since most high-end IT equipment today is outfitted with
dual-electrical connections and dual-power supplies, each component of the
electrical infrastructure, from the street to the CPU, is, and will continue to
be, fully redundant. Redundancy also supports scheduled maintenance. The
electrical infrastructure is designed to continue to provide service in the
event of a failure or planned shutdown of any component. Details of the
electrical power systems at VeriSign facilities are provided in Section 5(b)i,
Facilities and Systems.
Projected Time for Restoration System
Table 5(b)xviii-5 provides a description of the priorities for restoring major
registry functions and VeriSign's commitment to recovery timeframes.
Procedures for Testing the Process of Restoring the System to Operation in the
Event of an Outage
For all components of the registry system, procedures for restoring system
operations are rehearsed regularly to ensure restoration in a timely fashion. We
periodically test failure scenarios and the procedures for restoration. We
maintain an operations staging environment that provides the opportunity to
practice database and system recovery, without impacting production operations.
We also take the opportunity, during production maintenance periods, to test
high-availability database failovers. Finally, fire drills are conducted at the
alternate primary site on a periodic basis - and not always during normal
business hours. These drills are timed and response times are tracked.
Documentation Kept on System Outages
System outages for the registry are determined by information contained in
network logs, application logs, monitoring event logs, and the shift logs of NOC
staff. Any outages and incidents are captured by the NOC in an incident
reporting system. Each business day, the VeriSign NOC and Operations staff
review system outages and other incidents that could affect system performance.
Each month, we submit information on availability to ICANN, which is publicly
available at http://www.icann.org.
Potential System Problems That Could Result in System Outages
Some primary events that can occur to services are:
* A DDoS attack could easily affect both the primary and alternate primary site.
Steps in recovery vary from identifying the source of the attack to notifying
appropriate authorities.
* Security Breach: Recovery from security breaches is straightforward, but is
often consuming, and potentially disruptive to the services hosted on the
affected systems. Certain security breaches can disable a service for the
duration of the recovery and cleanup activities. Proper steps can result in
restoration of security and system service.
* Data Corruption: In the event of data corruption at both the primary and
alternate primary sites, data will be restored either from online disk backups
or onsite tape backups, or offsite tape backups.
Conclusion
VeriSign ensures its recovery procedures for rapid recovery in critical events
through:
* Established, robust recovery procedures: Promotes quick problem resolution and
root cause analysis to continually improve processes and operations.
* Extensive infrastructure investment providing redundant hardware and software:
Minimizes risk of servicewide failure and isolates impact of any component or
device failure so that overall system availability and operation are not
affected by cascading failures.
* Geographically dispersed resolution sites and NOC: Isolates impact of local
service interruption caused by catastrophic events in an area to maintain
service continuity. |
|
(xix) Technical and other support. Support for
registrars and for Internet users and registrants. Describe technical help
systems, personnel accessibility, web-based, telephone and other support
services to be offered, time availability of support and
language-availability of support. Ability to support new and emerging
technologies.
|
Delivering world class technical support is the result of experienced and
trained staff and a commitment to respecting a customer. The point of contact
for support says everything about a company - it can even be considered by the
customer to be the heart of the business. Securing the right technical/customer
support organization to service .net registrars is critical. Amid elevated
stakes for domain names registrants, registrars must be able to rely on
registries. VeriSign provides our customers with technical and other support
services that provide stable and scalable systems, streamlined processes,
highly-trained and knowledgeable service staff, resulting in the speedy
resolution of a registrar's service issues.
VeriSign advantage:
+ VeriSign's commitment and investment in world-class facilities and services
have been essential to delivering reliable support services to registrars,
Internet users, and registrants.
This section describes VeriSign's commitment to providing high quality technical
and other support capabilities to all constituents, including:
* Technical help systems, personnel accessibility, web-based, telephone, and
other support services. VeriSign provides world-class online help services using
the worldwide web, email, and by telephone. We provide outreach services to
registrar and registrants to educate them about products and services before
they launch to better serve their needs.
* Time availability of support and language-availability of support. VeriSign's
support staff is reachable by telephone and email every day, all the time. Many
members of our support team are multi-lingual. We provide real-time translation
services using the AT&T Language Line for languages not spoken by our staff.
* Ability to support new and emerging technologies. Our support team regularly
surveys customers to ensure that our methods and practices meet their needs as
they build new and innovative systems. We perform ongoing reviews to ensure that
our own systems use the most effective, up-to-date solutions. We train our
support staff in the use of new tools and technologies. VeriSign's ATLAS
platform provides protocol-agnostic features that allow us to deploy new systems
using existing technology, reducing the need for support training on new systems.
Technical Help Systems
VeriSign provides comprehensive, leading-edge technical help systems. Our
customers benefit from the following systems:
Knowledgebase. Registrars, Internet users, and registrants are able to search a
database of solutions online at their convenience. This knowledgebase is
maintained to provide accurate, thorough support for .net registrars.
Customer Relationship Management (CRM) System. Advanced, commercial grade CRM
software tracks all customer contacts and registrar incidents. The CRM solution
provides registrars with the following capabilities:
* Access support through multiple access channels (i.e., telephone, email,
website, knowledgebase, online information, and publications)
* Access an online database of solutions to various issues
* Submit service requests online at their convenience via the VeriSign website
* Monitor and track online, real-time progress on their service request as well
as review historical service requests
* Provide real-time feedback on the service and timeliness of response to their
service request
The CRM solution provides support and technical support teams with the ability
to conduct the following functions:
* Organize customer contact information
* Organize company information
* Organize/maintain customer contracts
* Capture registrar problem data
* Respond quickly to customer inquiries
* Categorize requests by access method (email/telephone)
* Create case types (e.g., billing problem, technical problem, etc.)
* Dispatch customer requests to other departments for tier 2, 3, and 4 support
* Assign a level of priority to each customer request
* Provide real-time status/update on service requests
* Create a database of solutions to various issues
* Generate detail and summary operational reports
When a service request is submitted (whether by registrar online or by VeriSign
support professionals), a service request number is automatically assigned and
provided to the registrar, which enables them to monitor and track resolution
progress and timeliness online. For issues requiring technical expertise and/or
product management specific assistance, which are beyond what the Customer
Service Center (CSC) can provide, service requests are dispatched to Tier 2, 3,
or 4 support teams who use the same CRM system. This includes: our NOC, systems
administrators, database administrators, networking support, software engineers,
QA engineers, and product/business management teams. Once the registrar's
problem is resolved, the service request is closed. The registrar then has the
opportunity to provide feedback via an online survey regarding the service and
timeliness of response.
Daily operational reports are generated showing the status and duration of all
open issues, with the support team who is currently working the service
requests. The system provides a permanent audit trail of actions and statuses
that can be accessed in real-time by registry business, technical, and customer
service staff. The overall service metrics described in Table 5(b)xix-1 are
monitored to ensure customers receive world-class service. Effective Q2 2005,
CSC will have the ability to measure first contact resolution.
Registrar Client. VeriSign provides registrars with a web-based management tool
that allows registrars to be self-sufficient for many domain name or name server
issue. The features of this tool include:
* Domain name maintenance: modify, delete, renew, transfer
* Name server maintenance: create, modify, delete
* Registrar account maintenance: add/delete contacts, check account balance.
Personnel Accessibility
The VeriSign in-house 24x7x365 CSC can instantly escalate technical issues to
the appropriate resource if an issue cannot be immediately resolved.
VeriSign strives for a continuous improvement in this process and has
implemented a series of feedback mechanisms to assist in identifying areas for
improvement. Feedback is sought from registrars on an ongoing basis, as well as
through a more formal annual survey. The survey measures overall registrar
satisfaction and is administered online. VeriSign customer service has
maintained at least an 8 rating for overall satisfaction on a scale of 10 (with
8 being defined as world-class service) since the survey started in 2000. In
addition, VeriSign is currently the only registry that conducts annual
third-party audits of internal controls and customer service methods to certify
that registrars gain equal access to VeriSign regardless of size or volume.
Web-based Support
VeriSign offers access to a website that provides important information and
tools for registrars, which is accessible through a user identification (ID) and
password assigned by the CSC as part of the certification process. Through this
secured access, registrars have tools that provide access to the registry
database, enabling registrars to perform account updates, domain name, and name
server maintenance. As requested by the registrars, the customer support team is
able to access the registrar's account and update the same functions, as
directed. VeriSign finance staff is also able to add or transfer funds within
the registrar accounts through these same tools. Processes are in place to audit
and log all modifications.
The following list covers the types of material available to the registrars on
the VeriSign website:
* How to Become a Registrar. Step-by-step process informing potential registrars
of all requirements, including online forms and agreements that can be easily
downloaded for quick execution.
* Registrar Reference Manual. A one-source reference manual that covers various
registry-registrar protocol usages and miscellaneous registry functions related
to second-level .net domain names.
* Registrar Tool User Guide. Provides registrars specific information about
transactions with the registry, including domain name administration, name
server administration, registrar administration, and reports.
* Software Developer Kits (SDKs). Registrars have online access to SDKs in C and
JAVA that assist them with technical integration and certification to the .net
registry. SDKs contain the source code required to perform RRP and EPP commands.
Customer service is available 24x7x365 to assist registrars who have difficulty
downloading SDKs. The SDKs available online include: EPP software development
kits, IDN software development kits, EPP example instance and schema files, and
RRP software development kits.
* Registrar Reference Manual. A nontechnical guide to using EPP and
understanding the business rules applicable to domain and name server registrations
* OT&E Acceptance Criteria.
* Contains the tests cases required to be certified as a registrar
* OT&E environment allows registrars to test their system before taking the
formal exam.
* Registrar Listing and Contact Information. Lists all registrars and their
contact information to facilitate ease of communication
* Performance Measurements. Online charts that display registry operational metrics
* Registrar Mailing List. This list is an opt-in service that provides a forum
for registrars and VeriSign to discuss issues related to registry/registrar
operations
* Registrars Email List Archives. A repository of all emails sent to the
registrars mailing list.
* Frequently Asked Questions (FAQs). Lists the most popular questions and
answers received across a wide range of topics, including ramp-up process, OT&E
processes, production environment, domain status, transfers, Whois, billing,
general support inquiries, etc.
* Calendar of Scheduled Maintenances. Lists upcoming maintenance services to be
performed over the next 60 days.
* News Bulletins and Archives of all Registrar Communications/Advisories:
- These announcements focus on customer feedback, bugs, changes in policy, etc.
- History of all communications with registrars.
- Links to Registrar Tool: Direct link to the web-based Registrar Tool
(described below).
Telephone Support
Registrars have the ability to contact VeriSign customer/technical support by
telephone any 24x7x365. VeriSign uses standard telephone technology, including
automatic call distribution, voice mail, and paging. Phone calls are routed to
the most appropriate support professional for immediate assistance. Phone
service is measured and key metrics are tracked to ensure world-class service.
This year, registrars continued to rank customer service with high marks in
phone support. On a scale of 10 with 8 identified as "world-class" service, our
support team received the following rankings for telephone support:
Time to reach customer service rep: 8.41
Courtesy and professionalism of customer service rep: 8.67
Other Support
Other support available to registrars includes the following:
Email Support. Registrars have the ability to contact VeriSign
customer/technical support by email 24x7x365. Customers can either send an email
directly to our support email address or can opt to use the web-based service
requests system. Either way, all customer emails are immediately received by the
customer/technical support teams. As with phone calls, emails are routed to the
most appropriate support professional for immediate assistance. As in previous
customer surveys, registrars continue to provide customer service with high
marks for email support. On a scale of 10 with 8 being identified as
"world-class" service, our support team received the following rankings for
email support:
Timely response to request: 8.6
Quality of query resolution: 8.0
Industry Events/Conferences. VeriSign actively participates in many other
industry events, such as ICANN meetings, DNSSEC consortium meetings, IBC
Euroforum's ISP conferences, IDN workshops, and the Registry constituency meetings.
Seminar/Webinar Presentations. VeriSign hosts events that bring together
industry leaders and influencers, both technical and business, in an open forum
to discuss the future of the Internet for a specific regions and the world.
Events occur annually in the U.S., Europe, Asia, and South America.
Additionally, VeriSign periodically hosts Webinars to registrars to facilitate
communication about upcoming launches of ICANN consensus policies (i.e.,
transfer dispute resolution policy), share results of market research, and
announce special marketing programs available to all registrars. Multiple
Webinar sessions are offered to accommodate our customer's geographic location
and time zones.
Marketing Tools for Registrars. VeriSign has developed marketing tools to help
registrars and resellers improve renewals and increase domain name sales. Use of
these tools is entirely optional. All tools and information can be made
available by registrars to resellers. Tools include:
* .net Logos for Registrar Website. A .net logo that easily notifies consumers
that .net domain names are available for registration on registrar websites.
* Online Banner Ads. Online information available to registrars that includes:
"Suggestions for Banner Placements," "Directions for Downloading Banners," and
"Internationalized Domain Name Banners," which are available in English,
Japanese, Korean, Simplified Chinese, and Traditional Chinese.
* Domain Name Renewal Tools:
- Best Practices Guide for Domain Name Renewals. A guide that provides
recommendations for registrars toward building a comprehensive and effective
renewal program, or increasing the effectiveness of current renewal programs. It
includes tips for renewal programs and recommended schedule for addressing renewals.
- Sample Renewal Messages. VeriSign has prepared a set of email templates (HTML
and TXT) for registrar's use in renewal campaigns. In addition, we offer
prepared IDN-specific messaging that highlights the value proposition for IDNs
as related to renewals. These templates are available in English, Simplified
Chinese, and Korean.
- Marketing Reports on Expiring Names. VeriSign generates reports for
registrars that provide a list of domain names due to expire within 30, 60, and
90 days.
* Retail Demo Tools. VeriSign has developed these tools and resources to help
registrars integrate new products and services with existing order flows.
* TLD Zone Files. VeriSign maintain files that contain data describing a portion
of the domain name space for specific TLDs.
* Newsletters. VeriSign publishes a monthly industry newsletter available to all
registrars on an opt-in free subscription basis. Its primary audience is
ICANN-accredited registrars and selected domain name industry professionals.
Time Availability of Support
VeriSign staffs the CSC 24x7x365 to handle the variety of registrar issues that
can surface in support of day-to-day operations, thereby enabling support to
registrars in their own time zones. These issues normally focus on connectivity
problems, registration problems, reporting problems, domain name or name server
maintenance, and billing issues. Onsite coverage allows the support team to
escalate issues immediately to either business or technical teams on behalf of
the registrars. Onsite support also assists with internal company escalations.
If any issues require communication with the registrars, the CSC can communicate
the problem to the registrars and handle questions or concerns from the customer
base. Refer to Figure 5(b)xix-1 for VeriSign's support and escalation process.
Language Availability of Support
The CSC is capable of providing assistance in English, Spanish, German, and
Farsi. In addition to using multilingual support staff, VeriSign users a
real-time translation service, AT&T Language Line, to assist with telephone and
document translation needs. AT&T Language Line is a 24x7x365 operation that
offers instant access to interpreters versed in more than 150 languages, which
represent 98 percent of all language needs. If a registrar is having a problem,
the CSC can instantly initiate a conference call to an interpreter and collect
information regarding the registrar's problem. The features and benefits of this
service are described in Table 5(b)xix-2.
Occasionally, foreign language emails are received at the CSC. If the language
is not covered by the multilingual CSC support staff, an online translation tool
is used to interpret the email. Telephone calls remain the primary use for AT&T
Language Line.
Geographic and Time Coverage of Services Offered
VeriSign supports a global network of registrars with diverse technical and
linguistic capabilities. In an effort to provide the best possible support to
all registrars, VeriSign has implemented a variety of tools and policies to
support registrars wherever they might be located. We have recently opened
additional offices in Asia to supplement current support available from offices
in Europe, North America, and South America.
Ability to Support New and Emerging Technologies
VeriSign will continue to encourage customers and the Internet community to
build solutions, using new and emerging technologies. Customer/technical support
teams recently upgraded support tools to a leading-edge customer relationship
management solution. We will continue to leverage this application and its
future enhancements to continue to provide world-class service to our customers.
We will continue to survey our customers regarding their support needs and
deploy appropriate technologies and solutions that deliver desired solutions.
Recently, VeriSign was selected as a Top 100 Innovator by Red Herring. This
prestigious award recognized our development of the ATLAS, a sophisticated
software platform designed to scale the Internet's core infrastructure, the DNS,
and support the convergence of voice and data networks. ATLAS is a protocol
agnostic convergence platform designed to connect any network, whether data or
voice-based. It has been built with future growth in mind in that it isn't just
a bigger "box," but has intelligence built into it to allow for the system to
scale to more than 400 billion queries per day and 24,000 updates per second.
Additionally, ATLAS is incredibly reliable, with 100 percent data integrity and
service availability, since the service was launched in 2002. Because ATLAS is a
technology neutral convergence platform that can connect to any network, it is
able to support new protocols, facilitating the implementation of new services.
Conclusion
VeriSign is committed to continue offering a high quality technical and customer
service support. Our well-regarded system is proven effective and efficient. We
are prepared to continue providing this support throughout the next term of a
.net Agreement and for all newly emerging technologies. |
|
|
|
|