Registry Advantage Test Plan. .........................................................................1
The purpose of this paper is to describe the automated and manual tests that were conducted to validate the functionality, performance and scalability of the Registry Advantage infrastructure and applications.
To maximize the validity of the tests, the full .org dataset was loaded into the registry database. In addition, the hardware and network configurations used in the tests were those that are already in place at Registry Advantage’s primary data center and are proposed for operation of the .org registry (for details, see C17.1). The test methodology utilized strict definitions of test criteria and expected results as a priori inputs to each test case. The expected test results were established based on several triangulated inputs where ever possible; e.g.: ICANN advisories, SnapNames’ State of The Domain reports, Registry Advantage’s experience as a registry outsourcing provider and public information supplied by VeriSign.
The SRS tests included queries of all the major types (adds, checks, infos, deletes) combined into test sets that were representative of ‘typical’ loads as well as atypical ‘add storm’ loads. The DNS tests covered a range of .org query mixes (successful queries, failed queries, malformed packets, etc.) conducted in a random order to determine maximum queries per second (q/s) and round trip times (rtt). The Whois tests were performed using a similar methodology to determine maximum queries per second and round trip times.
The following key selected test results were achieved:
In addition, this document provides the results of high availability
tests of the hardware, database, network and data interconnects.
The purpose of this paper is to describe how the Registry Advantage infrastructure and applications were validated by automated and manual testing. The definitions and procedures in this document were used for conducting and analyzing the validation process.
A boundary test is one that approaches a known limit. As the limit is approached, behavior is well known. At the limit, and beyond, behavior may be undefined. This type of test is designed to ensure that a system behaves normally at least until the known limits are reached. This is sometimes referred to as "limit testing".
A capacity test is one that determines the capacity of a system or service by measuring its ability to process data in terms of total volumes in a given time quantum. An example of this may be the total number of DNS queries a system can process per second. Resource sizing is also a possible metric. For example, the total number of records a database can store.
A compatibility test is one that determines whether or not a system is compatible with another system, an existing implementation of the same system, a published API, or a known protocol.
This type of testing is used to determine the behavior of a system during a fault condition. These fault conditions may be intentionally introduced hardware failures (or simulated failures) as well as broken interconnects and inoperable services.
A regression test is conducted in such a way that all functionality for a system is tested. This differs from "unit testing" or "modular testing" in which only a specific system module, or set of functionality, is tested. The significance of regression testing should not be underestimated. A simple change to a system may pass its unit test, but have side effects on other parts of the system that will only be found with regression testing. The regression testing conducted by Registry Advantage includes the sum of all individual tests and test cases described in this document.
For each area of testing, the validation criteria taken from specific known behavioral patterns, and industry norms where specific data was not known in advance. These were used as the expected results documented for that test. If the actual results did not fall within the accepted results set, the test failed and remedial action was taken. If the actual results were within the expected results set, the test passed. In all cases the validation criteria were documented for each test in advance.
Validation for a test suite was given only when all tests for that suite passed successfully in the same testing cycle.
Data Points: An e-mail from VeriSign’s Scott Hollenbeck to the ietf-whois mailing list in January 2001, indicated that the VeriSign registry’s Whois servers performed approximately 30 million queries per day.  Dividing the queries equally over the course of a day yields a rate of 347 queries per second. This rate applies to total queries done for all .com, .net, and .org (CNO) domain names.
Typical Rate: Registry Advantage presumed that the percentage of queries related to .org domain names would be roughly equivalent to the portion of .org names in the CNO database, or approximately 10%. This analysis suggests a typical Whois query rate of approximately 35 queries per second for .org names.
Peak Rate: Based on its experience as a registry operator, as well as on the experience of its parent company, Register.com, which provides Whois service for over two million domain names, Registry Advantage has determined that peak Whois query rates will be as high as ten times the average rate. Consequently, the peak Whois query rate for .org is estimated to be approximately 350 queries per second.
Data Points: A September 2000 ICANN statement on GTLD registry best practices indicated that the “A” root server, which was also acting as a GTLD server at the time, handled approximately 5000 DNS queries per second, with peaks as high as 8000 queries per second.  Similarly, a presentation by David Conrad of Nominum to the ITU in January 2001, indicated that moving GTLD data from the root servers to independent name servers resulted in a shift of approximately 5000 queries per second to the GTLD servers. 
Typical Rate: Assuming all 13 root servers experienced typical loads to the “A” root server, Registry Advantage calculated that the total number of DNS queries per second to the root server constellation was approximately 65,000 on average. Registry Advantage assumed that growth in DNS queries would roughly correlate to growth in the total number of hosts on the Internet. Telcordia Technology’s Netsizer tool  indicates that in September 2000, there were 91.2 million hosts, growing to 189.8 million hosts in May, 2002, a ratio of two to one. Consequently, in order to account for growth in the number of DNS queries since September 2000, Registry Advantage assumed that doubled this number, yielding a typical requirement of 130,000 queries per second across the entire DNS constellation. Once again, the assumption was made that approximately 10% of the typical CNO traffic would apply to .org queries, resulting in a system-wide typical requirement of 13,000 queries per second. This requirement is spread over a total of eight locations, resulting in a typical requirement of 1625 queries per second at each site.
Maximum Rate: From the ICANN data point, Registry Advantage knew that the maximum rate of transactions experienced by the “A” server was approximately 60% greater than the typical rate, resulting in a value of 2600 queries per second. However, because it is possible that traffic may not be evenly distributed across all sites, Registry Advantage nearly doubled the expected maximum rate, to 5000 queries per second, anticipating that this would be the peak load at the site with the greatest amount of traffic.
Data Points: Due to a lack of sufficiently granular data, creating typical data sets for testing purposes, as well as expected results, was challenging. Registry Advantage correlated numerous data points in order to make an estimate. The total number of domain names in the .org database, approximately 2,700,000, was used as a baseline against which the number of domain registration events was calculated. VeriSign’s presentation to the North American Network Operator’s Group (NANOG) in February 2002  provided raw data for the number of CNO failed write and check domain events as of December 2001 at 420 million and 3.6 billion, respectively. Other data sources were used to cross-check various assumptions and generate the full “typical load” test data set. These included various SnapNames State of the Domain Reports (SOTD), as well as ICANN’s Second Advisory Concerning Equitable Allocation of Shared Registration System Resources  .
Typical Rate: For the purposes of this testing, add and renew commands were considered to be identical as they have similar impacts on the registry systems. Registry Advantage derived the typical number of add commands based on the total number of .org domain names. Although the total number of .org domain names is slowly declining, the rate was considered to be low enough that for the purposes of determining these rates, Registry Advantage assumed that all domains would be either renewed or re-registered in the month that they expired. An additional assumption was made that the renewal dates of these domains were evenly spread throughout the calendar year. On this basis, it was determined that approximately 225,000 registrations were re-registered or renewed each month. Registry Advantage further assumed that registrations would only occur on one of approximately 20 business days per month, and that all registration events would occur within a twelve hour period per day. These assumptions yield a rate of 938 registrations per hour, or approximately one every four seconds.
To determine the typical rate of check commands, Registry Advantage noted that the “add storm” events that began in approximately June 2001 seemed to be responsible for roughly two-thirds of all check commands, or 2.4 billion of the monthly total, leaving 1.2 billion check commands as part of the typical usage pattern. Once again, only about 10% of the total CNO usage can be attributed to .org, resulting in a total volume of 120 million checks associated with .org names. According to the ICANN advisory cited above, add storm activity was concentrated within a four hour window each day. Registry Advantage spread the 120 million check commands across the remaining 20 hours of each day, resulting in an hourly rate of 200,000 check commands, or 56 per second.
In a similar vein, Registry Advantage analyzed the data to determine the expected typical number of changes, deletes and info transactions. Registry Advantage derived the number of change transactions based on its knowledge that the ratio of adds to changes in its ccTLD registries is approximately 2:1. The expected deletes were based on the turnover and renewal rates typical with com/net/org: approximately 50% of adds and renewals. The number of info queries was based on the data from VeriSign’s North American Network Operator’s Group presentation. Registry Advantage distributed info commands in the same manner as the check commands, deriving expected typical rates of 8 per second.
Peak Rates: Peak rates of domain creation were assumed to occur in situations in which registrars were performing batch processing of registration events. Consequently, the peak rate is unlikely to be correlated to externally observable trends. Based on its experience as a registry operator and previous experience of various staff members working at its parent company, Register.com, Registry Advantage estimated that peak rates during these batched events would be unlikely to exceed fifty registration events per second.
Peak check command rates were used by spreading 240 million check commands across the remaining four hours of each day, resulting in a rate of 2,000,000 checks per hour, or 560 checks per second. Registry Advantage believes that this estimate may significantly overstate the peak requirement, as add storm activity is likely to relate disproportionately to .com domains, which are considered more valuable in the secondary market.
The peak expected changes and deletes were considered to be potentially similar to the peak amounts of add transactions. For the info transactions, based on the VeriSign North American Network Operator’s Group data, the expected peak number of info commands is expected to be ~25 per second.
Each of these areas will be tested with its own test suite:
Registry Advantage has developed a proprietary DNS server (RA-DNS) designed specifically for extended capacities. The testing done against this server included a dataset of 5 ccTLDs, as well as the most recently available .ORG dataset. The number of queries per second and the average round trip time per query (measured in milliseconds) were used as the metric for these tests. The expected results were taken from leading industry service metrics where ever possible, but many of these metrics were not available prior to this test, and so were extrapolated from whatever available data we could obtain.
Each test was performed with 400 query clients to generate the load.
The MAX success test was done by having every domain in .ORG randomly
queried a total of 4 times each across all 400 clients. The remaining
tests used both .ORG domain data and manually created data to produce
desired query mixes (such as failed queries where the domains are not
present in the zone, and malformed packets).
The following tests were performed:
In the course of our RA-DNS capacity testing, we were unable to fully stress the server as we did not have the load generator capacity to do so. These numbers so exceed any of the projected requirements, however, that they are reported them as a client constrained maximum result set.
The SRS application cluster was tested for capacity in a number of ways. The metrics used to measure capacity for these tests were taken from publicly available data from ICANN, SnapNames SOTD reports, our own experience as a registrar and registry outsourcing provider, and public information supplied by VeriSign to NANOG. From NANOG and SOTD data for December 2001, we extrapolated a typical hourly load consisting of approximately 0.03% adds, 10.14% failed creates, 87.15% successful checks, 2.30% infos, 0.12% changes, and 0.06% deletes, and scaled these for today's volumes. In addition, we considered the impact of an "add storm" as described in the ICANN advisory from August 10  2001 on the performance of the typical load base case. The add storm we generated was more than double the volume reported in the ICANN advisory. All of these figures are in excess of current transaction trends.
We measured the number of successful adds, checks, changes, deletes, and info commands on a test cluster of 3 servers, and observed linear scaling. We then tested the MAX performance for what is overwhelmingly the most common command – check. Last, we tested our capacity to support client connections by testing the maximum number of connections per SRS cluster member, with linear scalability. These expected results are taken from the public information mentioned above. We also project that even under an add storm of outrageous proportion, due to the linear scaling of our cluster, we would have no problem managing the additional load.
These are the tests we performed:
These tests were performed against a cluster consisting of three SRS
cluster members, with linear scaling in most cases. The actual compliment
of SRS cluster members in production is 2N, or six cluster members. Therefore,
in the general case where we are at full capacity, performance will be
twice the actual results.
The Whois application cluster capacity was tested for MAX queries per second under an extreme load, while tracking the round trip times for these queries. It was then subjected to a similar load where 50% of the queries were for objects not present in the loaded data set. The expected results for this test come from internal experience and publicly available information on the VeriSign Whois service supporting .COM, .NET, and .ORG (concurrently). The objective of the test was to achieve at least the peak capacity under atypical peak loads.
The tests we performed were:
The Whois cluster members scale nearly linearly in our load balanced configuration. To support the peak query requirements for the entire Whois service of an estimated 350 queries per second, the standard cluster size of three was found to significantly exceed the performance requirements, with cluster performance exceeding 815 queries per second. A 2N cluster of six Whois servers with this sizing will be deployed at both the primary and secondary sites to support the peak load requirements.
The DNS functionality testing consisted of testing the RA-DNS for appropriate response codes and data in all sections of the DNS response packet to various type of requests. This same test was performed against a BIND server running on identical hardware with identical data as a reference case, in addition to referencing the RFCs. We also testing RA-DNS with a variety of DNS client resolver platforms to ensure compatibility with each. The tests included:
In every case the RA-DNS response was correct. In all of the cases we tested the performance was markedly faster than the reference BIND server under the same fault conditions.
Registry Advantage supports two SRS protocols currently – EPP v06/04, and SRP v1.1. Each of these has a suite of automated tests associated with them that validate proper functionality.
The SRP tests consist of the following:
The EPP is very similar to the SRP at a high level, but differs quite a bit in the details of how it functions. The following tests were conducted against our EPP implementation:
These tests are rigorous and very comprehensive. They are automated and so can be run at will against a target SRS cluster for validation.
The Whois protocol functionality testing consisted of validating the service against the RFC 954  definition of port 43 Whois, as well as Registry Advantage's own strict formatting requirements. The following tests were performed:
As with the other functionality tests, these tests are fully automated and can be re-run at will.
The Oracle Cluster consists of a pair of Sun Enterprise 6500 servers configured with identical hardware and interconnects. These host the active and standby Oracle instances, synchronized at the application level. The metrics used to assess the high availability of the Oracle database infrastructure were the amount of time it took to recover from a failure, expressed as our Recovery Time Objective, as well as our ability to recover the database without data loss, expressed as our Recovery Point Objective.
The following faults were introduced to implement this test:
The network and data interconnect testing consisted of introducing faults to determine the behavior of the network and data architecture under failure conditions. The expected results for each test were taken from either documented behavior for the device or interconnect and from industry standard expectations.
The following tests were performed:
The application cluster redundancy was tested for site independence and DNS service availability. Faults were introduced to simulate various types of site failures and the response of the overall collection of satellites was measured for each test.
The following list of tests was performed:
The combination of boundary, capacity, compatibility, and fault testing represent the complete set of regression tests run by Registry advantage to validate its applications and infrastructure. These tests demonstrate the ability to operate within a broad range of operational circumstances with expected results, as a predictable level of performance and stability.