C16. The third section of the .org Proposal is a description
of your technical plan. This section must include a comprehensive, professional-quality
technical plan that provides a full description of the proposed technical
solution for transitioning and operating all aspects of the Registry Function.
The topics listed below are representative of the type of subjects that
will be covered in the technical plan section of the .org Proposal.
C17. Technical plan for performing the Registry Function.
This should present a comprehensive technical plan for performing the
Registry Function. In addition to providing basic information concerning
the proposed technical solution (with appropriate diagrams), this section
offers the applicant an opportunity to demonstrate that it has carefully
analyzed the technical requirements for performing the Registry Function.
Factors that should be addressed in the technical plan include:
This section will discuss all of the technical aspects of implementing the .org registry. It will provide a complete description of the technical solution, implementation, operation, and maintenance of the .org registry.
C17.1. General description of proposed facilities
and systems. Address all locations of systems. Provide diagrams of all
of the systems operating at each location. Address the specific types
of systems being used, their capacity, and their interoperability, general
availability, and level of security. Describe buildings, hardware, software
systems, environmental equipment, Internet connectivity, etc.
Due to the number of distributed users and the critical nature of the .ORG Registry, the facilities and systems used will be state-of-the-art, well-tested and highly available. The deployment strategy has four main components:
The .ORG Registry will be housed in seven (7) world-class data centers as shown in Figure 1. Each data center will have at least the following:
- Geographical diversity - Strategically place servers throughout the world. This will withstand "acts of God", war or terrorism
- Geographical distribution - Place servers close to those who will use them and near the backbone. This is for performance.
- Provider diversity - Use a variety of vendors and facilities. This will create redundancy for network availability and stability in the event a provider becomes insolvent or goes out of business.
- Redundancy - Use stateless replication in the system architecture to provide for scalability, fault tolerance and stability.
- Physical security: badge readers with PIN, security cameras, escorted access and guards
- Power: power distribution units with UPS capability for 15 minutes of uptime and generators for indefinite power availability
- Bandwidth: redundant multi-homed burstable bandwidth. 6Mb burstable at Operational Centers and 3Mb burstable and regional data centers
- Environment: Climate control including HVAC, humidity and fire suppression
- Support: 24x7 onsite, remote hands support.
Figure 1 - .ORG Data Centers
There will be two types of data centers: 1) Distributed Services Data Centers ("Regional") that will house distributed real-time gTLD DNS servers and 2) Operational Data Centers that will not only house DNS servers but also the operational systems (e.g., registry database, whois, billing, etc.).
Seattle, WA (Main Operational Data Center)
The main Operational Data Center will be collocated within an Internap data center in the Fisher building in Seattle, WA. It will have redundant DS3 connections for maximum speed and reliability.
This private data center is located within a bank in Wheaton, IL. It will be a Regional Data Center with bandwidth being provided by PSI Net and Genuity.
This collocation data center is provided by Equinix and is located in Ashburn, VA. It will be in an Internap cage with redundant 10Mb throughput.
Houston, TX (Backup Operational Data Center)
A complete replica of all the operational systems will be colocated in the Worldcom data center located in Houston, TX. This redundant data center will act as a "cold" spare system to be brought online in case of catastrophic failure. In case of catastrophic failure, the complete system can be switched over in less than 15 minutes. The system can be switched back to the main data center in less than 5 minutes. The "cold" spare system will be protected by a locked firewall (no production traffic may pass) and all production code pushes will be done in both the Main Operational Data Center and the Backup Operational Data Center.
This is a Worldcom collocation data center and will act as a Regional Data Center.
This is a Worldcom collocation data center and will act as a Regional Data Center.
This is a Worldcom collocation data center and will act as a Regional Data Center.
The following hardware has been used successfully at eNom for its highly-available registrar services. So, the design and configuration will be replicated for the .ORG Registry implementation.
Note: For fault-tolerance, all Intel-based servers will have RAID1 hot-swappable, mirrored hard drives for their OS.
- F5 Networks BigIP HA+ will be used for load balancing and content switching
- F5 Networks 3DNS will be used for distributed fault tolerance (switching to cold standby)
- Cisco 7206 NPE 300 Routers
- Cisco 2900 and 3500 Series Catalyst switches
- Netscreen 100 Firewalls
- Intel Servers with dual 750 MB processors with a minimum of 1 GB of RAM running Windows 2000 Advanced Server
- Database servers will be Compaq Proliant 8000 Servers running Windows 2000 Advanced Server and SQL Server 2000.
As shown in the example of Figure 2, the network design in all data centers will have complete redundancy. There will be no single point of failure in any of the network equipment. Also redundant but functional servers will be placed on separate switches. Because scalability will be achieved through stateless replication, Virtual IP's (provided by Big IP's) will be used to load balance traffic and to provide for failover. The load balancing hardware will participate in monitoring the overall health of the individual systems and will remove poor or non-performing systems from participation.
A dedicated /20 subnet from ARIN will be used for the .ORG Registry. BGP load balancing will be utilized between the bandwidth providers.
In Operational Data Centers all backups and bulk data transfers betweens systems happen on a separate VPN over a second network card in the servers. This prevents non-critical data from adversely affecting OLTP traffic.
Figure 2 - Sample Redundant Network Design
All of the logical components required to operate the registry are shown in Figure 3 below. Each system component will be load balanced with a virtual IP across multiple servers. This allows for scalability, redundancy, and uninterrupted system upgrades.
Figure 3 - .ORG Registry Infrastructure
The database will be a 4-node cluster with a shared fiber-channel RAID system. The database will be distributed into four functional areas including Registry, Check, WHOIS, and Reports. Each node (or server) in the db cluster will be able to access the shared RAID array of hard drives and will be able to take over the job of any of the other nodes in case of system failure. Other system components that will be housed in the main Operational Data Center include DNS, RRP/EPP, FTP, Web, WHOIS, Backup and archiving, and Utility servers. Each of these components will be described in more detail throughout this proposal.
As mentioned, the gTLD name servers will be hosted different locations throughout the world. Each location will have several DNS servers load balanced through a single IP address and will be protected with a firewall allowing access through port 53. eNom's proprietary real-time dynamic name servers will be served with real-time zone changes via data replication. Data updates will be delivered in a secure manner over a VPN to each name server and will have a latency of up to 5 seconds.
C17.2. Registry-registrar model and protocol. Please
describe in detail, including a full (to the extent feasible) statement
of the proposed RRP and EPP implementations. See also item
The domain name registration and maintenance software shall provide all existing functionality provided by the current registry operator (Verisign) for .ORG top-level domains. Existing registrars will be able to continue to use existing RRP software and tools to continue registering and maintaining org domains as per RFC 2832.
The design will follow Microsoft current best practices for scalable, high performance servers. The core design will leverage multiple processor systems to allow it to scale up when installed on more powerful hardware.
The software leverages multiple processors and maximizes performance through the use of LIFO thread pools, IO completion ports and overlapped socket IO. Our primary strategy for scalability is achieved though scaling by adding additional servers when demand deems it necessary. All non-essential activities will queued or handed off to others servers in the system to handle (such as activity logging).
The RRP / EPP server software has a high performance, component-based event architecture for command processing. Unlike a monolithic service, this design can be extended through event sinks that can process a command as it is passed through the system as shown in Figure 4 below.
Figure 4 - RRP / EPP Server Software Architecture
By designing a component-based system, we not only can provide the basic functionality but also can extend it in directions we may not have foreseen. It also allows additional logging and monitoring when issues arise and need investigated without experiencing down time for the system and can be removed when complete to avoid performance degradation.
This architecture is a proven design. It is currently in use in eNom's production environments running everything from DNS servers, reseller API servers as well as our registry connection management servers. The framework is similar in design to Microsoft's IIS and SMTP servers.
The server will support both RRP and EPP protocols in the current form of the draft. RRP commands will be translated into EPP commands in an earlier loaded plug-in, then translated back to and RRP response in a final plug-in as shown in Figure 5 below. When a transition has been made for all existing registrars over to the EPP protocol, the RRP filters will be removed leaving the EPP protocol remaining.
Figure 5 - EPP / RRP Translation
The RRP / EPP servers will be protected by redundant firewalls. The true IP of the servers will be hidden through a virtual IP from the load balancer. Strict coding practices will be used to avoid unchecked buffers. Registrars must register IP subnets with eNom to allow access through firewall. The server will perform mutual authentication on the connecting registrar so a valid SSL certificate will be required. The certificate must be signed from a trusted certificate authority as specified by eNom.
A comprehensive test suite will be compiled and run on every build of the server. These tests will cover a variety of areas including functional, boundary, stress, security, constricted resource, performance, network failure and hardware failure. Every anomaly found will be turned into a regression test and added to the suite to prevent reoccurrence. Builds are done on a regular basis from snapshots of the source tree then propped to a development server for testing. Builds of the server, with full source, are archived for reference or emergency.
The committed availability of the system will be at 99.4%. Individual servers can be removed from the system without impacting normal registry activities or performance. Operations personnel will be notified immediately when any part of the system is not performing to specifications or has been removed from participation by the load balancing hardware.
Round trip time is defined as the time from the command is received, transported through the system to the time the response is sent to the registrar (i.e., no network latency). Acceptable response times for 95% of all check commands will not exceed 3 seconds. Response time for 95% of add commands will not exceed 5 seconds. Response times repeatedly outside of acceptable levels will be logged and operations staff notified.
Through effective monitoring and ongoing capacity planning efforts, system needs will be anticipated. A complete suite of standby servers that can be seamlessly added to the production servers to relieve unusually high loads.
International domain names will continue to be supported to the same extent as they are today. Future support will be accomplished through close cooperation with Verisign and the standards community.
All data is stored in Unicode format to maximize globalization capabilities. The web site and documents will be designed such that they are localizable to a wide variety of languages.
In addtion to the standard database audit logging, a central logging system will log all transactions. The system will be designed such that logging can be done in an asynchronous queued manner to avoid bringing any undue burden on any system. Logs will be backed up and archived on a regular basis. Custom logging and tracing can be enabled on an as-needed basis by syncing to events from the server at any point necessary to help resolve customer transaction issues.
C17.3. Database capabilities. Database size, throughput,
scalability, procedures for object creation, editing, and deletion,
change notifications, registrar transfer procedures, grace period implementation,
reporting capabilities, etc.
The .ORG Registry database will consist of a 4-server (node) cluster system with a shared fiber channel RAID system. A Compaq Proliant 8000 quad Xeon processor will power each node of the database. Each server will have at least four 900MHz processors and between 2-4 GB ram. The 4-server cluster will share a fiber channeled RAID system. The RAID system will consist of Compaq Proliant 4100 storage system with 52 hot swappable 18GB hard drives. The drives will be configured in a RAID 10 or RAID 5 configuration depending on the data stored on the drive.
The cluster system provides a robust and completely fault tolerant and redundant system at each level of the system. The system could continue to operate without interruption upon drive failure, operating system failure, and server failure. The hard drive RAID configuration allows any drive to completely fail without interruption. The server cluster configuration allows any one server to take over the job of any other node upon server or operating system failure.
The .ORG database will be divided into four functional areas, with each node of the cluster system responsible for one area, but with any node having the capability to run any other functional area. The four functional areas will include Registry, Check, WHOIS, and Reports. The Registry database will be the authoritative transactional processing (OLTP) registry database which will contain all domain, registrar, name server, contact, accounting, and all other pertinent registry data. All data insertion, updates, and deletion will occur in this database. The Check database will be a replicated, read-only copy of all data required to perform checks. All check commands requested from the RRP/EPP servers will use this database to determine availability. The WHOIS database will be a replicated, read-only copy of all data required to perform WHOIS. All WHOIS requests received by the registry's WHOIS servers will be routed to this database. The final functional area of reporting will also be a replicated, read-only copy of all data required for reporting and zone file generation. This database will be used for internal reporting, registrar reports, and zone file generation published to the FTP server.
The distributed database model for the .ORG registry is designed for scalability and load balancing. The registry database will be designed for transactional processing (OLTP) and as such will use relational database design and concepts that are geared towards increased performance and scalability. Such concepts include data normalization, vertical and horizontal data partitioning, narrow indexing, and referential and domain integrity. Although it is estimated that the .ORG Registry OLTP database will only need to handle 30-40 transactions per second on average (not including check commands), the proposed OLTP database will be able to handle 400 transactions per second with peak time and spike capabilities of 700 transactions per second.
eNom's real-time dynamic DNS servers are database driven and therefore rely on standard and extremely well tested commercial data replication to deliver zone updates. Each DNS server will house a local database that contains replicated, read-only data required for DNS queries. Initial data synchronization is delivered to each DNS server via ftp. Data modifications are delivered to each subscriber (DNS server) in binary format and via a secure OLEDB connection. Data update latency will be between 1 and 5 seconds. If a particular DNS server is unavailable, transactions will be queued and will be delivered sequentially when DNS server becomes available. Data replication is coordinated by the replication distributor that will reside on one of the database cluster nodes.
All data modifications done against the OLTP database will be logged in audit tables. The information logged will include the modification made, the time, and the name of the application and server that made the modification. This data will be retained for one year to support legal inquiry. Direct, ad-hoc access into the production databases will not be allowed. Data security will be provided at multiple layers in the system. The firewall at each data center location will not allow direct connections to the database that will reside on a port unknown to the outside world. Only application servers will be allowed access to the appropriate database. For example, the WHOIS application servers will only be allowed access to the WHOIS database. Data replication that occurs outside of the Operational Data Center will be delivered over a VPN. Data updates will be delivered via a secure OLEDB connection and in an encrypted format.
Backup Operational Data Center
eNom will maintain a replica of its OLTP data in an offsite "cold" (site will not take traffic under normal conditions) spare data center. Data updates will be delivered via log shipping technologies in 15 minute intervals. The transaction log of the OLTP database in the Redmond data center will be backed up every 15 minutes. This backup will be securely sent over ftp through a VPN to the "cold" spare data center. Here it will be restored onto the local OLTP database. The "cold" spare data will be no more than 15 minutes old at any given time.
Any changes to the production OLTP database will be performed in an orderly fashion and as part of the registry development cycle. All database code will be maintained under source control and thoroughly tested in the registry's development and staging environments. Once a week, there will be a scheduled code drop into production. Any hot fixes or code role backs will be performed if and when necessary.
The .ORG Registry database will be able to handle all registry functions such as checks, registrations, transfers, renewals, deletions, statuses, name server additions and modifications, accounting, reporting, and all other pertinent registry functions.
The implementation of the registry transfer process is illustrated in Figure 6 below. The .ORG registry transfer process will mirror the current Verisign Global Registry transfer process (with the exception that transfers will be free).
Figure 6 - Registry Transfer Process
The .ORG database will also support a grace period implementation. The default grace period will be 5 days for registration and renewals and 45 days for auto-renewals, but the registrar may change this if they desire. The registry grace period implementation is illustrated in Figure 7 below.
Figure 7 - Grace Period Implementation
C17.4. Zone file generation. Procedures for changes,
editing by registrars, updates. Address frequency, security, process,
interface, user authentication, logging, data back-up.
Each registrar will have access to modify, create, and delete zone file information for domains for which it is the registrar-of-record. All modification will come through the RRP or EPP interface. RRP and EPP access security and authentication will be done on several levels. The firewall will only allow access to the RRP and EPP servers from IP ranges that were pre-approved for access. Once a successful connection is established, the registrar credentials will be authenticated and the transaction will be processed.
All EPP or RRP transactions will be logged and zone file data will be backed up as part of the registry's regular backup procedures. A complete zone file will be generated each night from the Reports database and placed on the FTP server. Each accredited registrar with an account at the .ORG registry will be given ftp access to download the complete zone file. The firewall will only allow ftp access from pre-approved IP ranges and the ftp servers will authenticate the user name and password of the registrar.
Also, since the registrar-of-record info is publicly available via the registry's whois, this information will also be provided in the bulk information to reduce the load on the registry's whois server.
Verisign has offered to keep the DNS servers up to date for one year. So, even though the DNS will be provided by eNom, the zone file will be sent to Verisign as a backup. On instruction from ICANN, the root servers can then be pointed to Verisign's backup name servers.
C17.5. Zone file distribution and publication.
Locations of nameservers, procedures for and means of distributing zone
files to them. If you propose to employ the VeriSign global resolution
and distribution facilities described in subsection
5.1.5 of the current .org registry agreement, please provide details
of this aspect of your proposal.
eNom will maintain DNS servers in different locations throughout the world in order to provide uninterrupted gTLD resolution services. As shown in Figure 8 below, each location will contain at minimum two hardware load balanced DNS servers behind one virtual IP address. This will provide a fault tolerant solution for resolving a high volume of queries.
Figure 8 - DNS Load Balancing
Every location will also be protected by a firewall and will be monitored by the operations center's 24x7 monitoring software. Each DNS server will run Windows 2000 Advance Server operating system and will consist of a minimum of dual 800 MHz processor and 2GB of RAM.
eNom's proprietary DNS software will power each DNS server and BCP0040 and RFC 2870 (Root Name Server Operational Requirements) will be fully implemented on name servers in all locations. The eNom DNS software is a modular service utilizing an extensible plug-in architecture for name resolution and administration. As shown in Figure 9 below, the service is highly optimized to take advantage of multiple processor architecture.
Figure 9 - DNS Architecture
This same software currently provides DNS service for over 600,000 domain names with 1.6 million host records (sub-domains) and has been in continuous production for over two years. The DNS software is database-driven and relied on data replication to deliver zone file updates. The mechanism and security of data replication to the DNS servers was discussed in earlier sections.
C17.6. Billing and collection systems. Technical
characteristics, system security, accessibility.
Each registrar will be required to maintain an account balance at the .ORG registry in order to register, renew, or extend domain names. Each registrar will have an account balance and an available account balance. The .ORG registry will have a centralized accounting process where multiple applications may deduct available balance and create accounting transactions. However, only one process can adjust the actual account balance and process transactions in the accounting queue.
Registrars will have the option to deposit funds in their account via check, money order, wire transfer, electronic fund transfer, or credit card. Upon receipt of fund, a registrar's available balance will immediately be increased to reflect receipt of funds. The registrar's actual account balance will be increased by the accounting system.
When a registrars available balance is insufficient for processing transactions all future transactions will be rejected until the registrar replenishes its account. The accounting system will have the ability to give registrars credit lines when needed. This will mainly be used when a payment is in transit. A credit line only effects a registrar available balance and not actual account balance.
The central accounting process maintains an audit trail of all accounting transactions and a registrar's balance is logged after each transaction. At the end of each month, each registrar's account will be reconciled to ensure that ending account balance matches beginning account balance adjusted for all transactions in the accounting system for that month.
Registrars will be provided with the ability to monitor and replenish account funds in real-time on their administration web site. The registry's accounting department will also have in-house administration interfaces to record funds received.
This system is the same system that has been in use at eNom for over three years and currently supports over 4000 reseller customers.
C17.7. Data escrow and backup. Frequency and procedures
for backup of data. Describe hardware and systems used, data format,
identity of escrow agents, procedures for retrieval of data/rebuild
of database, etc.
The registries production database will be continually backed up onto a dedicated archiving and backup server in the Main Operational Data Center. Backups will be delivered to the backup server via a separate backup network so as not to interfere with production traffic. A full backup of the OLTP Registry database will be performed each night. In addition, a differential backup will be done each afternoon, and a transaction log backup every 15 minutes. The transaction log backup will be shipped to the registry's Backup Operational Data Center upon completion of each log backup.
A full backup set will be sent on a nightly basis via ftp to the registry's escrow agent. In addition a complete XML data feed will be generated on a weekly basis out of the Reports database and placed on the backup server. The data feed will be sent to the escrow agent on a weekly basis upon generation. The registry's backup servers will always maintain 1 weeks worth of backups on hand. A third-party will be retained to retrieve the nightly backups and store them offsite.
If the need arises, the Registry's OLTP database can be completely rebuilt to point of failure from backups. A document outlining procedures for recovering and rebuilding the database will be kept with the registry's master operation manual. Furthermore, the complete registry functionality can be failed over to its Backup Data Center. Procedures to switch the registry to its alternate location will be also be included in the registry's operation manual.
C17.8. Publicly accessible look up/Whois service.
Address software and hardware, connection speed, search capabilities,
coordination with other Whois systems, etc.
The registry's WHOIS service will be hosted of its Main Operational Data Center with redundant systems in its backup location. The WHOIS service will reside on multiple servers load balanced across a single IP address. Traffic to whois.registry.org will normally be routed to this IP address. Each WHOIS server will run Windows 2000 Advances Server operating system and will contain at minimum dual 800MHz processors with 2GB of RAM.
The WHOIS service will include a web-based and HTTP WHOIS look up service as well as a port 43 WHOIS service compliant with RFC 1834. Specialized "speed bump" software will ensure that the WHOIS system is not used for unintended purposes and will ensure system stability in case of excessive querying. Any party that abuses the WHOIS access agreement will be permanently blocked out of the WHOIS system (pending ICANN review).
The WHOIS service will query the WHOIS database that will be dedicated to this service. The WHOIS database contains data replicated from the OLTP registry database and will have a latency of between 1 and 5 seconds.
Initially the port 43 whois service will operate identically to the registry's current whois. Information given out by the whois service will include, at minimum, the domain's current nameservers, and information to point the client to the appropriate registrar's whois server for detailed information. Additionally, the output will include all contact or supplemental information maintained in our database as well.
The WHOIS architecture is illustrated in Figure 10 below.
Figure 10 - WHOIS architecture
C17.9. System security. Technical and physical
capabilities and procedures to prevent system hacks, break-ins, data
tampering, and other disruptions to operations. Physical security.
Beyond the physical security already mentioned, several other measures will be taken to protect the .ORG Registry systems.
Other application and system security is discussed in previous sections.
- Internet Security System's (ISS) RealSecure Protection software will be used to provide comprehensive protection for intrusion detection and response. ISS agents will be installed on all operational systems and monitored by a third-party managed security service provider.
- Norton Antivirus software will be installed on all servers.
- Production servers will always run the latest security software and patches as the operation staff is notified immediately when a new version is available.
- Firewalls - The Registry will be protected by a set of firewalls, which only allow access based on source IP, protocol and port. For example only a specific port on the RRP/EPP servers will be opened for a given source subnet; only port 80 on the web server, port 20 on the FTP server, and port 43 on the WHOIS server will be available through the firewall. Once a connection is made through one of these ports additional authentication methods will be employed.
- Private networks will be used to isolate data layers. No public access will be given to any database server. Production and non-production networks will be isolated by VPN.
- Communications to Regional Data Centers will be travel through a VPN established at the firewalls.
- IP spoof detection and blocking will be implemented at the edges of the Operational Data Centers.
- Proprietary request throttling technology developed by eNom, that can throttle by IP, will be deployed to prevent denial of service (DOS) attacks and UDP flood attacks.
C17.10. Peak capacities. Technical capability
for handling a larger-than-projected demand for registration. Effects
on load on servers, databases, back-up systems, support systems, escrow
systems, maintenance, personnel.
The .ORG registry system proposed is designed to operate at roughly 20% capacity and therefore would be able to handle a larger than projected demand for registration. . Each application is load balanced across multiple servers. Additional scalability can easily be achieved by adding additional application servers. The database is designed to operate at between 10%-15% capacity of expected average demand. Any node can be removed without adverse performance impact. The distributed nature of the database design makes it easy to increase scalability if and when needed. Additional scalability can be achieved through additional cluster nodes. Additional servers are added when average capacity exceeds 40%. Any server exhibiting over 80% capacity is considered a "Sev 2" event (see Section C17.13 System Reliability).
Back-up systems should not be effected by larger than projected demand since backups are performed through a separate private network. Personnel requirement may increase in the case of larger than projected demand. Additional tech support personal, network operations, and development personnel may be required. The .ORG registry is operated in Redmond, WA where Microsoft is headquartered, and therefore there is an abundant of technology personnel familiar with Microsoft technologies readily available for hire.
C17.11. Technical and other support. Support for
registrars and for Internet users and registrants. Describe technical
help systems, personnel accessibility, web-based, telephone and other
support, support services to be offered, time availability of support,
and language-availability of support.
eNom will maintain a technical support group which will be available by phone or e-mail 24 X 7 and incidents will be tracked through a web-accessible trouble ticketing application. The technical support group will be ready to assist registrars and end users regarding technical and domain related issues. Support is organized on three-levels:
Level 1 support will be on-site 24 hours a day, 7 days a week. Level 2 and Level 3 support will be on-site from 8AM to 6PM PDT and on-call otherwise.
- Level 1 - Operations Support Staff. This group monitors the systems, responds to alerts and takes initial inquiries from customers.
- Level 2 - System and network engineers. This group responds to problems in the network or problems with operating systems and servers
- Level 3 - Database administrators and software engineers. This group responds to data or application level problems.
The .ORG registry will also maintain complete technical documentations on all aspect of the .ORG registry including interfacing via RRP or EPP, the domain transfer process, the renewal process, deletion process, and all other pertinent registry functions. Technical documentations will be available to registrars on an admin site in both a web-based and PDF format.
C17.12. Compliance with specifications. Describe
the extent of proposed compliance with technical specifications, including
compliance with at least the following RFCs: 954,
eNom has reviewed all RFCs specified by ICANN and will fully comply with specifications in the following RFCs:
954 - NicName/WHOIS
1034 - Domain Names - Concepts and Facilities
1035 - Domain Names - Implementation and Specification
1101 - DNS Encoding of Network Names and Other Types
2181 - Clarifications to the DNS Specification
2182 - Selection and Operation of Secondary DNS Servers
Compliance with future registry or DNS related RFC or BCP (including RRP and IDN) will occur will be done with no additional cost to the registrars.
C17.13. System reliability. Define, analyze, and
quantify quality of service.
The .ORG Registry will have an uptime of 99.5% with scheduled downtime of up to 4 hours per month. However, DNS services will maintain 100% uptime. Scheduled downtimes may be taken for major system upgrades. Minor code pushes and general maintenance will not require downtime due to the redundant nature of the system. An automated registrar communication system will be provided which will inform registrars of both scheduled and unscheduled system outages. The communication system will post outage notices by e-mail, to each registrar's admin site, and to the registry's web site.
The total uptime of the registry will be quantified as a percentage each month and corrective measures will be taken if total scheduled uptime falls below 99.5%. Reports will generally be available through a website.
To achieve this uptime monitoring occurs at several levels. Each level will have a proactive component (detecting a potential upcoming failure) and a reactive component (alerting when something has failed or is failing intermittently).
A combination of commercial software will be used to monitor the health of the overall systems. MRTG will be used to monitor bandwidth statistics. WebTrends from NetIQ will be used for hardware and Registry Application checking as well as tracking overall system uptime for SLA compliance. AppManager Suite from NetIQ will be used for database and web servers monitoring (COTS and OS level). SNMP messaging and will be integrated into the central alerting system. Enterprise Manager for SQL Server 2000 will be used for database management monitoring and alerting.
- Bandwidth - Bandwidth and packet statistics will be monitored for unexpected spikes
- Hardware - ping tests and alerts for memory failures
- Operating system - check disk space, memory and CPU usage
- Commercial Off-the-shelf (COTS) - monitor resource usage of IIS and SQL Server 2000
- Registry Application - invoke test transactions to measure response time and availability (e.g., checks)
- Data integrity - daily asynchronous checks of registry database against zone file and transaction log
- Statistical - gather ongoing data (e.g., number of hits/day) for capacity planning and reporting.
System monitoring will occur from both internal location and an outside location (via a third-party managed service provider). The outside location will monitor and log the accessibility of all services required from the outside including EPP and RRP connections, ftp, WHOIS, and web services. Internal monitoring will be done on each level of the system including server monitoring, operating system, disk monitoring, application monitoring, database and replication monitoring, and all other critical components. Functional monitoring will also be done for all major functional system. For example, a WHOIS query will be run every minute, a check command every 15 seconds, etc.
Each time the monitoring system detects a problem it will page the appropriate group of personnel. The monitoring groups will be divided into network operations, database, and functional groups such as EPP, WHOIS, etc. Upon receipt of a page, the appropriate personnel will investigate the problem that caused the page and attempt to resolve the matter. If the matter cannot be resolved it will be escalated. Due to the redundancy in the registry system, most problems will not cause a system outage or failure. The main purpose of the monitoring system is to detect problems before they escalate into system wide outages. This will be managed by varying levels of severity.
Levels of Severity
Within the operation center procedures, three different levels of severity will be used. Events will be assigned a severity which will control the priority and needed response time.
- Severity ("Sev") 1 - Most severe, immediate action required. E.g., a registrar can not connect to registry, more than one of a single type of component is down.
- Severity 2 - Medium level of severity, action must be taken within 24 hours regardless of day of the week. E.g., systems are outside acceptable SLA, one component is down.
- Severity 3 - Low level of severity, action must be taken within 2 business days (M-F). E.g., compliance with judicial mandates.
C17.14. System outage prevention. Procedures for
problem detection, redundancy of all systems, back up power supply,
facility security, technical security, availability of back up software,
operating system, and hardware, system monitoring, technical maintenance
staff, server locations.
As discussed in the previous section, robust system monitoring will detect problems before they lead into system wide outages. All registry systems are redundant and a failure to any one system will not cause a system outage.
eNom has operated its production data centers for over three years and has a proven track record for managing real-time, widely-distributed, mission-critical applications.
C17.15. System recovery procedures. Procedures
for restoring the system to operation in the event of a system outage,
both expected and unexpected. Identify redundant/diverse systems for
providing service in the event of an outage and describe the process
for recovery from various types of failures, the training of technical
staff who will perform these tasks, the availability and backup of software
and operating systems needed to restore the system to operation, the
availability of the hardware needed to restore and run the system, backup
electrical power systems, the projected time for restoring the system,
the procedures for testing the process of restoring the system to operation
in the event of an outage, the documentation kept on system outages
and on potential system problems that could result in outages.
Complete documentations describing system recovery will be maintained including a full disaster recover plan. This plan will be reviewed quarterly and tested annually by an outside third party.
System recovery procedures will depend on the nature of the problem and the personnel responsible for restoring system failure; it will also depend on the nature of the problem. The Network Operations Center (NOC) Analyst (Level 1 Support) will address request by registrars and attend to alerts from monitoring. Once a problem has been identified, it will be handled according to the Severity Level and the necessary Support Level. Hardware, operating system, firewall, router, and any equipment failures will be handled by network operations (Level 2 Support). Database, replication, and cluster failures will be handles by the database group (Level 3 Support). The development team will handle application level failures (Level 3 Support). The team lead of each group is ultimately responsible that all systems handled by the group are fully operational and recovered in a speedy manner when necessary.
An in-house inventory of spare parts and servers will be available if necessary for restoring any component of the system. Personnel will also be trained on disaster recovery procedures for the registry. Again, a complete replica of the registry will be maintained in an off site location and complete fail over of the registry to this location will take less than 15 minutes.
To secure all data and registry integrity, a third-party will pick up and retain the nightly backups in an off-site location. All data will be delivered in a secure manner to an escrow agent who will be responsible for protecting the data.
C17.16. Registry failure provisions. Please describe
in detail your plans for dealing with the possibility of a registry
failure due to insolvency or other factors that preclude restored operation.
If the .ORG Registry Operator becomes non-operational due to insolvency, eNom will maintain operations in good faith until ICANN determines a corrective action. In the event that eNom becomes insolvent, all data (including backups) are available per the sub-contracting agreement with the registry operator.
As mentioned, the zone files will be sent to Verisign every 24 hours for the first year as a secondary backup mechanism. Subsequent years may be negotiated with Verisign if need be.
C18. Transition Plan. This should present a detailed
plan for the transition of the Registry Function from the current facilities
and services provided by VeriSign, Inc., to the facilities and services
you propose. Issues that should be discussed in this detailed plan include:
This section will describe in detail the plan of seamlessly migrating the .org registry from Verisign Inc. to eNom.
C18.1. Steps of the proposed transition, including
sequencing and scheduling.
Three main areas will be addressed in the transition.
- Legal - updated RRA's with all registrars, policy updates.
- Procedures - for becoming a .org registrar, acceptance plans, etc.
- Website - publish information to the public including status of transition and test data.
- Administration - creation of administration site for registrars.
- Hardware procurement and configuration.
- Technical - The technical transition of the .org registry will be accomplished in six phases. These phases include registry preparation, development and testing, deployment, validation, registrar transition, and registry cutover. The implementation and timeline of each phase is shown in Figure 11 below is detailed more fully in the following paragraphs.
Figure 11 - ORG Transition Plan Schedule
The preparation phase will include a delivery of all .org registry data structures and formats by the Verisign registry. eNom will analyze these existing data structures and based on its analysis, will design the new .org registry data structure and formats. The new registry data structures will incorporate data design to support the RRP as well as the EPP protocols. This planning phase will also include a mutually agreed upon XML data structure for the snapshot data exchange and RRP transaction data exchange between the registries.
The development phase will consist of the development of all components that will ultimately make up the new .org registry. Most components will be developed in parallel in order to meet development deadlines. All systems developed during this phase will ultimately become the permanent development platform of the registry. The registry components that will be developed during this phase include database, RRP/EPP interfaces, DNS, WHOIS, registry web site, registrar administrative web site, FTP, zone file generation, registrar reports, transfer process, deletion process, auto-renewal process, support and monitoring tools, test suite, and documentations.
Once all components are developed and integrated, complete and thorough functional testing will be performed. The testing will be done through a customized test suite that will perform all RRP commands, WHOIS queries, and DNS queries that may come through the system during normal registry operation.
Once development and functional testing is complete, the registry's OT&E and production environments will be built out in the deployment phase. This will include building all hardware and loading all software required to power the registry. All firewall and load balancing systems will be deployed and all proprietary software will be installed. Once the OT&E and production environments are built out, the registry's standardized development cycle will be implemented. All code pushes and upgrades will be propagated from development to OT&E to production in accordance with this standardized development cycle. Once OT&E and production environments are set up, an integrated system testing will be performed.
All components of the registry system will be fully tested by trained personnel following documented procedures. This includes network and security hardware, external services, software, and related internal services. Documentation of a complete test plan for each component will be written and reviewed by qualified personnel. Tests to be performed will be enumerated to ensure they are repeatable and measurable. Personnel will be trained appropriately on testing procedures and auditing of progress and results. Progress and results will be documented and published internally to monitor progress and quality levels. Tests will consist of:
Network hardware will be tested from both an external network and internal network. Components to be tested include base Internet connectivity, routers, switches, and machine connectivity. Load balancing configuration, logic, and fail over testing will be performed. Configuration will be documented, reviewed and archived.
- basic and extended functionality
- request routing
- failure testing and recovery
The load balancing hardware will also provide SSL acceleration. Configuration and validation of certificates will be performed and documented. Pertinent certificate information will be stored in a secure manner accessible by appropriate personnel. Testing will be performed to verify certificate validity period, subject, and issuer. Private key security will be reviewed. Private keys will be encrypted and secured with a suitable pass phrase. Only key personnel will know the pass phrase. Tests will be performed to certify mutual authentication is performed on connecting clients. Client certificates will be tested for validity, issuer and appropriate subject information.
Firewalls will be extensively tested to verify configuration, integrity and fail-over capability. Verifications will be made that connection limitations are enforced, IP ranges are appropriately opened for clients and connections are refused outside that range. All ports will be scanned to verify only appropriate ports are open and connections are only accepted from those that are specified in the firewall rules. Hardware and power failure will be simulated to test redundancy and fail-over.
Because the transactional data will be time stamped, relative performance evaluation (versus current registry) will occur.
After system testing is complete and the OT&E and production environments are deployed, the next phase will consist of registry validation. The registry validation process is designed to ensure that the new registry processes are consistent with the existing registry. The first portion of this validation testing will consist of transactional testing. The Verisign registry will deliver a complete and time stamped data feed to eNom. After 24 hours, Verisign will deliver a complete log of all .org transactions it received during this time frame as well as a complete data feed (only data modification commands) as of the last transaction. eNom will run all transactions through its RRP interface and then will make a comparison of its database to the data feed sent by Verisign. An exact match will validate that all transactions are being handled in an equivalent manner to the Verisign registry. Any data discrepancies will be investigated and the process will be repeated until the post snapshot data is an exact match. This is illustrated in Figure 12.
Figure 12 - Registry Data Migration/Validation
The next validation test will be performed on the DNS services. After a complete data feed from Verisign, all data will be synchronized to eNom's DNS servers. eNom will then generate a list of 20,000 random .org domain names from the Verisign zone file. eNom will then perform DNS queries on each domain using both Verisign's and eNom's name servers. The results will then be compared and any differenced will be reconciled. Any differences that are due to a recent change will be reflected in the "updated" date and will not be considered as inconsistent results since Verisign's DNS servers contain data that is delayed by 12-24 hours.
A similar validation will be performed with the WHOIS service. WHOIS queries will be performed on 20,000 random .org domain names and the results will be compared. This validation test will be repeated as necessary until eNom's WHOIS returns results that are consistent with the Verisign WHOIS. The final validation will be performed on the FTP reports and zone file generation. Automated registrar reports will be generated automatically that are comparable to those generated by the registry today. The report content will be verified and compared to samples provided by Verisign. Similarly, a complete zone file will be generated and the structure will compared to the one published by the Verisign registry.
The next phase in the registry transition will involve registrar transition from the Verisign registry to the new .org registry. This phase will involve procedures to validate existing ICANN accredited .org registrars and set each up with an account and access permission. Each registrar will be allowed to connect to the OT&E environment for test 60 days prior to registry cutover. Registrars will also have access to their administrative and ftp sites in advance of the cutover date.
The final phase in the registry transition will be the registry cutover. The registry cutover plan is described in section C18.2.
C18.2. The duration and extent of any interruption
of any part of the Registry Function.
The key operations for the cutover transition from Verisign will have been run through many times during development and testing of the new registry system. The estimated required time to transition will be well known and we anticipate a blackout period of four hours to perform the final data transition.
Seven days prior to the blackout period the new host and port numbers for the production environment will be communicated to registrars. They will be invited to connect to the production environment to resolve any connectivity issues prior to the going live of the new registry. No transactions will be allowed other than the DESCRIBE command to maintain connectivity. At the designated time of the blackout period, all registrar connections will be dropped and further connections will not be allowed until the published start time.
At the beginning of the blackout period, Verisign must stop accepting .org as a valid TLD. Verisign will then produce a final snapshot of their database data, compress it and make it available. The data will be downloaded and imported as performed many times during testing. DNS servers will have been set up and configured previously with test zone information cleared from all DNS servers.
Once imported, a final test suite will be run. This test will verify that all data was imported successfully and completely. Integrity testing will also be performed on the data to verify any relationships are valid.
Replication from the master database will proceed to propagate zone information to all DNS server locations and Whois server locations automatically. Initially DNS queries will continue to be handled by Verisign but our DNS servers will continually be updated with new zone information and testing will proceed on the live data to verify its integrity. The DNS transition is described later in the document.
A documented checklist of systems will be signed off. All systems will be verified and no changes will be made related to firewall or other systems affecting connectivity that may prevent or invalidate registrar connectivity. Precisely at the blackout end time, connectivity will be restored to registrars and normal registry operations will begin.
All transactions will be logged. This will include commands received from registrars and responses given. They will be in the same format provided by Verisign during the testing phases. This transaction information can be used to analyze any issues that may arise and aid the technical team in correcting any problems.
In the worst case, the new registry can be shut down and the transaction logs can be sent to them to perform a transactional update to their final database snapshot and registry operations could be temporarily restored at Verisign.
Upon successful registry startup, zone file information will be generated at appropriate times and Verisign's root servers will be updated with zone information. eNom's DNS servers will be continually updated. Within 30 days after final verification of eNom DNS functionality, the host name for the root server will be changed to reflect eNom's DNS IP addresses. Zone information will continue to be sent to Verisign's DNS servers for backup so in the event of complete failure root functionality can be restored as before.
C18.3. Contingency plans in the event any part
of the proposed transition does not proceed as planned.
Verisign has offered to continue DNS hosting for 1 year after cutover. We intend to continue sending zone files to them for updates to their servers even though the root servers will not point to them as the authority. At any time on the authority of ICANN, the root servers could change the delegation back to Verisign.
eNom will capture all raw transactional data (excluding checks) to and from the RRP service. This will be stored in the same format that Verisign has provided the data to eNom. This will allow eNom or anyone else (presumably including VRSN since it is their format) to recreate a current state of the data or detect errors or omissions. This data will be retained for the first year to ensure stability.
In the event that cutover cannot occur on January 1, an extension should be given to the existing Verisign contract.
C18.4. The effect of the transition on (a) .org
registrants and (b) Internet users seeking to resolve .org domain names.
Both .org registrants and Internet users will be minimally affected by the registry transition. Registrants will need to apply for a registrar account at the new registry and change their software to point to the new RRP servers any time from up to 25 days prior to switch. Registrants will not need to change the way they run commands or any other process.
Internet users will be unaware of the switch. Domain name resolution will remain uninterrupted and internet users will be effected only by the inability register new .org domains or modify existing ones during the scheduled down time. Internet users may not know that a switch is being performed as the current .org registry schedules frequent downtimes.
C18.5. The specifics of cooperation required from
There will be several items required from Verisign in order to make the registry transition as smooth as possible. Verisign, Inc. will need to provide eNom with its current data structure for the .org registry. This will include a description of all registry related data that it currently stores. Verisign, Inc. will also need to develop an XML data structure and a mechanism to export all registry data into this XML structure. It will then need to zip and ftp the file to an eNom server. This process will need to be repeated several times to ensure data transfer integrity and to practice the process.
If not already doing so, Verisign, Inc. will also need to make a modification to its RRP interface for .org domains to log transactional traffic. This modification would allow eNom to route RRP transactions to its systems using the registrar's user id only. Existing passwords of registrars of the Verisign registry will not be shared with the new registry for security reasons.
Verisign, Inc. will also need to be ready to continue providing registry service for .org domains for up to 6 weeks past scheduled switch. This will be in case registry migration does not proceed according to plan.
It is absolutely critical that Verisign provides information and cooperates in a timely manner. Milestones will be established and public. We suggest that any extension to the current contract to compensate for late delivery by Verisign have substantial financial penalties to keep everyone's interest aligned.
C18.6. Any relevant experience of the applicant
and the entities identified in item C13 in performing
eNom has vast experience in designing, building, and operating world class database systems. Its current registrar database operation is built in a very similar fashion to the proposed .org registry implementation offered in this proposal. During its history, eNom has had to migrate to newer versions of operating systems, databases and networks. eNom employs very experienced personnel that have participated in very large-scale migrations (i.e., hundreds of systems) for Fortune 100 companies and this experience has provided for a stable data environment.
C18.7. Any proposed criteria for the evaluation
of the success of the transition.
In summary, for a successful transition to take place, the following must occur:
The extensive validation testing of comparing DNS responses, zone file outputs, whois data and request/response data will ensure that the systems are working before cutover is attempted. Quantitative milestones will be established and reported on as the transition project unfolds. To aid in oversight and visibility, all of this data will be publicly available via our registry website.
- Complete transition process will be completed in less than 4 hours.
- Registrars will not have to make any code changes to register and manage .org domain names.
- .org name resolution will be uninterrupted.
C19. Please describe in detail mechanisms that you
propose to implement to ensure compliance with ICANN-developed policies
and the requirements of the registry agreement.
- If the Registry Service Provider is not complying with the ICANN policies, their initial term as the Registry Service Provider will not be extended. Additionally, non-compliance with ICANN policies is grounds for termination of the .Org Facilities and Services Contract and the Service Provider can be replaced with one that does comply.
- A dedicated employee from The .Org Foundation and from the registry service provider will stay informed of ICANN polices and IETF activities by attending their meetings and by participating on the email lists and formulation of new policy.
- The registry service provider cannot, due to its contract with the operator, unilaterally offer non-complying services because it is prohibited from offering any services that are not in the registry agreement.
- To ensure that the registry remains in compliance on technical issues such as service level agreements, as previously stated in the technical section, automated measurements and reporting will be implemented to provide a feedback mechanism that will be used to correct the problem and bring the system back into compliance. A suite of regression tests will also be maintained to ensure that new code continues to remain in compliance.