Proposal Home | Attachments | ||||||||||||||||||||||||||||||||||||||||
Proposal by Questions: |
||||||||||||||||||||||||||||||||||||||||
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | C20 | C21 | C22 | C23 | C24 | C25 | C26 | C27 | C28 | C29 | C30 | C31 | C32 | C33 | C34 | C35 | C36 | C37 | C38 | C39 | C40 | C41 | C42 | C43 | C44 | C45 | C46 | C47 | C48 | C49 | C50 | |
||||||||||||||||||||||||||||||||||||||||
C17. Technical plan for performing the Registry Function. This should present a comprehensive technical plan for performing the Registry Function. In addition to providing basic information concerning the proposed technical solution (with appropriate diagrams), this section offers the applicant an opportunity to demonstrate that it has carefully analyzed the technical requirements for performing the Registry Function. Factors that should be addressed in the technical plan include: C17.1. General description of proposed facilities and systems. Address all locations of systems. Provide diagrams of all of the systems operating at each location. Address the specific types of systems being used, their capacity, and their interoperability, general availability, and level of security. Describe buildings, hardware, software systems, environmental equipment, Internet connectivity, etc. The Registry Advantage facilities and systems are described in detail throughout Question C17, but are summarized in the remainder of this section. All major elements of the registry infrastructure are operational today. Upon the award of the .org registry to DotOrg Foundation, Registry Advantage will also deploy services at four additional locations. No single point of failure exists throughout the registry infrastructure, and multiple layers of redundancy ensure that even in the event of multiple components failing simultaneously, the system will continue to function normally. This approach allows Registry Advantage to deliver levels of reliability normally associated with mission-critical applications such as core telephone networks and trading floor environments. Locations of Facilities Registry Advantage will provide services from eight geographically dispersed facilities worldwide. Four of these facilities are operational today. The registry architecture includes two general types of facilities: "subscription" and "publication". The "subscription" facilities, which are provided at two locations, support registration services (including the SRS and the registrar's Account Management Interface) as well as Whois and DNS services. The other six locations provide a read-only "publication" interface to registry data using the DNS protocol. In order to provide a highly reliable registry infrastructure, Registry Advantage will operate core registration services in two locations. Each of these locations will provide Shared Registry System (SRS), database operations, account management, DNS and Whois services. One of the two locations is currently operational at an AT&T data center located at 811 Tenth Avenue in New York City. For the purposes of this document, this location is referred to as the “primary site”. Registry Advantage will also duplicate the core registry functions at a “secondary site” in Asia, which will be built out and deployed upon award of the registry to DotOrg Foundation. Registry Advantage has selected the Tokyo, Japan region as a likely location for the secondary site, although the final location will be determined after the selection of the successor registry operator. In the event of a failure at the primary site, or other unforeseen circumstances, core registry functions will be transferred from the primary site to the secondary site. Only one site will host the core registry functions at any given point in time. In addition to the primary and secondary sites, Registry Advantage currently operates DNS services at the following facilities:
Figure 17.1.1 Facilities Locations Further information about each of these facilities is available in Attachment O. Additional DNS facilities will be built out in the event that DotOrg Foundation is selected. Although the final locations of these facilities will also be determined in the future, the intent is to locate one in each of the following areas:
Facilities will be selected with the intent of maximizing network diversity and performance. Typically, facilities will be located at or near major Internet exchange points. Building Security, Environment and Connectivity All Registry Advantage data center locations are subject to a rigorous set of requirements relating to security, physical plant, network connectivity, and policies and procedures. These requirements include:
Capacities of Systems Registry Advantage has designed its systems to ensure maximum reliability, performance and security for the .org registry. Its existing database and registry systems already support over five million domain names and can process at least one hundred thousand new registrations per day. Registry Advantage has built systems that can support a peak registration capacity of over 200 new registrations per second and up to 1000 check queries per second. These current levels are five times (5x) the expected maximum ‘add storm’ peaks based on analysis of publicly available data. (See section C17.10 for details.) Furthermore, the DotOrg Foundation is prepared to commit to service level agreements that meet or exceed the ‘best-of-breed’ based on an analysis of the Service Level Agreements in place for all of the major gTLDs. Availability for DNS will be meet 100% uptime, while Whois and SRS will meet or exceed an uptime of 99.99% (see sections C17.13 and C28 for definitions and details). Registry Advantage, on behalf of the DotOrg Foundation, has engineered its systems to meet or exceed those guaranteed service levels. Systems Overview The transactional hub of the Registry Advantage infrastructure is an Oracle database running on a Sun Enterprise 6500 server. Storage for the database is provided by an EMC Symmetrix 8530 storage array. McData Sphereon 3016 (ES-16) FC/FA fibre channel switches provide connectivity between the database and its storage, as well as to a Spectra Logic AIT3 tape backup library. Additionally, network-attached storage for front-end servers is provided by a Network Appliance storage array. Front-end registry functions such as Shared Registry Service, DNS, and Whois run in a Linux environment on clusters of IBM X330 1U servers. Requests are load-balanced among the hosts in these clusters by Big-IP load balancers from F5 Networks and with the load balancing features of Extreme Networks Summit Ethernet switches. Additional network components include Cisco routers and switches, and Netscreen firewalls. This baseline infrastructure is deployed at Registry Advantage’s secondary site in Asia, as shown in Figure 17.1.2 below.
Figure 17.1.2 Secondary Data Center In addition to this baseline infrastructure, additional components are added at the primary site in New York to provide additional redundancy and reliability. A duplicate Sun Enterprise 6500 server acts as a hot standby database. Also, an EMC ClarIIon 4700 storage array acts as alternate storage for both database systems.
Figure 17.1.3 Primary Data Center Network Architecture The network infrastructure is designed to be robust, fast and scalable. The design is based on standard network architecture principles that have been well tested and found to be very reliable. It is designed to have no single point of failure, and for the network to continue operating even in the event of failure in multiple components. The network consists of several zones: border, core and access. At the border, two Cisco 7206 routers attach to upstream ISPs via Fast Ethernet and DS-3 interfaces. Each router operates a Border Gateway Protocol (BGP) peering session with the ISPs that attach to it. BGP is used to make route announcements for all public IP addresses used within the location, as well as to receive routing updates from the ISPs. The two routers are connected to one another via a gigabit Ethernet link. For redundancy purposes, the routers use Cisco’s Hot Standby Router Protocol to provide for automatic failover capabilities in the event of a failure. The routers have public IP addresses, but they will make use of extensive access lists to serve as a preliminary layer of network security and prevent large streams of malicious packets from reaching the firewalls. Also in the border zone are the firewalls. Registry Advantage uses Netscreen firewalls. These firewalls have automatic failover capabilities, which allow a standby unit to assume active operations in the event of a failure of the currently active unit. All traffic passes from the routers through the firewalls before moving to the internal network. The firewalls will use public IP address space on their outside interfaces and private IP address space on their inside interfaces. In the core, Registry Advantage operates two Extreme Networks Summit 5i non-blocking gigabit switches. These multilayer switches will provide core switching (layer 2) and routing (layer 3) functions at wire speed (up to one gigabit per second on each port). Network paths will exist from both of these switches through the firewalls to the border routers. Additionally, these switches will be connected via redundant gigabit links on different ports.
The other service provided in the core zone is load balancing. Two Big-IP load balancers from F5 networks will be used for this function. Each of these devices will attach to one switch via two gigabit Ethernet connections. (The two connections do not provide redundancy; one is considered the “outside” interface by the Big-IP and will use public IP addresses, and the other is considered the “inside” interface by the Big-IP and will use private IP address space.) Requests from outside the network for public services such as Whois or the SRS service will actually be routed to addresses on the “outside” Big-IP interface. The request will be processed by the load balancer and handed off to an appropriate internal system. The Big-IP can use a number of algorithms to determine the best server to hand an individual request to, but generally requests are handed to individual hosts in a cluster on a round robin basis. Note that the load balancer will only attempt to process requests destined for legitimate public services. No attempt is made to translate packets and move them into secure internal systems such as the database or NFS storage arrays. The Big-IPs will use a high availability configuration. At any given time, only one of the switches will be active. If a Big-IP or one of its network links should fail, the other load balancer will become active and take over all virtual IP addresses used to provide load balancing functions. Finally, the access layer will consist of a number of Extreme Networks Summit 48 switches. These switches will connect to each of the Black Diamonds via a gigabit Ethernet connection, and will use the spanning tree algorithm and Extreme’s Standby Routing Protocol to prevent routing loops and allow for redundancy in the event of a link or core switch failure. Individual hosts will attach to the Summit 48 switches via Fast Ethernet. Hosts requiring gigabit Ethernet access to the network will attach directly to the core switches Logical Network Architecture The data flow between the three architectural layers described above and the Internet can be additionally described in terms of the logical access between these layers. Systems that are logically associated with the access layer are only accessible via private (RFC 1918) address space. Additionally, the VLANs in the access layer only allow specific VLAN and IP source traffic through the use of access restrictions on the switches. This effectively limits inbound connections to only systems originating from the core layer. Similarly, systems in the access layer are only able to directly communicate with other systems in the access layer, or systems in the core layer with private addresses. Similar restrictions exists between the core layer and the border layer, where connections originating form the Internet may only reach systems between the border and core layers (e.g. the BigIP). This effectively forms two virtual demilitarized zones (DMZs) within the physical network architecture.
Figure 17.1.5: Satellite Network Design At remote DNS locations, the network is significantly simplified. At each of these locations, Internet connectivity will generally be from a single ISP via a 100-megabit fast Ethernet connection (or, in some circumstances, possibly a 10-megabit Ethernet connection) directly into a Summit 48i layer 2 and layer 3 switch. This switch is capable of performing wire-speed routing and switching functions, and all equipment at the location will attach directly to the switch. The switch also has software that enables it to communicate with the ISP using BGP to propagate route announcements into the global Internet. Additionally, the switch can also provide basic load balancing services required to intelligently distribute requests across multiple hosts in a cluster, in a similar manner to the Big-IP load balancers described above. A Netscreen firewall is also attached to the switch. VLANs are created on the switch in order to separate the network into inside and outside portions, and allow the firewall to effectively protect the servers on the network. As with DotOrg Foundation’s Cisco 7206 routers, access control lists will also be applied on the switch to restrict access to the network even before packets reach the firewall. More detailed descriptions of key components of the storage infrastructure are included in Attachment M3. Specifically, documents are attached which describe:
Database Systems The Registry Advantage database is an Oracle 8.1.7 server instance tuned for both online transaction processing (OLTP) and decision support system (DSS) functionality. Registry Advantage database operations make use of Sun Enterprise 6500 servers, a proven high-availability and high-performance hardware platform. In order to provide maximum reliability, Registry Advantage will operate three database servers in two locations, any of which can rapidly assume the function of the active database. Primary Site The primary data center houses two of the three Oracle database servers in the Registry Advantage high availability architecture. These are configured in a tightly coupled high availability cluster, each with redundant access to a pair of synchronized high availability storage arrays. These Oracle instances are instrumented and actively monitored in real time by dedicated Oracle database administrators and enterprise management software. Secondary Site The secondary data center will house the third Oracle database server in the Registry Advantage high availability architecture. This server will be dual attached to a single McData FC/FA switch, providing access to another high availability storage array, also synchronized with the active database server at the primary site. Database operations are described in detail in our response to Question C17.3 below. Storage Systems Registry Advantage has a state-of-the-art storage network comprised of industry leading 100% uptime guaranteed high performance storage frames in a fully redundant SAN Fibre Channel Fabric, as well as clustered Network Attached Storage (NAS) devices. All storage hardware is provided by market leading vendors. Primary Storage At both the primary and secondary location, the primary storage device is an EMC Symmetrix 8530 that is attached via fibre channel to a McData redundant switch pair. This provides a highly available storage sub-system which is achieved through EMC’s redundant platform design, multiple fiber channel interconnects, power path load balancing, and Veritas volume manager / file system. The EMC Symmetrix is configured to provide the following basic storage components:
This configuration allows for inter-day snapshots and archive to the secondary data storage, hundreds of thousands of I/O operations per second, and the fastest random I/O performance in the industry. This also comes with a 100% uptime guarantee from EMC. Network Attached Storage A Network Appliance 740 cluster in a highly available configuration addresses all network attached storage needs. This storage will be accessed using NFS and allow all currently specified nodes to connect. In addition to the compressed database copies and archived logs (staged as part of the Global Application Data Synchronization Framework), this device also supports the Information Request System redundant masters and other front end application systems (see the application section for details). Network Appliance is the market leader in NFS filer appliance technology and the clustered F740 filer is a state-of-the-art high availability NFS filer, utilizing the latest DataOnTap filer operating environment. It is capable of dynamic resizing of volumes, multiple point-in-time snapshots per volume, automated failover, and proactive management using industry standard tools, such as SNMP and Secure Shell. The proprietary WAFL File System is capable of extremely large numbers of files per directory and large files with exceptional performance. Alternate Storage An EMC CLARiiON 4700 RAID subsystem, dual fabric attached, provides an additional 500GB+ of alternate direct attached storage. This hosts the standby database copy, and the archived logs and has the following characteristics:
The EMC CLARiiON 4700 operates within the same order of magnitude of the primary storage device, handling over 100,000 I/O operations per second and achieving over 200MB of throughput. Backup All data storage is backed up via Veritas Netbackup to a Spectra Logic Spectra 12000 AIT3 tape library, capable of storing over 30TB of data in a single 120 AIT3 tape set, and can backup over 800GB of data per hour. The backup server has access to the primary data storage over a direct fiber channel connection. The secondary data storage is accessed as a netbackup client over the local gigabit Ethernet connection. Full (level 0) Database and System backups run daily. Retention policies store weekly backups for one month and monthly backups for twelve months. Component failure in the managed storage The Symmetrix is a highly redundant subsystem with advertised “zero downtime” as part of its feature set. It can withstand any single component failure with no loss of service, and can be configured to have available sparing to tolerate multiple component failures for disk drives. Therefore, to the extent configured, component failure (single in most cases, or possibly multiple in the case of extents) presents zero downtime, zero data loss, and zero risk to the operation of the Oracle database. Likewise, the NetApp cluster is similarly redundant and any single redundant component failure will result in zero downtime and zero data loss. Since the Oracle database is not dependent on the alternate storage for operation, failures in this unit have no impact on the production system. However, the unit is a redundant RAID subsystem and can withstand single component failures. This would only matter in the event of a total Symmetrix failure, and only for as long as it takes to restore the Symmetrix to full operation. Recovery If the primary storage is failed or corrupt, the network attached storage device has copies of both the database volumes and the archived logs. Additionally, the alternate storage device has a point in time copy of the database and all subsequent logs, keeping the standby database copy in a near current state. Application logs can be used, if need be, to replay any transactions not yet applied to the standby database copy. In the event the network attached storage fails, no downtime will be experienced. However, the secondary copies of the data and logs will be unavailable and no new secondary copies will be stored until the network-attached storage is available again. Secondary Site At the secondary site, the storage environment is simplified. A single storage array attaches to a McData switch fabric. No alternate storage is provided. Network attached storage is once again provided by a Network Appliance storage array. More detailed descriptions of key components of the storage infrastructure are included in Attachment M2. Specifically, documents are attached which describe:
Application Servers and Software All applications are served from clusters of high performance Intel-based systems that run the open-source GNU/Linux operating system, tuned to satisfy the demands of the registry environment. The application software was written by Registry Advantage to meet the needs of high performance, reliability and security. The proprietary DNS application service, in particular, provides security advantages over alternatives such as the open source BIND service. Key features of the application servers include:
Application Servers at the Primary Site The following applications are supported by the IBM X330 GNU/Linux platform:
Figure C17.1.6: Application Software Architecture Additional Services Supported at the Primary Site In addition to the primary application clusters, DotOrg Foundation also maintains dedicated systems and network management servers, including Host and Network Intrusion Detection Systems (H/NIDS), as well as a complete set of development and testing servers, and a fully functional replication of the production environment for Quality Assurance Testing. Supported Applications at the Secondary Site The secondary site will host a complete replication of the production application server pools. Each application cluster will be replicated with N=N redundancy, ensuring the secondary site is fully capable of operating the registry at any time. The redundancy in the number of 1U servers and network interconnects, as well as the load balancing capabilities of the primary site will all be fully present at the secondary site. Additionally, as the primary site expands, the secondary site will expand in lock step, maintaining the reliability of the architecture. Supported Applications at the Satellite Locations At this time, Registry Advantage maintains 3 satellite locations, in addition to the primary and secondary sites. These locations support three additional DNS Points of Presence (PoPs), each with its own cluster of both BIND and proprietary DNS servers. Like the primary site, these satellites use the advanced layer 4 load balancing capabilities of the Extreme Networks Summit 48i switches for high performance and availability. The geographically distributed POPs may also be load balanced globally using an implementation of RFC 1546 "anycasting", leveraging the stateless "best effort" nature of the DNS.
|
||||||||||||||||||||||||||||||||||||||||
<< Previous Question | Next Question >> |