[registerOrg .org Proposal]

Registry Advantage Global Application Data Synchronization Framework (GADSF)...................1

Introduction

As Registry Advantage extends the enterprise to every corner of the globe, the complexity of application data synchronization grows. Traditional data replication techniques do not scale beyond the private network. And even in a cloud of private circuits and interconnects, the best commercial long haul replication peaks at the thousands of kilometers mark. [1] ^{,
[2]}

If we are to succeed in maintaining data integrity, availability, and reliability, we must overcome several obstacles:

These objectives are impossible to achieve with today's available technology, regardless of cost, in a generic, commercially viable way. Fortunately, we do not need to produce a generic, commercial solution. GADSF is a framework of software modules and protocols that specifically target our data replication requirements.

Replication Requirements

Our replication requirements are specific to the repository and directory services infrastructure currently deployed at our primary data center. This includes Oracle 8i data files and transaction logs, as well as application logs for Extensible Provisioning Protocol (EPP), Simple Registration Protocol (SRP), Generic E-mail Parsing System (GEPS), and Account Management Interface (AMI). Additional application interfaces to the repository and directory services infrastructure are likely to be added in the near future as the product offering grows.

The rest of the data, including DNS and WHOIS distributed caches, can be built up from scratch (if need be) from the recovered repository in under 70 minutes. DNS in particular, is already globally distributed and does not rely directly on the primary site for continuous service -- only for updates. This reduces the replication requirement problem space significantly.

In assessing the problem at hand, and searching for solutions, we reviewed commercial products from Veritas, Hitachi Data Systems, HP, EMC, Nishan Systems, and CNT, in addition to other internal systems (OLBCP). The Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) we considered are described in the table below.

In order to meet our ideal RTO/RPO objective, the only viable approach is a custom, highly focused solution that targets the problem space specifically. GADSF was designed from the ground up to meet these objectives and overcome the obstacles that prevent other solutions from working reliably.

Architectural Overview

The GADSF is a multi-component framework that enables the reliable distribution of data files to a large number of remote locations over unreliable, low bandwidth, high latency networks. It has the ability to bring any DSP up to date from any previous point in a Data Synchronization Channel Instance (DSCI). It also has the ability to integrate external triggers at either end of the DSP Transport (as part of the Data Source Interface and Target Delivery Interface), to tightly couple data recovery scripts and procedures to the replication process.

Rather than trying to create a generic data replication solution, GADSF specifically targets the data replication requirements of the Registry Advantage repository and directory services infrastructure. This enables us to make certain assumptions about the kind of data being distributed, and how it is used to ensure reliability, availability, and integrity at the application level.

The GADSF was specifically designed to replicate data that can be moved incrementally, in small chunks, over the wire to the interested DSPs. Oracle transaction logs, and IRS application logs are precisely this type of data.

Data Source Modules collect log files for replication into Data Replication Packages by pushing them into a Data Synchronization Channel (DSC) queue, where they are assigned a unique validation hash, DSCI sequence number, and a global DSC identifier. DSC identifiers are globally unique to help ensure the correct files are in the correct queues across DSCIs. The sequence number is DSCI specific to simplify manual DRP reconciliation between DSPs, although the GADSF has robust queue management built into it already.

Data Synchronization peers communicate using TCP. The DSP hosting the DSCI listens to a well known port for other DSPs to register interest in the channel. It keeps a list of interested DSPs and their last known state, and allocates a separate thread to handle communicating with each. At each queue update cycle, the DSP hosting the DSCI communicates with all DSPs registered to that channel on a separate well known port.

There is no inherent difference in the roles DSPs play in the synchronization scheme, so a remote DSP can register interest in one DSCI, while hosting another. This is basically a point to point publish and subscribe methodology.

There is currently no discovery protocol implementation in the GADSF, so DSPs have to be configured to publish or subscribe to a DSCI. Additionally, all communications are secured with over the wire encryption and each channel requires an explicit list of allowed subscribers. Cryptographic keys are used to authenticate channel subscriptions and implement symmetric encryption over the wire.

Once a DSP successfully subscribes to a DSCI, it receives regular DRP updates from the publishing DSP. Each update includes the interval for the next update message. Publishers send update messages containing lists of DRPs to be processed by the subscriber. This list may be potentially empty. Any DRPs received by the subscriber are processed by the Target Delivery Module associated with the DRP. This is basically the compliment of the Data Source Module, and the DRP format is self describing to simplify TDM associations. The required TDM merely needs to exist on the DSP and it is automatically associated with the DRP that requires it.

TDMs can also be configured with external triggers, which are useful for tightly coupling the recovery process to the data synchronization process. For example, an Oracle transaction log file can be distributed to a DSP, where the TDM can call a script to apply to log to an Oracle instance running in recovery mode on the DSP.

If the subscriber does not receive an update message from the publisher within the interval specified in the last update message, an event is created using the Event Notification Service. This service interfaces with an enterprise systems management framework to report the missed update. Currently, the only supported framework is Register.com's proprietary Ops2 platform. However, it is trivial to add support for additional platforms, such as Tivoli, BMC Patrol, HP OpenView, etc.

Likewise, if the publisher is unable to communicate with a subscriber, an event is created using the Event Notification Service. Other conditions in which an event is created include any situation where the DSPs cannot talk to each other. This may be due to a network outage, or incorrect encryption keys, etc.

Core Components

GADSF is comprised of several related components and modules. At the core of these are the Data Distribution Queuing Engine, the DSP Transport, the Data Source Interface, the Target Delivery Interface, and the Event Notification Service. Each of these is described below.

Data Distribution Queuing Engine (DDQE)

The DDQE is the main component in the GADSF and provides both the queuing functionality as well as the sequencing and global identifiers. Data Synchronization Channels are instantiated by the DDQE component of the DSP. This involves initializing the channel with a unique identifier and an initial sequence number. A channel may also be instantiated in recovery mode, in which case, the initialization recovers the previous unique ID and subscriber state information. These channels are then available for publishers and subscribers to communicate with. DSCs may be re-initialized at any time (see below), which effectively creates a new instance of the DSC.

As soon as a DSP has an active DSCI, subscribers may register their interest by sending a subscription request. This request includes the subscriber's key, and their current state (last DSCI subscribed to, last DRP sequence processed, a potentially empty list of DRP sequence numbers to be re-transmitted, interval request). The DDQE then authenticates the subscriber, creates or updates the associated state information, and creates a DSP Transport thread for the subscriber.

Publishers must also authenticate to get access to the DLCI before it can push any DRPs into the queue. Once a publisher authenticates itself, the DDQE creates associated state information, and starts a DSI thread to communicate with the DSM agent.

The DDQE is also responsible for throttling the queue and maintaining the queue length and state. As DSM agents publish DRPs, the queue may become overrun with data. In this case, it will create an event using the Event Notification Service, and pause all DSI trheads on the DSCI until the queue drains. If a particular DSP is slow or unresponsive, the DDQE will mark it as down, create an event using the ENS, and clear the queue of any unsent DRPs. The DSI threads will be taken off pause at this time, and the DSM agents will send any backlogged DRPs. The DSP may re-subscribe to the DSCI later, but will need to complete a channel recovery before getting a DSP Transport thread assigned.

As DRPs are sent to subscribers, they are marked as sent in the subscriber state (see the "DSP Transport" section below). When all current subscribers have received the DRP, it is moved to a special archive area. If new subscribers join the channel late, they must complete a channel recovery before normal subscription service can begin. This recovery involves creating a special DSP Transport thread that continuously send all outstanding DSPs. If new DRPs are archived during this period, these will also need to be sent by the special recovery DSP Transport thread. Once the subscriber is up to date, the associated state is updated, a normal DSP Transport thread is assigned and normal subscription operations begin.

A DSCI may be re-initialized at any time. If a DSCI is re-initialized, the DDQE will pause publishers and subscribers. DRPs already scheduled for transmission will be transmitted and archived, then the archive area will be marked as stale. Stale archives can be removed at any time. All queued DRPs left will be resequenced and stamped with the new DSCI global identifier. All subscribers will be sent an "end of subscription" message, causing them to re-subscribe to the channel under the new instance. DSI threads will be taken off pause at this time, and again, DSM agents will transmit any backlogged DRPs. These will now be sequenced and stamped in the new DSCI.

Data Synchronization Peer (DSP) Transport

This is the part of the DSP that initiates network connections to other DSPs. If a DSP is hosting a DSC, each subscriber DSP will have a separate DSP Transport thread assigned to it. This thread will handle sending update messages and receiving update responses from the remote DSP. When the response is received, it will contain the remote DSPs current state.

If the update response message state does not indicate the DSP requires maintenance, the update is marked as being sent successfully for the DSP.

If a DSP is marked as needing maintenance after an update message is sent, a special maintenance DSP Transport thread is created. Each DSP can have a configurable number of maintenance threads. If the maximum number of maintenance threads is created and the DSP is still in a needs maintenance state, the DDQE will mark the DSP as down, and all current DSP Transport threads associated with the DSP will be canceled. A special recovery DSP Transport thread will be assigned until the DSP clears the archived DRPs. Once that's done, a normal DSP Transport thread will again be created and the DSP will proceed with normal update operations.

If more than some configurable percentage of DSPs are in a "needs maintenance" state, the DDQE will throttle the DSCI until all DSPs are caught up, and an event will be created using the ENS. If this ever occurs, the publish or subscribe intervals should be adjusted, as well as the number of maximum maintenance threads per DSP. If there is still no practical way to configure the GADSF to keep up with the publishing demand, then multiple DSCIs can be created .

As many as one DSCI per DSM agent may exist. Remote DSPs may subscribe to multiple DSCIs for the same DSC. TDMs locally will synchronize their activity through a single master TDM thread on the subscribing DSP in such a case. The global identifier in the DRPs will help the TDM agent serialize the data (see the "Target Delivery Module" section below).

If this still does not produce tolerable results, and the network circuit is saturated, then individual DSCIs and even indiviual DSP subscribers can be routed over separate circuits. If required, additional circuits can be added to accomplish this. If this still does not resolve the problem, the practical limits of the GADSF have been reached.

Essentially, the GADSF can scale to the extent the network capacity and the data synchronization tolerance allow. There are no inherent limits.

Data Source Interface (DSI)

The DSI is a generic communication thread for DSM modules to communicate with. DSM modules (see below) are specific to a data source, but since the DRPs they create are self describing, there is no need for a data source specific DSI on the hosting DSP. DSI threads are configured manually on the hosting DSP and store DSM publisher information, including encryption keys, to authenticate publishers. DSIs and their corresponding DSMs negotiate the transmission (and re-transmission) of DRPs into a DSC.

The DSI interfaces with the DDQE to push DRPs into the DSCI queue. This is where the DDQE hashes the DRP and assigns the DSCI sequence and global identifier. When the DDQE pauses a DSI, the DSI tells the DSM to stop sending DRPs. The DSM will queue a configurable number of DRPs before marking the DSI as down and creating an event using the ENS (see the Data Source Module section below). Once the DDQE takes the DSI off pause, it informs the DSM and the queued DRPs in the DRM are transmitted normally.

The DSI is responsible for interfacing all data sources with the DDQE. If it is unable to interface with the DDQE, or if it loses its connection with the DSM unexpectedly, or if the DSM authentication fails, an event is created using the ENS. The DSI thread and the DSM module negotiate the nature of the connection between them. It may use an interval based reconnect paradigm, or a single long lived connection. Additionally, they negotiate the recovery method in the event something goes wrong. There is less state maintained about the DSM and DSI, so they use a very loose method of determining where to resume the DRP transmissions from. DSMs will maintain a back log of DRPs according to the negotiated recovery method. Since the DDQE uses a content hash to uniquely identify a DRP, this is perfectly safe.

Target Delivery Interface (TDI)

The TDI is the subscriber compliment to the DSI. The subscriber will start a TDI thread when it authenticates access to a DSCI. This thread is responsible for listening to update messages from its DSP Transport. Update messages are processed by the TDI to select an appropriate TDM (see below). The TDM will de-encapsulate the DRP into a staging area and authenticate the original data file using the content hash. This usually results in the original data file being moved into its final location. However, triggers may be assigned to the TDM for additional processing.

The TDI is also responsible for coordinating multiple DSP connections for split DSCI scenarios. In this case, the TDI will assign a separate thread for each DSCI connection, and an additional thread for the master TDM to interface with the other TDI/TDM threads. This is described in greater detail in the TDM section below.

Each DRP is processed separately and in sequence. If a DRP is received out of sequence, it is stored in a temporary location until all DRPs are processed from the update message, or until the missing sequence numbers are processed. If the missing sequences are not received in the update message, or if there are any problems in processing any DRPs, then the TDI will return the appropriate state to the DSP Transport thread and set the update interval to 0.

Event Notification Service (ENS)

The ENS is a generic event monitoring service. It takes any events created by the GADSF and translates them into alarms for an enterprise management platform to handle. The ENS is intended to use an out of band messaging system to help mitigate risk associated with using GADSF framework to notify an event console about problems with the GADSF. If there are problems with the GADSF, no assumptions should be made about its ability to deliver messages at all.

The ENS out of band interface is modular and robust. It includes a self healing capability that can send messages to close out alarms created previously if the error conditions are no longer present. [5] It also has the ability to leverage more than one communication method at the same time. For example, it could be configured to transmit an Ops2 message, an email, and a TEC Adapter event at the same time, for any given ENS event. This behavior is configurable by event class and or severity. The ENS event is translated into the target management platform message format using a simple translation rule table. Part of adapting a new management platform into the ENS involves defining these transaction rules. Currently, only Ops2 is supported.

Additional Modules

The GADSF supports modular extension from the ground up. In fact, all of the non-core components are built as extension modules. Additional extension modules are straight forward to implement. The following is a list if the current set of modules.

Data Source Module (DSM)

Data Source Modules are specific modules designed for a single data source. They communicate with the DSI layer of the GADSF core in a similar manner as a subscriber DSP. The DSM is effectively the publisher DSP. Unlike the subscriber interface, the DSM/DSI relationship is much looser. The two threads negotiate the initial connection in much the same way as the subscribers do, but they have a much looser protocol between them after that.

The DSM is responsible for locating data files for replication, encapsulating them into a DRP (including an initial content hash used by the DDQE for content verification prior to assigning a sequence number), and storing them into a local queue for delivery. Part of the encapsulation process also identifies the data source in the DRP itself. This allows for a loose coupling between the DSM and the DSI, and even the DDQE and DSP Transport. The only part of the framework that needs to care about the type of DRP is the TSI/TSM layer on the subscriber DSP.

The DSM is also responsible for delivering the DRPs to the DSI thread or creating ENS events if it is unable to. This includes negotiating a back log windowing strategy for recovery purposes, agreeing a purging cycle for the original data files (and their associated DRPs), and setting time out and back log length values to be used in a paused or erred state. The initial communication between the DSI and DSM is made according to DSC specific manual configuration values, but most of these are later re-negotiated by the agents.

If a DSI requests recovery for data outside the agreed recovery window, or from data files already purged, an ENS event is created and an appropriate error is sent to the DSI thread. Similarly, if the DSI thread pauses the DSM for longer than the agreed pause window, or the local DRP queue grows beyond the negotiated size, an ENS event is created and the DSI is marked as down in the DSM. A DSI that has been marked down will need to be re-established using the previously agreed configuration values, but the data being synchronized has now been compromised. For this reason, the local DRP queue length should be generous, as should the recovery window.

If a DSM marks the DSI as down, it will also create an ENS event and request that the DSC be re-initialized. Since this has very serious ramifications for all subscribers, great lengths should be taken to ensure this situation only happens when there is absolutely no other choice (such as the local disk filling on the publisher). Since the data sources are usually pegged to a point in time copy of a larger data set, a new point in time copy needs to be delivered to the subscribing DSPs prior to re-initializing the DSC on ths hosting DSP.

Target Delivery Module (TDM)

The TDM is responsible for de-encapsulating the DRP and authenticating the data files using the content hash provided. The DRP sequence is used to ensure proper serialization of data files on the subscribing DSP. Once the data files are authenticated, they are moved into their target directory. If there is an external trigger associated with the TDM, it is called only after the TDM has successfully moved the data file. If there is an error in the de-encapsulation process, authentication process, or the DRP is not in sequence, an error is returned to the TDI and an ENS message is created. If the external trigger fails for any reason, an ENS message is created, but no error is returned to the TDI.

If any errors are encountered prior to executing the external trigger, the trigger is not called. Additionally, if there are multiple TDMs being coordinated by a single master TDM, only the master TDM will return errors to the corresponding TDI, and only the master TDM will execute external triggers. In this case of multiple TDMs, sequence errors are returned if any single TDI sequence is skipped, or if the relative global identifier sequence cannot be resolved in the currently active set of updates. This may result in many additional TDI and TDM threads, or eventually a pause in the DDQE. Eventually, the sequencing will resolve itself and processing will continue. Its important, therefore, to configure appropriate (short) intervals and (long) time outs for split DSCI configurations.

Summary

The GADSF achieves the RTO/RPO objectives by applying very specific remedies to the specific requirements of the Registry Advantage repository and directory services infrastructure. It is highly configurable through a set of simple modular component configuration files, and is extremely robust. It can take full advantage of all network circuits available between any two DSPs to ensure data gets delivered in the required time frame. Extensive monitoring from within the GADSF add to its robustness.

The development of external triggers for applying Oracle transaction logs to a standby Oracle instance in recovery mode, and a trigger to organize EPP, SRP, GEPS and AMI logs into the recovery tool format are all that is required to fully leverage this framework. These triggers also include Ops2 monitoring hooks and can create Ops2 messages directly. Additionally, they can be run as stand alone programs in addition to being called as TDM triggers. This also simplifies manual recover from any error situation.

[1] http://www.nishansystems.com/techlib/downloads/Promontory_Performance_Note_01.pdf

[2] http://www.nishansystems.com/techlib/ipsans.php#_Toc9654738

[3] It is unacceptable operationally to use the IRS as a backup and recovery strategy for the IRS.

[4] OLBCP could be leveraged to achieve this, but not easily, and it would not scale beyond a single local and single remote location. It also does not have robust monitoring and fault tolerance capabilities required for a well managed solution, and so would need to be extended.

Registry Advantage Global Application Data Synchronization Framework (GADSF)White Paper

Table of Contents