Unity Registry Logo               Time to re-organise
The Proposal
 

C17.10. Peak capacities. Technical capability for handling a larger-than-projected demand for registration. Effects on load on servers, databases, back-up systems, support systems, escrow systems, maintenance, personnel.

The registry system has a number of unique features that enable it cope with peak capacities, these range from features built in to the AusRegistry code, to hardware used and the scalable design of the whole system. The code it self has a number of “break points” or limiting factors, that means whenever one of these conditions is detected (e.g. number of process in the system exceeding certain amount) the machine refuse to accept any more connections. This means that if a single machine reaches its configurable capacity no more connections are given to the that machine until some of its existing connections are lost, this stops a machine from simply accepting connection after connection until it reaches the point that it crashes.

The routers and packet shapers, smooth out fluctuations in bandwidth usage, this rate limiting, has the effect of reducing any sudden bursts of connections flying through to the registry machines and regulates them such that they are handled quickly and in an efficient manner. The Load Balancers allow for a fully scalable design which means adding more machines to the network is a simple process which requires no down time. Upon receiving new hardware, a machine can be up and running in the registry within 10-20mins, without the need for any downtime what so ever. Should monitoring indicate that peak load for the system appears to be constantly reached, more machines will be added to  reduce the overall system load and allow much higher throughputs.

If the bottle neck appears to be the database itself, the Sunfire system can have more CPUs and more memory added to it without the need for any down time, if all CPU bays become filled in the sun fire unit (16 in total) then the clustering capabilities of the Oracle database permit more database boxes to be added.. The provision of the relevant connection Grades described above will also aid in reducing the effect of peak loads on the system in particular stopping one or two registrars from taking all registry resources and stopping other registrars from access the service.

Effect of load

The load will be easily detected on all monitoring scripts, alerts will be firing all over the place alerting registry engineers that systems are peaking at capacity, however because of the breaking mechanisms built into the code, and the network bandwidth limiting described above, it would be near impossible to produce a load that would be significant enough to bring down the registry system. The worst scenario is that registrars who are using all of their bandwidth will receive timeouts when trying to create new connections. It is our view that if every registrar has 10 working connections at a reasonable speed, that is preferable to  unlimited connections at speeds too low to be of real value.

Backup

Even during times of peak load, the system will still be able to perform backups reliably by making use of software called Sun StorEdge[tm] Instant Image 3.0, this software allows us to take an instant image of the raid file system, to a third volume, and then back up the image leaving the proper volumes free to concentrate on database operations. Service will be maintained in line with the service level discussed and are fully scalable to meet demand if demand changes.

Support services/maintenance

The management servers are not used by registrars and  hence maintenance and support will still be able to be preformed successfully. If bandwidth is too congested (which shouldn’t occur because the packet shaper would give management packets priority over registry connections) we can still reach the registry from the redundant fiber optic links that are maintained to the secondary site.

Personnel

Since the system is highly automated the involvement here is minimal, support staff will need to keep a closer eye on system statistics and warn system administrators of any possible concern areas. Development personnel will collect statistics from these periods to aid in detecting the areas of the system that’s can be improved to increase system performance.

Development staff are constantly available to move to a support and maintenance role temporarily if an immediate shortage occurs. Where additional shortages occur properly-qualified and experienced contract staff can be employed.