Advice from SunGard Availability Services.
Virtualization technology has changed the landscape of IT and data centers / centres, delivering substantial benefits to not only production environments but also in disaster recovery. However, while data centers are becoming increasingly virtualized, most IT operations are a mix of physical and virtual systems: a hybrid environment.
While newer applications may run exclusively on virtual workloads, there are still many mission-critical applications running on a combination of mainframes, Windows servers, Linux/Unix systems and virtual machines. And managing a recovery site requires enterprises to purchase a whole new set of costly application software licenses for the secondary location.
This reality has created an IT issue that is still flying ‘under the radar’ of many IT organizations: How to best protect and recover applications in hybrid environments – and do it a way that works within business and cost constraints?
The three top challenges for enterprises looking to recover hybrid environments are addressing their needs to:
- Recreate a multi-layer, multi-platform hybrid stack for each mission-critical application.
- Recover mission-critical applications within the time requirements needed to avoid unacceptable consequences to the business (recovery time objective – RTO).
- Avoid busting the IT budget on CAPEX for building a secondary site for recovery and OPEX for maintaining the site.
Why recovery in hybrid environments is so difficult
To better understand the complexity and difficulty in managing recovery in hybrid environments, let's examine a typical three-tier web application – for instance, an e-commerce application. The application may have a database layer that runs on two different systems – a Linux system running Oracle and a Microsoft Windows server running SQL. Next, the middleware – or business logic – of the application could be on a Win2K server running WebLogic, and its job is to aggregate data from the Oracle and SQL servers. Lastly, the application has a web layer on an ESX server running Apache.
Add into this scenario some of the hardware supporting the application. For example, the web and middleware tiers are stored on an EMC SAN device, with the Oracle database on a NetApp SAN device and the SQL server on a Dell storage device.
Here is what this enterprise faces: multiple storage platforms, multiple compute platforms, multiple operating systems, and a mix of physical and virtual environments. So when a disaster or outage hits, if the enterprise has not created the identical physical and virtual stacks in its recovery environment to accommodate all three layers, the recovery will fail.
If the enterprise has the wrong version of VMware's hypervisor running in the recovery environment, the recovery will fail. If it has the wrong hypervisor running in the recovery environment (say, Xen), the recovery will fail. If the enterprise only has the ability to recover the database layer by itself, or both the database and middleware layers without the web layer, the recovery will fail.
And now add in another level of complexity. The previous scenario is just one application. What if the organization has 50, 80 or even more than 100 applications to recover?
As enterprises examine the challenge of recovering a large number of important applications – all with aggressive recovery time objectives – the reason why recovery in hybrid environments is so difficult becomes very clear.
SunGard Availability Services recommends that organizations address the following set of questions when developing a recovery strategy for hybrid environments:
- Is your production environment 100 percent virtualized, or do you run a hybrid environment with multiple platforms, operating systems, hypervisors and storage technologies?
- Do you have a full understanding of your recovery environment? Is it compatible from a platforms, operating systems, hypervisors, storage and application data point of view with your production environment? Do you understand all the interdependencies within your mission-critical applications?
- Do you have the diverse skills and the automation technologies to be able to recover all of your applications in an application-consistent way and be able to meet the RTOs and recovery point objectives (RPOs) for all of your applications?
- Have you created the processes and procedures to recover your hybrid environment? Have you tested your ability to meet your RTOs?
- Is your disaster recovery runbook current? In particular, have all production configurations been captured in the recovery environment – addressing change management?
What's needed to achieve recovery in hybrid environments
In order to support recovery of a hybrid environment, an enterprise needs to have in place:
- The right technologies for each platform and operating system at a secondary site.
- A well-documented disaster recovery playbook that contains all recovery processes.
- The right staff and expertise (a multi-discipline team skilled in VMware, Oracle, Windows, storage technologies and more) – trained and tested in running the playbook.
- Change management processes in place so all changes in production configurations – which happen frequently in enterprises – make their way into the recovery environment.