Tuesday, August 28, 2012

IT Continuity Framework

Develop a framework for IT continuity to support enterprise wide business continuity management using a consistent process. The objective of the framework should be to assist in determining the required resilience of the infrastructure and to drive the development of disaster recovery and IT contingency plans. The framework should address the organisational structure for continuity management, covering the roles, tasks and responsibilities of internal and external service providers, their management and their customers, and the planning processes that create the rules and structures to document, test and execute the disaster recovery and IT contingency plans.
The plan should also address items such as the identification of critical resources, noting key dependencies, the monitoring and reporting of the availability of critical resources, alternative processing, and the principles of backup and recovery.

Value Drivers
    Continuous service across IT
    Consistent, documented IT continuity plans
    Governed services for business needs
    Achieved short- and long-range objectives supporting the organisation’s objectives

Risk Drivers
    Insufficient continuity practices
    IT continuity services not managed properly
    Increased dependency on key individuals

Control Practice

·       Assign responsibility for and establish an enterprise wide business continuity management process. This process should include an IT continuity framework to ensure that a business impact analysis (BIA) is completed and IT continuity plans support business strategy, a prioritised recovery strategy, necessary operational support based on these strategies and any compliance requirements.
·       Ensure that the continuity framework includes:
o   The conditions and responsibilities for activating and/or escalating the plan
o   Prioritised recovery strategy, including the necessary sequence of activities
o   Minimum recovery requirements to maintain adequate business operations and service levels with diminished resources
o   Emergency procedures
o   Fallback procedures
o   Temporary operational procedures
o   IT processing resumption procedures
o   Maintenance and test schedule
o   Awareness, education and training activities
o   Responsibilities of individuals
o   Regulatory
o   Critical assets and resources and up-to-date personnel contact information needed to perform emergency, fallback and resumption procedures
o   Alternative processing facilities as determined within the plan
o   Alternative suppliers for critical resources
o   Chain of communications plan
o   Key resources identified

·       Ensure that the IT continuity framework addresses:
o   Organisational structure for IT continuity management as a liaison to organisational continuity management
o   Roles, tasks and responsibilities defined by SLAs and/or contracts for internal and external service providers
o   Documentation standards and change management procedures for all IT continuity-related procedures and tests
o   Policies for conducting regular tests
o   The frequency and conditions (triggers) for updating the IT continuity plans
o   The results of the risk assessment process (PO9)


Without a business continuity program in place, even a minor disruption to systems, facilities or other key resources can potentially halt operations, impact customers or harm the financials of an organization.
Therefore, it’s essential for organizations to understand how an unplanned outage would impact their business and know the steps they need to take to respond effectively. This requires a holistic view not only of threats to availability but also threats to a business continuity program’s continued viability.

To provide a starting point for those new to the subject of business continuity the following list offers seven essential components of a successful business continuity plan:

1. Conduct a business impact analysis
Carry out a thorough analysis of people, information, application and other resources to build an understanding of the consequences – financial and operational – of losing vital components. Take particular care to uncover interdependencies across the organization that are critical to remaining operational. This analysis will provide a solid foundation for establishing recovery priorities and timeframes in your plan, allowing you to make informed decisions on where and how much to invest in business continuity.

2. Your business continuity plan needs to be a living document 
Creating a business continuity plan is an important step, but not the end state. It takes more than words tucked away in a plan to enable readiness. Business continuity preparedness means having a living program – which is continually validated, communicated, tested, updated and improved. It also means having an organization that is ‘situation ready’: with skills honed through training and supported by robust planning tools to respond to a significant business disruption. It is important to remember that your business continuity plan needs to keep pace with new workflows, business applications and computer systems.

3. Don’t plan in a vacuum
You need to involve all key stakeholders in the business continuity planning process, including IT, business leaders, human resources, corporate communications, and physical and information security managers. Be sure that in planning you coordinate with other business units in your organization to avoid potential conflicts, such as multiple business units depending on the same facility as a secondary site in response to an interruption.

4. Bridge the gap between business and IT
The success or failure of a business continuity program hinges on having the entire organization on the same page. In order to be successful, IT professionals need to involve their business unit colleagues, garner executive support, and keep all parties involved.

5. Not all employees will be available
Betting the future of your business on the assumption that all key employees will be available is not the best course of action. Facing a disruption may cause some employees to focus on their home lives, or encounter obstacles that prohibit them from performing their jobs normally. It is critical that your organization communicates the business continuity plan to all employees and ensures that everyone understands their roles and responsibilities, even if they are not part of the ‘recovery’ team.

6. Test your business continuity plan
Many companies think they have an effective business continuity plan in place. However, the true effectiveness of a plan can only be fully understood after it is tested. Even if your company cannot arrange a full-scale exercise, look for smaller ways to test portions of your plan. For example, arrange a test of your company’s call tree or review recent organizational changes and assign new responsibilities based on the current structure and available resources.

7. Consider the benefits of business continuity management software
BCM software helps organizations develop planning strategies that simplify processes and manage the entire lifecycle of their continuity program – regardless of the technologies used for business continuity. BCM software is a tool built to accommodate change and address the unique demands of information availability. It can also be scaled appropriately to fit companies of any size, regardless of the maturity of their business continuity program or budget.


Advice from SunGard Availability Services.

Virtualization technology has changed the landscape of IT and data centers / centres, delivering substantial benefits to not only production environments but also in disaster recovery. However, while data centers are becoming increasingly virtualized, most IT operations are a mix of physical and virtual systems: a hybrid environment.
While newer applications may run exclusively on virtual workloads, there are still many mission-critical applications running on a combination of mainframes, Windows servers, Linux/Unix systems and virtual machines. And managing a recovery site requires enterprises to purchase a whole new set of costly application software licenses for the secondary location.
This reality has created an IT issue that is still flying ‘under the radar’ of many IT organizations: How to best protect and recover applications in hybrid environments – and do it a way that works within business and cost constraints?
The three top challenges for enterprises looking to recover hybrid environments are addressing their needs to:
  • Recreate a multi-layer, multi-platform hybrid stack for each mission-critical application.
  • Recover mission-critical applications within the time requirements needed to avoid unacceptable consequences to the business (recovery time objective – RTO).
  • Avoid busting the IT budget on CAPEX for building a secondary site for recovery and OPEX for maintaining the site.
Why recovery in hybrid environments is so difficult
To better understand the complexity and difficulty in managing recovery in hybrid environments, let's examine a typical three-tier web application – for instance, an e-commerce application. The application may have a database layer that runs on two different systems – a Linux system running Oracle and a Microsoft Windows server running SQL. Next, the middleware – or business logic – of the application could be on a Win2K server running WebLogic, and its job is to aggregate data from the Oracle and SQL servers. Lastly, the application has a web layer on an ESX server running Apache.
Add into this scenario some of the hardware supporting the application. For example, the web and middleware tiers are stored on an EMC SAN device, with the Oracle database on a NetApp SAN device and the SQL server on a Dell storage device.
Here is what this enterprise faces: multiple storage platforms, multiple compute platforms, multiple operating systems, and a mix of physical and virtual environments. So when a disaster or outage hits, if the enterprise has not created the identical physical and virtual stacks in its recovery environment to accommodate all three layers, the recovery will fail.
If the enterprise has the wrong version of VMware's hypervisor running in the recovery environment, the recovery will fail. If it has the wrong hypervisor running in the recovery environment (say, Xen), the recovery will fail. If the enterprise only has the ability to recover the database layer by itself, or both the database and middleware layers without the web layer, the recovery will fail.
And now add in another level of complexity. The previous scenario is just one application. What if the organization has 50, 80 or even more than 100 applications to recover?
As enterprises examine the challenge of recovering a large number of important applications – all with aggressive recovery time objectives – the reason why recovery in hybrid environments is so difficult becomes very clear.
SunGard Availability Services recommends that organizations address the following set of questions when developing a recovery strategy for hybrid environments:
  • Is your production environment 100 percent virtualized, or do you run a hybrid environment with multiple platforms, operating systems, hypervisors and storage technologies?
  • Do you have a full understanding of your recovery environment? Is it compatible from a platforms, operating systems, hypervisors, storage and application data point of view with your production environment? Do you understand all the interdependencies within your mission-critical applications?
  • Do you have the diverse skills and the automation technologies to be able to recover all of your applications in an application-consistent way and be able to meet the RTOs and recovery point objectives (RPOs) for all of your applications?
  • Have you created the processes and procedures to recover your hybrid environment? Have you tested your ability to meet your RTOs?
  • Is your disaster recovery runbook current? In particular, have all production configurations been captured in the recovery environment – addressing change management?
What's needed to achieve recovery in hybrid environments
In order to support recovery of a hybrid environment, an enterprise needs to have in place:
  • The right technologies for each platform and operating system at a secondary site.
  • A well-documented disaster recovery playbook that contains all recovery processes.
  • The right staff and expertise (a multi-discipline team skilled in VMware, Oracle, Windows, storage technologies and more) – trained and tested in running the playbook.
  • Change management processes in place so all changes in production configurations – which happen frequently in enterprises – make their way into the recovery environment.

Organizational Resilience…Try Framing It As A Strategic Roadmap!

In today’s ‘go fast, go hard, go global’ business transaction environment, management teams and boards are less inclined to dismiss or characterize materialized risks and business disruptions as merely being embarrassing events or short-lived inconveniences. 
In large part that’s because of recent events, ala BP, Massey Coal, Toyota, Wall Street, etc., coupled with a re-consideration (appreciation) for the speed which negative events can occur or certain risks materialize and literally cascade throughout an enterprise to irreversably infect and adversely affect what matters most to company’s operating in ‘intangible asset’ dominated economies, i.e., their  (a.) brand, reputation, image, and goodwill, and (b.) supply – value chain!
That, in my view, is sufficient rationale for management teams and boards to initiate organizational resilience planning, quite apart from conventional business continuity-contingency planning.  But, to further serve a company in 2010, 2011, 2012 and beyond, its useful for management teams and boards to conceive – frame their organizational resilience planning initiative in the context of a ’strategic roadmap’ to achieve business goals and objectives absent impediments and/or interruptions. 
The strategic roadmap would of course also include achieving a level of (enterprise) preparedness sufficient to effectively (1.) mitigate, counter, defend, and manage certain risks, and (2.) enable a company to recover from adverse events and/or materialized risks more speedily.
The heart of an organizational resilience program, in my view, lies in framing-conceptualizing a ’strategic roadmap’ that includes effectively designed ’infrastructures’ that are operationally resilient to the ever growing array of risks and vulnerabilities by being able to perform, at minimum, two broad, but essential functions under duress, i.e., (1.) safeguard supply – value chains by ensuring an acceptable level of preparedness and functionality exists so a company may continue to produce-deliver goods, services, products, etc., for the duration of the adverse event, and (2.)  enable a company to return to an acceptable level of operational normalcy as rapidly as possible.  
Company infrastructures are increasingly complex and nuanced however, and routinely consist of numerous disparate components and inter-dependencies, which, in most instances, extend well beyond a company’s conventional walls or perimeter by virtue of what seems to be ever evolving, dynamic, and converging chains of suppliers, distributors, partners, customers, and other stakeholders, that presumably converge to achieve (the company’s business) goals and objectives. 
Too, by framing organizational resilience planning in a ’strategic roadmap’ context, it elevates management team and board ‘buy-in’ by bringing more clarity and insight to company exposures and vulnerabilities to certain risks, particularly within the supply – value chain.

Article Copied from: Business IP and Intangible Asset Blog http://kpstrat.com/blog