Tuesday, March 20, 2012

Business Continuity Management (BCM) Vs. Insurance

Business Continuity Management (BCM) Vs. Insurance

Insurance gives businesses the comfort of knowing that, in the event of loss or damage due to an insured peril, it will be able to replace or repair material items.  Additionally business interruption insurance gives cover, typically, for the shortfall in gross profits for a specified period following the incident.

However, no matter how effectively a business protects itself through insurance, there are always some risks that cannot be anticipated or insured against. For instance insurance can never provide cost- effective security against the long term or permanent loss of customers, market, quality, reputation and employee loyalty.

The only effective protection against serious disruption to your Business BCM.  BCM can be most simply described as “understanding and controlling risk and being best able to recover your business, regardless of the causes of interruption”. Insurance companies recognize the mutual benefit to be gained from BCM;
  • ·   BCM is seen by insurers as a means to improve the quality of the business they are underwriting and confirm that BCM helps organisations mitigate impact, recover faster and minimize losses.
  • ·      BCM can be used to protect against losses incurred through traditionally non-insurable such as Supplier insolvency or pandemic influenza
  • ·      BCM can be used to better understand the requirement for Business Interruption Cover

BCM plays a vital role, during negotiations most insurers will want to see clear evidence of the fact that the company seeking a business interruption over is managing its potential loss exposures effectively and taking the necessary mitigation steps.  This is where the BC Plan can play a valuable role in demonstrating that the organization has implemented measures to limit any possible risks. The underwriter will also expect to see evidence of processes in place to ensure that the business can return to full operation as quickly as possible after the event. 

Wednesday, March 7, 2012



The development of recovery times for both the business organization’s business continuity plan and the IT department’s disaster recovery plan need to be developed through the collaboration of both parties for either plan to provide the proper protection. However in my thirty-five years in the business continuity and resiliency field I have found in many situations they are not.
The reasons for this can be timing or a lack of knowledge of the overall business continuity and/or disaster recovery planning process coupled with a lack of understanding of each other’s real recovery timing needs.
The purpose of this article is to provide a framework in which the recovery time objectives (RTOs) for the business continuity and the disaster recovery plan can be developed together.

Reason for inconsistencies and failures

Generally the drivers for business continuity and disaster recovery planning are considered to be one and the same, but this is not always the case. Many times the very design process for IT infrastructure requires that the IT organization develop disaster recovery planning thoughts and plans early in the application and/or systems development process. So, early in the project’s timescale of the development of a new application or system, IT must have some understanding of what kind of recovery timing and recovery point timing will be needed to support the technology to be deployed. IT will try to obtain the RTO and RPO (recovery point objective) numbers, but the business is most often focused on insuring that the deployment of the new business process or function is rolled out on time and within budget. The business organization is not thinking about business continuity planning at this time. So, IT will take it on itself to develop a best guess of the required recovery times either based on conversations with the business organization or on its own, if the latter cannot or will not commit to a number.
In other cases that I have seen, there is a clear lack of knowledge about business continuity and disaster recovery planning. Each organization knows that they need either a business continuity or a disaster recovery plan but they are not trained in the overall steps in developing such plans. As such the business organization does not understand the risks, tradeoffs, and costs involved in developing a proper business continuity plan. The business organization also often does not understand that it needs to properly analyze the operation to better understand the recovery requirements during the process/systems/application development phase of the systems/process development life cycle or, as ITIL defines it, the application life cycle (ALC). The business organization needs to quantify the impacts of loss of that process or system; and may not be sure of the right questions to ask - not only in terms of loss of productivity, but in terms of costs to process manually in case of a system loss or failure. Can the organization develop and use manual processes at all if the system or IT infrastructure fails? Does the organization have the human resources to perform the necessary manual processes or will they need to bring in contingent workers and for how long and for what cost? Every business organization needs to clearly understand and to articulate their operation’s maximum tolerable period of disruption (MTPD).
MTPD is the maximum time an activity or resource can be unavailable before irreparable harm is caused to the organization. This applies to both customer-facing and internal activities. Note that the recovery time objective specifies the time by which an organization intends to recover an activity or resource: the maximum tolerable period of disruption is the upper bound on this time.

The business needs to utilize the MTPD to develop its processes and contingency processes, and the IT organization need to understand the MTPD to properly develop its technology and RTO which, in turn, will enable the business to achieve its RTO objectives.
At the same time, IT needs to utilize the recovery time numbers developed by the business organization as a basis for its system and infrastructure RTO values.
Standards and planning process
There are so many business continuity and disaster recovery standards to choose from, as well as other related standards of practice, that this might be the reason for all of the confusion. The fact that none of these standards really talk of integrating the business recovery and the IT technology recovery plans together in to the overall process or application development life cycle complicates the matter even further.

There is also the issue that business continuity and/or disaster recovery planning classes are usually only electives in business administration or computer technology/information systems curriculum. So we are not exactly preparing our next batch of business or technology leaders to properly understand the methods, or importance, of contingency planning.
All that being said, most of the standards that exist do have a pretty consistent set of predefined steps to be reasonably successful. So if we take all of the contingency planning steps and align them with the ITIL ALC phases the planning cycle will integrate system development with continuity planning together at the best possible time in the development process.
I will outline the steps below in developing business continuity and disaster recovery plans with their corresponding points within the ITIL application development life cycle:
1) Understand the Organization
a. Risk Assessment
b. Business Impact Assessment
            i. Determine MTPD for operation
           ii. Develop RTO for Critical Systems
           iii. Develop RPO for Critical Systems
Requirements – requirements gathered based on business needs of the organization
2) Evaluate and Determine Strategy
a. BC strategy to meet RTO/RPO
b. DR strategy to meet RTO/RPO
Design – requirements translated into specifications
3) Develop Plans
a. BCP – Business Organization
b. DRP –IT Organization
Build – Application and the operational model are made ready for deployment
4) Exercise PlanOperate -- IT operates the application as part of the business service
5) Audit and Maintain PlanOptimize

Using the standards and good practices
During the requirements gathering phase of the ITIL ALC the business owner should have also conducted the risk assessment and business impact analysis or BIA. The results of these two activities allow the business owner to clearly see the impact on the business of a failure or discontinuation of operations in either, or both, of the business or IT operations. They can then translate that knowledge from the risk assessment and business impact analysis into quantifiable RTO and RPO numbers to be used in the next phase of business continuity and disaster recovery planning (Evaluate and Determine Strategy) and the Design phase of the ITIL ALC.

The RTO and RPO numbers are used to develop alternative strategies that meet the recovery time and point needs. A cost for each alternative design is developed. The cost is the total of the IT cost to design, implement, build and operate; and the business cost for any workarounds or special handling during the outage period; plus costs to load any transactions processed during that outage period into the system (processing resynchronization) after they are brought back on-line and are processing again as before the incident.

The alternative strategies are then looked at using a cost and benefit (time, reduced workaround complexity, and etc.) analysis of each alternative. The best option will accomplish return to operation in a reasonable time with an acceptable cost to the business and IT. However, the alternative selected will require input from both IT and the business to properly address the risk of outage. The business will need to insure that it can perform the workarounds and still meet all of the business, regulatory and audit needs of the operation for the time period that the alternative defines the IT organization to need for restoring the IT systems needed to restart the application and its associated services.
For the plans to be effective and ‘fit for purpose’ it is very important that the business and IT are on the ‘same sheet of music’ as to recovery times and points. It is no good if the business has planned its resources and workarounds expecting a system recovery time of 24 hours only to find that the system will be down for 48 hours. On the other side of the coin it is not fiscally responsible to pay the cost to expedite the recovery time of an IT system to less than four hours if the business can tolerate an outage period of 24 hours or more at much less cost for the final IT solution.
Once it has been concluded that both plans are consistent with each other, the actual plans can be developed. While the business prepares for implementation of the new application and/or service, IT will make ready the systems and infrastructure needed to also meet the business schedule for implementation.

Exercising the plans
There is one caveat, however. Even if both sides have planned together and developed their plans based on a single and consistent recovery time, the two planning activities still need to verify (via exercising the plans together) that the IT recovery timing (the disaster recovery plan which includes hardware restoration, software restoration, synchronization of databases, and etc.) actually comes in on time to meet the business’ needs as provided for in the business continuity plan.
Only in testing and timing the two recovery processes to ensure that they are coincident can an organization truly be confident that the overall plans will be successful.

Social media can transform enterprise business continuity management

Social media can hold the key to transforming enterprise business continuity management, especially crisis/incident management and communications practices, according to Gartner.
Gartner analysts predict that, by 2015, 75 percent of organizations with business continuity management systems will have public social media services in their crisis communications strategies; and BCM professionals are advised to immediately begin assessing social media's opportunities and risks.

"Enterprises simply cannot afford to ignore social media as a crisis communications tool," said Andrew Walls, research vice president at Gartner. "In many cases, social media may represent the only available means of locating and contacting personnel; providing stakeholders with the information and assistance they need; informing citizens, customers and partners of product/service availability; and taking other business-critical actions following a disruptive event."
However, Mr. Walls said that effective use of a new communications channel requires forward planning and practice. Attempting to leverage social media for the first time during a crisis can cause more harm than good. Instead, he said that organizations must develop comprehensive social media strategies and tactics for crisis/incident management and integrate social media with the enterprise's established business continuity management processes.
The use of social media for user input and knowledge sharing can create a conflict for organizations when the sites are being used during a crisis by the workforce and others that are involved or watching the event unfold.

"As the workforce develops personal, digital friendships that might take precedence over the official spokesperson of the organization, a conflict over who is the authority during an event can emerge, leading to unanticipated and negative results if official procedures are not followed," said Roberta Witty, research vice president at Gartner. "Such usage shouldn't turn into a battle for control, but organizations must protect their reputations and the effectiveness of their communications during stressful times. Therefore, putting forth a social media management strategy as part of a business continuity management program is essential to ensure that the organization's crisis communications effectiveness is protected, and that response and recovery plans and procedures are followed."

Social media is very different, technically and culturally, from the tightly controlled technologies and means of communication that enterprises are accustomed to using and supporting (such as corporate email systems). The use of social media for collection and distribution of information can create serious challenges for enterprises:
  • Maintaining an authoritative and credible information source;
  • Enlisting active, effective participation of staff and the public that are active in social media;
  • Collecting, filtering, analyzing and applying information gathered from social platforms
"Organizations developing social media strategies and tactics for crisis/incident management must take these factors into account by establishing effective authorization processes, content guidelines, and monitoring and message retention capabilities," Ms. Witty said. "The bottom line is that no enterprise's business continuity management efforts can afford to ignore the opportunities and risks presented by social media. BCM and crisis management specialists should begin working now to integrate social media tools and practices into their BCM efforts."