Logo

Saturday, June 23, 2012

Royal Bank of Scotland Group suffers major business continuity failure




British based banks belonging to the Royal Bank of Scotland Group have experienced a major business continuity failure, with customers being unable to access bank accounts for more than 24 hours. NatWest customers have been the most affected, with some RBS and Ulster Bank customers also impacted.
The systems failure was the result of a problem with a software application. This has now been resolved but a huge backlog of information that requires processing means that customers are still unable to make transactions.
Artco Solutions will recommend the following…

Incident detection

Response BEFORE an incident occurs, upon detection of one or a series of related events that become incidents
  • ·      Detecting incidents at the earliest opportunity minimizes impact to services, reduces recovery effort, and preserves quality of service
  • ·      Investment in detection should be linked to the business continuity needs
  • ·      Hardware failures
- Malfunctions in racks, servers, storage arrays, tape devices
  • ·      Network - Data connectivity interruptions, intrusion detection etc.
  • ·      Software
- Upgrade issues, unauthorized software, malware etc.
  • ·      Data – Corrupted data sets, incomplete datasets etc.
  • ·      Processes
- System changes, maintenance etc.
  • ·      Suppliers
- Power failure, telecoms outage



Incident prevention

ICT Readiness promotes resilience
  • ·      Facilitates identification of critical components in each of the elements which make up the ICT environment
  • ·      Relates ICT criticality to wider business criticalities
  • ·      Priorities also driven by BC requirements
  • ·      Justifies resource and budget for appropriate resilience measures
  • ·      Monitors the performance of resilience measures
  • ·      Review and improvement following exercises, tests and incidents


Response
  • ·      Confirm nature and extent of incident
  • ·      Take control of situation
  • ·      Contain the incident
  • ·      Communicate with
stakeholders
  • ·      Confirm nature and extent of incident
    • o   Acquire information
    • o   Assess
    • o   How does it affect the elements of the ICT environment?
      • §  How might these affect service-users and the critical activities of the organisation?


Take control of situation
  • ·      Automatic or manual failover?
  • ·      Determine priorities for mitigating incident
    • o   People

    • o    Facilities

    • o   Technology
    • o   Data

    • o   Processes
    • o   Suppliers
  • ·      Determine resource requirements
  • ·      Communicate
  • ·      Contain the incident
    • o   Auto or manual failover?
    • o   Direct resources to manage situation
·      Communicate
  • Communication is essential through the response process
  •    Integration with overall BC incident management process
  • Is there concurrent activation of BC Incident Management?
  • Liaise with rest of organisation
  •  Activate relevant contingency arrangements
Recovery
  • ·      Technical recovery plans
  • o   In conjunction with organisational business continuity plans
    • o   Failover of immediately time- critical systems
    • ·      Recovery of less time-sensitive systems 

  • ·       Manage recovery process •
  • o    Over hours, days, weeks.....
Learning
  • ·      Audits/self assessment
  • ·      Feedback from periodic BIAs and risk assessments
  • ·      Corrective action following incidents
  • ·      Preventive action

No comments:

Post a Comment