British based banks belonging to the Royal Bank of Scotland Group have
experienced a major business continuity failure, with customers being unable to
access bank accounts for more than 24 hours. NatWest customers have been the
most affected, with some RBS and Ulster Bank customers also impacted.
The systems failure was the result of a problem with a software
application. This has now been resolved but a huge backlog of information that
requires processing means that customers are still unable to make transactions.
Artco
Solutions will recommend the following…
Incident detection
Response BEFORE an incident occurs, upon detection of one or a series of related events
that become incidents
- · Detecting incidents at the earliest opportunity minimizes impact to services, reduces recovery effort, and preserves quality of service
- · Investment in detection should be linked to the business continuity needs
- · Hardware failures - Malfunctions in racks, servers, storage arrays, tape devices
- · Network - Data connectivity interruptions, intrusion detection etc.
- · Software - Upgrade issues, unauthorized software, malware etc.
- · Data – Corrupted data sets, incomplete datasets etc.
- · Processes - System changes, maintenance etc.
- · Suppliers - Power failure, telecoms outage
Incident prevention
ICT Readiness promotes resilience
- · Facilitates identification of critical components in each of the elements which make up the ICT environment
- · Relates ICT criticality to wider business criticalities
- · Priorities also driven by BC requirements
- · Justifies resource and budget for appropriate resilience measures
- · Monitors the performance of resilience measures
- · Review and improvement following exercises, tests and incidents
Response
- · Confirm nature and extent of incident
- · Take control of situation
- · Contain the incident
- · Communicate with stakeholders
- · Confirm nature and extent of incident
- o Acquire information
- o Assess
- o How does it affect the elements of the ICT environment?
- § How might these affect service-users and the critical activities of the organisation?
Take control of
situation
- · Automatic or manual failover?
- · Determine priorities for mitigating incident
- o People
- o Facilities
- o Technology
- o Data
- o Processes
- o Suppliers
- · Determine resource requirements
- · Communicate
- · Contain the incident
- o Auto or manual failover?
- o Direct resources to manage situation
·
Communicate
- Communication is essential through the response process
- Integration with overall BC incident management process
- Is there concurrent activation of BC Incident Management?
- Liaise with rest of organisation
- Activate relevant contingency arrangements
- · Technical recovery plans
- o In conjunction with organisational business continuity plans
- o Failover of immediately time- critical systems
- · Recovery of less time-sensitive systems
- · Manage recovery process •
- o Over hours, days, weeks.....
Learning
- · Audits/self assessment
- · Feedback from periodic BIAs and risk assessments
- · Corrective action following incidents
- · Preventive action
No comments:
Post a Comment