Your Resilience Plan Needs a Degraded Mode
A practical article for community bank and credit union boards and senior leaders on governing the degraded operating mode after a technology disruption, including control bound...
The hardest technology day at a financial institution is not always the day something breaks.
Sometimes it is day two.
The system is half back. The vendor says progress is being made. Employees are using workarounds that were supposed to last two hours and are now stretching into a week. Customers and members can do some things, but not enough things. Leadership is getting updates. The board is getting reassurance. Everyone is exhausted. Nobody is fully sure which compromises are acceptable anymore.
That is the degraded mode. And most institutions do not govern it nearly well enough.
Business continuity plans usually focus on the event itself. Disaster recovery plans focus on restoration. Incident response plans focus on containment, communication, and escalation. All of that matters.
But the real governance test often shows up in the messy middle between failure and normal operations.
Can the institution keep serving customers and members safely while systems are only partly available? Who gets to approve manual workarounds? Which control steps can be delayed, and for how long? What volume limits should trigger a change in operating posture? When does leadership stop treating the situation like an inconvenience and start treating it like a strategic risk?
Those are not technical questions. They are operating model questions. Which means they belong in front of senior leadership and, at the right level, in front of the board.
Outages are events. Degraded mode is a management system.
Most executives think about resilience as a binary.
The system is up, or it is down.
Real life is uglier than that.
A customer portal may be online while identity checks are lagging. Loan processing may be available while document imaging is delayed. Card transactions may clear while dispute handling falls behind. The core may be stable while reconciliations pile up in spreadsheets and inboxes.
When that happens, the institution is not simply recovering. It is operating inside a temporary business model. Different staffing assumptions. Different risk tolerance. Different customer communication needs. Different evidence trails.
If nobody has defined that temporary model in advance, people improvise. Improvisation is sometimes necessary. It is also where inconsistency, hidden risk, and control drift love to live.
Boards do not need to write the playbook for every scenario. But they do need confidence that management has answered a few very practical questions:
What degraded states have we actually planned for? What customer or member services are allowed to continue in each one? What controls must stay intact no matter what? What manual workarounds are time-limited, and who monitors the backlog they create? How will leadership know when the institution is accumulating too much operational debt to keep limping forward?
Without those answers, resilience turns into optimism with nicer formatting.
Example one: Change Healthcare showed how fast the ugly middle gets expensive
The 2024 Change Healthcare cyberattack happened in healthcare, not banking, but the governance lesson travels well. Providers across the country were forced into manual processes, delayed payments, slower approvals, and improvised workarounds while major systems remained impaired. The slower bleed that followed is the part bank and credit union leaders should pay attention to.
When a critical external platform is impaired, the institution does not just lose technology. It loses cadence. Decision velocity changes. Backlogs grow. Exceptions rise. Employees start creating informal ways to keep work moving. Customers feel the inconsistency before leadership has a clean dashboard for it.
That is why resilience governance should never stop at "do we have a vendor incident clause?" A better question is "what does our institution look like operationally on day three if a critical provider is partly unavailable?"
Example two: Patelco showed that member trust gets tested in the manual gap
Patelco Credit Union's 2024 ransomware disruption is closer to home for this audience. Public reporting described service interruptions that affected online banking and other member-facing functions for an extended period while the credit union worked through recovery.
During a prolonged disruption, members do not experience your architecture diagram. They experience uncertainty.
Can I see my money? Can I move it? Will my payment process? Who can answer me clearly? Are the frontline staff getting the same answer as the website?
For boards and senior leaders, that means resilience oversight should include customer and member experience thresholds, not just technical restoration steps. If manual servicing grows, if communication starts drifting by channel, or if exception handling becomes inconsistent, that is not just operations noise. That is governance signal.
Institutions love to say they are member-focused or customer-focused. Fine. Prove it in the degraded mode.
Example three: CrowdStrike reminded everyone that partial functionality still breaks businesses
The 2024 CrowdStrike incident was not a ransomware event and it was not bank-specific. But it demonstrated something executives should remember: a system can be "recoverable" on paper and still cripple operations in practice.
Flights were grounded. Hospitals, retailers, and financial firms dealt with device issues, manual intervention, and delayed normalization. In many organizations, the question was not "is there a backup?" It was "how many humans does it take to restore enough function to keep the business moving?"
Community financial institutions should take that lesson seriously because lean teams amplify the pressure. You do not need a full data center failure to end up in degraded mode. One endpoint dependency, authentication issue, vendor outage, telecom problem, or third-party software failure can be enough to shift the institution into an expensive manual posture.
That posture needs rules.
Five governance decisions to make before you need them
1. Name the degraded states
"Business disruption" is too vague.
Define a few realistic states in plain English. For example: member channels available but slow, core processing available with manual exceptions, payments operating with reconciliation delay, branch operations functioning with limited authentication confidence.
2. Decide what cannot be waived
Every institution has controls that become tempting to soften under pressure. Dual control. Reconciliation timing. Callback discipline. Segregation of duties. Exception review. Identity verification thresholds.
Do not wait until people are tired and angry to figure out which is which.
3. Put time limits on workarounds
Manual processes have a way of becoming unofficial systems.
If staff are using spreadsheets, shared inboxes, handwritten logs, or side-channel approvals, someone should own how long that workaround is acceptable, how backlog gets measured, and what trigger forces reevaluation.
A workaround without an expiration date is just a future audit issue in business casual.
4. Set customer and member communication thresholds
What level of disruption requires executive review of messaging? When do frontline scripts need to change? When does the board get informed? When do you stop using general reassurance and start using precise operating guidance?
That sequence should not be invented live.
5. Define the handoff from management issue to board issue
Not every outage belongs in the boardroom immediately. But prolonged degraded mode often does.
The board should know what triggers a higher level of oversight: customer harm risk, prolonged manual processing, control exceptions above threshold, financial exposure, regulator attention, or unresolved vendor dependence.
If the escalation line is fuzzy, leadership may wait too long because nobody wants to sound dramatic.
Good resilience governance reduces improvisation, not judgment
Institutions get into trouble when they confuse flexibility with vagueness.
The goal of degraded mode governance is shared judgment under pressure. It gives executives, operators, and directors a clearer answer to a simple but very expensive question:
How do we run safely when normal is unavailable, but stopping entirely is not an option?
That is the question more boards should be asking.
Because the outage may get the headlines. The degraded mode is where institutions quietly win or lose trust.
Discussion questions
1. What would your institution's most likely degraded operating mode look like, in plain English? 2. Which manual workaround in a disruption would create the most hidden risk if it lasted a week? 3. At what point would your board expect a prolonged technology disruption to become a governance issue rather than just a management update?
Sources
- American Hospital Association, public statements and updates on the Change Healthcare cyberattack, 2024
- Reuters, coverage of Patelco Credit Union service disruption and recovery, July 2024
- Reuters, coverage of the global CrowdStrike outage impact, July 2024
- FFIEC, "Business Continuity Management" booklet, November 2019
- NIST, "Cybersecurity Framework 2.0," February 2024