Table of contents

1. Operational Resilience: Building Peace of Mind in an Unstable Environment

In recent years, the word resilience has appeared in nearly every industrial conversation. It shows up in strategic plans, corporate presentations, and executive speeches. Sometimes, I get the sense it is used as an abstract, almost aspirational concept. But on the plant floor, resilience is much more concrete and far less sophisticated: it’s the ability for the factory to keep running when reality takes a turn.

And today, reality takes a turn more often than we would like to admit.

From my experience supporting industrial companies through transformation processes, one clear observation emerges: disruptions are no longer exceptions. Supply issues, critical breakdowns, sudden demand changes, digital incidents, or cyberattacks are no longer “black swans.” They are part of the operational context. That is why resilience cannot be treated as a one-off project or a technological layer added at the end. It is a way of managing.

Operational resilience is not about preventing failures. That is a utopia. It’s about detecting issues early, responding better, and recovering faster. And to achieve that, the starting point is not technology, but deep knowledge of processes and truly critical points of the business.

2. Key KPIs

2.1. Visibility: Mean Time Between Failurese Between Failures

1. The first pillar is visibility. Too many plants still operate with fragmented, delayed, or unreliable information. When you don’t know what’s happening in real time, any deviation becomes a surprise—and in industry, surprises usually arrive at the worst possible moment. Digitalization does not mean filling the plant with sensors or screens; it means reducing the time between something starting to go wrong and someone seeing it and acting. Whether we call it a KPI or not, this interval is one of the best indicators of resilience. The shorter it is, the more control the organization has over its operations.

MTBF – Mean Time Between Failures – Measures the reliability of critical assets. Higher MTBF means greater operational stability.
Operational Availability (%): Reflects the real impact of interruptions on production.

2.2. Maintenance: Mean Time To Repair

2. Maintenance is another clear mirror of a plant’s resilience. Many organizations talk about preventive maintenance, but in practice, they still operate reactively. When failures repeat or repairs take too long, it’s rarely bad luck. It’s information that’s not being interpreted correctly. In many projects, I’ve seen how moving from “fix when it breaks” to proactive maintenance changes more than costs: it changes the culture. Teams stop chasing breakdowns and start managing business continuity with judgment, backed by data and experience.

Mean Time To Repair. Evaluates how quickly a system recovers after a failure. A key KPI of responsiveness.
Predictive vs. Reactive Maintenance Ratio . Measures the degree of anticipation. High resilience = greater weight of predictive maintenance.

2.3. Supply Chain: Supply Chain Resilience Buffer

3. Resilience is not built solely within factory walls. The supply chain is an inseparable part of the system. One of the most uncomfortable yet necessary questions any organization should ask is: What risks are we taking to gain efficiency? Minimizing inventory or planning with tight margins may look brilliant in Excel, but in an unstable environment, it’s a risky bet. Measuring the tension between efficiency and robustness allows conscious decisions. Often, a little redundancy is not inefficiency but operational prudence.

This tension can be measured as the ratio between inventory days available and actual recovery time of critical suppliers. A value below one indicates the organization is prioritizing efficiency over resilience, implicitly accepting the risk of downtime during any disruption.

2.4. Industrial Cybersecurity: Operational Recovery Time after Cyber Incident

4. In recent years, another factor has emerged—once considered external to the plant: industrial cybersecurity. In an environment where production, planning, and management systems are connected, a digital incident is effectively a massive breakdown. Digital resilience is no longer just about protecting data; it’s about protecting the plant’s physical availability. The ability to detect an intrusion, isolate it, and recover operations without compromising production is now a fundamental pillar of operational continuity.
Operational Recovery Time after Cyber Incident (TRO-C). Average time for the plant to return to normal operations after a cybersecurity incident.
Interpretation:

Low TRO-C → high digital resilience
High TRO-C → fragile cybersecurity for operations

3. Conclusión

After supporting many companies on this journey, one conclusion repeats clearly: the most resilient plants are not necessarily the most technological. They are the ones that know themselves best. They understand their processes, identify critical points, and use data not to decorate reports, but to empower decision-makers under pressure.

Operational resilience is not improvised on the day of the problem. It is built beforehand, with knowledge, judgment, and an honest view of how the organization truly operates. In industry, it’s not about never failing. It’s about having peace of mind knowing that when failures happen, we are ready to respond.

Josep Maria Riera

9 de February de 2026

SHARE IT ON YOUR SOCIAL NETWORKS

Subscribe to our Newsletter

Insights, analysis, and vision on the technology that drives us