Seven Challenges to Overcome to Develop Electrical System Reliability

BY Chip Angus
Articles & Papers


There are very few organizations that are not concerned with following reliability best practices. It is more efficient and less expensive to prevent a breakdown than it is to fix something that breaks. Monitoring and data management are becoming ubiquitous in the industrial world, the utility world, and across most industries and sectors as we work towards a more reliable and interconnected future. Electric power systems, however, have not been included in this shift away from break/fix and towards data-centric reliability.

In our experience as a leading transformer management company, we’ve found that even within organizations that employ healthy reliability programs for other equipment, there are still some barriers to developing a reliability program for the electrical system that powers them: Organizational barriers, which arise when there is not enough involvement across different departments within an organization; a tendency to prefer “the way we have always done it” over a new approach to electric power system reliability; fear of a loss of jobs as technology becomes more vital to monitoring and analysis; fear of a loss of control from the plant level to the corporate level; distrust of electric power reliability as a “flavor of the month” idea dictated from the corporate level; unclear expectations; and unclear timelines that do not factor in the complexities of bringing an electric power system back up to where it needs to be.

This article provides context for those barriers and offers some solutions that can help an organization overcome them. These solutions can help an organization pursue a reliability program that includes the asset that everything else in the operation relies on: the electric power system.


Last year we conducted a focus group comprised of reliability practitioners to discuss electric power reliability (Figure 1). Our company works mainly with transformer reliability, but this group was concerned with the reliability of entire electrical systems in the industrial market, including transformers, cabling, breakers, and relays and protective equipment. One of the questions we asked them was whether they had a high-voltage electric power reliability program. Few did. Only 8.3 percent said that they are well on their way to developing electric power system reliability, with 6.7 percent at the beginning of developing one. One quarter told us that they know they should, but they are not currently addressing it. Most significantly, half of the people in the group said that they have no program whatsoever. Some weren’t even aware of the risk.

Every person in the focus group did have one thing in common, however: they have programs for every other asset class in their organization. They had programs for the servers in their data centers; the steel mill people had programs for their furnace trucks, furnaces, auxiliaries, LMFs and EFS; every industry represented had a CMMS that covered their other critical assets. Most had nothing for their electric power system.

We all know that it is a lot cheaper to prevent a breakdown than it is to fix a repair. But that kind of prevention doesn't tend to happen on electrical systems. Asset owners don't realize that the cost of a breakdown in an electrical system is not the cost of the asset alone. What does that asset power? It could be an entire plant. We've seen downtime and replacement costs in the range of $100,000 to $50 million for the breakdown of a $400,000 transformer. It's not the cost of the asset that's the issue; it’s the cost of downtime while you wait for the replacement to be installed.

There is not an organization that I know of that is not trying to implement reliability best practices. A lot of the work I do is to extend that understanding to the electric power system. The first step towards this goal is to categorize equipment by criticality and evaluate the real risk of electric power system failure.

Figure 1. The real risk of electrical system failure is not known to many reliability leaders


 By defining a fleet of transformers by their criticality and risk, as well as their history (Figure 2), it is possible to prioritize the urgency with which to perform maintenance. We can develop a criticality ranking by asset class and categorize into four distinct groups: Non-critical, Auxiliary, System Critical, and Mission Critical.

    Non-critical transformers

Non-critical transformers can be run-to-failure. Downtime is an inconvenience and an unplanned expense, but failure of these assets does not have a significant knock-on effect to the rest of the operation. Testing is all that is required so that you are aware of the condition of your equipment and you can plan accordingly.

     Auxiliary transformers

Auxiliary transformers power equipment and structures which are important to the operation of your business, but an unexpected loss would not cause significant disruption of a line or the shuttering of a facility. These transformers require regular testing and interpretation of test data, regular field inspection, and transformer management training for key staff. For auxiliary transformers, reactive maintenance is acceptable.

     System-critical transformers

System-critical transformers power production lines and hospital wards; places where a loss of power is unacceptable and creates substantial losses in revenue or poses a real danger to life and limb. These transformers require regular testing and interpretation of test data, regular field inspection, electrical testing, and advanced transformer management training for key staff. Predictive maintenance is required for system-critical transformers. This equipment requires single-gas real-time monitoring to detect insipient faults and allow for and immediate response to prevent downtime.

     Mission-critical transformers

Mission Critical transformers are pieces of equipment so vital to the overall operation of a facility that failure could be catastrophic to the entire operation of a business or organization. These transformers require regular testing and interpretation of test data, regular field inspection, electrical testing, advanced bushing and PD monitoring, engineer evaluation and review, and advanced transformer management training for key staff that covers the lifecycle of the equipment, from installation to decommissioning. Preventative maintenance is required for mission-critical transformers. This equipment requires multi-gas real-time monitoring to detect insipient faults and to provide immediate data on what is causing the fault.

Figure 2. Best practice for transformer reliability

Organizational barriers

Your organization is set up to run very efficiently. Things don’t get done unless there is a standardized process and everyone in the system not only knows their role, but also excels at it. Organizations are structured in this way because, for productive operational processes, it makes sense to be as effective and as streamlined as possible. What organizations are not set up for, however, is cross-departmental and cross-hierarchical sea-changes in culture. Every challenge I’ve laid out in this list is about communication, and each can be met, challenged, and overcome if we get everyone on the same page.

Anyone who is not part of the solution could potentially be a barrier. In Figure 3 I show everyone who can influence the process of culture change in terms of electric power reliability. I’ll focus here on the challenges with Procurement versus Purchasing, why it’s important to involve I.T. from the beginning, and the benefit of using Risk Management professionals to lend expert credence to the concept of electric power system reliability. In my experience, these areas are where many of the major barriers arise.

Figure 3. Organizational barriers to overcome to develop electric power system reliability

     Procurement vs. Purchasing

One of the main challenges is the difference in the way purchasing and procurement people approach solutions. Purchasing people tend to be interested in one thing: lowest price. Procurement people, however, are interested in acquiring the best solution for the problem. We need to ensure that the procurement people are part of the reliability team, because their influence will impact the plant management and E&I people more than anything. I've had a lot of success bringing procurement people in and involving both procurement and purchasing in the process from the outset.

In one meeting we had the procurement and purchasing people together in the same room, and we were discussing how to approach securing $1.2 million for the maintenance required to bring the electric power system up to speed.

“Could you get that approved in date in one year?” I asked, and the procurement person looked to the purchasing person… who immediately said “no.”

“How about if you did that over three or four years, at around $300,000 or $400,000 annually?” That response came back almost as quickly as the first: “Absolutely.”

I like to work with the procurement person from the very beginning and include purchasing in the discussion. Procurement is looking for the best solution, not the cheapest price, and not a solution that just meets specs. Involve both teams from the outset when you can.

     Information Technology

The minute that you start talking about monitoring, I.T. must be involved. Part of their responsibility in your operation involves cybersecurity, so there’s a natural tendency here for more cautions professionals to say that monitoring can’t be done securely. But it can be done, and it is being done all across the industrial world. In fact, it must be done to effectively protect your electrical assets and prevent unexpected failure. Convincing I.T. of this isn’t difficult once you start the conversation, but you have to involve them in the process from the beginning—you can't just spring it on them at the end. With communication and involvement, they become part of the solution rather than a challenge to overcome.

     Risk Management

At some point, risk management will get involved. Whether that's the insurance expert that comes in and walks through the plant as a consultant, or whether it is your own internal risk management expert, at some point they're going to need to be brought in to the project. Risk management professionals are an excellent resource when you're trying to bring about culture change because they can bring the awareness and understanding of the real risk of not investing in reliability and deliver it right to the C-suite.


One of the biggest barriers we face is not with the data or logic of the concepts we lay out, but with mindsets. People often prefer the tried-and-tested way of doing things. We affectionately call this resistance to change “TTWWHADI”—or “That’s The Way We Have Always Done it.” Plants are run by experts, and maintenance teams have followed processes that have kept production at peak efficiency for years. Communication here is once again essential. They haven’t been doing it wrong; the success and strength of your operation is testament to a job consistently well done. But with critical electric power systems, there are ways to protect that efficiency that go way beyond the current mode of thinking. Changing hearts and minds might be the most challenging barrier to overcome. But it’s an essential one.

Fear of loss of jobs

When new processes challenge TTWWHADI—the standard way of doing things—the first assumption for many at the plant level is that technology will replace people. Far from it. We’re in the business of maintaining production and profitability, not removing expertise. The most important aspect of any reliability program is wisdom, and we need to communicate that from the beginning.

Fear of loss of control from plant to corporate

Plants managers own their PnL, and I’ve experienced a reluctance from plant managers to allow corporate decision-making to impinge on that metric. This reluctance is almost always assuaged when we can show plant managers that a transformer reliability program is in their best interests. A reliable electric power system means that a plant can continue to operate at peak efficiency and maintain its profitability.

Again, the importance here is in communicating the value of long-term system reliability over quarterly figures. This is a delicate balancing act that requires involvement from all levels during reliability discussions, and a focus on collaboration rather than working in silos. For us, silos are a constant fight. The only way to overcome them is by communicating.

Flavor of the month

When an organization decides to make significant process changes, such as implementing a transformer reliability program, it’s important to emphasize across the entire organization that those changes represent more than a just a novel way of doing things; they represent an organization-wide culture change. Buy-in is essential, from the boots on the ground at the plant level to those in the C-suite. If reliability is seen as another “flavor-of-the-month” from corporate, rather than as real culture change, there will always be friction and resistance. Get everyone involved from the beginning and make the expectations clear to all.

Unclear expectations

If we know that silos stifle the implementation of electric power reliability programs, then it stands to reason that they will also create confusion if expectations are not communicated thoroughly across the organization. In an organizational structure that is scaffolded around efficient processes, it’s imperative to make sure that everyone knows their role. The concept of culture change begins with laying out expectations at every level. Who owns the asset and what is the current state at the start of this analysis? Who is committed to support the effort? How are they committed to it? Are you going to have monthly meetings? There must be a team with representatives from every part of the organization to ensure expectations are stated, communicated, and met.

Unrealistic time frames

If a problem has developed in an electrical system over a timeline that is measured in decades, it is unrealistic to expect that the problem can be resolved in a matter of months. The focus should be on criticality and deciding which issues are most important to resolve.

You can’t solve these problems overnight. Transformer reliability—and, by extension, electric power system reliability—is achieved through the execution of a long-term strategy that is based on standards. SDMyers has developed standards for transformer health, and we’ve worked alongside companies that specialize in cables, relays, and breakers, each of whom have developed standards based on their respective sets of data.

These long-term strategies require multi-year budgets and an appreciation of lifecycle management. From purchasing through monitoring, inspection, maintenance, and repair, it’s important to note that a multi-million-dollar electric power system reliability program is more cost-effective than a plan that involves massive capital expenditure every few years.


There are challenges to overcome to develop electrical system reliability, and most of them involve communication between either the solution provider and the organization, or between silos within the organization itself. It is first important to define what the real risk of failure is, and this can be illustrated by developing a criticality ranking by asset class. Use past and current data (or gather additional data if you don’t have recent and accurate information) to prioritize the current needs of the equipment based on that ranking. Develop long-term standards using unit criticality, failure impact, current condition and rank each unit for priority service and next step. Create a multi‐year maintenance budget and a plan to extend the life of these assets and minimize downtime losses. Create capital budgets over a 3- to 5-year timeline to maximize ROI and minimize downtime losses. During this process, ensure that everyone involved is aware of the risks of doing nothing, and that decisions made during this process will impact not only medium- to long-term reliability, but the entire culture of the organization.

Please wait while logging in.