Repairing fences instead of catching chicken

Uptime Engineering > blog > Repairing fences instead of catching chicken

Failures must be eliminated in development and plant operation under high time pressure. This prevents root cause research and leads to “quick-and-dirty” measures. They are also called “solutions”. But that does not fit, because as a rule, they are what their name says: They ensure availability at short notice, but enable the failure to occur again and again, because its causes have not been eliminated.  The development or maintenance teams are busy with “catching chicken instead of repairing fences”.  

Recurring failures indicate misjudgement. They are unavoidable if all causes are not understood, validated and remedied.  High innovation rates tend to devalue experiential knowledge. They therefore exacerbate the problem. 

 Damage analysis is like forensics. For engineers, damage analyses are better than any TV thriller because they themselves are the commissioners. Being allowed to ask stupid, seemingly trivial questions, as Commissioner Columbo once did, has a high addictive potential. Damage analyses are a learning booster, they promote the reduction of knowledge gaps or misjudgements, which often prove to be the causes of problems. 

 Failure affinity instead of failure aversion 

 Reliability problems and failures usually occur multiple times along the development chain. They are often belittled, explained away or kept silent on the grounds that the tests are excessively hard and that prototypes have to be used as test specimens or the test bench probably had a problem, etc. Of course, that can be anything. These concerns must therefore be addressed before the respective activity. Otherwise, indicators are ignored, the weak points then migrate with the development process into the next generation and after the SoP to the customer, where they reveal themselves as serial errors. 

 Failures are indicators of a lack of understanding. They bring novel weak points to light, reveal unexpected operational load cases, unintended control phenomena, inadequate quality assurance or demonstrate poorly coordinated development of hardware and software. Failures were and are the reason for the investigation of unknown damage mechanisms. They are therefore an efficient driver for increased reliability in the entire mechatronic industry. But cases of damage interfere with the proof of reliability. The goal of knowledge is in temporal conflict with the goal of proof. That must be processed e.g. in the steering meetings of a development project, where – apart from the completed test hours – failures and indicators of weak points should be in the focus. 

 All methods of reliability growth assume that the causes of failures are clarified and that the necessary changes are quickly installed in the subsequent generation.  For the planning of product validation, on the other hand, the volume of endurance runs is often used as a relevant evaluation parameter.  De facto, however, the quality and speed of forensic work are the more important parameters, because you can’t buy time: 

At the end of the validation interval, a product goes into series production. If it has to be, there are also the problems that cannot be solved sustainably during this time.  To make matters worse, recurring failures terminate the endurance runs prematurely, before further weak points can manifest themselves as failures. These hidden problems therefore also go into series production. Quick-and-dirty measures can therefore also be understood as a method of generating unreliability. But of course, no one does this on purpose. And there are countermeasures.  

 If you want to understand a problem, you should tackle it with alle methods. 

 A major hurdle for problem solving – the division of knowledge into specialist departments – is circumvented by the identification of a mixed problem-solving team. Their way of working is – analogous to agile development – interdisciplinary, non-hierarchical and result-driven. The coverage of all potential types of causes is central to the composition of such teams. In addition to design, simulation and experimental development, quality and production must therefore be represented. Problem solving should be controlled by a process, e.g. the 8D process, which is supported by our software Uptime SOLUTIONS. 

 The statistical analysis of the failure cases is powerful. It usually provides a reliable diagnosis of whether it is a quality or a lifetime problem, whether the cause is seasonal in nature or whether certain modes of operation are causing the damage, etc. 

In practice, the fact that the engineers and technicians with a high level of expertise are not available to the necessary extent is critical. The integration of external support eliminates this problem quickly and effectively. Even more important, however, is the view from the outside. It provides the blind spots, i.e. aspects that have receded into the background due to the long occupation with the matter, or have not been sufficiently investigated for other reasons.  

 The results of damage analyses are primarily used for a sustainable solution to a specific problem. In addition, they can be used for future product development, for preventive plant operation or for optimized plant maintenance if damage indicators are derived from the analyses. These indicators are used to detect precursor effects to initiate maintenance before failure.  They also provide the input for automated diagnostics. On this basis, the remaining service life of damaged components can finally be determined. This can be automated for risk-focused plant analysis in our Uptime HARVEST software. 

 Damage analyses are therefore useful beyond the specific case for reliability throughout the entire product life cycle: for efficient development, preventive operation, early diagnostics and accurate forecasting. 

So what to do?

Uptime arrow icon

The failure rate in product development and fleet operation increases sharply with the degree of innovation of a product. The degree of innovation should therefore be assessed, e.g. via the Technology-Readiness Level developed by NASA.

Uptime arrow icon

The capacity of dedicated problem-solving teams should be according, ideally supplemented by external partners and experts.

Uptime arrow icon

The validated results and consequences should be incorporated into a central knowledge base. They can be used in a variety of ways: for product developments, for preventive plant operation, for system monitoring and for preventive maintenance.

Related Posts

The role of the digital twin in maintenance

The digital representation of the structure and hierarchy of a mechatronic system often seems like...

Is predictive maintenance a useful concept?

“Operate systems until there is a machine crash?” or “Try to predict the remaining lifespan...