Essentially, the detection of anomalous conditions involves a change in measured variables that does not result from the load. Such deviations indicate a change in a system property. They can be used as an indicator of the early phase of damage. We want to detect them as accurately and as early as possible so that measures can be taken to prevent failures.
The relevant parameters are determined based on the failure risks of a system. The control data is often useful for this, supplemented by lean instrumentation. It should provide risk-related information with as few data channels as possible and at the lowest possible sampling rate. Under no circumstances should you have all possible correlations from all possible channels evaluated. This only leads to “black box”-chaos.
Of course, it is useful to design specific models for anomaly detection based on the failure risks. However, if the input data is missing, there is still the possibility of determining deviations of a general nature. Energy and material balances offer robust methods. Although they do not identify a specific damage mechanism, they still allow vulnerable instances to be identified.
A detector only monitors one aspect of reality. A combination of input channels is therefore required to cover the risk landscape. More data requires more effort. This is only worthwhile if it creates additional information about risks. Therefore, it is wise and cost-effective to link as few but different types of data as possible, e.g. time series of load data and load responses with system states, fault reports and maintenance information.
The stationary operation of a system is the trivial case because the expected values for the intact system are constant and can be compared with the actual state without any modeling. To do this, you use the characteristic properties, such as the speed of a fan. However, if the fan is controlled at a constant speed, then it makes sense to monitor the performance, too, in order to detect an anomaly from the change in power consumption that does not occur in the speed.
The relevant systems are the transient ones. Their condition changes depending on the load, boundary conditions and regulation. The response behavior depends on the controlled system and the inertia of the system (e.g. its thermal mass). To create the reference, we have to consistently determine this load-response behavior. For the heating element of a dishwasher, for example, this would be the heating curve as a function of the filling level, the ambient temperature and the power consumption. These curves are measured and stored as a reference map. If the dishwasher actually heats up during operation, comparing the measured temperature rise with the reference provides information if everything is OK or not.
If there are significant deviations, an indicator event is generated. It indicates deviating system behavior throughout the entire spectrum of use, i.e. even without an extreme value being exceeded. Such a residual analysis detects as soon as the deviation becomes larger than the spread of the reference. It is therefore crucial for early detection to determine the reference as precisely as possible.
With microelectronic devices, measuring system behavior is only possible to a limited extent due to miniaturization. But if the system architecture is known, the behavior can be simulated. We do this for all expected operating conditions that put significant strain on the system in order to derive the reference in the form of the current-voltage characteristic, analogous to the measurement. It again serves as a reference for the target/actual comparison.
Detecting deviations for a fleet of equivalent units becomes easier. In this context, equivalent means the same hardware, the same software, the same control parameters and the same load. This applies, for example, to the brakes of a train bogie. They are operated synchronously and in the same way. The difference between their braking pressures must therefore constantly be close to zero. Similarly, the turbines in a wind farm should align normally to the wind direction and set their blade angles the same. Differences between an instance and its equivalent neighbors indicate malfunctions. Transient differences are generally indicators of control problems. The charm of the comparative method is that you do not even need to know the technical details of the systems being monitored. You do not have to model the control characteristics either. This information is contained in the behavior of the reference system(s). However, you need a basic understanding of the technology and system characteristics, otherwise you will be lost in the jungle of measured values, or you might be comparing apples with oranges.
Many failures are announced over a long period of time by a gradual change in a parameter. Certain bearing damage manifests itself up to six months before the actual failure occurs through increasing particle loads in the oil. Trend analysis allows such phenomena to be detected, even if the values are still within the scatter band of the reference. We have developed a statistical procedure that provides the earliest possible indicator – and therefore the longest possible advance warning time. Because that is what is important so that a repair can be completed before (!) the failure occurs