Anomaly detection in Multivariate Time Series
Josu Ircio Fernández
- DIRECTORS: Aizea Lojo and Jose A. Lozano
- UNIVERSITY: UPV/EHU
The fourth Industrial Revolution has brought about many advances in the monitoring of industrial systems. The development of new types of sensors and their low cost make it possible to obtain information about machines’ performance more efficiently. Hence, tasks such as predictive maintenance have become crucial for the competitiveness of companies. Predictive maintenance consists of analyzing machines’ operation through the data reported by the sensors to try to detect anomalies that may indicate a possible forthcoming failure. Anticipating these failures can be vital to reduce the chance of unexpected breakdowns, thereby reducing the associated maintenance costs, reducing the time required to repair or recondition malfunctioning equipment and mitigating the risk of accidents related to machine failures.
An anomaly in machines’ performance can be defined as a non permitted deviation of the system from the acceptable, usual, or standard condition. In most cases, an anomaly is not only given by an abnormal value, but also by the time in which it occurs and by the inconsistency with previous or future values. Therefore, to exploit the information reported by these sensors over time when searching for anomalies, it is common to use time series theory.
A time series is defined as a sequence of observations ordered in time. In general, observations close together in time will be more correlated than observations further apart. This is one of the features that distinguishes time series data from non-temporal data, in which there is no natural ordering of the observations. Therefore, the techniques used in time series analysis must account for this temporal correlation. In addition, the nature of time series data also includes some characteristics that make their analysis difficult, such as, large volumes of data, high dimensionality
and continuous updating.
It should be noted that in real-life scenarios, it is common that the system to be monitored is complex and needs to be described using more than one temporal variable. In this case, apart from the temporal correlation between the observations of each variable, inter-correlations between the variables can also exist. Consequently, all the variables need to be considered together in order to analyze the complete system. To face these situations, multivariate time series (MTS) are defined. A MTS is a set of univariate time series which provides information about a complex system.
The importance of this kind of data lies in the fact that nowadays, they can be extracted from any component that contains sensors and whose operation is monitored. For this reason, in the past few years, the use of techniques to extract the knowledge and useful information from this type of temporal data has increased. Specifically, a whole field of research, called time series data mining, has been devoted to extending classical machine learning tasks and algorithms to time series data and to creating new specific algorithms. Among the most important tasks that have been studied with time series are the following: time series forecasting, clustering, classification, temporal pattern discovery, rule discovery, segmentation, and anomaly detection.
The research shown in this dissertation will focus on anomaly detection in multivariate time series. Although it is a general problem with diverse applications, in this case, the target will be the anomaly detection in the operation of industrial systems. In these scenarios it is common to have a set of example time series where correct and abnormal operation series are identified (labeled). Consequently, the MTS anomaly detection problem will be approached as a supervised MTS classification (MTSC) problem. Thus, the objective will be to learn a classifier that is able to distinguish between correct operation and abnormal operation time series.
Apart from the inherent difficulty of the anomaly detection problem in MTS, anomaly detection in industrial systems involves the following additional challenges that must also be taken into account:
- Streaming data. In real system monitoring scenarios, data is continuously arriving and almost immediate processing of the incoming time series streams is required. Once the new series is examined and according to the obtained results, it is necessary to react in a particular way as soon as possible, e.g. by issuing a malfunction alert. Therefore, the learning methodology should take this scenario into account.
- High dimensionality. In complex industrial systems it is common to have a high number of sensors to analyze. This high dimensionality will require high computational time and resources, which contrasts with the requirements of a streaming scenario. In addition, the large number of variables can complicate the analysis and even make the anomaly detection results less accurate. Several variables, represented as univariate time series in this case, might be redundant in the presence of others, not provide relevant information to perform the target task. Therefore, being able to select only the relevant time series for classification can be decisive and can improve the final results. For all these reasons, the existing methods for feature subset selection will be investigated, specifically in the field of multidimensional time series classification.
- Imbalance. The anomaly detection problem addressed from a supervised approach is usually inevitably linked to the class imbalance problem, since the malfunctions that cause anomalies are usually rare. Most of the available time series will refer to the machine in normal operation, as opposed to a very small percentage of them in which the machine operation has been malfunctioning. These imbalanced scenarios suppose a problem for traditional models, which implicitly assume equally distributed classes, and are prone to generate biased values in favor of the majority class. This implies that the prediction of the minority class presents low precision or, equivalently, that anomalies remain largely undetected. Consequently, another key aspect of the research will be to work on multivariate time series classifiers that are able to cope with class imbalance.
- Degradation. In real scenarios where the aim is to predict a failure through continuous monitoring and anomaly detection, it must be taken into account that, prior to a failure, the system undergoes a certain degradation. At some point, the system’s normal operation is altered and gradually deteriorates until the failure occurs. In most cases, information about when the system breaks down is available. However, we usually do not have information concerning when it began to malfunction. Being able to identify these moments is fundamental for anticipating failures. Hence, it is necessary to investigate this aspect and to develop new learning methodologies that consider it.