DIAGNOSIS OF AIR COMPRESSOR CONDITION USING MINIMUM REDUNDUNCY MAXIMUM RELEVANCE (MRMR) ALGORITHIM AND DISTANCE METRIC BASED CLASSIFICATION

Finding a reliable machines condition monitoring technique has been attracted many researchers to avoid the sudden failure in machines and the unexpected consequences. This work proposes a fault diagnosis of air compressors using frequency-based features and distance metric-based classification. The analyzed experimental datasets contain one healthy condition and seven different fault conditions. Features are extracted from the frequency spectrum, then the best feature sets are selected using MRMR algorithm and eventually the classification is conducted using a distance metric classifier. The results demonstrated the automatic classification with more than 97% correct classification rate. The effect of selected feature set size, training sample size on the classification accuracy is also investigated. From the results, this method of analysis can be used for early detection of faults with very great accuracy.


INTRODUCTION
Reciprocating compressors are commonly used in many industries such as chemical and petroleum ones. The condition of the compressor must be monitored to maintain a safe working environment and avoid economic losses because of the unplanned shutdown. Fault diagnosis of compressors still attracts the interest of many researchers who keep developing techniques that efficiently and automatically detect the compressor fault.
Various condition monitoring methods have been developed for fault diagnosis in reciprocating compressors. A simulation study of a two-stage reciprocating compressor has been conducted in [1]. Five various physical processes are involved in the mathematical modelling to represent the speedtorque characteristics of induction motor, crankshaft rotational movement, variation of cylinder pressure, vibration of the valves' plates and the flow characteristics through valves. In this study, the valve spring defect and valve leakage modelling are achieved. It was found that pressure variation and the crankshaft instantaneous angular speed are clearly sensitive for the presence of valve leakage and spring deterioration. The autocorrelation of spectrogram difference is also utilised for the detection of broken valve diagnosis under variable load conditions [2]. In Reference [3] the acoustic emission recordings of the valve motion are used for distinguishing the valve healthiness. The reference motion of normal valve is used for comparison. Two faults conditions, namely valve delayed closing and valve flutter and as well as the normal condition are considered in the analysis.
The authors of [4] compare the sensitivity of feature sets obtained from different signal transforms for fault detection. These feature sets are used for classification of three different compressor states namely, healthy, leakage inlet valve fault and leakage outlet fault. Generally, it was found the feature sets obtained from time-frequency transformation have better preformation in comparison to others.
A research was conducted in [5] to investigate the effectiveness of sensor position on the accuracy of fault detection in air compressor. More recently, the hybrid deep neural network is also incorporated for the purpose of fault diagnosis in air compressor [6]. Teager Kaiser energy operator is combined with deep belief classifier to identify the condition of the air compressor valve [7]. The researchers in [8] utilise the Local Mean Decomposition (LMD) in combination with Stack denoising Autoencoder for diagnosing faults in reciprocating compressors using vibration signals. Signal features are extracted using LMD and then classified using the SADE model. A forward neural network is also utilised for diagnosing of high-pressure compressor valve fault. Thirty-two various features are acquired from the compressor including voltages, pressure..etc. These most important features are then selected based on the scattering matrices [9]. In Reference [10] the performance of an optimised SVM in compressor fault diagnosis is compared to other SVM techniques. In that work, ninety-two features are extracted from the compressor pressure readings and DIAGNOSTYKA, Vol. 22, No. 4 (2021) Al-Bugharbee H, Samaka H, Zubaidi SL.: Diagnosis of air compressor condition using minimum … 26 divided equally into a training and testing samples. In reference [11], the simulated motion of the suction and discharge valves as well as the measured AE signals are studied for the purpose of the fault diagnosis of valves in air compressors. It was found in the study that the valve fault can be identified as an increase in the AE amplitude.
The present paper has the following structure. The first section presents the state of art of techniques used for fault detection in reciprocating compressors. The second section briefed the proposed methodology including feature extraction, feature selection and classification. The third section explains the details of the experimental datasets used in the present paper. The fourth section presents and discusses the most important results obtained in the current work. The fifth section concludes the paper.

METHODOLOGY
This paper suggests a new methodology for fault detection of faults in air compressor. The methodology can be divided into three parts: features extraction, features selection and classification.

Feature extraction
The frequency representations of readings are obtained using Fourier transform and then the frequency spectrum are divided into a number of frequency bandwidths for every frequency spectrum. The mean value of every bandwidth's amplitude is then arranged and used to form the feature vector (fv) as in equation 1. (1)

Feature selection
The use of all extracted features is not always the right choice for a classifier input. It might negatively affect the accurateness of the classifier as well as causing additional computational burdens. The most effective and important features that provide best classification can be selected using the Minimum Redundancy Maximum Relevance (MRMR) algorithm. This algorithm sorting the extracted features according to their importance for the classification process. The feature importance of a given feature fij can be expressed according to the following equation 2 [12,13] Where fij is the j th feature for the i th I (.,.) The mutual information.

S
The set of selected features.

| |
The number of features in S. The mutual information I (.,.) measures the "amount of information" obtained about one random variable through observing the other random variable [14].

Classification
The selected feature sets are divided into a training and testing sample. The training sample is used for building a reference space for every compressor condition while the testing sample is used to evaluate the accuracy of the classification model. The feature vectors of the testing sample are used as input to the Mahlanobis distance-based classifier. In this classifier, the distance between each testing feature vector to every training sample category is measured. Eventually, the feature vector can be assigned to the category which the measured distance is a minimum. The accuracy of the classification can be evaluated using the so-called confusion matrix. The matrix main diagonal represents the rates of the correct classification while the off diagonal elements represent the mistaken classification.

EXPERIMENTAL DATA SETS
The proposed methodology was applied to a single stage reciprocating air compressor acoustics datasets. These datasets were acquired from a compressor which has air pressure range 0-35 Kg/cm 3 [12]. These datasets were acquired at 50kHz sampling rate using a microphone and an NIDAQ. The total number of the dataset was 1800 (i.e. 225 sample x 8 compressor conditions). The figure 1 shows the air compressor from which acoustic datasets were acquired. It can be easily seen that direct comparison of frequency spectrum is not always helpful neither practical. In addition, using the whole spectrum is not practical in classification of compressor condition. For this reason, features are formed by dividing the frequency spectrum into a number of bandwidths. Then the average amplitude value of every bandwidth is considered as a feature. Figure (3) illustrates the average value of the frequency bands at 25 Hz respectively.

Feature selection
The representation of these spectrums by a number of compact information still need for further processing in order to alleviate the computational burdens and to select only useful features for the purpose of compressor condition classification. As it was mentioned in section (II), the MRMR algorithm is used to rank the extracted features according to their importance in classification process. Figure 4 illustrates the rank of frequency bandwidths according to their importance score in the classification.

Classification
The automatic selection of features is important in reducing the burdens of forming sets of features randomly and it also helps in reducing the time consuming for calculations. The classification performance is investigated at different training sample length, different feature vectors and different frequency bandwidths.
For the detection process, the classifier is firstly investigated for the two-class healthyfaulty purpose. The table (1) illustrates the percentages of the correct classification rate for the testing sample in the detection process. These rates are calculated for two cases namely raw and MRMR based feature selected (i.e. bold and between brackets). The features used for the detection of faults are only the first three features. These results are also shown for different training sample size (i.e. 50, 100 and 125 features vector). The significant effect of the MRMR based feature selection is clearly seen in improving the correct classification rate percentage value for both bandwidths. For example, for the case where the frequency bandwidth is 25Hz, the accuracy of the detection rose to %97.7 at small training sample size (i.e. 50) while it raised to %99.4 at training sample size of 100. In addition, it rose to %99.6 for training sample size of 125.
For the fault diagnosis (i.e. identification of fault types), more longer feature vectors are required in comparison with the detection process as more information is required for distinguishing among different compressor condition. Hz bandwith case, and bottom (first three selected features based on MRMR algorithm) at 50 Hz bandwith case From the figure above, the improvement of separation between healthy and faulty categories are easily observed due to the selection algorithm. Figure 6 shows a three-dimensional bar representation for the minimum correct classification rate percentage for the raw (left) and MRMR-Mahalanobis distance (right). As the figure  7 represents the minim value of the correct classification rate, this means the average value of the correct classification rates will be higher. It is also shown that increasing the training sample size as well as the feature vector lengths improve the correct classification rates.
The same discussion and findings can also be seen for the results obtained for the case of 50Hz bandwidth (see Figure (5) and Table ( 1). Figure 8 illustrates the confusion matrix for the correct classification percentage rate obtained using feature vectors length of 15 and for a different training sample size. The improvement of classification accuracy is seen clearly when the off diagonal becomes less while the main diagonal of the matrix becomes higher. The same improvement can also be seen for the case where frequency band is 50Hz.

CONCLUSION
In the present study, the diagnosis of fault presence and type in air compressor is investigated. The methodology contains linking the MRMR based feature selection and the distance metrics for the purpose of fault diagnosis. The features are extracted from the frequency spectrum of compressor acceleration vibration signals. The MRMR based selected features are then used as input to Mahalanobis distance classifier to identify the compressor health condition. The present methodology is evaluated at different training sample size and at different frequency bandwidth. The results show a very high correct recognition rate and reaches 99% in average.