A DIAGNOSTIC ALGORITHM DIAGNOSING THE FAILURE OF RAILWAY SIGNAL EQUIPMENT

Failure of railway signal equipment can cause an impact on its normal operation, and it is necessary to make a timely diagnosis of the failure. In this study, the data of a railway bureau from 2016 to 2020 were studied as an example. Firstly, denoising and feature extraction were performed on the data; then the Adaptive Comprehensive Oversampling (ADASYN) method was used to synthesize minority class samples; finally, three algorithms, back-propagation neural network (BPNN), support vector machine (SVM) and C4.5 algorithms, were used for failure diagnosis. It was found that the three algorithms performed poorly in diagnosing the original data but performed significantly better in diagnosing the synthesized samples, among which the BPNN algorithm had the best performance. The average precision, recall rate and F1 score of the BPNN algorithm were 0.94, 0.92 and 0.93, respectively. The results verify the effectiveness of the BPNN algorithm for failure diagnosis, and the algorithm can be further promoted and applied in practice.


INTRODUCTION
Failure diagnosis has strong engineering applications and plays a very important role in reducing maintenance cycles and improving maintenance quality. As industrial equipment become increasingly complex, more and more attention is paid to failure diagnosis [6]. The earliest failure diagnosis used traditional methods, i.e., detecting signals with some equipment and making judgments on failures according to empirical knowledge. With the development of technology, intelligent methods have emerged [4], such as acoustic diagnosis [8], vibration diagnosis [1], etc., which have been widely used in various complex systems. Jiang et al. [9] proposed a method based on an autoregressive (AR) model and fuzzy clustering for bearing failure diagnosis and found through experiments that the method could identify different types of faulty bearings. Chine et al. [3] used an artificial neural network (ANN)-based method for the failure diagnosis of new photovoltaic systems, validated it using an experimental database of climatic and electrical parameters from a PV string installed at the Renewable Energy Laboratory (REL) of the University of Jijel, and found that the method was able to accurately identify different failures. Zhao et al. [16] combined wavelet packet decomposition (WPD) with multiscale permutation entropy (MPE) to diagnose the failures of rolling bearings, conducted experiments using a data set from the Case Western Reserve University bearing data center, and found that the method was able to identify failures accurately. Cerrada et al. [2] conducted a study on gear failure detection, used the genetic algorithm and the random forest algorithm to classify several failure types, and obtained a classification accuracy of over 97%. In order to ensure the safe transportation of railways, the diagnosis algorithm of railway signal equipment after failure has become an increasingly important topic. With the continuous development of railway construction, new requirements for the safety of railway signal equipment have been put forward, and better and faster failure diagnosis needs to be achieved in order to meet the requirements of railway operation. In this paper, the failure data of a railway bureau from 2016 to 2020 were processed and analyzed, and the diagnostic performance of different algorithms was compared, in order to make some contributions to realize better equipment fault diagnosis.

RAILWAY SIGNAL EQUIPMENT FAILURE DATA AND PROCESSING
Railway signal equipment includes signal machines, track circuits, etc. According to different criteria, railway signal equipment failures can be divided into different categories, as shown in Fig. 1.
In this paper, failure diagnosis was analyzed based on the failure data of the signal equipment of a railway bureau between 2016 and 2020, and ten types of failures that appear more frequently are shown in Table 1.  First, the noise in the data needs to be removed. The method used in this paper is wavelet analysis [12], and its detailed steps are as follows.
(1) It is assumed that the nonlinear variation result of parent function φ(t) is ̂( ).  , where 2 ( ) refers to the coefficient of the wavelet transform. Then, the real state data x(t) of railway signal equipment is obtained. (4) For the formula in step (3), the data are processed by wavelet decomposition using recursive realization. The calculation process can be written as : xf(j + 1, k) = xf(j, k)h(j, k) and Wf(j + 1, k) = xf(j, k)g(j, k).
(5) The wavelet transform coefficient is set as 0, and then wavelet reconstruction is performed on the failure data to obtain the noise-free data, i.e., xf(j − 1, k) = xf(j, k)h 0 (j, k)g 0 (j, k).
After removing the noise from the data, feature extraction is performed using empirical modal decomposition (EMD) [5]. The detailed steps are as follows.
(2) The modal moment is obtained from the modal function component The calculated feature vectors for different failures are shown in Table 2.

Generation of minority class samples
It was seen from Table 1 that some of the failures have a small amount of data, which is not conducive to failure diagnosis; therefore, this paper uses the Adaptive Comprehensive Oversampling (ADASYN) [14] method to synthesize minority class samples. Its principle is to find out a probability distribution and take it as a criterion to determine how many samples need to be synthesized for every minority class sample. For a minority class sample , its K-nearest neighbors in the n-dimensional space are found, and the ratio is calculated using = ∆ , where ∆ i refers to the number of minority class samples among K neighbors of . For , r i is regularized; then, r i is the probability distribution, ∑ r i = 1.
Finally, the failure data are supplemented by calculating the number of samples that needed to be synthesized in every class through the equation = × , where stands for the number of samples that need to be synthesized and stands for the total number of the synthesized samples.

Classification algorithm
After denoising and feature extraction of the data, the failure data are diagnosed using classification algorithms. Three commonly used classification algorithms are mainly studied in this paper.
(1) Back-propagation neural network (BPNN) algorithm [15]: the input layer is the feature vector of failure data, containing five nodes. The output layer is the failure type, containing 10 nodes. The hidden layer is determined based on the empirical formula: l = √m + n + a, 1 ≤ a ≤ 10 . n, m and l represented the numbers of nodes in the input layer, output layer, and hidden layer, respectively. After calculation, the number of nodes in the hidden layer is 11. The L-M algorithm is used for training, and the weight adjustment rate is ∆w = (J T J + μJ) −1 • J T e , where w is a weight, J is the Jacobi matrix, e is the error vector, and μ is a scalar.
(2) Support vector machine (SVM) algorithm [13]  . By solving the information gain rate of every attribute, the attribute with the largest value is used as the root node to generate a decision tree.

EXPERIMENTAL ANALYSIS
First, minority class samples were generated by the ADASYN method, and the results are shown in Table 3. Seventy percent of the data set was used for the training of the algorithm and 30% for the testing of the algorithm. 10-fold cross-validation was adopted. Firstly, the original data were tested. The performance of different algorithms for failure diagnosis is shown in Fig. 2. It was seen from Fig. 2 that when experiments were conducted using the original data, the precision, recall rate and F1 score of different algorithms were lower; the highest precision was 0.91, and the lowest was 0.81; the highest recall rate was 0.85, and the lowest was 0.75; the highest F1 score was 0.88, and the lowest was 0.78. In conclusion, the BPNN algorithm had the best performance, followed by the SVM algorithm and the C4.5 algorithm.
Then, the failure data were diagnosed using the synthesized samples, and the precision of different algorithms is shown in Fig. 3.

Fig. 3. Comparison of precision between different algorithms
It was seen from Fig. 3 that when using the synthesized samples for fault diagnosis, the diagnostic precision of different algorithms for different failures was above 0.8; the precision of diagnosing normal samples was the highest, and the precision of diagnosing on-board equipment failure was the lowest. In conclusion, the BPNN algorithm was better than the SVM algorithm, and the SVM algorithm was better than the C4.5 algorithm. The average precision of the three algorithms was 0.94, 0.91 and 0.86, respectively; the average precision of the BPNN algorithm was 0.03 higher than the SVM algorithm and 0.08 higher than the C4.5 algorithm.
The recall rates of different algorithms are shown in Fig. 4.

Fig. 4. Comparison of recall rates between different algorithms
It was seen from Fig. 4 that for different failures, the recall rates of different algorithms were all above 0.7; the BPNN algorithm had the highest recall rate, followed by the SVM algorithm and the lowest recall rate; the highest and lowest recall rates of the BPNN algorithm were 0.94 and 0.91, respectively; the average recall rate of the three algorithms were 0.92, 0.86, and 0.79, respectively; the average recall rate of the BPNN algorithm was 0.06 higher than the SVM algorithm and 0.13 higher than the C4.5 algorithm.
Finally, the F1 scores of different algorithms were compared, and the results are shown in Fig. 5. 0.91, respectively, the F1 scores of the SVM algorithm were below 0.92, and the F1 scores of the C4.5 algorithm were below 0.9, indicating that the BPNN algorithm had the best performance. The average F1 score of the three algorithms was 0.93, 0.88 and 0.82. The average F1 score of the BPNN algorithm was 0.05 larger than the SVM algorithm and 0.11 larger than the C4.5 algorithm.

DISCUSSION
Railway signal equipment needs to be kept in good working condition in order to maintain the normal operation of the railway; therefore, the railway needs to carry out regular and irregular strict maintenance to ensure the safety of traffic. However, the maintenance cannot predict the failure of the equipment, and repeated maintenance and inspection also need to consume a lot of time and energy [10]. The real-time nature of fault diagnosis is receiving increasing attention [7]. It is very important to improve the maintenance level and ensure system reliability by providing information about the equipment and achieving timely mastery of the equipment status through modern technology.
In this paper, failures of railway signal equipment were briefly analyzed and introduced, the fault data were processed by denoising and feature extraction, and failure diagnosis experiments were conducted using three algorithms. The results showed that the BPNN algorithm had the best performance when diagnosing the original samples, and the precision, recall rate and F1 score were 0.91, 0.85 and 0.88, respectively. After synthesis by the ADASYN method, the samples were more balanced, and the effect of failure diagnosis significantly improved. The BPNN algorithm performed best in the precision, recall rate and F1 score, followed by the SVM algorithm and the C4.5 algorithm, indicating that the BPNN algorithm showed the best performance in failure diagnosis and was more capable of identifying different types of failures. The average precision of BPNN, SVM and C4.5 algorithms were 0.94, 0.91 and 0.86, respectively, the recall rates were 0.92, 0.86 and 0.79, respectively, and the F1 scores were 0.93, 0.88 and 0.82, respectively, which verified the effectiveness of the BPNN algorithm in fault diagnosis.
Although this paper has obtained some results involving failure diagnosis of railway signal equipment, there are many shortcomings that need further research in the future, such as: (1) analyzing a wider range of failure types; (2) comparing and studying more failure diagnosis algorithms.

CONCLUSION
This study designed the data denoising, feature extraction, minority class sample synthesis, and diagnosis algorithm based on the data of a railway bureau between 2016 and 2020 and conducted the experimental analysis. The experiment found that the diagnosis performance of different algorithms effectively improved after the synthesis of minority class samples, and the BPNN algorithm had the best performance in failure diagnosis in the comparison of BPNN, SVM, and C4.5 algorithms. The experimental results verify that the algorithm designed in this study is reliable and effective in diagnosing the failure of railway signal equipment; thus, the BPNN algorithm can be further promoted and applied in practice to further improve the diagnostic performance of failures in railway signal equipment and satisfy the demand of safe railway operation.