THE MINING OF HIGH RISK EQUIPMENT BASED ON THE ALGORITHM OF HR-TREE’S DECISION

Due to the different construction of various subsystems in the power grid, the information of various systems are not closely connected. Nowadays, the network is complex and changeable where the automation is getting higher. This article takes high-risk equipment set of substation in Liaoyang as the research background. It constructs HR-Tree for the device set, and establishes a high-risk equipment evaluation system which based on the HR-Tree context. Then we calculate high-risk equipment sets in the structure of overall data set. By establishing the original data set and the prior knowledge system of equipment risk, the non-candidate high-risk equipment set is reduced in the local path of the high-risk equipment set. We refer to the process of reducing data as minus branch. After the threshold is established, the branches are reduced and the highest risk equipment set is obtained. Finally, we use the scoring system to find the probability of occurrence of associated devices, such information is more open. Example showed that such methods could effectively express high-risk device sets, and managers could get early warning information based on this. It helps people monitoring the power system, w hich could also provides new ideas for the monitoring project.


INTRODUCTION
State Grid Corporation of China includes many systems for collecting various types of information. These subsystems in the dispatch control system have great limitations in terms of data-exchanging and data-sharing, where a unified connection has not yet been established for the multi-source data of each system. The situation cannot meet the needs for development of automated application systems and the integration of smart grid information [1].
In the process of multi-dimensional information fusion, this article establishes an overall framework. The system will reasonably allocate information to categories and provide scientific theoretical basis according to the needs of on-site dispatch.
In this article, the establishment of an HR-Tree information network which based on the framework of fused information is helpful for information analysis. It is more efficient to explore high-risk equipment on the fuse platform of the power grid information.HR-Tree is one of the main methods for testing system's safety whose causality is clear. It can intuitively and comprehensively reflect the internal mechanism of faults. After analyzing the basic events, the contribution rate of basic events to faults can be obtained [2].
The research of industrial multi-dimensional data networks involves high-risk decision trees in many fields. HR-Tree generates high-risk equipment sets and reduces candidate branches to obtain the highest-risk equipment set. In addition, the method of this paper explores the association between high-risk equipment sets and then obtains the rules of failure occurrence, which are mainly reflected in the score of related equipment failures.
As an effective and basic method, system safety evaluation has been gradually applied to various engineering fields [3]. Traditional evaluation methods such as fuzzy comprehensive evaluation rely on a large amount of historical experience and expert opinions [4]. The literature [5] presents results on a methodology for high-power EM based risk assessment of large structures considering the example of smart grid substations. The methodology developed in this paper evaluates the threat, vulnerability, impact, and protective measures as indicators in various scenarios of both conducted and radiated intentional EM interference (IEMI) threats to these systems. In literature [6], Data-mining analysis was carried out using the C4.5 decision tree algorithm for the aforementioned three events using five different splitting criteria. The literature [7] introduce the APRICOIN algorithm, which combines frequent pattern mining and a fuzzy logic system, to assess the container's risk score. The frequent pattern growth algorithm is proposed to retrieve the key criteria for evaluating container risk. In literature [8], It presents a scientific literature information extraction architecture using text mining techniques to assess the human health risk of electromagnetic fields (EMFs) generated by wireless sensor devices in Internet of Things. To extract high-quality patterns in real-life applications, this literature [9] extends the occupancy measure to also assess the utility of 52 DIAGNOSTYKA, Vol. 21, No. 2 (2020) Wang S, Zhang Y, Li Y, Gao S, Yang F.: The mining of high risk equipment based on the algorithm of … 52 patterns in transaction databases. Research on power equipment and fault assessment methods at home and abroad has grown vigorously in the past 20 years and has formed a scientific theoretical basis.
Literature [10][11] obtained fault diagnosis classification by studying massive monitoring data of transformers, and proposed a collaborative variable prediction model based on Spark computing framework. Literature [12][13] used deep learning models for transformer fault diagnosis, while abandoning the shortcomings of traditional DBN-based deep learning models, and then proposed adaptive improvements. Finally, an adaptive deep learning model transformer fault diagnosis method was obtained. Literature [14][15] introduced the calculation of the critical importance of each component under each fault and the order of troubleshooting by using the T−S fuzzy gate algorithm, and combined with the component fault self-diagnostic program. The hybrid reasoning based on fuzzy fault tree analysis in literature [16][17] uses a targeted artificial intelligence search algorithm, which will more accurately search for the fault location of the system and provide a solution for fault analysis from multi-source data sources.
The exploration methods for faults at home and abroad have gradually developed into intelligence in recent years. The rise and development of data mining analysis methods has opened up a new technical line for the evaluation and fault diagnosis of power equipment conditions. Data mining in industry proposed high requirements for more parameter's information of equipment condition [18] . In recent years, the algorithms about data mining analysis have been used in a variety of situations in industry.
Although the innovation of digital technology had brought convenience to system monitoring and equipment management, the integration of digital information in the power system has not yet matured. This is mainly because the framework of information fusion is not yet constructed, so the efficiency of information utilization has not yet meet the requirements. On dispatching department under the State Grid Corporation of China, multiprofessional knowledge can be presented on several platform in a short period of time, but the manual release of commands is delayed in time. In summary, this paper reduces branches between various data sets, and the high-risk equipment set which obtained after branch reduction has guiding significance for the dispatch center. The method of this paper calculates the relationship between various data sets, and the estimated occurrence of the fault of device. Therefore, the research on electric power data mining still has great development prospects.

EVALUATION METHOD OF CHARACTERIZATION FROM EQUIPMENT RISK
Facts have proved that equipment failure often were the direct cause of power outages and the key factor in the expansion of accidents, so equipment risk is one of the cores of grid risk assessment. As shown in Figure 1, equipment risk impact is composed of equipment importance and equipment hidden dangers, and they are respectively composed of equipment cost, voltage level, level of power supply area, related scale, alarm level, impact of failure , and maintenance frequency. As shown in Figure 1, equipment risk impact is mainly composed of equipment importance and equipment hidden dangers. The interaction between equipment importance and equipment hidden dangers finally determines the final risk value of equipment. Equipment importance and equipment hidden dangers can be divided into seven section in detail: A, Equipment cost. From the economic point of view, the more expensive the equipment, the more important it is. Its level has a clear guiding effect on the evaluation equipment. B, Voltage level. The basis of the establishment of the voltage level is the magnitude of the impact on the environment and residents after the failure of high-voltage equipment. The level is established by the magnitude of the loss caused by the environmental blackout. C, Level of power supply area. The magnitude of the load is proportional to the risk factor. The higher the factor of risk, the greater the administrative level of the department in which the equipment is located. The high level of power supply area represents its high risk. D, Related scale. In the process of equipment failure, the scale of the affected equipment is counted to determine the important impact of the equipment. E, Alarm level. The alarm characterization form of the device is the weighted summation of each alarm level. Its formula is as follows: DIAGNOSTYKA, Vol. 21, No. 2 (2020) 53 Wang S, Zhang Y, Li Y, Gao S, Yang F.: The mining of high risk equipment based on the algorithm of … In the formula, i w is set as the alarm level, i K is set as the frequency of occurrence of a certain alarm level, and t is the number of alarm levels.
The larger the value of WR .The greater the alarm level, the greater the degree of risk damage.
F, The equipment's fault characterization is a weighted summation of various fault levels, and its formula is as follows: In the formula, i g is set as the fault level, i T is set as the frequency of occurrence of a certain alarm level, and s is the number of alarm levels.
G, Maintenance frequency. Equipment maintenance is divided into planned maintenance and maintenance after failure. The number of maintenance times objectively represents the of equipment hidden dangers.
These non-quantifiable indicators are shown in Tables 1 to 3: The process is based on expert evaluation and manual weighting. The equipment importance is the same as the evaluation method of equipment hidden dangers. According to the actual situation, only these two indicators Equipment cost and Voltage level are static, while other indicators (including Level of power supply area, Related scale, Alarm level, Impact of failure, Maintenance frequency)are dynamically changing, which is also caused by the physical structure of the device itself. Half-ladder model includes Half-lift ladder model and Half-step ladder model. In order to facilitate subsequent calculations, the quantized data is subjected to isotropic processing, that is, the quantized results are all located in the interval [0,1], in Half-lift ladder model, the smaller the value, the better the corresponding state and running status.
The expression of the Half-lift ladder scoring model is: The expression of the Half-step ladder scoring model is: We quantify each index using methods such as formula (3) and formula (4). Such algorithms are more accurate and fair which are widely used in engineering applications. Among them, a, b are thresholds; m is a quantized parameter value. According to the relevant regulations and maintenance experience settings, the revision of the threshold and parameter values of each state quantity be found above.

Assessment system in equipment risk impact
The above related equipment set is defined as: According to the establishment of each characterization of equipment risk in Section 2.1, the mathematical model for establishing the evaluation system is shown in formula (5):

Definition of equipment high risk set
The risk value of device

HR-Tree mining algorithm based on equipment risk impact data
The HR-Tree algorithm describes a tree structure for classifying specific parameters. The algorithm moves down recursively until it reaches the leaf node, and finally assigns the instance to the class of the leaf node [19].
The construct of the HR-Tree for the equipment risk impact was obtained in Section 2. It can be seen from Figure 2 that the HR-Tree starts from the integration of the original data, which obtains the initial equipment risk set through the information screening of rule 1.It eliminates the low-threshold equipment set through rule 2 and the equipment risk threshold. This article defines the first rule, second rule and pruning of HR-Tree as follows: Rule 1: If the risk value of a certain item of equipment does not meet the minimum risk threshold, the inclusion item of the device is eliminated. After a certain device set eliminates some branches, the branches need to be rebuilt.
Rule 2: If a certain device set is determined to be a high-risk set, any subset of the device set meets the minimum threshold of the minimum risk. This principle is also called downward closure.
pruning process: Step 1: Perform risk value evaluation on the initial data of the equipment to obtain the initial risk equipment set.
Step 2: Calculate the minimum risk threshold as A and construct the global equipment risk set. Wang S, Zhang Y, Li Y, Gao S, Yang F.: The mining of high risk equipment based on the algorithm of … Step 3: Eliminate the set of equipment that does not meet the requirements, and reduce the number of branches to form a new HR-Tree.
Step 4: Find the equipment set with the highest risk from the local equipment set and determine the environmental safety index in the new situation.

The analysis of Equipment risk impact
This paper selects 9 equipments of a 220kV substation in Liaoyang to evaluate the equipment risk impact. This section categorizes the information of importance index from 9 devices, as shown in Table 4.   Table 6 collects various types of alarm information for a 220kV substation in Liaoyang in 2019, and it sorts out the original data to get an example of the original set of equipment failures.  According to the total value of the single transaction risk in Table 6, the total transaction impact of the equipment risk: reducing branches. The S1 equipment is discarded and all branches were arranged according to the total transaction risk value.

The construction and reduction of HR-Tree
According to the basic data in section 3.1, HR-Tree is constructed. In this section, branch A is first established. The process is as follows: (1) Establish the root node of the tree and name it top.
(2) Insert transaction set: The HR-Tree L1 path is constructed, as shown in Figure 3. Insert the L2, L3, L4, L5, and L6 paths in the L1 path to form a partially complete HR-Tree, and connect the branches with the same numbered devices. As shown in Figure 4.
According to the total transaction risk value 3 R , a non-candidate lower risk path is obtained, in which the path {H 2}'s total transaction risk value is the smallest, it's thinning process is as follows: Classify the three paths containing {H 2}: Remove the non-candidate high risk set: 1 G and readjust the path. The process is shown in Table 7. The minimum global risk value of local devices is shown in Table 8.  (15) Reconstruct the local path risk value and describe the first path in 1266. The partial branch reduction is shown in Figure 5. According to on-site dispatching, high-risk collections have a great impact on the environment, and this information plays an important role in early warning of dispatchers and maintenance teams. After investigation, the on-site 110kV transformer and 10kV capacitor had a long investment time and aging phenomenon, which were highly consistent with the early warning of high-risk equipment.
After that, this section attempts to find the connection between transactions, and then explores the law of transaction occurrence, and provides a reference for the dispatchers on site. The simulation results are shown in Figure 6. As shown in Figure 6, The transaction set is divided into three levels, namely low level, intermediate level, and high level. The set of each level is arranged according to the score of the occurrence of transactions. Each set of levels can get two sets of transactions with a high score system, which also shows that the probability of these high-scoring transactions is high. The 100% correlation transaction are: . The above symbol indicates the occurrence of the previous item, and the occurrence probability of the latter item is 100%.
From the above analysis, the method research in this article believes that Transformer and Capacitor will also be damaged after Bus bar was damaged, and the loss must be stopped in time: turn off the relevant switches. Transformer 's damage also needs to pay attention to Capacitor at the same time. The correlation between device sets is also significant. When Bus bar and Transformer were damaged at the same time, we must know that Capacitor has also been damaged.
These chain of equipment damage occurred, we will simulate the on-site early warning measures in the process of knowing the chain reaction, which is very meaningful for the protection of the project. The basis for early warning is based on the scores generated after the occurrence of each transaction set. From Figure 6, we can see that the score of the occurrence of transactions. The selection of thresholds in this section varies according to the level of the transaction occurrence chain.
In the low-level environment, the threshold is selected as: 0.35. In the intermediate-level environment, the threshold is selected as: 0.24. In the advanced environment, the threshold is selected as: 0.24. From this we can see that each level of transaction generation chain has two transactions that get the highest score, and we could give early warning of these transactions. The content of the early warning is the highest scoring transaction set. This article hopes that such early warning can be brought to the field operation staff for maintenance guidance.

The Influence of the number of branches on HR-Tree data mining
This article extracts equipment data from the three accidents in the past five years, and selects equipment data of three different units. Each number means a group of data about a transformers and accessories, and the size of the control group ranges from few to many. At the same time, we set the scale factor in the experimental method to 0.4, and Table 9 shows the data mining results of the high-risk equipment set during the three accidents.  Table 9 shows the relationship between the number of branches and the set of high-risk equipment. Abnormal devices represent the number of damaged device on site. As can be seen from Table 9, the number of branches represents the number of high-risk equipment sets in other means. The number of branches increases, and the number of high-risk equipment sets also increases. The increase of high-risk equipment sets has a warning effect on site supervisor. From the contrast of the abnormal events on the site, this is in line with reality.

CONCLUSION
With the increase of the automation degree of power equipment, the operation efficiency of personnel in handling accidents is also efficient. Nowadays, the power network framework has gradually changed. Digitalization has brought a huge operational revolution to dispatchers and substation employees, but the imformation is still necessary to be standardized and summarized. Without analysis of multi-dimensional information sources, it will bring the result of digital