Abstract
Weather-related failures significantly challenge the reliability of distribution systems. To enhance the risk management of weather-related failures, an interpretable extra-trees based weather-related risk prediction model is proposed in this study. In the proposed model, the interpretability is successfully introduced to extra-trees by analyzing and processing the paths of decision trees in extra-trees. As a result, the interpretability of the proposed model is reflected in the following three respects: it can output the importance, contribution, and threshold of weather variables at high risk. The importance of weather variables can help in developing a long-term risk prevention plan. The contribution of weather variables provides targeted operation and maintenance advice for the next prediction period. The threshold of weather variables at high risk is critical in further preventing high risks. Compared with the black-box machine learning risk prediction models, the proposed model overcomes the application limitations. In addition to generating predicted risk levels, it can also provide more guidance information for the risk management of weather-related failures.
WEATHER-RELATED failures pose a significant challenge to the reliability of distribution systems. Service interruptions often occur under unfavorable weather conditions [
Many studies have focused on improving the performance of weather-related risk prediction. Poisson regression models [
ML methods have always been a hot topic in risk prediction. However, ML methods currently in use do not provide thorough support for risk management because interpretability is sacrificed [
To solve these problems, developing an interpretable ML model for weather-related risk prediction is necessary. General ML models mine hidden rules from data, and, therefore, the source of knowledge is the data. Because the interpretable ML model is not exactly a black-box model, the model itself can provide valuable information [
This study introduces an interpretable ML model for weather-related risk prediction and illustrates how interpretability can help in developing risk management plans.
An interpretable weather-related risk prediction model based on extra-trees is proposed. Extra-tree algorithm is an ensemble of decision trees in particular random patterns, which randomizes strongly in both attribute and cut-point choice while splitting a tree node [
Regarding long-term plan for a region, effectively using limited investments to strengthen and update the weaknesses of power systems is critical. The task is how to determine which weather-induced risks are prioritized. The proposed model can derive the importance of weather variables, which represents the overall degree of impact of each weather variable on weather-related risk, helping in the formulation of a long-term plan.
The short-term plan is intended to guide ex-ante resource preparation and formulate preventive measures by clarifying the source of risks in the next prediction period. The proposed model can meet this requirement because of its interpretability. The proposed model can derive the contributions of weather variables, which indicates each weather variable’s contribution to the predicted risk level in the next prediction period.
High risks require more attentions when preparing risk management plans. The interpretability of the proposed model can derive the threshold of weather variables at high risk, which measures when weather variables promote the occurrence of high risks. Therefore, it can be used as a guide to developing a quantified high-risk prevention plan.
The main contribution of this study is obtaining and harnessing the valuable guidance information using the proposed model for weather-related risk management. This study represents a new perspective on weather-related risk management beyond merely pursuing prediction performance using previous black-box models, thus making the risk prediction model more practical. The obtained information can be a useful reference for making long-term, short-term, and high-risk prevention plans for weather-related risk management. In addition, the proposed model offers better prediction performance than other ML models that can provide the same degree of interpretability. Therefore, the proposed model is an excellent choice for weather-related risk management, whether from the interpretability that brings sufficient guiding information or prediction performance.
The remainder of this study is organized as follows. Related works are described in Section II. Section III introduces the development of interpretable extra-trees. The interpretable extra-tree based weather-related risk prediction model is described in Section IV. Section V presents the weather-related risk management with help of proposed interpretability. Section VI concludes this paper.
Interpretability is critical for weather-related risk prediction. Decision makers in high-risk fields do not make decisions easily based on the prediction results of the used model without knowing the operating principle of the black-box model. When the interpretability of the used prediction model can be revealed, the interpretability will facilitate the practical application of the risk prediction model based on ML because it is more credible and can produce useful guiding information.
The research path of interpretable ML has two main directions: intrinsic and post-hoc interpretability. The models with intrinsic interpretability, which include many statistical models, have relatively simple structures and are already interpretable when they are designed. The relationship between the model variables and outputs can be easily explained for statistical models such as linear regressions. This is due to the availability of model parameters and their statistical significance. The coefficients of the linear regression model intuitively reflect the degree of influence of the variables on the predicted results. For black-box models such as extra-trees, this information is hidden inside the model structure. For intrinsic interpretable models, the model structure itself can explain why the model makes a certain prediction. When the model is too complex to interpret based on its structure, its interpretability can be tested using post-hoc methods. Compared with post-hoc interpretable models, intrinsic interpretable models are more intuitive and easier to understand for decision makers.
The relationship between prediction performance and model interpretability is illustrated in

Fig. 1 Relationship between prediction performance and model interpretability.
A. Extra-trees
Extra-tree algorithm [
Algorithm 1 : decision tree algorithm (attribute list A) |
---|
1. Create a node N2. If all samples are of the same class C then label N with C; terminate3. Select , with the lowest Gini index; label N with a4. For each value v of a: 1) Grow a branch from N with the condition 2) Let Sv be the subset of samples in S with 3) If Sv is empty then attach a leaf labeled with the most common class in S5. Repeat the above steps until leaf nodes are found |
The criterion like Gini index [
(1) |
where m is the number of output labels; and pi is the probability that a sample belongs to the
Algorithm 2 : extra-tree splitting algorithm (attribute list A) |
---|
1. Select K attributes in A2. Let ai,min and ai,max denote the minimal and maximal values of ai in A 3. Draw a random cut-point uniformly in [ai,min, ai,max]4. Return the split si [], where si represents a set smaller than in A5. Draw K splits 6. Return a split s* such that Score Score |
B. Interpretable Extra-trees
1) Interpretable Decision Trees
Due to the difficulty in interpreting and fully understanding decision trees, the decision trees often become black boxes. This derives from the fact that the extra-tree model consists of a large number of deep trees, and each tree is split with strong randomness. However, when the fundamentals of the extra-tree model are thoroughly analyzed, the model can be better understood, and its interpretability can be expressed [
1) Path: for a sample input to a decision tree, the path refers to the combination of all inference rules that the sample passes through from the root node to the leaf node such as path 1, as illustrated in

Fig. 2 Schematic of an interpretable decision tree.
2) Value: each node of the decision tree has a value represented by vi (), as shown in

Fig. 3 Three aspects of interpretability in interpretable extra-trees.
3) Contribution: the contribution value is derived from the value of the current node minus that of the previous node and represents the contribution of the split attribute to the prediction path.
The paths of decision trees can be used to obtain more information. Each path of the tree is from the root of the tree and includes a series of decisions guarded by a particular attribute. The decision of paths either adds or subtracts from the value given in the parent node. All decision paths contribute to the final prediction results in decision trees. Therefore, the prediction process can be defined as the sum of the attribute contributions and the value of the root node, i.e., the mean value given by the topmost region that covers the entire training set. The prediction function can be written as (2). Vroot is the value at the root of the node and is the contribution from the
(2) |
Note that the contribution of each attribute is not a single predetermined value. It depends on the rest of the attribute vector which determines the decision path that traverses the tree. In this way, it also determines the contributions passing along the way.
2) Pseudocode of Interpretable Extra-trees
As the ensemble of decision trees, the interpretable definition of extra-trees is based on the interpretable extra-trees, the pseudocode for which is shown in
Algorithm 3 : interpretable extra-trees |
---|
1. Obtain the value croot at the root of the node in each tree of extra-trees2. Calculate 3. The prediction function in a decision tree can be witten as: 4. The prediction of extra-trees is the average of the prediction of its trees:
|
3) Three Aspects of Interpretability
As
1) Importance of variables: the importance of variables measures the degree of impact of variables on the overall prediction, which can be obtained by calculating the Gini index in interpretable extra-trees. The importance ranking reflects the extent to which each variable determines the prediction. For a decision tree, it is necessary to find the attribute that can best distinguish the data set as the prioritized inference condition. The Gini index can be used to select the best attribute when dividing nodes. The smaller the Gini index of an attribute, the better its ability to divide nodes. The definition of the Gini index is given in (1). We first select a variable and then calculate the sum of the Gini index of all nodes split by this variable in each decision tree derived from the extra-trees. The variables can be ranked in the order of importance based on the sum of the Gini index of variables. The importance of variables is calculated during the training phase. Its source is the training data, i.e., the historical data, so it is a static index.
2) Contribution of variables: the importance of variables is used to evaluate those variables which are crucial for the overall prediction model, whereas the contribution of variables can provide more information when we are interested in a particular variable. The contribution of variables is a dynamic index of interpretability. It is used to evaluate the contribution of variables to a certain prediction. Therefore, the contribution of variables can be determined during each prediction period. The contribution value of variables on the prediction can be positive or negative. A positive or negative value indicates that the variable has a facilitating or hindering effect on the prediction, respectively. For example, as shown in
3) Threshold of variables: managers may sometimes pay greater attention to specific classes of predicted outputs and their interpretability. For example, in the prediction of weather-related failure risks, risk managers are more vigilant against high risks. With respect to the prediction of the specific class, the threshold of variables can measure when variables will promote this prediction. For each variable, its contribution to the prediction of a specific class is related to its own value. In general, the larger the value of a variable, the larger its contribution to the high risk. Therefore, if we try to analyze the relationship between the contribution values of variables on the predicted output and their own values, the threshold of variables under different classes of prediction can be obtained, providing quantifiable guidance information.
An interpretable extra-tree based weather-related risk prediction model is developed in this study using an actual data set.
A. Weather-related Failure Data Source and Analysis
The data used in this study were collected from a city in eastern China. The utility company recorded weather-related failures from January 2011 to October 2018 that included failure information by date, time, location, and type, and a simple weather description. To quantify the effects of the weather, we obtained quantified weather parameters from the meteorological bureau in the studied city.

Fig. 4 Distribution of weekly weather-related failure counts.
Weather-related risk increases with the number of failures. The greater the number of failures in a given period, the higher the requirement for the coordination and preparation of manpower and material resources for power recovery. High weather-related failure counts indicate high risk, which imposes higher requirements on the risk management capabilities of utility companies. Therefore, accurately predicting the occurrence of high risk is critical. However, as

Fig. 5 Importance of weather variables in studied city.
Type | Rich percentage (%) |
---|---|
Overhead bare line | 9.10 |
Overhead insulated line | 35.40 |
Cable line | 55.50 |
B. Weather-related Risk-level Classification
In order to reasonably characterize the risk caused by weather-related failure counts, the failure counts are classified into three risk levels. The classification details are presented in
Failure level | Number of weather-related failures |
---|---|
0 | 0 |
1 | 1, 2, 3 |
2 |
[ |
The failure counts from 1 to 3 are classified as one class due to the high occurrence frequency, which can be thought of as the common risk level. However, when the failure counts are larger than 3, the failure occurrence frequency is reduced to 4.19%, which is beyond the 95% confidence level [
It is reasonable for utility companies to classify the risk level into common and rare because a well-designed power grid should perform well under both conditions. In addition, the classification of failure counts helps utility companies in conducting risk management because different operation and maintenance plans correspond to different risk levels.
C. Weather Variables and Prediction Period
According to the investigation of historical failure data, weather-related failures are mainly caused by wind, rain, and thunder. There are various reasons specifically. For example, a strong wind will blow down trees, overhead lines, and equipment in a distribution system. Mild winds can also blow small objects up into the air such as plastic bags and branches, result in contacting with lines. Many failures occur in humid environments. When the rain is heavy, strong winds are also present, and when thunder occurs, the probability of failure increases. In general, several weather stations in cities produce different weather parameters. The degree of difference depends on the geography of the city under study. Due to the small area and single geographical environment in a city, the difference between different weather stations is typically small. In this study, a weather station located in the center of the city is used to obtain weather parameters. We chose the following six weather variables as attributes of the proposed model.
1) Feature 1: weekly average wind speed.
2) Feature 2: weekly maximum wind speed.
3) Feature 3: weekly average rainfall.
4) Feature 4: weekly maximum rainfall.
5) Feature 5: thunder days within a week.
6) Feature 6: weekly average humidity.
The prediction period should be reasonably determined. Daily and weekly predictions were used in previous studies. In [
D. Prediction Performance
1) Evaluation Metrics
Due to the unbalanced nature of the data in terms of risk levels, evaluating the prediction performance based on accuracy is not reasonable, where accuracy is defined as the proportion of all correctly predicted samples to the total samples in the test set. We introduced the F1 score to evaluate the performance of risk prediction, which is suitable for evaluating ML methods under unbalanced sample sets [
Underestimating weather-related risk may cause utility companies to neglect its prevention, leading to the inability to cope with the risk. Overestimating the risk results in an increase in risk prevention costs, including waste of workforce as well as material and financial resources. To better reflect the model’s prediction performance and the prediction propensity for risk, we define two evaluation metrics in this study: risk underestimation rate (RUR) and risk overestimation rate (ROR). The definitions of RUR and ROR are given in (3) and (4), respectively. In the test set, the numbers of samples with underestimated and overestimated risk are denoted as u and o, respectively, and the total number of samples is denoted as t.
(3) |
(4) |
2) Experiments and Results
In our experiments, the weather-related risk data were divided into a training data set (from 2011 to 2016) and a test data set (from January 2017 to October 2018). The numbers of training and test samples were 312 and 94, respectively. Previously, we introduced the interpretable way of the proposed model, which was based on an interpretation of the decision tree through an analysis of decision paths. Therefore, decision tree (DT) and random forest (RF) [
Model | The maximum depth of trees | Number of trees | F1 score | RUR (%) | ROR (%) |
---|---|---|---|---|---|
Extra-trees | 20 | 50 | 0.939 | 3.191 | 3.191 |
RF | 20 | 200 | 0.918 | 4.255 | 4.255 |
DT | 3 | 0.877 | 4.255 | 7.446 |
In
The solutions to the tasks with unbalanced data can be divided into two methods [
The proposed model not only provides rich interpretability but also exhibits the best risk prediction performance among the models that have the same degree of interpretability. Thus, the proposed model is an excellent choice for utility companies in managing weather-related risk.
3) Experiments on Robustness
The weather data we used are the monitoring data from weather stations, which are similar to many studies [
Error (%) | Prediction value | ||
---|---|---|---|
Extra-trees | RF | DT | |
0 | 0.939 | 0.918 | 0.877 |
+1 | 0.929 | 0.888 | 0.877 |
-1 | 0.907 | 0.897 | 0.877 |
+2 | 0.927 | 0.897 | 0.877 |
-2 | 0.890 | 0.875 | 0.877 |
+3 | 0.930 | 0.884 | 0.802 |
-3 | 0.876 | 0.863 | 0.837 |
+4 | 0.930 | 0.900 | 0.809 |
-4 | 0.859 | 0.851 | 0.845 |
+5 | 0.930 | 0.910 | 0.802 |
-5 | 0.868 | 0.875 | 0.861 |
+6 | 0.942 | 0.901 | 0.879 |
-6 | 0.900 | 0.873 | 0.869 |
+7 | 0.950 | 0.901 | 0.884 |
-7 | 0.855 | 0.873 | 0.869 |
+8 | 0.950 | 0.927 | 0.884 |
-8 | 0.864 | 0.848 | 0.857 |
+9 | 0.950 | 0.900 | 0.876 |
-9 | 0.873 | 0.848 | 0.857 |
+10 | 0.930 | 0.900 | 0.876 |
-10 | 0.861 | 0.840 | 0.845 |
Mean | 0.908 | 0.884 | 0.859 |
Compared with the previous ML risk prediction models that output only risk levels, the proposed model can further reveal the relationship between weather variables, which considerably helps in developing risk management plans. In this section, we describe in detail how the proposed interpretability helps weather-related risk management.
Interpretability includes the “importance of weather variables”, “contribution of weather variables”, and “threshold of weather variables”. As

Fig. 6 Different guidance functions of three aspects of interpretability.
A. Interpretation 1: Importance of Weather Variables
The importance of weather variables reflects the influence of each weather variable on the severity of risk, which helps inform the development of long-term plan for utility companies. The calculated importance of the weather variables in the studied city is shown in
B. Interpretation 2: Contribution of Weather Variables
For the development of short-term plans, the contribution of weather variables, which is produced dynamically during each prediction, is useful. The interpretation can provide dynamic guidance information by creating a targeted operation and maintenance plan for the next prediction period.
As previously stated, a positive contribution value indicates that the weather variable has a facilitating effect on the predicted risk, whereas a negative value indicates a hindering effect on the predicted risk. Under the attempts to prevent weather-related failure, more attention should be given to weather variables that contribute to the emergence of risk. Thus, decision makers can develop more specific prevention strategies based on the different contribution values of the variables. For example, in

Fig. 7 Examples of application analysis of contribution values of weather variables. (a) Sample occurred from 2018-06-04 to 2018-10-06. (b) Sample occurred from 2017-10-23 to 2017-10-26.
It is worth mentioning that when the model misjudges the risk as risk-free, the contribution of weather variables can still help in risk prevention. Errors inevitably occur in prediction due to the randomness of weather-related failures. In this case, a negative contribution value of the weather variable would indicate that the weather variable hinders the risk-free prediction results. Therefore, it is instructive to focus on weather variables with large negative contribution values for ex-ante operation and maintenance decisions when the risk is underestimated as risk-free, and they are most likely to be the causes of the risk. This means that when the prediction model indicates risk-free, utility companies have avenues to further stifle the risk by taking reasonable risk prevention measures with respect to weather variables with large negative contribution values. As an example, in
Statistically, when the largest negative contribution value corresponds to the true cause of failure, and when we assume that taking measures in advance can effectively avoid failure, the RUR can be further reduced from 3.191% to 2.128%. Therefore, this interpretability provides a targeted prevention direction for the situations underestimated as risk-free, further improving the risk-management capabilities of distribution systems.
For the validity of the interpretability, it is crucial to realize that the results are not artifacts of one particular realization of an extra-tree model but that they convey actual information held by the data [

Fig. 8 Boxplot of contribution values of weather features for an instance.
C. Interpretation 3: Threshold of Weather Variables
In risk management, more attention should be given to high risk (level 2) because its lower frequency and insufficient learning samples make the prediction difficult. Simultaneously, it has a more severe effect on distribution systems.
When the contribution value of weather variables to high risk is positive, the occurrence probability of high risk increases. When the contribution value changes from negative to positive, the transition point can be used as a warning threshold. The contribution is related to the values of weather variables. Therefore, we can analyze the patterns of contributions of weather variables at high risk and then obtain the threshold of weather variables that trigger high risk, thereby providing quantitative reference information for high-risk management.
When the value of a single weather variable exceeds the captured threshold, the weather variable begins to make a positive contribution to the occurrence of high risk. If the positive contribution accumulates to a certain level, a high risk will occur, and only a single variable exceeding the threshold will be insufficient to warn of high risk. However, when each weather variable exceeds the captured threshold simultaneously, it can be used as a quantitative early-warning signal of high risk. When all weather variables begin to contribute positively to high risk, multiple weather factors facilitate the occurrence of high risk, indicating a complex weather situation. The probability of high risk is greatly increased. The theoretical reasons are as follows.
After we endow the extra-tree prediction model with intrinsic interpretability, the process of one prediction can be expressed by (2). Therefore, the proposed model can be expressed by the following equation in our application.
(5) |
where c1-c6 represent the contribution values of weather features to a certain predicted risk level.
Considering that the values of feature1-feature6 are all greater than or equal to 0 and that the trained value of Vroot is also greater than 0, if the contribution value of each weather variable to high risk is also greater than 0 at this time, multiple weather factors will simultaneously contribute positively to the occurrence of high risk, and the occurrence probability of high risk will be high.

Fig. 9 Relationship between contribution value of weather variables to high risk and values of weather variable. (a) Feature 1. (b) Feature 2. (c) Feature 3. (d) Feature 4. (e) Feature 5. (f) Feature 6.
Feature | Threshold value | Precision_single |
---|---|---|
1 | 2.3 m/s | 0.737 m/s |
2 | 5.1 m/s | 0.842 m/s |
3 | 5.2 mm | 0.842 mm |
4 | 8 mm | 0.895 mm |
5 | 1 day | 0.895 day |
6 | 64 %rh | 0.642 %rh |
We test the effectiveness of the proposed threshold in the high-risk data set. We chose the precision metric to verify the effectiveness of the proposed threshold. The precision metric can measure the proportion of samples that meet the threshold criteria as high-risk samples, i.e., the probability of high-risk occurrence when exceeding the threshold. The precision metrics calculated by the threshold of a weather variable and the threshold of all weather variables as criteria of high risk are named as precision_single and precision_all(100%), respectively. We can find that for just a single weather variable meeting the threshold, there is no necessarily high risk. When the threshold of all weather variables is exceeded at the same time, the probability of the high-risk occurrence is 100%. Therefore, the proposed threshold can serve as a quantifiable early warning signal for high risk, guiding utility companies in making high-risk prevention arrangements.
Predicting weather-related failure risk can provide useful guidance information for utility companies to develop ex-ante risk prevention plans. The interpretable extra-tree based weather-related risk prediction model is proposed, which has interpretability with three aspects to provide effective advice for risk prevention. Specifically, the importance of weather variables helps in making long-term operation and maintenance plans. The weather variables support the development of specific risk prevention plans prior to the next prediction period. The threshold of weather variables at high risk yields a quantitative high-risk prevention plan. The proposed model overcomes the limitations of black-box ML models, making the risk prediction model more practical, further improving the weather-related risk management capabilities of utility companies. In comparison with ML models that can provide the same degree of interpretability, the proposed model has the best weather-related risk prediction performance. In addition, the proposed model can provide a way to guide decisions on other prediction issues in power systems.
References
D. H. Vu, K. M. Muttaqi, A. P. Agalgaonkar et al., “Recurring multi-layer moving window approach to forecast day-ahead and week-ahead load demand considering weather conditions,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 6, pp. 1552-1562, Nov. 2022. [Baidu Scholar]
H. Li, L. A. Treinish, and J. R. M. Hosking, “A statistical model for risk management of electric outage forecasts,” IBM Journal of Research and Development, vol. 54, no. 3, pp. 1-11, May 2010. [Baidu Scholar]
X. Wei, J. Zhao, T. Huang et al., “A novel cascading faults graph based transmission network vulnerability assessment method,” IEEE Transactions on Power Systems, vol. 33, no. 3, pp. 2995-3000, May 2018. [Baidu Scholar]
J. He, D. W. Wanik, B. M. Hartman et al., “Nonparametric tree-based predictive modeling of storm outages on an electric distribution network,” Risk Analysis, vol. 37, no. 3, pp. 441-458, Mar. 2017. [Baidu Scholar]
H. Liu, R. A. Davidson, D. V. Rosowsky et al., “Negative binomial regression of electric power outages in hurricanes,” Journal of Infrastructure Systems, vol. 11, no. 4, pp. 258-267, Dec. 2005. [Baidu Scholar]
S. R. Han, S. D. Guikema, S. M. Quiring et al., “Estimating the spatial distribution of power outages during hurricanes in the Gulf coast region,” Reliability Engineering & System Safety, vol. 94, no. 2, pp. 199-210, Feb. 2009. [Baidu Scholar]
H. Liu, R. A. Davidson, and T. V. Apanasovich, “Spatial generalized linear mixed models of electric power outages due to hurricanes and ICE storms,” Reliability Engineering & System Safety, vol. 93, no. 6, pp. 897-912, Mar. 2007. [Baidu Scholar]
P. Kankanala, A. Pahwa, and S. Das, “Regression models for outages due to wind and lightning on overhead distribution feeders,” in Proceedings of 2011 IEEE PES General Meeting, Detroit, USA, Jul. 2011, pp. 1-4. [Baidu Scholar]
P. Kankanala, A. Pahwa, and S. Das, “Exponential regression models for wind and lightning caused outages on overhead distribution feeders,” in Proceedings of 2011 North American Power Symposium, Boston, USA, Aug. 2011, pp. 1-4. [Baidu Scholar]
P. Kankanala, S. Das, and A. Pahwa, “AdaBoost: an ensemble learning approach for estimating weather-related outages in distribution systems,” IEEE Transactions on Power Systems, vol. 29, no. 1, pp. 359-367, Jan. 2014. [Baidu Scholar]
P. Kankanala, A. Pahwa, and S. Das, “Estimation of overhead distribution system outages caused by wind and lightning using an artificial neural network,” in Proceedings of International Conference on Power System Operation & Planning (ICPSOP), Juja, Kenya, Jan. 2012, pp. 1-6. [Baidu Scholar]
Y. Du, Y. Liu, X. Wang et al., “Predicting weather-related failure risk in distribution systems using Bayesian neural network,” IEEE Transactions on Smart Grid, vol. 12, no. 1, pp. 350-360, Aug. 2020. [Baidu Scholar]
I. Bratko, “Machine learning: between accuracy and interpretability,” Learning, Networks and Statistics, vol. 382, pp. 163-177, Jan. 1997. [Baidu Scholar]
B. Kim and R. Khanna, “Examples are not enough, learn to criticize criticism for interpretability,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, Dec. 2016, pp. 2288-2296. [Baidu Scholar]
D. V. Carvalho, E. M. Pereira, and J. S. Cardoso, “Machine learning interpretability: a survey on methods and metrics,” Electronics, vol. 8,no. 8, pp. 832-838, Jul. 2019. [Baidu Scholar]
C. Molnar. (Aug. 2019). Interpretable machine learning: a guide for making black box models explainable. [Online]. Available: https://christophm. github. io/interpretable-ml-book [Baidu Scholar]
F. Doshi-Velez and B. Kim. (2017, Mar.). Towards a rigorous science of interpretable machine learning. [Online]. Available: https://arxiv.org/abs/1702.08608 [Baidu Scholar]
P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, no. 1, pp. 3-42, Mar. 2006. [Baidu Scholar]
C. Desir, C. Petitjean, L. Heutte et al., “Classification of endomicroscopic images of the lung based on random subwindows and extra-trees,” IEEE Transactions on Biomedical Engineering, vol. 59, no. 9, pp. 2677-2683, Sept. 2012. [Baidu Scholar]
A. Zhang. (2021, Dec.). Explainable artificial intelligence. [Online]. Available: http://statsoft.org/ [Baidu Scholar]
R. J. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, Mar. 1986. [Baidu Scholar]
R. I. Lerman and S. Yitzhaki, “A note on the calculation and interpretation of the Gini index,” Economics Letters, vol. 15, no. 3, pp. 363-368, Feb. 1984. [Baidu Scholar]
G. Tam. (Sept. 2017). Interpreting decision trees and random forests. [Online]. Available: https://engineering.pivotal.io/post/interpreting-decision-trees-and-random-forests/ [Baidu Scholar]
A. Palczewska, J. Palczewski, R. M. Robinson et al., “Interpreting random forest models using a feature contribution method,” in Proceedings of 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI), San Francisco, USA, Aug. 2013, pp. 112-119. [Baidu Scholar]
Y. Du, Y. Liu, Q. Shao et al., “Single line-to-ground faulted line detection of distribution systems with resonant grounding based on feature fusion framework,” IEEE Transactions on Power Delivery, vol. 34, no. 4, pp. 1766-1775, Aug. 2019. [Baidu Scholar]
Y. Sun, A. K. Wong, and S. K. Mohamed, “Classification of imbalanced data: a review,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 23, no. 4, pp. 687-719, Oct. 2009. [Baidu Scholar]
B. Ci, “Confidence intervals,” Lancet, vol. 1, no. 8531, pp. 494-497, Jan. 1987. [Baidu Scholar]
G. Wang, T. Xu, T. Tang et al., “A Bayesian network model for prediction of weather-related failures in railway turnout systems,” Expert Systems with Applications, vol. 69, pp. 247-256, Oct. 2016. [Baidu Scholar]
L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5-32, Oct. 2001. [Baidu Scholar]
C. Bouveyron, B. Hammer, and T. Villmann, “Recent developments in clustering algorithms”in Proceedings of the 20th European Symposium on Artificial Neural Networks, Bruges, Belgium, Apr. 2012, pp. 447-458. [Baidu Scholar]
H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning,” ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1-36, Jul. 2020. [Baidu Scholar]
W. S. Cleveland, “Robust locally weighted regression and smoothing scatterplots,” Journal of the American Statistical Association, vol. 74, no. 368, pp. 829-836, Apr. 1979. [Baidu Scholar]