Abstract
The hidden failures generally exist in power systems and could give rise to cascading failures. Identification of hidden failures is challenging due to very low occurrence probabilities. This paper proposes a state-failure-network (SF-network) method to overcome the difficulty. The SF-network is formed by searching the failures and states guided by risk estimation indices, in which only the failures and states contributing to the blackout risks are searched and duplicated searches are avoided. Therefore, sufficient hidden failures can be obtained with acceptable computations. Based on the state and failure value calculations in the SF-network, the hidden failure critical component indices can be obtained to quantify the criticalities of the lines. The proposed SF-network method is superior to common sampling based methods in risk estimation accuracy. Besides, the state and failure value calculations in the SF-network used to re-estimate the risks after deployment of measures against hidden failures need shorter time in comparison with other risk re-estimation methods. The IEEE 14-bus and 118-bus systems are used to validate the method.
THE cascading failures in power systems can lead to large blackouts and cause severe losses to society [
Some techniques have been proposed to detect the potential hidden failures before they occur. With the application of wide area measurement systems (WAMSs) and phasor measurement units (PMUs), the data analyses enable the operators to adopt on-line detection methods [
In order to minimize the potential costs and risks caused by hidden failures, the maintenance strategies are optimized to bolster the reliability of the protection systems [
As one of the factors that deteriorate the system during the cascading failures, the effects of hidden failures on system risks and reliability are studied in [
In this paper, a state-failure-network (SF-network) method is proposed to identify the critical hidden failures and critical hidden failure lines. The SF-network method in this paper is improved based on the original proposed SF-network in [
The rest of the paper is organized as follows. Section II illustrates the model of hidden failure and the cascading failure simulation considering hidden failures. Then the SF-network method is introduced in Section III. Finally, two cases are used to validate the proposed method in Section IV.
The undetected defects of the relays can give rise to the hidden failures. If any adjacent line of line (connected to the same bus as line ) fails, the defective relay of line will be exposed and line may be falsely tripped [
During a cascading event, line exposure can occur multiple times to a single line. However, the hidden failures of the lines are more likely to occur on the first exposure than on the subsequent exposures [
(1) |
The cascading failures considering hidden failures are simulated by a DC power flow based simulator modified from the simulator in [

Fig. 1 Flow chart of cascading failure simulation considering hidden failures.
Step 1: input the initial operation point of the power system.
Step 2: set initial contingencies to trigger the cascading failures.
Step 3: detect islands. If new islands are detected, re-balance the generation and loads in each island. Otherwise, re-dispatch the islands based on the dispatch method in [
Step 4: expose the adjacent lines of the failed lines.
Step 5: check the exposed lines according to Step 1. If no line is exposed, go to Step 6. For the exposed lines whose , select a line to trip as the hidden failure based on the Roulette-Wheel algorithm [
(2) |
(3) |
where is the number of lines in the system; and is the simulated probability of no hidden failure.
According to Steps 2 and 3, it always holds that . If any hidden failure occurs, go to Step 3. Otherwise, go to Step 6.
Step 6: check the overloaded lines. In fact, the lines whose loads are close to their capacities are taken as overloaded and can fail in a certain probability [
(4) |
(5) |
where is a load dependent variable whose value is between 0 and 1 when the line is overloaded, and equals to 0 otherwise [
According to Steps 4 and 5, it always holds that . When any line is tripped, go to Step 3. If no line is overloaded or tripped, go to Step 7.
Step 7: the cascade ends, and the total loss of the chain is recorded.
The simulation offers a cascading failure sample after the cascade ends.
It is worth noting that the events of hidden failures show higher priority than those of overloaded failures in the simulation procedure. In other words, the overloaded failure can only occur when no hidden failure occurs at the current stage.
In general, the risk estimation needs enormous Monte-Carlo (MC) samples to obtain the expected losses as:
(6) |
where is the number of samples; is the loss of the th simulation sample; and is the risk of the power system. Reference [

Fig. 2 Risks estimated by cascading failure models with and without hidden failures.

Fig. 3 CCDFs of blackouts estimated by simulations with and without hidden failures.
However, sampling based methods such as the MC have two major defects. One is that collecting sufficient samples of hidden failures to identify the critical hidden failures can be infeasible due to the very low occurrence probabilities.

Fig. 4 Mean values of recorded failures and exposed line among samples of IEEE 14-bus system.
Besides, it should also be noted that the difference between the estimated risks of scenarios in
Therefore, a method is needed to collect sufficient hidden failures and more accurately estimate the blackout risks, and the SF-network method proposed by the authors in their previous work [
The structure of SF-network is mainly formed by the states and failures, which can be obtained by the cascading failure chains [
First, a -length cascading failure chain comprised of the failure sequence and final loss is denoted by where the subscripts in brackets are the occurrence order of the failures. Then the states are denoted as vectors , and the th state of the failure chain is defined as .
In addition, we denote the initial state where no failure occurs as , and the ending mark of a failure chain as . Then, recombine the states and corresponding failures as a tuple sequence: .
The subscripts of the state vectors are the stage numbers, which are also the numbers of failed lines.
Secondly, nodes and edges are used to signify the states and failures, respectively. The nodes and edges are joined based on the tuple sequences as shown in

Fig. 5 State-failure sequence.
We can get the structure of the SF-network from tuple sequences as illustrated in

Fig. 6 Structure of SF-network.
As in the figure, the SF-network originates at the initial state , spreads along the subsequent chains of states and failures, and terminates at the states with .
The states in this paper differ from those in the original SF-network in [
In the authors’ previous work [
In the improved SF-network, the impact of sampling randomness on the risk estimation is eliminated. The loss of the th failure chain in (6) is the sum of the losses that occur at the states along the failure chain and can be formulated as:
(7) |
where is the loss that occurs at state .
Substituting (7) into (6), we can get:
(8) |
Since the similar states in the failure chains, which share similar failure sequences, are merged into a single state in the SF-network, denoting the number of the gathered state as . Thus, (8) can be rewritten as:
(9) |
where is the final stage of SF-network; and is the number of states at the stage.
As the number of samples increases, the fraction will converge to the corresponding occurrence probability of , . Thus, we have:
(10) |
can be derived from:
(11) |
Therefore, the accurate estimated risk can be obtained by the sum of the risks of the states in the SF-network, which is more accurate than the estimated risk of random samples derived from (6).
(12) |
According to (12), the risk can be obtained by searching the states of the SF-network and summing the risks of the states. Instead of sampling the cascading failure chains randomly based on their probabilities, the failures of high risks and the hidden failures leading to high losses are the interest of this paper. Thus, we introduce the risk estimation indices to guide the searching process.
When the search reaches a state, the risk estimation indices indicate the risks of the failures of the operating lines at the state. To calculate the risk estimation indices, the failure probabilities of the lines and estimated losses are worked out as follows.
1) Failure probability calculation: according to Section II-B, the occurrence of hidden failures shows higher priority than that of the overloaded failures at a new stage. However, all the failures can be taken as mutually exclusive independent events, so the failure probability of line at state can be obtained by:
(13) |
(14) |
where is the hidden failure probability of line l at state ; is the overloaded failure probability of line l at state ; and is the failure of line l.
Particularly, the probability of no failure can be derived from (3) and (5) as:
(15) |
Once line is selected and fails, the probability of the next state after line fails is obtained by:
(16) |
2) Estimated loss calculation: the failures can cause losses. In this paper, the loss of system splitting and loss of overloading are considered. At state , both and can be calculated based on the admittance matrix of the system and the Penrose-Moore pseudo-inverse of as introduced in [
(17) |
where is the system splitting loss coefficient; and is the overloading loss coefficient. The values of and depend on the loss preference of the operator, and their sum is 1. If the loss of system splitting requires more attention, they should be set as , and vice versa. Without loss of generality, the two kinds of estimated losses are treated equally by setting in this paper.
Given the failure probability and estimated loss, the risk estimation index of line can be obtained by:
(18) |
Hence, the searching is guided to the failures with higher risk estimation indices so that the states of high risks can be added into (12). Specifically, the failures at a certain state are searched based on the probabilities derived from the Roulette-Wheel algorithm and the risk estimation indices obtained by (18). Thus, the failures with higher indices are more probable to be searched. When there are no overloads/exposures or the probability of the searched failure chain is below a threshold , the search on the current failure chain terminates and the search continues on a new failure chain. The search avoids duplicated failure chains by updating the risk estimation indices after finishing searching a failure chain [

Fig. 7 Searching procedure of SF-network.
After forming the SF-network, the state values (abbreviated as S-value) and failure values (abbreviated as F-value) can be worked out by the SF-network value calculations [
1) S-value calculation:
(19) |
where is the set of failures that occur at ; and is the F-value of the failure at .
2) F-value calculation: the F-value of the failure equals to the S-value of its next state, which is denoted as . The F-value of equals to the total loss of the system at .
(20) |
where is the loss of state .
The calculation starts at the states where cascades terminate, and performs successively from larger stages to smaller ones (from right to left in
1) Get the F-value of at the backmost state at the maximal stage in the SF-network according to (20). Then, the S-value of the backmost state equals to the F-value of the according to (19).
2) Move forward to the next stage. According to (20), the F-values of the failures all equal to the S-values of their next states, which have been obtained at the previous stage. Then, the S-values of the states at the current stage are worked out based on (19).
3) Continue the calculation until the first stage. Then , the S-value of initial state , is finally worked out and equals to the system blackout risk.
After state and failure value calculations of the SF-network, every state and failure gets a value that quantifies the expected loss after its occurrence.
The critical failures at a state can be identified as the ones whose F-values are higher than the S-value of the state. The critical component index (CCI) can be calculated by summing up the risks of the critical failures in the SF-network.
(21) |
(22) |
where is the indicator function; is the number of states at the
Since the probabilities of hidden failures and overloaded failures are distinguishable in the SF-network, the hCCI similar to the CCI can be obtained by:
(23) |
Thus, the lines with high are the critical hidden failure lines.
Once the critical hidden failure lines are identified, the result should be verified to validate the SF-network method. To this end, the risk of the system after the critical lines are upgraded needs to be re-estimated to quantify the effect. Since the risk assessment can measure the robustness of the power system to withstand the hidden failures and cascading failures, upgrading the most critical lines should be the most effective in increasing the robustness and decreasing the risks [
However, the sufficiently searched SF-network has already had the information of states and cascading failures. Thus, the new risks can be re-calculated by the state and failure value calculations in the SF-network with changed failure probabilities. More specifically, the hidden failure probability of line is decreased from to , then the probabilities of the failures in SF-network are all re-calculated according to (2)(4), (13), (14). Afterwards, the calculation algorithm in Section III-D obtains a new S-value of , which is the new risk . The risk drop between the original risk and the new risk is derived from
(24) |
where upgrading the most critical hidden failure is expected to achieve the largest .
Compared with the repetitive load flow and re-dispatch computations during the search to form the new SF-network, the re-calculations in the SF-network are all algebraic calculations which only take very short time. Besides, the efficiency of risk re-estimation also makes the SF-network method superior to the MC sampling methods.
As hidden failure probability changes only impact the probabilities in the SF-network, (24) can be rewritten as:
(25) |
where is the final stage of SF-network. It indicates that the system risk changes can be summed up by the risk changes of the states. Though the is expected to be positive when the failure probabilities of critical lines are decreased, some of the items in (25) can be negative, indicating the risks increase of some states.
The reason lies in the interaction effects of the probabilities of the line failures at a state. Since it always holds at an arbitrary state that:
(26) |
where is the set of failures that occur at state .
Thus, the probability decreases of some failures might increase the occurrence probabilities of the other failures. According to the probability relationship between the failures and states in (16), the items can be either larger or smaller than the corresponding items in (25). Then, (25) can be rewritten as:
(27) |
where is the th state in the th state in SF-network; and are the failure probabilities of state with the risk before and after SF-network updating, respectively; and are the failure probabilities of state before and after SF-network updating, respectively; and are the risk values greater and less than 0, respectively. Therefore, if , will be negative and the system risk will increase after the hidden failure probabilities are decreased. In general, the negative is more likely to result from decreasing the probabilities of non-critical hidden failure lines, which increases the probabilities of critical failures at the corresponding states.
The test program is developed and tested in MATLAB on a computer with 2.4 GHz processor and 32 GB RAM. In both the cases below, and for all lines, and . The system data are accessible in [
The initial operation point is set based on the settings in [
It takes 3546 searches and 170.17 s to form the SF-network, and the estimated risk is 26.63 MW. The risks estimated by four groups of random MC samples (each group takes about 680 s) and the SF-network are given in

Fig. 8 Cascading failure risks estimated by MC sampling method and SF-network method in IEEE 14-bus system.
After the state and failure value calculations in the SF-network, the CCIs are calculated according to (21) and listed in
The hCCIs are obtained according to (23) and listed in

Fig. 9 Identified critical lines in IEEE 14-bus system.
To verify the identified critical hidden failure lines, three groups of lines are chosen to be upgraded in different scenarios according to their rankings in
The complementary cumulative distribution functions (CCDFs) of the blackouts in the scenarios are given in

Fig. 10 CCDFs of blackouts after upgrading different groups of lines in IEEE 14-bus system.
It can be also seen that the hidden failures have greater influence on the probabilities of blackouts with larger scale than those of blackouts with smaller scale.
Here, each risk re-estimation by the calculations in the SF-network takes only about 2 s, which demonstrates the efficiency of the SF-network method. The upgrading of the top-ranking lines can achieve the highest risk drop of 0.2548 MW, whereas the upgrading of the middle-ranking lines have very few effects. Particularly, the risk drops of the group of lowest ranking lines are negative. As seen from the corresponding and of the lowest ranking lines, the probability decreases of their hidden failures can raise the failure probabilities of other critical lines and increase the system risk.
In addition, the risk variations in the three groups, which are 0.2548 MW, 0.0066 MW and 0.0310 MW respectively, are all smaller than the fluctuations of MC sampling method shown in
Moreover, we test the method in different situations, where the loads are set to be as 100%, 130%, 150%, 200% and 250% of their original values respectively to cover situations from the best to the worst. Then, the rankings of the lines with hCCIs in the situations obtained by the method are listed in
The operation point is set as that of the IEEE 14-bus system in the last case. The failures of lines 96 and 66 are selected as the initial contingencies.
It takes 50260 searches to form the SF-network.

Fig. 11 Cascading failure risks estimated by MC sampling and SF-network methods by searching in IEEE 118-bus system.
Compared with the IEEE 14-bus system, forming the SF-network in this case demands longer time and more computations. It takes about 4642.09 s, whereas the MC sampling method needs about 7622.80 s to gather 50260 samples. As the sufficiency of searched hidden failures is preferred in this paper, the cumulated searched new hidden failures of both SF-network method and MC sampling method are shown in

Fig. 12 New hidden failures searched by SF-network and MC sampling methods in IEEE 118-bus system.
The hCCIs of the lines in the IEEE 118-bus system are obtained by the SF-network and listed in
To verify the identified critical hidden failure lines, three groups of lines (the top-ranking lines 115, 117, 65, 60, the middle-ranking lines 12, 112, 136, 113 and the low-ranking lines 66, 75, 95, 51) are chosen based on their rankings in the table. The hidden failure probabilities of the chosen lines are reduced from 0.01 to 0.001 to simulation system improvement measures. Re-estimating the risks by the calculations in the SF-network takes about 17 s, which is dramatically shorter than the time consumptions of forming the SF-network or MC sampling methods.
The CCDFs of the blackouts after upgrading different groups of lines are given in

Fig. 13 CCDFs of blackouts after upgrading different groups of lines in IEEE 118-bus system.
According to [
This paper proposes a SF-network method to identify the critical hidden failure lines. The searching to form the SF-network insures that sufficient hidden failures are searched and the duplicated searches of failure chains are avoided. When the SF-network is formed, the state and failure value calculations in the SF-network can efficiently obtain the indices to identify the critical hidden failure lines and achieve risk estimations for verifications of the identification. In comparison with the commonly used sampling based methods, the proposed method can achieve not only more accurate risk estimations, but also high efficiency in risk re-estimations. The simulations validate that the accuracy and efficiency of the proposed method.
Our future work includes forming the SF-network by less searches and applications of the method in more complicated analyses.
References
M. Vaiman, K. Bell, Y. Chen et al., “Risk assessment of cascading outages: methodologies and challenges,” IEEE Transactions on Power Systems, vol. 27, no. 2, pp. 631-641, May 2012. [Baidu Scholar]
O. P. Veloza and F. Santamaria, “Analysis of major blackouts from 2003 to 2015: classification of incidents and review of main causes,” The Electricity Journal, vol. 29, no. 7, pp. 42-49, Sept. 2016. [Baidu Scholar]
J. Thorp, A. Phadke, S. Horowitz et al., “Anatomy of power system disturbances: importance sampling,” International Journal of Electrical Power & Energy Systems, vol. 20, no. 2, pp. 147-152, Feb. 1998. [Baidu Scholar]
A. Phadke and J. S. Thorp, “Expose hidden failures to prevent cascading outages in power systems,” IEEE Computer Applications in Power, vol. 9, no. 3, pp. 20-23, Jul. 1996. [Baidu Scholar]
J. Chen, J. S. Thorp, and I. Dobson, “Cascading dynamics and mitigation assessment in power system disturbances via a hidden failure model,” International Journal of Electrical Power & Energy Systems, vol. 27, no. 4, pp. 318-326, May 2005. [Baidu Scholar]
L. Zhao, X. Li, M. Ni et al., “Review and prospect of hidden failure: protection system and security and stability control system,” Journal of Modern Power Systems and Clean Energy, vol. 7, no. 6, pp. 1735-1743, Nov. 2019. [Baidu Scholar]
A. G. Phadke, P. Wall, L. Ding et al., “Improving the performance of power system protection using wide area monitoring systems,” Journal of Modern Power Systems & Clean Energy, vol. 4, no. 3, pp. 319-331, Jul. 2016. [Baidu Scholar]
Z. Jiao, H. Gong, and Y. Wang, “A d-s evidence theory-based relay protection system hidden failures detection method in smart grid,” IEEE Transactions on Smart Grid, vol. 9, no. 3, pp. 2118-2126, Sept. 2016. [Baidu Scholar]
H. F. Albinali and A. P. S. Meliopoulos, “Resilient protection system through centralized substation protection,” IEEE Transactions on Power Delivery, vol. 33, no. 3, pp. 1418-1427, Jun. 2018. [Baidu Scholar]
Y. Cai, Y. Cao, Y. Li et al., “Cascading failure analysis considering interaction between power grids and communication networks,” IEEE Transactions on Smart Grid, vol. 7, no. 1, pp. 530-538, Jan. 2016. [Baidu Scholar]
Y. Han, C. Guo, S. Ma et al., “Modeling cascading failures and mitigation strategies in pmu based cyber-physical power systems,” Journal of Modern Power Systems & Clean Energy, vol. 6, no. 5, pp. 944-957, Sept. 2018. [Baidu Scholar]
Y. Wang and H. Pham, “A multi-objective optimization of imperfect preventive maintenance policy for dependent competing risk systems with hidden failure,” IEEE Transactions on Reliability, vol. 60, no. 4, pp. 770-781, Dec. 2011. [Baidu Scholar]
B. Liu, R. Yeh, M. Xie et al., “Maintenance scheduling for multicomponent systems with hidden failures,” IEEE Transactions on Reliability, vol. 66, no. 4, pp. 1280-1292, Dec. 2017. [Baidu Scholar]
K. Bae and J. S. Thorp, “A stochastic study of hidden failures in power system protection,” Decision Support Systems, vol. 24, no.3, pp. 259-268, Jan. 1999. [Baidu Scholar]
D. C. Elizondo, J. L. Ree, A. G. Phadke et al., “Hidden failures in protection systems and their impact on wide-area disturbances,” in Proceedings of 2001 IEEE PES Winter Meeting, Columbus, USA, Jan. 2001, pp. 710-714. [Baidu Scholar]
F. Yang, A. S. Meliopoulos, G. J. Cokkinides et al., “Effects of protection system hidden failures on bulk power system reliability,” in Proceedings of 38th North American Power Symposium, Carbondale, USA, Sept. 2006, pp. 517-523. [Baidu Scholar]
N. A. Salim, M. M. Othman, I. Musirin et al., “Risk assessment of cascading collapse considering the effect of hidden failure,” in Proceedings of 2012 IEEE International Conference on Power and Energy (PECon), Kota, Malaysia, Dec. 2012, pp. 778-783. [Baidu Scholar]
S. Mei, F. He, X. Zhang et al., “An improved opa model and blackout risk assessment,” IEEE Transactions on Power Systems, vol. 24, no. 2, pp. 814-823, Apr. 2009. [Baidu Scholar]
O. A. Mousavi, R. Cherkaoui, and M. Bozorg, “Blackouts risk evaluation by Monte-Carlo simulation regarding cascading outages and system frequency deviation,” Electric Power Systems Research, vol. 89, pp. 157-164, Aug. 2012. [Baidu Scholar]
Y. Cai, Y. Li, Y. Cao et al., “Modeling and impact analysis of interdependent characteristics on cascading failures in smart grids,” International Journal of Electrical Power & Energy Systems, vol. 89, pp. 106-114, Jul. 2017. [Baidu Scholar]
Z. Ma, C. Shen, F. Liu et al., “Fast screening of vulnerable transmission lines in power grids: a pagerank-based approach,” IEEE Transactions on Smart Grid, vol. 10, no. 2, pp. 1982-1991, Dec. 2017. [Baidu Scholar]
J. Guo, F. Liu, J. Wang et al., “Toward efficient cascading outage simulation and probability analysis in power systems,” IEEE Transactions on Power Systems, vol. 33, no. 3, pp. 2370-2382, May 2018. [Baidu Scholar]
S. Tamronglak, S. Horowitz, A. Phadke et al., “Anatomy of power system blackouts: preventive relaying strategies,” IEEE Transactions on Power Delivery, vol. 11, no. 2, pp. 708-715, Apr. 1996. [Baidu Scholar]
L. Li, H. Wu, Y. Song et al., “A state-failure-network method to identify critical components in power systems,” Electric Power Systems Research, vol. 181, Apr. 2020. [Baidu Scholar]
L. Li, H. Wu, and Y. Song, “Temporal difference learning based critical component identifying method with cascading failure data in power systems,” in Proceedings of 2018 IEEE PES General Meeting (PESGM), Portland, USA, Aug. 2018, pp. 1-5. [Baidu Scholar]
R. Yao, S. Huang, K. Sun et al., “A multi-timescale quasi-dynamic model for simulation of cascading outages,” IEEE Transactions on Power Systems, vol. 31, no. 4, pp. 3189-3201, Sept. 2015. [Baidu Scholar]
A. Lipowski and D. Lipowska, “Roulette-wheel selection via stochastic acceptance,” Physica A: Statistical Mechanics and its Applications, vol. 391, no. 6, pp. 2193-2196, Mar. 2012. [Baidu Scholar]
Y. Jia, Z. Xu, L. Lai et al., “Risk-based power system security analysis considering cascading outages,” IEEE Transactions on Industrial Informatics, vol. 12, no. 2, pp. 872-882, Apr. 2016. [Baidu Scholar]
B. A. Carreras, V. E. Lynch, M. Sachtjen et al., “Modeling blackout dynamics in power transmission networks with simple structure,” in Proceedings of the 34th Annual Hawaii International Conference on System Sciences, Washington DC, USA, Jan. 2001, pp. 719-727. [Baidu Scholar]
S. Soltan, D. Mazauric, and G. Zussman, “Analysis of failures in power grids,” IEEE Transactions on Control of Network Systems, vol. 4, no. 2, pp. 288-300, Nov. 2015. [Baidu Scholar]
R. Yao, S. Huang, K. Sun et al., “Risk assessment of multi-timescale cascading outages based on markovian tree search,” IEEE Transactions on Power Systems, vol. 32, no. 4, pp. 2887-2900, Oct. 2016. [Baidu Scholar]
Power Systems Engineering Research Center (PSERC). (2019, Jun.). Matpower (version 7.0). [Online]. Available: https://matpower.org/ [Baidu Scholar]
L. Che, X. Liu, Y. Wen et al., “A mixed integer programming model for evaluating the hidden probabilities of N-k line contingencies in smart grids,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 1036-1045, Oct. 2017. [Baidu Scholar]
I. Dobson, B. A. Carreras, and D. E. Newman, “How many occurrences of rare blackout events are needed to estimate event probability?” IEEE Transactions on Power Systems, vol. 28, no. 3, pp. 3509-3510, Aug. 2013. [Baidu Scholar]