Abstract
This paper proposes a neural-network-based state estimation (NNSE) method that aims to achieve higher time efficiency, improved robustness against noise, and extended observability when compared with the conventional weighted least squares (WLS) state estimation method. NNSE consists of two parts, the linear state estimation neural network (LSE-net) and the unobservable state estimation neural network (USE-net). The LSE-net functions as an adaptive approximator of linear state estimation (LSE) equations to estimate the nominally observable states. The inputs of LSE-net are the vectors of synchrophasors while the outputs are the estimated states. The USE-net operates as the complementary estimator on the nominally unobservable states. The inputs are the estimated observable states from LSE-net while the outputs are the estimation of nominally unobservable states. USE-net is trained off-line to approximate the veiled relationship between observable states and unobservable states. Two test cases are conducted to validate the performance of the proposed approach. The first case, which is based on the IEEE 118-bus system, shows the comprehensive performance of convergence, accuracy, and robustness of the proposed approach. The second case study adopts real-world synchrophasor measurements, and is based on the Jiangsu power grid, which is one of the largest provincial power systems in China.
MODERN power systems rely on many autonomous control algorithms to improve the response speed and decision-making toward system state changes [
As the number of phasor measurement unit (PMU) installations continues increasing world-wide, some regional systems have become nominally observable due to pure PMU data available at the transmission level [
The concept of pure PMU data-driven LSE is first proposed in [
Many data pre-processing methods have been proposed to aid the application of LSE, such as statistical methods that remove outlier measurements in an iterative manner [
In recent studies, NNs have shown high time efficiency and robustness compared with traditional methods in power system state estimation. For the state estimation at the transmission level, a deep NN-based online estimation method is proposed in [
This paper proposes a NN-based state estimation (NNSE) method that aims to achieve higher time efficiency, better robustness against noise, and expanded observation when compared with conventional WLS state estimation methods.
The contributions of this paper are threefold: a novel NN based state estimation incorporating the LSE formulation is proposed; a deep NN is proposed to expand the system observability by building connections between observable and unobservable states; parallel and distributed formulations are developed to improve the computational efficiency of the proposed approach for large-scale power systems.
This paper is organized as follows. Section II briefly discusses the formulations of LSE. Section III introduces the proposed NNSE method, including an NN-based LSE (NNLSE), an NN-based unobservable state estimation (NNUSE), and a multi-thread training and updating architecture. Case studies are discussed in Section IV while future work and conclusions are drawn in Section V and Section VI.
LSE leverages the linear relationship between the voltage and current phasors. The PMUs are usually installed at the ends of lines, and their measurements include the three-phase current and voltage phasors in polar coordinates. Transmission systems are usually considered to be three-phase-balanced in this analysis. Hence, positive sequence measurements can be extracted from three-phase measurements through the phase-to-sequence transformation in (1).
(1) |
where is the sequence voltage phasor vector, which includes zero, positive, and negative sequence measurement vectors labeled as 0, 1, and 2, respectively; is the three-phase voltage phasor vector of phases a, b, and c directly from PMU measurements; and is a rotation vector equals . LSE at the transmission level is generally implemented upon the positive sequence measurements [
For a system with nodes and lines, in which some nodes and lines are deployed with PMUs so that there are voltage measurements and current measurements, the state vector includes the voltage phasors of all nodes. The measurement vector includes the voltage and current phasors of the nodes with PMU installation. The measurement model of PMU data can be derived from Ohm’s law as formulated as:
(2) |
where is the from-end system admittance matrix used to calculate the current injection at the “from” end of the measured lines; is the current phasor measurement vector; and is the relationship matrix between the state vector and voltage phasor measurement vector . If the voltage phasor of node is the component in the measurement vector of voltage phasors, then ; otherwise , where is the element of on the row and column.
By combining the voltage and current measurements into one formulation, the measurement model of PMU data can be represented by the complex matrix in (3).
(3) |
Although the model in (3) is linear, its components are complex numbers. It can be further expanded into the rectangular-coordinate formulation in (4). The corresponding measurement model becomes (5).
(4) |
(5) |
where and are the functions that take the real part and imaginary part of a complex number, respectively; and are the real and imaginary parts of the matrix , respectively [
Based on the formulation in (5), it is possible to solve the states directly. The solution of is given in (6).
(6) |
where is a diagonal matrix, of which the diagonal components are weights for the corresponding measurements [

Fig. 1 Flowchart of NNSE method.
The estimated states are further fed into the unobservable state estimation neural network (USE-net) to get the estimations of unobservable states. The final estimation is a concatenation of the estimated observable and unobservable states. The USE-net is an off-line trained NN that learns the veiled relationship between observable states and unobservable states. The training data set consists of simulation data and historical data. Simulation data set up the baseline of outputs, and the historical data help the estimator to capture the recent slow dynamics of the system and are updated periodically. The training process of the USE-net consumes more time than the online training of LSE-net. Hence, the time intervals for updating the USE-net parameters are longer. To avoid the conflict between the networks, updating of LSE-net and USE-net is performed by two independent threads. This multi-thread updating architecture aims to reduce the estimation time and prevent numerical failure propagation.

Fig. 2 Schematic diagram of NNLSE.
This subsection introduces the proposed NNLSE from three aspects: the architecture of NN-based estimator; the BP and loss function of NNLSE; and the SGD training of NNLSE.
Feed-forward NNs are widely used for universal function approximation [

Fig. 3 Typical feed-forward NN architecture.
[()] | (7) |
where and are the weight matrices for the hidden layer and output layer, respectively; and are the bias vectors applied to the hidden layer and output layer, respectively; and and are the activation functions that introduce nonlinearity to the outputs of the hidden layer and output layer, respectively. Reference [
() | (8) |
As shown in
(9) |
where and are the element of and , respectively; is the dimension of the measurement vector; and is the loss calculated after the measurement model because the values of actual states are unknown and the target values of states are not accessible. However, the measurements and the estimated measurements are comparable and can reflect the gaps between the estimated states and the actual states, which are minimized indirectly by minimizing the measurement residual through (9).
LSE-net is updated by BP through online training. The gradient, which is also known as partial derivative, of the loss function to each network parameter is calculated through the chain-rule and multiplied by a learning rate to get the update step size. The gradient of LSE-net output is derived separately in (10) because the BP through measurement model is a specialized part of NNLSE. The inherent BP within LSE-net is after the gradient of as formulated in (11).
(10) |
(11) |
where is the intermediate output of the hidden layer; and is the pseudo inverse of the measurement model calculated using the Moore-Penrose method [
With the loss function, as well as the gradient and learning rate determined, the NN can be updated in a gradient descent (GD) manner to minimize the loss as formulated in (12).
(12) |
where is the learning rate, whose value is usually tuned between 0.0001 and 0.01.
In traditional GD optimization, the average gradient of all data points is used to update the estimation. In NN training, GD is still an efficient method for linear or quadratic cases. However, in non-linear and non-convex cases, the averaged gradient may lead the network toward a local minimum and stop updating. In some other cases, the training data may come in batches, and they become time-costly to wait for the entire training data set to be available. SGD optimization is introduced to handle these issues. As a result, SGD updates the network parameters with the average gradient of a subset of all data points, and iterates through the data set until every subset is visited. SGD has been proven to have better performance than GD in both computational complexity and converging speed [
The training process of the LSE-net is an SGD process. The estimation and update are performed on each data point, meaning the batch size is one, and the average gradient of the subset is the gradient itself. Moreover, the batch size is adjustable depending on the accuracy and speed requirement. When the batch size changes, the update step size is calculated upon the average gradient of the data in that batch.

Fig. 4 Schematic diagram of NNUSE.
Off-line training is unavoidably time-consuming. With a large amount of data and input and output dimensions, the network loss can take hours to reach the convergence tolerance. Also, the intermediate status of estimators cannot be applied to online estimation, and only the well-trained one can. Hence, the USE-net is unable to capture system dynamics with time constants lower than hours. The USE-net parameters can be updated in the online estimation adapting to slow dynamics. For instance, to capture the system dynamics to a certain extent, the USE-net parameters can be refreshed every few hours by the off-line re-trained network. The training data used for the re-training process is collected dynamically. In this way, the estimation of the USE-net is expected to be more accurate.
An intrinsic challenge associated with NNs is its scalability. As the dimensions of the inputs increase, both the weight matrices and bias vectors will increase proportionally. Therefore, the processing time and memory required by the training and estimation computation increase exponentially with the dimension of inputs and outputs. Since the dimensions of measurement vectors, observable states, and measurement model are fixed, the computation complexity of the LSE model does not have much room for improvement in terms of time consumption. However, the NNUSE approach introduced in Section III-B can be further improved through decomposition and parallelism techniques.
The changes of the states are caused by load condition variation. We observe that load profiles tend to be similar among adjacent nodes. Inspired by the K-nearest neighbor (KNN) algorithm, we propose the distributed-NNUSE architecture that decouples the estimation of unobservable states into parallel processes.

Fig. 5 Distributed-NNUSE architecture.
The total number of unobservable nodes is . Each USE-net only estimates the states of one unobservable node with the input of states from its nearest observable nodes in terms of electrical distance. The number of input substations is a hyperparameter that needs to be fine-tuned. With this architecture, not only the dimension is reduced, but also the unobservable states can be estimated in parallel to achieve higher time efficiency.
As mentioned in Sections III-A and III-B, online estimation, BP, as well as the updating of LSE-net and USE-net are performed on individual threads. To coordinate them so that they work together and minimize the risk of interrupting online estimation, a multi-thread NN training and updating architecture is proposed.

Fig. 6 Training and updating architecture of multi-thread estimator.
In NN training, BP consumes the majority of computation time, and this can be volatile. If the BP is included in the online state estimation, not only is the average time efficiency compromised, but it is also difficult to guarantee the upper bound of time consumption at each step. With this multi-thread architecture, the time consumption and the unpredictable part of each training are removed from the online estimation. The updating of LSE-net and USE-net is decoupled as well. Both NN-based estimators can update at their own frequencies without interrupting the online estimation.
The proposed NNSE method is tested in two systems. The IEEE 118-bus system is used in the first case study to show the comprehensive performance in large-scale systems in terms of estimation accuracy, time efficiency, and robustness against noise. LSE is applied to this system as the benchmark. The data are generated in MATPOWER, a power system analysis toolbox run in MATLAB [
To justify the performance of the proposed method in real-world applications, we applied NNSE to a practical system using real PMU data in the second case. This test system is the high-voltage transmission system of the Jiangsu power grid in China. Both PMU data and SCADA results are collected and stored in time series. NNSE and LSE are performed based on PMU data, while SCADA acts as the reference to check the accuracy when their time stamps overlap. Note that the data are not collected in real time, but are read from a database in time-series order so that the proposed method acts as an online estimation in the simulation. The PMU reporting frequency in the Jiangsu system is 25 Hz.

Fig. 7 IEEE 118-bus system topology.

Fig. 8 Convergence of network loss and estimation error. (a) NN loss. (b) Estimation error.
One of the motivations of the proposed method is to improve the computation efficiency. The time consumption of the proposed method is compared with several versions of LSE, which is then solved upon different matrix handling algorithms. The base method is labeled as LSE-PI, which solves matrix when using the Moore-Penrose algorithm [

Fig. 9 Step-wise time consumption comparison.
It is indisputable that LSE is the “optimal” state estimation solution in theory. However, measurement noise is unavoidable in real-world measurements. Therefore, the robustness against noise is important for online state estimation. In order to test the robustness of the proposed method, the estimation error and standard deviation (STD) are compared with LSE results at five different noise levels. To separate the impact from the warm-up stage of the LSE-net, the estimator, in this case, is pre-trained to a suboptimal solution. The comparison of the estimation performance is summarized in

Fig. 10 Comparison of estimation error against different noise levels. (a) Comparison of RMSE. (b) Comparison of STD.
Examples of step-wise estimation error curves are given in

Fig. 11 Estimation error trajectories at 0.01 noise level.

Fig. 12 Estimation of bus 19. (a) Estimation of voltage magnitude. (b) Estimation of voltage angle.
The leverage measurements have higher impact on the performance of LSE, and therefore they are more likely to be attacked [
In real-time power system operation, the states are always changing due to the volatility of load and generation. The state estimation algorithms are expected to capture the relatively slow dynamics of state changes. In this case, a load ramp-down scenario is designed to test the estimation performance under state changes. The average ramp-down rate is 1% per second, which is a steep change for transmission-level power systems. The average load factor decreases from 1 to 0.99 in the 1-second interval from the
The average estimation error and the corresponding STD during the 1-second ramp-down window are summarized in
The step-wise estimation error and the load profile curves are shown in

Fig. 13 Step-wise estimation error under ramp-down transient.

Fig. 14 Estimation of bus 19 in ramp case. (a) Estimation of voltage magnitude. (b) Estimation of voltage angle.
Breaker status change and line tripping happen in power system operation from time to time. It is important that state estimation algorithms are capable of adapting to topology changes. This test case compares the performance of LSE and NNLSE in a topology change scenario, where the line from bus 3 to bus 5 is opened at the second. The dimensions of the state vector and measurement vector are the same before and after the topology change; thus the dimensions of the NN-based estimator are compatible with the new measurement model. The fast transient of opening the breaker is neglected, and the system topology is known before and after the change.

Fig. 15 Estimation of bus 19 in topology change case. (a) Estimation of voltage magnitude. (b) Estimation of voltage angle.
Data transmission failure is unavoidable in practice, including missing data, package inversion and data displacement within a package. These three types of data failure are considered in this test to validate the performance of the proposed method under data failure. The load condition considered here is a slow ramping decreasing from 1.0 to 0.99 within the 10-second time window, to differentiate the effect of the failed data from the correct ones. The length of a data package under consideration is 1 s, which corresponds to 50 data points. The estimation results in three scenarios are summarized in
The Jiangsu power grid has the second highest provincial energy consumption in China. The power grid also has four HVDC terminals, which receive power from Shanxi, Sichuan, Hubei, and Inner Mongolia. Jiangsu power grid consists of thousands of nodes at the transmission level. More importantly, it has the largest PMU installation number compared with other provincial power grids in China. Its large system scale complexity, high requirement for system stability, and need for good observability due to extensive PMU coverage, together make the Jiangsu power grid suitable for LSE and NNSE study.
This numerical experiment covers the high-voltage (220 kV and above) transmission system of Jiangsu power grid that includes 763 substations of 230 kV, 525 kV, and 1000 kV. The numbers of substations of 1000 kV, 525 kV, and 230 kV are 4, 103, and 656, respectively. One hundred and thirty two substations are equipped with high-quality and reliable PMUs, resulting in 235 substations nominally observable. The states of the observable substations are estimated by NNLSE. The remaining 528 nominally unobservable substations are estimated via NNUSE. The SCADA results, offering the observability of the entire system, are collected to validate estimation accuracy. SCADA results used are essentially the state estimation solutions from the energy management system (EMS), which are generally considered to be accurate. We use the SCADA-based LSE results as the reference to validate the accuracy of the proposed NNSE method.
As discussed in Section III-B-2), applying NNUSE on all unobservable states together causes a scalability issue. Therefore, we implement distributed NNUSE architecture to reduce training and estimation time through parallelism. First, a sensitivity study on the hyperparameter of the input dimension is performed.
A comprehensive NNSE is performed on the Jiangsu power grid based on a 5-input NNUSE, and a comparison with LSE is summarized in
The simulation results above justifies the feasibility and superiority of the proposed NNSE method. The NNLSE for the observable state estimation is tested in the IEEE 118-bus system. Its convergence speed is fast (less than 1 s) and its time consumption is approximately 50% lower than the conventional LSE method. The reliability of NNLSE is also examined in various scenarios, including noise sensitivity study on measurements, especially the leverage measurements, load condition change scenarios, topology change scenarios, and data transmission failure scenarios. The estimation accuracy and standard deviation of NNLSE are lower than the traditional LSE method in all testing scenarios, indicating it has better robustness and reliability than LSE in online estimation. The NNUSE method is mainly tested in the large-scale Jiangsu power grid. The optimal hyperparameter of NNUSE input number is selected based on a sensitivity study. The resulting estimation error of the unobservable states is, though higher than the observable states, acceptably low for providing some insights on the operation conditions of the traditionally unobservable nodes. The time consumption of the entire NNSE method is less than 10 ms, so that running the observable and unobservable state estimation on the frequency of PMU data reporting frequent (60 Hz and below) is well supported.
The proposed NNSE method provides a novel solution to online state estimation in large-scale power systems. The NN-based estimator achieves online state estimation at a higher frequency than traditional methods. The proposed method also shows its superiority in robustness against noise. The unobservable states are estimated using a data-driven approach. The multi-thread updating architecture improves the stability and time efficiency of the online estimation process. The proposed method is tested in IEEE 118-bus system (small scale with simulation results) and the Jiangsu power grid (large scale with real PMU/SCADA data), respectively. The convergence, accuracy, time efficiency, and robustness of the proposed method have been validated through numerical experiments. Although this method improves the online state estimation performance, there are still some improvements that can be made in the future. The design of unobservable state estimator could be further improved. As the USE-net is trained off-line, there is great potential to explore more sophisticated models such as a recurrent neural network (RNN), so as to take temporal information into account to further improve the accuracy.
References
N. K. Saxena, “Voltage control by optimized participation of reactive power compensation using fixed capacitor and STATCOM,” in Optimization of Power System Problems, 1st ed. Cham, Switzerland: Springer, 2020, pp. 313-364. [Baidu Scholar]
A. Abur and A. G. Exposito, “Network observability analysis,” in Power System State Estimation, 1st ed. New York: Marcel Dekker, 2004, pp. 74-113. [Baidu Scholar]
K. R. Mestav, J. Luengo-Rozas, and L. Tong, “Bayesian state estimation for unobservable distribution systems via deep learning,” IEEE Transactions on Power Systems, vol. 34, no. 6, pp. 4910-4920, May 2019. [Baidu Scholar]
A. Monticelli, “Real-time modeling of power networks,” in State Estimation in Electric Power Systems, 1st ed. Cham, Switzerland: Springer, 1999, pp. 1-13 [Baidu Scholar]
J. R. Gracia, M. A. Young, D. T. Rizy et al. (2016, Mar.). Advancement of synchrophasor technology. [Online]. Available:https://www.smartgrid.gov/document/Synchrophasor_Report_201603.html [Baidu Scholar]
L. Zhang, A. Bose, A. Jampala et al., “Design, testing, and implementation of a linear state estimator in a real power system,” IEEE Transactions on Smart Grid, vol. 8, no. 4, pp. 1782-1789, Jan. 2016. [Baidu Scholar]
R. Raz, “On the complexity of matrix product,” in Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, Montreal, Canada, May 2002, pp. 144-151. [Baidu Scholar]
E. Klarreich, “Multiplication hits the speed limit,” Communications ACM, vol. 63, no. 1, pp. 11-13, Dec. 2019. [Baidu Scholar]
A. Phadke, J. Thorp, R. Nuqui et al., “Recent developments in state estimation with phasor measurements,” in Proceedings of 2009 IEEE/PES Power Systems Conference and Exposition, Seattle, USA, Mar. 2009, pp. 1-7. [Baidu Scholar]
M. Netto and L. Mili, “Robust data filtering for estimating electromechanical modes of oscillation via the multichannel prony method,” IEEE Transactions on Power Systems, vol. 33, no. 4, pp. 4134-4143, Nov. 2017. [Baidu Scholar]
J. Zhu and A. Abur, “Improvements in network parameter error identification via synchronized phasors,” IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 44-50, Aug. 2009. [Baidu Scholar]
J. Chen and A. Abur, “Placement of PMUs to enable bad data detection in state estimation,” IEEE Transactions on Power Systems, vol. 21, no. 4, pp. 1608-1615, Oct. 2006. [Baidu Scholar]
L. Zhang, G. Wang, and G. B. Giannakis, “Real-time power system state estimation and forecasting via deep unrolled neural networks,” IEEE Transactions on Signal Processing, vol. 67, no. 15, pp. 4069-4077, Jul. 2019. [Baidu Scholar]
E. Manitsas, R. Singh, B. C. Pal et al., “Distribution system state estimation using an artificial neural network approach for pseudo measurement modeling,” IEEE Transactions on Power Systems, vol. 27, no. 4, pp. 1888-1896, Apr. 2012. [Baidu Scholar]
G. Tian, Q. Zhou, R. Birari et al., “A hybrid-learning algorithm for online dynamic state estimation in multimachine power systems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5497-5508, Dec. 2020. [Baidu Scholar]
L. Wang, Q. Zhou, and S. Jin, “Physics-guided deep learning for power system state estimation,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 4, pp. 607-615, Jun. 2020. [Baidu Scholar]
R. H. Park, “Two-reaction theory of synchronous machines generalized method of analysis: Part Ⅰ,” Transactions of the American Institute of Electrical Engineers, vol. 48, no. 3, pp. 716-727, Jul. 1929. [Baidu Scholar]
A. G. Phadke and J. S. Thorp, “Phasor estimation of nominal frequency inputs,” in Synchronized Phasor Measurements and Their Applications, 1st ed. Cham, Switzerland: Springer, 2008, pp. 29-48. [Baidu Scholar]
E. H. Moore, “On the reciprocal of the general algebraic matrix,” Bulletin of American Mathematical Society, vol. 26, pp. 394-395, Jun. 1920. [Baidu Scholar]
G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303-314, Dec. 1989. [Baidu Scholar]
B. Karlik and A. V. Olgac, “Performance analysis of various activation functions in generalized mlp architectures of neural networks,” International Journal of Artificial Intelligence and Expert Systems, vol. 1, no. 4, pp. 111-122, Feb. 2011. [Baidu Scholar]
O. Shamir and T. Zhang, “Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes,” in Proceedings of International Conference on Machine Learning, Atlanta, USA, Feb. 2013, pp. 71-79. [Baidu Scholar]
MATPOWER: A MATLAB Power System Simulation Package, 1st ed., Power Systems Engineering Research Center, Ithaca, USA, 1997, pp. 1-10. [Baidu Scholar]
R. Penrose, “A generalized inverse for matrices,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 51, pp. 406-413, Jul. 1955. [Baidu Scholar]
K. Tanabe and M. Sagae, “An exact cholesky decomposition and the generalized inverse of the variance–covariance matrix of the multinomial distribution, with applications,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 54, no. 1, pp. 211-219, Sept. 1992. [Baidu Scholar]
T. A. Davis and W. W. Hager, “Modifying a sparse Cholesky factorization,” SIAM Journal on Matrix Analysis and Applications, vol. 20, no. 3, pp. 606-627, Mar. 1999. [Baidu Scholar]
A. Abur, F. Magnago, and F. Alvarado, “Elimination of leverage measurements via matrix stretching,” International Journal of Electrical Power & Energy Systems, vol. 19, no. 8, pp. 557-562, Nov. 1997. [Baidu Scholar]
A. Majumdar and B. C. Pal, “Bad data detection in the context of leverage point attacks in modern power networks,” IEEE Transactions on Smart Grid, vol. 9, no. 3, pp. 2042-2054, Oct. 2016. [Baidu Scholar]