Neural-network-based Power System State Estimation with Extended Observability

Guanyu Tian; Yingzhong Gu; Di Shi; Jing Fu; Zhe Yu; Qun Zhou

网刊加载中。。。

使用Chrome浏览器效果最佳，继续浏览，你可能不会看到最佳的展示效果，

确定继续浏览么?

复制成功，请在其他浏览器进行阅读

Neural-network-based Power System State Estimation with Extended Observability PDF

- ORCID：
Guanyu Tian
✉
- ORCID：
Yingzhong Gu
✉
- ORCID：
Di Shi
✉
- ORCID：
Jing Fu
✉
- ORCID：
Zhe Yu
✉
- ORCID：
Qun Zhou
✉

Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32816, USA； GEIRI North America, San Jose, CA 95134, USA； State Grid Jiangsu Electric Power Company, Nanjing, China

Updated：2021-09-27

DOI：10.35833/MPCE.2020.000362

Abstract

This paper proposes a neural-network-based state estimation (NNSE) method that aims to achieve higher time efficiency, improved robustness against noise, and extended observability when compared with the conventional weighted least squares (WLS) state estimation method. NNSE consists of two parts, the linear state estimation neural network (LSE-net) and the unobservable state estimation neural network (USE-net). The LSE-net functions as an adaptive approximator of linear state estimation (LSE) equations to estimate the nominally observable states. The inputs of LSE-net are the vectors of synchrophasors while the outputs are the estimated states. The USE-net operates as the complementary estimator on the nominally unobservable states. The inputs are the estimated observable states from LSE-net while the outputs are the estimation of nominally unobservable states. USE-net is trained off-line to approximate the veiled relationship between observable states and unobservable states. Two test cases are conducted to validate the performance of the proposed approach. The first case, which is based on the IEEE 118-bus system, shows the comprehensive performance of convergence, accuracy, and robustness of the proposed approach. The second case study adopts real-world synchrophasor measurements, and is based on the Jiangsu power grid, which is one of the largest provincial power systems in China.

Keywords

State estimation; linear state estimation; stochastic gradient descent; neural network; wide area management system (WAMS).

I. Introduction

MODERN power systems rely on many autonomous control algorithms to improve the response speed and decision-making toward system state changes [

1]. Since many of these algorithms are used for making decisions on system states in real time, numerous state estimation methods have been proposed to improve estimation accuracy, efficiency, and robustness. These methods can be classified into two categories: minimizing estimated measurement residuals and probability-based estimation. Examples of the first type include the weighted least squares (WLS) method, the weighted least absolute value (WLAV) method, and the optimization-based methods that formulate the measurement residuals in their objective functions [2]. The second type, namely probability-based estimation, tries to find the value of states with the highest likelihood, e.g., Bayesian estimation, the details of which are discussed in [3].

As the number of phasor measurement unit (PMU) installations continues increasing world-wide, some regional systems have become nominally observable due to pure PMU data available at the transmission level [

2], [4]. For example, the North American grid has over 1700 PMUs and 200 phasor data condensers (PDCs), which provide a good amount of synchrophasor data for relevant applications [5], especially for linear state estimation (LSE) methods [6]. However, the traditional LSE method reveals some drawbacks in the application of large-scale system online monitoring. The first one is the measurement noise handling capability. The measurements in practical systems typically contain noise, which is unavoidable. It is important that a state estimation algorithm can attain certain computational accuracy under the presence of noise. For example, for WLS, which is the most widely used state estimation method, the estimation accuracy is highly affected by weight matrix that describes the quality of the measurements, but the weight matrix is hard to quantify directly in practice. Hence, a noise matrix calibration free method would be beneficial. Using neural networks (NNs) is an option as it has been applied to noise filtering and data processing. Another drawback is the computational efficiency. The calculation of matrix inversion is the major contribution of LSE’s time consumption, which increases quadratically to the dimension of system states (

O (n^{2.3727})

) [7]. The forward propagation of an NN contains only multiplication and addition operations. Therefore its complexity is

O (n l g (n))

, which is much lower than a conventional LSE algorithm [8]. If an NN is able to duplicate the function of LSE, the complexity can be reduced. Considering the above two motivations, we propose the NN-based power system state estimation method in this paper.

The concept of pure PMU data-driven LSE is first proposed in [

9]. It has a higher time efficiency compared with conventional state estimation methods based on supervisory control and data acquisition (SCADA) due to its linear and non-iterative feature in the solution process. The drawbacks of PMU data-driven LSE, however, are that the calculation of the measurement model matrix

H

is computationally complex, especially with large state dimensions, and it is sensitive to erroneous measurements. Moreover, since LSE is a direct solution method, the erroneous data may significantly affect the estimation accuracy.

Many data pre-processing methods have been proposed to aid the application of LSE, such as statistical methods that remove outlier measurements in an iterative manner [

10]-[12]. Larger systems with more measurements usually have a larger number of erroneous measurements. Therefore, the time consumption involved in bad data pre-processing tends to be higher, which limits the online implementation of LSE in large-scale systems.

In recent studies, NNs have shown high time efficiency and robustness compared with traditional methods in power system state estimation. For the state estimation at the transmission level, a deep NN-based online estimation method is proposed in [

13], where the online estimator is trained off-line. This estimator is capable of estimating the observable states of the system with the high frequency input of PMU data, while the unobservable states are not considered. Since the system observability at transmission level is usually high, it is not a big issue. But, the observability of distribution systems is poor due to the limited number of measurements. Reference [3] proposes the Bayesian network method for unobservable states in distribution systems, where the Bayesian network is an off-line trained NN that predicts the probability of the unobservable states. Another method to estimate the unobservable states in the distribution systems is to generate virtual measurements. Reference [14] introduces a virtual measurement generation method using an offline trained NN. Then the unobservable states can be estimated from the hybrid measurements of real and virtual measurements. However, the NNs in these methods are all trained off-line, therefore, the accuracy cannot be guaranteed without feed-back. A hybrid learning mechanism proposed in [15] and [16] fills this gap by considering the system model into the back propagation of the NN, so that the estimator can be updated online to adjust to its online estimation error. Nevertheless, the unobservable states are not considered in the hybrid learning methods, because they are not included in the system model.

This paper proposes a NN-based state estimation (NNSE) method that aims to achieve higher time efficiency, better robustness against noise, and expanded observation when compared with conventional WLS state estimation methods.

The contributions of this paper are threefold: a novel NN based state estimation incorporating the LSE formulation is proposed; a deep NN is proposed to expand the system observability by building connections between observable and unobservable states; parallel and distributed formulations are developed to improve the computational efficiency of the proposed approach for large-scale power systems.

This paper is organized as follows. Section II briefly discusses the formulations of LSE. Section III introduces the proposed NNSE method, including an NN-based LSE (NNLSE), an NN-based unobservable state estimation (NNUSE), and a multi-thread training and updating architecture. Case studies are discussed in Section IV while future work and conclusions are drawn in Section V and Section VI.

II. Formulations of LSE

LSE leverages the linear relationship between the voltage and current phasors. The PMUs are usually installed at the ends of lines, and their measurements include the three-phase current and voltage phasors in polar coordinates. Transmission systems are usually considered to be three-phase-balanced in this analysis. Hence, positive sequence measurements can be extracted from three-phase measurements through the phase-to-sequence transformation in (1).

V_{012} = \frac{1}{3} [\begin{matrix} 1 & 1 & 1 \\ 1 & α & α^{2} \\ 1 & α^{2} & α \end{matrix}] V_{a b c}

(1)

where $V_{012}$ is the sequence voltage phasor vector, which includes zero, positive, and negative sequence measurement vectors labeled as 0, 1, and 2, respectively; $V_{a b c}$ is the three-phase voltage phasor vector of phases a, b, and c directly from PMU measurements; and $α$ is a rotation vector equals $e^{i \frac{2 π}{3}}$ . LSE at the transmission level is generally implemented upon the positive sequence measurements [

17].

For a system with $N$ nodes and $L$ lines, in which some nodes and lines are deployed with PMUs so that there are $n$ voltage measurements and $l$ current measurements, the state vector $x \in C^{N \times 1}$ includes the voltage phasors of all nodes. The measurement vector $z \in C^{(n + l) \times 1}$ includes the voltage and current phasors of the nodes with PMU installation. The measurement model of PMU data can be derived from Ohm’s law as formulated as:

\{\begin{array}{l} V = A x \\ I_{f} = Y_{f} x \end{array}

(2)

where $Y_{f} \in C^{l \times N}$ is the from-end system admittance matrix used to calculate the current injection at the “from” end of the measured lines; $I_{f}$ is the current phasor measurement vector; and $A \in R^{n \times N}$ is the relationship matrix between the state vector $x$ and voltage phasor measurement vector $V$ . If the voltage phasor of node $j$ is the $i^{t h}$ component in the measurement vector of voltage phasors, then $A_{i, j} = 1$ ; otherwise $A_{i, j} = 0$ , where $A_{i, j}$ is the element of $A$ on the $i^{t h}$ row and $j^{t h}$ column.

By combining the voltage and current measurements into one formulation, the measurement model of PMU data can be represented by the complex matrix $\dot{H}$ in (3).

z = [\begin{matrix} V \\ I_{f} \end{matrix}] = [\begin{matrix} A \\ Y_{f} \end{matrix}] x = \dot{H} x

(3)

Although the model in (3) is linear, its components are complex numbers. It can be further expanded into the rectangular-coordinate formulation in (4). The corresponding measurement model becomes (5).

\{\begin{array}{l} x = [\begin{matrix} r e a l (x) \\ i m a g (x) \end{matrix}] \\ z = [\begin{matrix} r e a l (z) \\ i m a g (z) \end{matrix}] \end{array}

(4)

z = [\begin{matrix} H_{r e a l} & - H_{i m a g} \\ H_{i m a g} & H_{r e a l} \end{matrix}] x = H x

(5)

where $r e a l (\cdot)$ and $i m a g (\cdot)$ are the functions that take the real part and imaginary part of a complex number, respectively; $H_{r e a l}$ and $H_{i m a g}$ are the real and imaginary parts of the matrix $H$ , respectively [

18]; and

H

is the linear model for LSE in rectangular form.

Based on the formulation in (5), it is possible to solve the states directly. The solution of $x$ is given in (6).

\hat{x} = (H^{T} W^{- 1} {H)}^{- 1} H^{T} W^{- 1} z

(6)

where $W \in R^{(n + l) \times (n + l)}$ is a diagonal matrix, of which the diagonal components are weights for the corresponding measurements [

19].

III. NNSE

Figure 1 is the flowchart of the proposed NNSE method. Three-phase PMU measurements are transformed into sequence data, and the positive sequence data are fed into the LSE-net. The LSE-net is the observable state estimator whose output is the estimated states $\hat{x}$ that contain both observable and unobservable states. LSE-net is randomly initialized at the beginning, and a short pre-training is needed before online implementation. The updating of LSE-net parameters is achieved by stochastic gradient descent (SGD) based backpropagation (BP) in an independent thread synchronized at PMU reporting rate. The nominally observable and unobservable states can be distinguished by their convergence. The estimation of the observable states is expected to converge to the actual states, while the estimation of unobservable states is updated less frequently.

Fig. 1 Flowchart of NNSE method.

The estimated states are further fed into the unobservable state estimation neural network (USE-net) to get the estimations of unobservable states. The final estimation $\hat{x}'$ is a concatenation of the estimated observable and unobservable states. The USE-net is an off-line trained NN that learns the veiled relationship between observable states and unobservable states. The training data set consists of simulation data and historical data. Simulation data set up the baseline of outputs, and the historical data help the estimator to capture the recent slow dynamics of the system and are updated periodically. The training process of the USE-net consumes more time than the online training of LSE-net. Hence, the time intervals for updating the USE-net parameters are longer. To avoid the conflict between the networks, updating of LSE-net and USE-net is performed by two independent threads. This multi-thread updating architecture aims to reduce the estimation time and prevent numerical failure propagation.

A. Observable State Estimation

Figure 2 shows the schematic diagram of NNLSE. LSE-net is a 3-layer feed-forward NN that takes the input of measurement vector $z$ and yields an output of the estimated state $\hat{x}$ .

Fig. 2 Schematic diagram of NNLSE.

This subsection introduces the proposed NNLSE from three aspects: the architecture of NN-based estimator; the BP and loss function of NNLSE; and the SGD training of NNLSE.

1) Proposed State Estimator

Feed-forward NNs are widely used for universal function approximation [

20]. They usually contain three or four layers: one input layer, several hidden layers, and one output layer. The layers consist of neurons, which are the smallest units needed to build a NN. Figure 3 shows a typical feed-forward NN structure. Equation (7) is the forward propagation calculation.

Fig. 3 Typical feed-forward NN architecture.

x = a_{o}

[

w_{o} a_{h}

(

w_{h} z + b_{h}

)

+ b_{o}

]

(7)

where $w_{h}$ and $w_{o}$ are the weight matrices for the hidden layer and output layer, respectively; $b_{h}$ and $b_{o}$ are the bias vectors applied to the hidden layer and output layer, respectively; and $a_{h}$ and $a_{o}$ are the activation functions that introduce nonlinearity to the outputs of the hidden layer and output layer, respectively. Reference [

21] provides a comprehensive analysis of different types of activation functions. In the proposed NNLSE, we choose the sigmoid function formulated in (8) as the hidden layer activation function. The output layer activation function is a linear function to avoid applying the limits on the range of output values.

s i g

(

x

)

= \frac{1}{1 + e^{- x}}

(8)

As shown in Fig. 2, the measurement model is inserted between the output of LSE-net and loss function. The loss is defined as the L2 norm of estimated measurement residual defined in (9).

e_{L o s s} = {‖z - \hat{z}‖}_{2} = \sqrt[]{\sum_{i = 1}^{M} {(z_{i} - {\hat{z}}_{i})}^{2}}

(9)

where $z_{i}$ and ${\hat{z}}_{i}$ are the $i^{t h}$ element of $z$ and $\hat{z}$ , respectively; $M$ is the dimension of the measurement vector; and $e_{L o s s}$ is the loss calculated after the measurement model because the values of actual states are unknown and the target values of states are not accessible. However, the measurements and the estimated measurements are comparable and can reflect the gaps between the estimated states and the actual states, which are minimized indirectly by minimizing the measurement residual through (9).

LSE-net is updated by BP through online training. The gradient, which is also known as partial derivative, of the loss function to each network parameter is calculated through the chain-rule and multiplied by a learning rate $η$ to get the update step size. The gradient of LSE-net output $\hat{x}$ is derived separately in (10) because the BP through measurement model is a specialized part of NNLSE. The inherent BP within LSE-net is after the gradient of $\hat{x}$ as formulated in (11).

\{\begin{array}{l} \frac{\partial e_{L o s s}}{\partial \hat{z}} = \frac{\partial {‖z - \hat{z}‖}_{2}}{\partial \hat{z}} \\ \frac{\partial e_{L o s s}}{\partial \hat{x}} = \frac{\partial e_{L o s s}}{\partial \hat{z}} \frac{\partial \hat{z}}{\partial \hat{x}} \\ \frac{\partial \hat{z}}{\partial \hat{x}} = {H^{'}}^{+} \end{array}

(10)

\{\begin{array}{l} \frac{\partial e_{L o s s}}{\partial b_{o}} = \frac{\partial e_{L o s s}}{\partial \hat{x}} \frac{\partial \hat{x}}{\partial b_{o}} \\ \frac{\partial e_{L o s s}}{\partial w_{o}} = \frac{\partial e_{L o s s}}{\partial \hat{x}} \frac{\partial \hat{x}}{\partial w_{o}} \\ \frac{\partial e_{L o s s}}{\partial b_{h}} = \frac{\partial e_{L o s s}}{\partial \hat{x}} \frac{\partial \hat{x}}{\partial u} \frac{\partial u}{\partial b_{h}} \\ \frac{\partial e_{L o s s}}{\partial w_{h}} = \frac{\partial e_{L o s s}}{\partial \hat{x}} \frac{\partial \hat{x}}{\partial u} \frac{\partial u}{\partial w_{h}} \end{array}

(11)

where $u$ is the intermediate output of the hidden layer; and ${H^{'}}^{+}$ is the pseudo inverse of the measurement model $H^{'}$ calculated using the Moore-Penrose method [

19].

With the loss function, as well as the gradient and learning rate $η$ determined, the NN can be updated in a gradient descent (GD) manner to minimize the loss as formulated in (12).

\{\begin{array}{l} b_{o} = b_{o} - η \frac{\partial e_{L o s s}}{\partial b_{o}} \\ w_{o} = w_{o} - η \frac{\partial e_{L o s s}}{\partial w_{o}} \\ b_{h} = b_{h} - η \frac{\partial e_{L o s s}}{\partial b_{h}} \\ w_{h} = w_{h} - η \frac{\partial e_{L o s s}}{\partial w_{h}} \end{array}

(12)

where $η$ is the learning rate, whose value is usually tuned between 0.0001 and 0.01.

2) SGD Training of NNLSE

In traditional GD optimization, the average gradient of all data points is used to update the estimation. In NN training, GD is still an efficient method for linear or quadratic cases. However, in non-linear and non-convex cases, the averaged gradient may lead the network toward a local minimum and stop updating. In some other cases, the training data may come in batches, and they become time-costly to wait for the entire training data set to be available. SGD optimization is introduced to handle these issues. As a result, SGD updates the network parameters with the average gradient of a subset of all data points, and iterates through the data set until every subset is visited. SGD has been proven to have better performance than GD in both computational complexity and converging speed [

22].

The training process of the LSE-net is an SGD process. The estimation and update are performed on each data point, meaning the batch size is one, and the average gradient of the subset is the gradient itself. Moreover, the batch size is adjustable depending on the accuracy and speed requirement. When the batch size changes, the update step size is calculated upon the average gradient of the data in that batch.

B. Unobservable State Estimation

1) General NNUSE Approach

Figure 4 shows the schematic diagram of the NNUSE module. The USE-net is an off-line trained 3-layer feed-forward NN that estimates the unobservable states based on the estimated observable states. It is challenging to formulate the correlations between the observable and unobservable states analytically. The USE-net can be trained to learn this correlation from the data. The training data of the off-line training of the USE-net combine Monte Carlo simulation data and recent historical data. The simulation data set a baseline for the network, and the recent historical data help the network capture the slow dynamics of the system.

Fig. 4 Schematic diagram of NNUSE.

Off-line training is unavoidably time-consuming. With a large amount of data and input and output dimensions, the network loss can take hours to reach the convergence tolerance. Also, the intermediate status of estimators cannot be applied to online estimation, and only the well-trained one can. Hence, the USE-net is unable to capture system dynamics with time constants lower than hours. The USE-net parameters can be updated in the online estimation adapting to slow dynamics. For instance, to capture the system dynamics to a certain extent, the USE-net parameters can be refreshed every few hours by the off-line re-trained network. The training data used for the re-training process is collected dynamically. In this way, the estimation of the USE-net is expected to be more accurate.

2) Distance-based Feature Selection for Distributed NNUSE in Large-scale Systems

An intrinsic challenge associated with NNs is its scalability. As the dimensions of the inputs increase, both the weight matrices and bias vectors will increase proportionally. Therefore, the processing time and memory required by the training and estimation computation increase exponentially with the dimension of inputs and outputs. Since the dimensions of measurement vectors, observable states, and measurement model are fixed, the computation complexity of the LSE model does not have much room for improvement in terms of time consumption. However, the NNUSE approach introduced in Section III-B can be further improved through decomposition and parallelism techniques.

The changes of the states are caused by load condition variation. We observe that load profiles tend to be similar among adjacent nodes. Inspired by the K-nearest neighbor (KNN) algorithm, we propose the distributed-NNUSE architecture that decouples the estimation of unobservable states into parallel processes. Figure 5 illustrates this architecture, where the superscript indicates the index of unobservable nodes to be estimated, and the subscript indicates the index of the nearest neighbor of the target node.

Fig. 5 Distributed-NNUSE architecture.

The total number of unobservable nodes is $S$ . Each USE-net only estimates the states of one unobservable node with the input of states from its $N$ nearest observable nodes in terms of electrical distance. The number of input substations is a hyperparameter that needs to be fine-tuned. With this architecture, not only the dimension is reduced, but also the unobservable states can be estimated in parallel to achieve higher time efficiency.

C. Multi-thread Estimator Training and Updating

As mentioned in Sections III-A and III-B, online estimation, BP, as well as the updating of LSE-net and USE-net are performed on individual threads. To coordinate them so that they work together and minimize the risk of interrupting online estimation, a multi-thread NN training and updating architecture is proposed. Figure 6 shows the proposed architecture, illustrating how it coordinates each module. The LSE-net is updated online at PMU reporting frequency. The USE-net is updated on two independent threads: the SCADA data reporting rate threads update the USE-net in a similar way as the LSE-net, and the off-line training thread updates the USE-net every few hours for calibration purpose.

Fig. 6 Training and updating architecture of multi-thread estimator.

In NN training, BP consumes the majority of computation time, and this can be volatile. If the BP is included in the online state estimation, not only is the average time efficiency compromised, but it is also difficult to guarantee the upper bound of time consumption at each step. With this multi-thread architecture, the time consumption and the unpredictable part of each training are removed from the online estimation. The updating of LSE-net and USE-net is decoupled as well. Both NN-based estimators can update at their own frequencies without interrupting the online estimation.

IV. Case Study

The proposed NNSE method is tested in two systems. The IEEE 118-bus system is used in the first case study to show the comprehensive performance in large-scale systems in terms of estimation accuracy, time efficiency, and robustness against noise. LSE is applied to this system as the benchmark. The data are generated in MATPOWER, a power system analysis toolbox run in MATLAB [

23]. The PMU data reporting frequency are set to be 50 Hz.

To justify the performance of the proposed method in real-world applications, we applied NNSE to a practical system using real PMU data in the second case. This test system is the high-voltage transmission system of the Jiangsu power grid in China. Both PMU data and SCADA results are collected and stored in time series. NNSE and LSE are performed based on PMU data, while SCADA acts as the reference to check the accuracy when their time stamps overlap. Note that the data are not collected in real time, but are read from a database in time-series order so that the proposed method acts as an online estimation in the simulation. The PMU reporting frequency in the Jiangsu system is 25 Hz.

A. IEEE 118-bus Test Case

Figure 7 shows the topology of the IEEE 118-bus system. 54 PMUs are placed at buses 2, 5, 9, 11, 12, 17, 21, 24, 25, 28, 34, 37, 40, 45, 49, 52, 56, 62, 63, 68, 73, 75, 77, 80, 85, 86, 90, 94, 101, 105, 110, and 114 to measure the voltage phasors and the current phasors on the 151 lines. There are 5 unobservable buses under such PMU placement, i.e., buses 21, 22, 44, 52, and 95. The simulation duration is a 1-second steady-state window, and the learning rate of LSE-net is set to be 0.01. The batch size of SGD for LSE-net update is 1. The measurement noise complies with a normal distribution of $N (0,0 . 03^{2})$ .

Fig. 7 IEEE 118-bus system topology.

1) Convergence

Figure 8 indicates that the loss of the LSE-net estimator and the root mean square error (RMSE) of estimation error on the observable states are well converged. The estimation error without USE-net converges only to 0.15 due to the non-convergence of the unobservable states. With the help of the USE-net, the total estimation error is greatly reduced and close to the convergence of the observable states. The convergence takes approximately 1 s, which equals 50 iterations.

Fig. 8 Convergence of network loss and estimation error. (a) NN loss. (b) Estimation error.

2) Time Consumption

One of the motivations of the proposed method is to improve the computation efficiency. The time consumption of the proposed method is compared with several versions of LSE, which is then solved upon different matrix handling algorithms. The base method is labeled as LSE-PI, which solves matrix $H$ when using the Moore-Penrose algorithm [

24]. The improved method using the Cholesky decomposition is labeled as LSE-LD [25]. Due to the sparsity of matrix

H

, the computation complexity can be further reduced with the Cholesky decomposition for sparse matrices, and this method is labeled as LSE-LDS [26]. To make a fair comparison, the USE-net is not included because the LSE-based methods are unable to estimate the unobservable states. The comparison of the step-wise time consumption curves of these four methods is shown in Fig. 9. The pseudo inverse and the left division methods are two slowest ones. The left division (sparse) is faster than the LSE-net in the first few steps, but its curve is above the LSE-net for the rest of the time steps. The average time consumptions of pseudo inverse, left division, left division (sparse), and LSE-net are 8.2 ms, 2.1 ms, 0.52 ms, and 0.22 ms, respectively. Although the left division (sparse) is fast, the LSE-net still outperforms it using less than half of the time, which is only 10.48% of the left division and 2.68% of the pseudo inverse method.

Fig. 9 Step-wise time consumption comparison.

3) Robustness Against Noise

It is indisputable that LSE is the “optimal” state estimation solution in theory. However, measurement noise is unavoidable in real-world measurements. Therefore, the robustness against noise is important for online state estimation. In order to test the robustness of the proposed method, the estimation error and standard deviation (STD) are compared with LSE results at five different noise levels. To separate the impact from the warm-up stage of the LSE-net, the estimator, in this case, is pre-trained to a suboptimal solution. The comparison of the estimation performance is summarized in Table I and Fig. 10. The estimation error of NNSE is less sensitive to noise than LSE.

TABLE I Estimation Error Against Noise on All Measurements

Noise	LSE		NNLSE
Noise	RMSE	STD	RMSE	STD
No noise	$3.13 \times 10^{- 15}$	$3.16 \times 10^{- 30}$	$2.28 \times 10^{- 6}$	$1.25 \times 10^{- 5}$
0.001	$6.88 \times 10^{- 5}$	$2.22 \times 10^{- 5}$	$1.86 \times 10^{- 5}$	$1.09 \times 10^{- 5}$
0.010	$7.00 \times 10^{- 4}$	$2.00 \times 10^{- 4}$	$2.00 \times 10^{- 4}$	$1.00 \times 10^{- 4}$
0.030	$2.10 \times 10^{- 3}$	$7.00 \times 10^{- 4}$	$5.00 \times 10^{- 4}$	$2.00 \times 10^{- 4}$
0.050	$3.50 \times 10^{- 3}$	$1.10 \times 10^{- 3}$	$9.00 \times 10^{- 4}$	$3.00 \times 10^{- 4}$

Fig. 10 Comparison of estimation error against different noise levels. (a) Comparison of RMSE. (b) Comparison of STD.

Examples of step-wise estimation error curves are given in Fig. 11, where the noise level is 0.01. It shows that the estimation error of NNLSE is less than LSE at every step. The estimation results in polar coordinates are given in Fig. 12. The estimation of LSE contains a larger noise component than the NNLSE estimation.

Fig. 11 Estimation error trajectories at 0.01 noise level.

Fig. 12 Estimation of bus 19. (a) Estimation of voltage magnitude. (b) Estimation of voltage angle.

The leverage measurements have higher impact on the performance of LSE, and therefore they are more likely to be attacked [

27], [28]. Hence, the robustness against errors on the leverage measurements is important. In this test, we apply different noise magnitudes to measurements. The noise magnitude added to the non-leverage measurements is

0.001

, whereas that of the leverage measurements varies from 0.001 to 0.05. The results are summarized in Table II. NNLSE has lower estimation error and standard deviation than LSE under the same condition, and therefore it is considered more robust against noise on the leverage measurements.

TABLE II Estimation Error Against Noise on Leverage Measurements

Noise	LSE		NNLSE
Noise	RMSE	STD	RMSE	STD
0.001	$6.88 \times 10^{- 5}$	$2.22 \times 10^{- 5}$	$1.86 \times 10^{- 5}$	$1.09 \times 10^{- 5}$
0.010	$6.20 \times 10^{- 3}$	$6.00 \times 10^{- 4}$	$6.10 \times 10^{- 3}$	$5.00 \times 10^{- 4}$
0.030	$2.00 \times 10^{- 2}$	$1.80 \times 10^{- 3}$	$1.97 \times 10^{- 2}$	$1.50 \times 10^{- 3}$
0.050	$3.34 \times 10^{- 2}$	$3.00 \times 10^{- 3}$	$3.40 \times 10^{- 2}$	$2.40 \times 10^{- 3}$

4) Response to State Changes

In real-time power system operation, the states are always changing due to the volatility of load and generation. The state estimation algorithms are expected to capture the relatively slow dynamics of state changes. In this case, a load ramp-down scenario is designed to test the estimation performance under state changes. The average ramp-down rate is 1% per second, which is a steep change for transmission-level power systems. The average load factor decreases from 1 to 0.99 in the 1-second interval from the 3^rd to the 4^th second. But, the load and generation are not adjusted homogeneously. The random volatility factor is applied to each load and generator to make the scenario more realistic. The measurement noise level applied is 0.01. Therefore, the actual states during the ramp-down transient fluctuate.

The average estimation error and the corresponding STD during the 1-second ramp-down window are summarized in Table III. It suggests that the NN-based estimator has a lower estimation error during the transient. This is because the estimator is capable of tracking the dynamics of the load changes and being robust against noise.

TABLE III Estimation Accuracy Under Ramp State Changes

Method	RMSE	STD
LSE	0.0008	0.0002
NNLSE	0.0003	0.0001

The step-wise estimation error and the load profile curves are shown in Fig. 13. The NNLSE estimation error during the transient is higher than the steady-state results but still lower than the LSE results.

Fig. 13 Step-wise estimation error under ramp-down transient.

Figure 14 shows the estimation of bus 19 during the transients. The voltage angle deviates from the original steady-state along the ramping of load, while the voltage magnitude is barely affected.

Fig. 14 Estimation of bus 19 in ramp case. (a) Estimation of voltage magnitude. (b) Estimation of voltage angle.

5) Response to Topology Change

Breaker status change and line tripping happen in power system operation from time to time. It is important that state estimation algorithms are capable of adapting to topology changes. This test case compares the performance of LSE and NNLSE in a topology change scenario, where the line from bus 3 to bus 5 is opened at the $3^{r d}$ second. The dimensions of the state vector and measurement vector are the same before and after the topology change; thus the dimensions of the NN-based estimator are compatible with the new measurement model. The fast transient of opening the breaker is neglected, and the system topology is known before and after the change.

Figure 15 shows the trajectories of the estimation from LSE and NNLSE. The actual states under equilibrium before and after the topology change are denoted by the dashed lines. The impact of topology change is insignificant on voltage magnitude, while the voltage angle change is more visible. When the topology changes, LSE responds instantly due to its non-iterative solution nature. But the parameters of the NN-based estimator need to be updated through iterations. The convergence of NNLSE to the new equilibrium point takes approximately 0.1 s (5 steps), which is short enough to be neglected for static state estimation. The updating period of NN compromises the estimation accuracy but still outperforms LSE because of its better robustness against noise as discussed in Section IV-A-3).

Fig. 15 Estimation of bus 19 in topology change case. (a) Estimation of voltage magnitude. (b) Estimation of voltage angle.

6) Data Transmission Failure Test

Data transmission failure is unavoidable in practice, including missing data, package inversion and data displacement within a package. These three types of data failure are considered in this test to validate the performance of the proposed method under data failure. The load condition considered here is a slow ramping decreasing from 1.0 to 0.99 within the 10-second time window, to differentiate the effect of the failed data from the correct ones. The length of a data package under consideration is 1 s, which corresponds to 50 data points. The estimation results in three scenarios are summarized in Table IV.

TABLE IV Estimation Error in Data Transmission Failure Scenarios

Scenario	LSE		NNLSE
Scenario	RMSE	STD	RMSE	STD
Missing	$7.54 \times 10^{- 4}$	$2.06 \times 10^{- 4}$	$2.74 \times 10^{- 4}$	$6.73 \times 10^{- 5}$
Inversion	$7.33 \times 10^{- 4}$	$2.08 \times 10^{- 4}$	$2.31 \times 10^{- 4}$	$7.15 \times 10^{- 5}$
Inter-displacement	$7.32 \times 10^{- 4}$	$2.39 \times 10^{- 4}$	$2.08 \times 10^{- 4}$	$7.87 \times 10^{- 5}$

B. Jiangsu Power Grid

The Jiangsu power grid has the second highest provincial energy consumption in China. The power grid also has four HVDC terminals, which receive power from Shanxi, Sichuan, Hubei, and Inner Mongolia. Jiangsu power grid consists of thousands of nodes at the transmission level. More importantly, it has the largest PMU installation number compared with other provincial power grids in China. Its large system scale complexity, high requirement for system stability, and need for good observability due to extensive PMU coverage, together make the Jiangsu power grid suitable for LSE and NNSE study.

This numerical experiment covers the high-voltage (220 kV and above) transmission system of Jiangsu power grid that includes 763 substations of 230 kV, 525 kV, and 1000 kV. The numbers of substations of 1000 kV, 525 kV, and 230 kV are 4, 103, and 656, respectively. One hundred and thirty two substations are equipped with high-quality and reliable PMUs, resulting in 235 substations nominally observable. The states of the observable substations are estimated by NNLSE. The remaining 528 nominally unobservable substations are estimated via NNUSE. The SCADA results, offering the observability of the entire system, are collected to validate estimation accuracy. SCADA results used are essentially the state estimation solutions from the energy management system (EMS), which are generally considered to be accurate. We use the SCADA-based LSE results as the reference to validate the accuracy of the proposed NNSE method.

As discussed in Section III-B-2), applying NNUSE on all unobservable states together causes a scalability issue. Therefore, we implement distributed NNUSE architecture to reduce training and estimation time through parallelism. First, a sensitivity study on the hyperparameter of the input dimension is performed. Table V shows the average error and time consumption of NNUSE with inputs of 1, 5, 10, 20, and 50 bus states. The average estimation time grows exponentially. In terms of estimation accuracy, 5-input estimation yields the lowest error. 1-input estimation is less accurate because the number of features is too small to approximate the behavior of the target substation. However, with too many inputs, the network could be misled by irrelevant features, and this is the reason the errors keep increasing in the 10-input scenario. This sensitivity study serves to fine-tune the hyperparameter of the input number. From the results, we choose to use 5-input NNUSE for the following tests.

TABLE V NNUSE Sensitivity to Number of Inputs

Number of inputs	Estimation error	Time (ms)
1	0.0567	8.1
5	0.0455	8.1
10	0.0588	8.3
20	0.0586	8.4
50	0.0645	9.1

A comprehensive NNSE is performed on the Jiangsu power grid based on a 5-input NNUSE, and a comparison with LSE is summarized in Table VI. The accuracy and standard deviation of unobservable and overall states are not applicable for LSE and NNLSE because they can only estimate the nominally observable states. For the same reason, NNUSE does not have results for observable and overall states. Since NNSE is a combination of NNSLE and NNUSE, it has only the results for overall estimation error and standard deviation. Both the estimation error and the standard deviation of NNLSE are lower than those of LSE. Therefore, NNLSE has higher estimation accuracy in this numerical experiment. For the unobservable states, there are no comparable results from LSE. The estimation error is still at the same magnitude as the observable state estimation error of LSE and NNLSE. The standard deviation of the NNUSE estimation errors is slightly higher than that of LSE. Hence, the estimation accuracy of NNUSE is acceptable since it is an extension of the nominal observability. The overall estimation error is the average of NNLSE and NNUSE, and its accuracy is also acceptable. In terms of time consumption, the NN-based methods are predominantly faster than LSE. The total time consumed by NNLSE and NNUSE together is 57.5% lower than that of LSE while it expands the estimation to the entire system. Based on the analysis above, it suggests that the proposed NNSE shows an improved time efficiency, competitive accuracy, and broader estimation observability than the traditional LSE method in a large-scale power system.

TABLE VI 5-input NNSE Performance Breakdown

Method	Time (ms)	Observable state		Unobservable state		Overall
Method	Time (ms)	RMSE	STD	RMSE	STD	RMSE	STD
LSE	22.10	0.0222	0.0010
NNLSE	1.30	0.0209	0.0009
NNUSE	8.10			0.0567	0.0120
NNSE	9.40					0.0486	0.0010

V. Discussion

The simulation results above justifies the feasibility and superiority of the proposed NNSE method. The NNLSE for the observable state estimation is tested in the IEEE 118-bus system. Its convergence speed is fast (less than 1 s) and its time consumption is approximately 50% lower than the conventional LSE method. The reliability of NNLSE is also examined in various scenarios, including noise sensitivity study on measurements, especially the leverage measurements, load condition change scenarios, topology change scenarios, and data transmission failure scenarios. The estimation accuracy and standard deviation of NNLSE are lower than the traditional LSE method in all testing scenarios, indicating it has better robustness and reliability than LSE in online estimation. The NNUSE method is mainly tested in the large-scale Jiangsu power grid. The optimal hyperparameter of NNUSE input number is selected based on a sensitivity study. The resulting estimation error of the unobservable states is, though higher than the observable states, acceptably low for providing some insights on the operation conditions of the traditionally unobservable nodes. The time consumption of the entire NNSE method is less than 10 ms, so that running the observable and unobservable state estimation on the frequency of PMU data reporting frequent (60 Hz and below) is well supported.

VI. Conclusion

The proposed NNSE method provides a novel solution to online state estimation in large-scale power systems. The NN-based estimator achieves online state estimation at a higher frequency than traditional methods. The proposed method also shows its superiority in robustness against noise. The unobservable states are estimated using a data-driven approach. The multi-thread updating architecture improves the stability and time efficiency of the online estimation process. The proposed method is tested in IEEE 118-bus system (small scale with simulation results) and the Jiangsu power grid (large scale with real PMU/SCADA data), respectively. The convergence, accuracy, time efficiency, and robustness of the proposed method have been validated through numerical experiments. Although this method improves the online state estimation performance, there are still some improvements that can be made in the future. The design of unobservable state estimator could be further improved. As the USE-net is trained off-line, there is great potential to explore more sophisticated models such as a recurrent neural network (RNN), so as to take temporal information into account to further improve the accuracy.

References

N. K. Saxena, “Voltage control by optimized participation of reactive power compensation using fixed capacitor and STATCOM,” in Optimization of Power System Problems, 1st ed. Cham, Switzerland: Springer, 2020, pp. 313-364. [Baidu Scholar]

A. Abur and A. G. Exposito, “Network observability analysis,” in Power System State Estimation, 1st ed. New York: Marcel Dekker, 2004, pp. 74-113. [Baidu Scholar]

K. R. Mestav, J. Luengo-Rozas, and L. Tong, “Bayesian state estimation for unobservable distribution systems via deep learning,” IEEE Transactions on Power Systems, vol. 34, no. 6, pp. 4910-4920, May 2019. [Baidu Scholar]

A. Monticelli, “Real-time modeling of power networks,” in State Estimation in Electric Power Systems, 1st ed. Cham, Switzerland: Springer, 1999, pp. 1-13 [Baidu Scholar]

J. R. Gracia, M. A. Young, D. T. Rizy et al. (2016, Mar.). Advancement of synchrophasor technology. [Online]. Available:https://www.smartgrid.gov/document/Synchrophasor_Report_201603.html [Baidu Scholar]

L. Zhang, A. Bose, A. Jampala et al., “Design, testing, and implementation of a linear state estimator in a real power system,” IEEE Transactions on Smart Grid, vol. 8, no. 4, pp. 1782-1789, Jan. 2016. [Baidu Scholar]

R. Raz, “On the complexity of matrix product,” in Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, Montreal, Canada, May 2002, pp. 144-151. [Baidu Scholar]

E. Klarreich, “Multiplication hits the speed limit,” Communications ACM, vol. 63, no. 1, pp. 11-13, Dec. 2019. [Baidu Scholar]

A. Phadke, J. Thorp, R. Nuqui et al., “Recent developments in state estimation with phasor measurements,” in Proceedings of 2009 IEEE/PES Power Systems Conference and Exposition, Seattle, USA, Mar. 2009, pp. 1-7. [Baidu Scholar]

M. Netto and L. Mili, “Robust data filtering for estimating electromechanical modes of oscillation via the multichannel prony method,” IEEE Transactions on Power Systems, vol. 33, no. 4, pp. 4134-4143, Nov. 2017. [Baidu Scholar]

J. Zhu and A. Abur, “Improvements in network parameter error identification via synchronized phasors,” IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 44-50, Aug. 2009. [Baidu Scholar]

J. Chen and A. Abur, “Placement of PMUs to enable bad data detection in state estimation,” IEEE Transactions on Power Systems, vol. 21, no. 4, pp. 1608-1615, Oct. 2006. [Baidu Scholar]

L. Zhang, G. Wang, and G. B. Giannakis, “Real-time power system state estimation and forecasting via deep unrolled neural networks,” IEEE Transactions on Signal Processing, vol. 67, no. 15, pp. 4069-4077, Jul. 2019. [Baidu Scholar]

E. Manitsas, R. Singh, B. C. Pal et al., “Distribution system state estimation using an artificial neural network approach for pseudo measurement modeling,” IEEE Transactions on Power Systems, vol. 27, no. 4, pp. 1888-1896, Apr. 2012. [Baidu Scholar]

G. Tian, Q. Zhou, R. Birari et al., “A hybrid-learning algorithm for online dynamic state estimation in multimachine power systems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5497-5508, Dec. 2020. [Baidu Scholar]

L. Wang, Q. Zhou, and S. Jin, “Physics-guided deep learning for power system state estimation,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 4, pp. 607-615, Jun. 2020. [Baidu Scholar]

R. H. Park, “Two-reaction theory of synchronous machines generalized method of analysis： Part Ⅰ,” Transactions of the American Institute of Electrical Engineers, vol. 48, no. 3, pp. 716-727, Jul. 1929. [Baidu Scholar]

A. G. Phadke and J. S. Thorp, “Phasor estimation of nominal frequency inputs,” in Synchronized Phasor Measurements and Their Applications, 1st ed. Cham, Switzerland: Springer, 2008, pp. 29-48. [Baidu Scholar]

E. H. Moore, “On the reciprocal of the general algebraic matrix,” Bulletin of American Mathematical Society, vol. 26, pp. 394-395, Jun. 1920. [Baidu Scholar]

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303-314, Dec. 1989. [Baidu Scholar]

B. Karlik and A. V. Olgac, “Performance analysis of various activation functions in generalized mlp architectures of neural networks,” International Journal of Artificial Intelligence and Expert Systems, vol. 1, no. 4, pp. 111-122, Feb. 2011. [Baidu Scholar]

O. Shamir and T. Zhang, “Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes,” in Proceedings of International Conference on Machine Learning, Atlanta, USA, Feb. 2013, pp. 71-79. [Baidu Scholar]

MATPOWER: A MATLAB Power System Simulation Package, 1st ed., Power Systems Engineering Research Center, Ithaca, USA, 1997, pp. 1-10. [Baidu Scholar]

R. Penrose, “A generalized inverse for matrices,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 51, pp. 406-413, Jul. 1955. [Baidu Scholar]

K. Tanabe and M. Sagae, “An exact cholesky decomposition and the generalized inverse of the variance–covariance matrix of the multinomial distribution, with applications,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 54, no. 1, pp. 211-219, Sept. 1992. [Baidu Scholar]

T. A. Davis and W. W. Hager, “Modifying a sparse Cholesky factorization,” SIAM Journal on Matrix Analysis and Applications, vol. 20, no. 3, pp. 606-627, Mar. 1999. [Baidu Scholar]

A. Abur, F. Magnago, and F. Alvarado, “Elimination of leverage measurements via matrix stretching,” International Journal of Electrical Power & Energy Systems, vol. 19, no. 8, pp. 557-562, Nov. 1997. [Baidu Scholar]

A. Majumdar and B. C. Pal, “Bad data detection in the context of leverage point attacks in modern power networks,” IEEE Transactions on Smart Grid, vol. 9, no. 3, pp. 2042-2054, Oct. 2016. [Baidu Scholar]

Address:No.19 Chengxin Avenue, Jiangning District, Nanjing 211106, China

E-mail: mpce@alljournals.cn

Tel:86-25-81093060

Fax:86-25-81093040

Home

Introduction

Editorial Board

For Author

Call For Papers

APC

Sponsor & Publisher