Harmonic Data Recovery Method Based on Multivariate Norm Matrix

Ying Wang; Rui Xu; Shiqi Song; Xiaoyang Ma; Huaying Zhang; Xian Wu

网刊加载中。。。

使用Chrome浏览器效果最佳，继续浏览，你可能不会看到最佳的展示效果，

确定继续浏览么?

复制成功，请在其他浏览器进行阅读

Harmonic Data Recovery Method Based on Multivariate Norm Matrix PDF

- ORCID：
Ying Wang
✉
- ORCID：
Rui Xu
✉
- ORCID：
Shiqi Song
✉
- ORCID：
Xiaoyang Ma
✉
- ORCID：
Huaying Zhang
✉
- ORCID：
Xian Wu
✉

Sichuan University, Chengdu 610065, China； the Joint Laboratory for High Quality Power Supply in New Smart Cities of Southern Power Grid Company, Shenzhen 518020, China

Updated：2023-09-20

DOI：10.35833/MPCE.2022.000582

OUTLINE

Abstract

During state perception of a power system, fragments of harmonic data are inevitably lost owing to the loss of synchronization signals, transmission delays, instrument failures, or other factors. A harmonic data recovery method is proposed based on multivariate norm matrix in this paper. The proposed method involves dynamic time warping for correlation analysis of harmonic data, normalized cuts for correlation clustering of power-quality monitoring devices, and adaptive alternating direction method of multipliers for multivariable norm joint optimization. Compared with existing data recovery methods, our proposed method maintains excellent recovery accuracy without requiring prior information or optimization of the power-quality monitoring device. Simulation results on the IEEE 39-bus and IEEE 118-bus test systems demonstrate the low computational complexity of the proposed method and its robustness against noise. In addition, the application of the proposed method to field data from a real-world system provides consistent results with those obtained from simulations.

Keywords

Multivariate norm matrix; harmonic data; data recovery; power-quality monitoring device

I. Introduction

STATE perception is essential for the correct interconnection of all the components in a power system to guarantee safe and stable operation [

1]. Power-quality monitoring and wide-area measurement systems can provide representative data for state perception owing to their real-time operation, synchronization, and accuracy [2]. In those systems, power-quality monitoring devices (PQMDs) and phasor measurement units are critical components [3]. However, the monitoring and measurement systems inevitably suffer from data loss owing to transmission delays, instrument failures, or other factors [4], [5]. In China as well as in North America, the percentage of bad data exceeds 10% [6]. The quality of monitoring data affects the accuracy of applications such as state estimation [7], interference detection [8], and fault detection [9].

Missing data are generally represented with zeroes. In terms of recovery hierarchy, existing data recovery methods can be classified into local and distributed recovery methods. Local recovery is only applicable to historical data through the methods such as interpolation [

10]-[12]. In [10], a data interpolation algorithm was proposed by combining Kalman filtering, smoothing, and quadratic prediction. In [11], categories of data loss were analyzed, and missing data were recovered using Lagrangian interpolation. However, these algorithms are only suitable for mild data loss. As the volume of missing data increases, the error rate of such algorithms increases rapidly. To prevent this problem, improved cubic spline interpolation based on a priority assignment strategy was proposed in [12]. Local recovery relies only on historical data, without requiring parameters of the target power system. However, the limited information impedes recovering data with prolonged continuous losses, notwithstanding its computational simplicity. Moreover, the recovery performance depends on the data quality and noise.

Distributed recovery applies multi-bus data to recover missing data mainly using machine learning [

13]-[22] or low-rank matrix methods [23]-[32]. Machine learning methods such as artificial neural networks and integrated learning, have been widely used in recovering missing data. In [13], selected-pass recurrent neural networks have been used to predict the state of a system for addressing data loss. In [14] and [21], learning networks have been trained such that the data generated from the networks had the same distribution characteristics as the original data, thereby recovering missing data. The combination of graph theory [17]-[19] and deep learning algorithms has allowed embedding power system topologies into neural networks, and available data have been used to recover missing data [20]. Machine learning methods have high recovery performance but require extensive online or offline training [31]. Moreover, the generalization capability of trained models is usually low, thus failing to suitably handle diverse scenarios. Alternatively, low-rank matrix methods assume that the buses of a system are related by the system topology and that the measurements are sampled synchronously. Therefore, there is a high correlation (i.e., low-ranking property) in a high-dimensional data matrix composed of measurements from each bus. This property has been exploited in image analysis [23], collaborative filtering [24], and system identification [25], [26]. Similarly, missing data recovery was formulated as a low-rank matrix complementation problem in [7]. A low-rank matrix recovery algorithm was used for the first time in [7] to recover lost data from a communication blockage. Data recovery was accomplished by convex optimization of low-rank matrices [20]. In [27], the singular-value threshold algorithm was used to recover missing data. Based on this algorithm, online data recovery applicable to prolonged continuous data loss was proposed in [28], [32]. In addition, robust principal component analysis (PCA) was proposed in [29] for data recovery, but anomalous data tended to affect accuracy. To address this problem, kernel PCA was proposed in [30] to remove anomalous data before recovering missing data. In [31], measurements were represented as low-rank Hankel matrices, and their low-rank properties were verified. A mixture of dynamic and static metrics was used to accurately estimate missing data in [32]. The low-rank matrix methods transform complex system topological connections into the low-rank property of a data matrix. The main disadvantage of matrix methods is that they fail to deal with high-order data characteristics. Specifically, measurements across spatiotemporal dimensions increase the data complexity within the constructed matrix. If all the original data in a region are used directly for matrix completion, the curse of dimensionality may occur or the convergence of the method may be slow or even impossible.

To address the above-mentioned problems, we propose a harmonic data recovery (HDR) method based on multivariate norm matrix. The main contributions of this paper are summarized as follows.

1) Based on the dynamic time warping (DTW) and normalized cut algorithms, the proposed method can select appropriate PQMDs when the system topology information is unknown or the parameters change, thereby improving applicability to large-scale power systems.

2) An adaptive penalty factor reduces the number of iterations and ensures convergence stability for data recovery.

3) Pollution matrices (for noise and sparse outliers) are introduced to extract the features from original data and thus improve the anti-interference ability.

The remainder of this paper is organized as follows. Section II introduces clustering steps of PQMDs based on harmonic data analysis. Section III details the proposed HDR method for PQMDs based on multivariate norm matrix. Simulation analysis and verification on real data are reported in Section IV. Section V summarizes the paper.

II. Harmonic Data Analysis of PQMDs

Harmonic data from different PQMDs are sequences of harmonic measurements acquired at different physical locations throughout a power system. Thus, PQMD data at similar locations tend to have similar dynamic properties.

A. Low Rank of Harmonic Data

Before analyzing data, harmonic data with similar dynamic properties can be represented as a harmonic data matrix:

X = [\begin{matrix} x_{1,1} & x_{1,2} & \dots & x_{1, i} & \dots & x_{1, n} \\ x_{2,1} & x_{2,2} & \dots & x_{2, i} & \dots & x_{2, n} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{j, 1} & x_{j, 2} & \dots & x_{j, i} & \dots & x_{j, n} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{m, 1} & x_{m, 2} & \dots & x_{m, i} & \dots & x_{m, n} \end{matrix}]

(1)

where n is the number of PQMDs with similar dynamic properties; m is the number of measurements from a PQMD; and $x_{j, i}$ is a harmonic datapoint from the $i^{t h}$ monitoring device at instant j.

We perform singular value thresholding decomposition on X:

X = U S V^{Τ}

(2)

where U and V are the orthogonal matrices; and S is a diagonal matrix that can be expressed as:

\{\begin{array}{l} S = [\begin{matrix} s_{1} & 0 \\ 0 & 0 \end{matrix}] \\ s_{1} = d i a g {s_{1}, s_{2}, . . ., s_{r}} & r \leq m i n (m, n) \end{array}

(3)

where s₁ is a diagonal matrix with non-zero diagonal elements; and s₁, s₂, ..., s_r are the singular values of X that satisfy $s_{1} \geq s_{2} \geq . . . \geq$ $s_{r} > 0$ .

$X$ can be said to have a strict low-rank property if (4) holds.

r a n k (X) = m i n (m, n)

(4)

If the matrix does not satisfy (4) but satisfies (5), it can be considered to have an approximate low-rank property.

\begin{matrix} s_{i} ≫ s_{j} > 0 & 1 \leq i \leq j \leq r \end{matrix}

(5)

A strict low-rank property only occurs theoretically, whereas matrices constructed from measurements generally show an approximate low-rank property. Nevertheless, the approximate low-rank property still allows to perform matrix complementation [

33].

In practice, a matrix can be assumed to have a low-rank property when one or more of its singular values are considerably greater than the remaining singular values. To determine the low rank of data matrices, the following expression can be defined as [

34]:

\begin{matrix} β_{l} = \frac{\sqrt[]{s_{1}^{2} + s_{2}^{2} + . . . + s_{l}^{2}}}{\sqrt[]{s_{1}^{2} + s_{2}^{2} + . . . + s_{l}^{2} + . . . + s_{r}^{2}}} & 1 \leq l \leq r \end{matrix}

(6)

where $0 \leq β_{l} \leq 1$ . The low-rank property of a matrix can be determined based on the value of l: when $β_{l} \approx 1$ , a smaller l indicates a better low rankness.

Singular value thresholding decomposition can be performed on matrix X₁ constructed from random harmonic data. In addition, matrix X₂ is constructed from partially similar data, and matrix X₃ is constructed from completely similar data. The low-rank properties of the matrices are determined using (6).

As shown in Table I, $β_{4}$ of X₁ is notably below 1, while $β_{4}$ of X₂ is approximately 0.917, and $β_{1}$ of X₃ is closer to 1. Consequently, a matrix constructed from perfectly similar data has obvious low-rank properties, thus providing a theoretical basis for the analysis and processing of harmonic data acquired from PQMDs.

TABLE I LOW-RANK ANALYSIS OF MATRICES

Matrix	β₁	β₂	β₃	β₄
X₁	0.763	0.776	0.787	0.796
X₂	0.746	0.834	0.893	0.917
X₃	0.993	0.996	0.997	0.998

B. Correlation Analysis of Harmonic Data

The analysis above indicates that the recovery of harmonic data requires finding data with similar dynamic properties. Hence, we use the DTW algorithm to perform correlation analysis on harmonic data from different PQMDs [

35]. The DTW algorithm can assess correlations even when the compared datasets have different sizes (e.g., when data loss occurs), and it is less sensitive to the number of samples.

Let $A = {a_{1}, a_{2}, . . ., a_{n}}$ and $Β = {b_{1}, b_{2}, . . ., b_{m}}$ be two sets of harmonic data containing voltage or current amplitudes. Distance matrix D is constructed with elements $d (x, y)$ given by:

d (x, y) = \sqrt[]{(a_{x} - b_{y})^{2}} x = 1,2, . . ., n, y = 1,2, . . ., m

(7)

where x and y are the datapoints from datasets $A$ and $Β$ , respectively; a_x and b_y are the datapoints from the x and y elements, respectively; and n and m are the numbers of samples in the two datasets.

As shown in Fig. 1, the set of adjacent elements in D can be defined as a curved path that must satisfy the constraints of boundedness, continuity, and monotonicity [

36]. The set of curved paths is denoted by

R = {r_{1}, r_{2}, . . ., r_{s}, . . ., r_{k}}

, where k is the number of the required paths. Element r_s denotes the coordinates of point s along the path, i.e.,

r_{s} = (x, y)

Fig. 1 Diagram of DTW process.

Many curved paths satisfy the above-mentioned constraints [

36]. We use the DTW algorithm to find an optimal curved path (e.g., path depicted by solid blue squares in Fig. 1) that minimizes the total cost of fitting the two datasets as:

D T W (A, B) = \underset{}{m i n} \sum_{s = 1}^{k} R (r_{s}) = \frac{1}{k} \sqrt[]{\sum_{s = 1}^{k} r_{s}}

(8)

where $D T W (A, B)$ is used to calculate the distance between sets A and B. A smaller DTW value indicates a higher correlation between the two datasets. The correlation between any pair of PQMDs can be evaluated using (8), thus establishing a foundation for correlation clustering of PQMDs in power systems.

C. Correlation Clustering of PQMDs

The correlation between two PQMDs can be calculated using (8), which establishes correlation matrix C, i.e., $C (i, j) = D T W (i, j)$ , where i and j denote different monitoring devices. To facilitate correlation analysis, the elements of C can be normalized as:

C^{'} (i, j) = [\frac{C (i, j) - m i n C}{m i n C - m a x C}] + 1

(9)

We use data correlation to perform spectral clustering on $C^{'}$ , achieving reasonable clustering of different PQMDs, as illustrated in Fig. 2.

Fig. 2 Schematic of correlation subgroups for different PQMDs.

First, Laplacian matrix L can be constructed to map the measurements from a low-dimensional space onto a high-dimensional one.

L = K - C^{'}

(10)

The elements of degree matrix K can be defined as:

\{\begin{array}{l} K = d i a g {k_{1}, k_{2}, . . ., k_{t}} \\ k_{i} = \sum_{j = 1}^{t} C^{'} (i, j) i = 1,2, . . ., t \end{array}

(11)

where t denotes the number of PQMDs that require correlation subgroups.

As illustrated in Fig. 2, the subgroup of PQMDs should satisfy the following three principles.

1) The sum of data correlations (DTW values) within the subgraph should be maximized.

2) The data correlation between subgraphs should be minimized.

3) Each subgraph should contain as many measurements as possible.

Based on these principles, the normalized cut algorithm can be expressed as [

37]:

N C u t (O_{1}, O_{2}, . . ., O_{q}) = \frac{1}{2} \sum_{i = 1}^{q} \frac{C u t (O_{i}, {\bar{O}}_{i})}{v o l (O_{i})}

(12)

where O_i denotes different subgraphs that can be divided into q subgraphs; and $v o l (O_{i})$ is the sum of the weights for each subgraph. We introduce indicator vector h_j given by:

\{\begin{array}{l} h_{j} = [h_{1 j s} h_{2 j s} . . . h_{t j s}]^{Τ} j = 1,2, . . ., q \\ h_{i j} = \{\begin{array}{l} \frac{1}{\sqrt[]{v o l (O_{j})}} i \in O_{j}, i = 1,2, . . ., t \\ 0 i \notin O_{j}, i = 1,2, . . ., t \end{array} \end{array}

(13)

Based on (13), (12) can be rewritten as:

N C u t (O_{1}, O_{2}, . . ., O_{q}) = \sum_{j = 1}^{q} h_{j}^{Τ} L h_{j} = \sum_{j = 1}^{q} (H^{Τ} {L H)}_{j j} = t r (H^{Τ} L H)

(14)

where $t r (H^{Τ} L H)$ denotes the matrix trace. Thus, the normalized cut algorithm aims to minimize $t r (H^{Τ} L H)$ :

\{\begin{array}{l} \underset{H \in R^{t \times q}}{a r g m i n} t r (H^{Τ} L H) \\ s . t . H^{Τ} L H = I \end{array}

(15)

where $H = K^{- \frac{1}{2}} T$ , and T is the characteristic matrix. Equation (15) can be expressed as:

\{\begin{array}{l} \underset{T \in R^{t \times q}}{a r g m i n} t r (T^{Τ} K^{- \frac{1}{2}} L K^{\frac{1}{2}} H) \\ s . t . T^{Τ} T = I \end{array}

(16)

Because $q ≪ t$ causes partial information loss, T cannot be used directly to segment the subgraph. The row vector of $T \in R^{t \times q}$ can be used as a sample for k-means clustering to obtain the final subgroup results. Algorithm 1 summarizes PQMD clustering.

Algorithm 1 : PQMD clustering

Input: harmonic data of different PQMDs

Output: subgroup results of PQMDs

1: Construct distance matrix D using (7)

2: Construct correlation matrix $C^{'}$ using (8) and (9)

3: Construct Laplacian matrix L using (10) and (11)

4: Introduce indicator vector $h_{j}$ using (13)

5: Combined with h_j, convert (12) into (14)

6: Combining (15) and (16), obtain characteristic matrix T

7: Use row vectors of T as samples for k-means clustering

8: Obtain subgroup results of PQMDs

III. HDR for PQMDs Based on Multivariate Norm Matrix

A. HDR Model Based on Multivariate Norm Matrix

Noise and sparse outliers often pollute harmonic data. As noise has a high density and small amplitude, we use the Frobenius norm for mathematical description, while outliers with sparsity are described using the L₁-norm.

We formulate an HDR model based on multivariate norm matrices considering the low-rank matrix model as:

\{\begin{array}{l} m i n ({‖X^{'}‖}_{*} + λ {‖E‖}_{1} + ρ {‖G‖}_{F}^{2}) \\ s . t . X = X^{'} + E + G + N \end{array}

(17)

where $X^{'}$ is a matrix with recovered data; ${‖\cdot‖}_{*}$ denotes the kernel norm of the matrix; E is a sparse outlier matrix; ${‖\cdot‖}_{1}$ denotes the L₁-norm; G is a noise matrix; ${‖\cdot‖}_{F}^{2}$ denotes the Frobenius norm; N is an auxiliary matrix, which stores opposite values of missing data; and $λ$ and $ρ$ are the weight coefficients.

B. Solution of HDR Model

Equation (17) represents a complex equation-constrained multi-objective optimization problem, and the discontinuous derivability of various norms further increases complexity of (17) [

38]. If (17) is changed into an augmented Lagrangian form, the above problem can be transformed into the following unconstrained optimization problem:

\begin{array}{l} L (X^{'}, E, G, N, Y, μ) = {‖X^{'}‖}_{*} + λ {‖E‖}_{1} + ρ {‖G‖}_{F}^{2} + \\ <Y, X^{'} + E + G + N - X> + \frac{μ}{2} ‖X^{'} + {E + G + N - X‖}_{F}^{2} \end{array}

(18)

where Y is a Lagrangian multiplier; $μ$ is a penalty factor; and $<\cdot>$ denotes the internal product of the matrix.

Because (18) is a multidimensional multi-norm joint optimization problem, we solve it using an adaptive alternating direction method of multipliers; that is, alternating iterations are required to update $X^{'}, E, G, N, Y$ .

1) Update $X^{'}$ : fixing $E, G, N, Y$ , the objective function of $X^{'}$ can be expressed as:

\begin{array}{l} X_{k + 1}^{'} = \underset{X^{'}}{a r g m i n} ({‖X^{'}‖}_{*} + <Y_{k}, X^{'} + E_{k} + G_{k} + N_{k} - X> + \begin{matrix}  \end{matrix} \\ \frac{μ}{2} {‖X^{'} + E_{k} + G_{k} + N_{k} - X‖}_{F}^{2}) = \underset{X^{'}}{a r g m i n} ({‖X^{'}‖}_{*} + \frac{μ}{2} {‖X^{'} - C_{X^{'}}‖}_{F}^{2}) \end{array}

(19)

where $C_{X^{'}} = X - E_{k} - G_{k} - N_{k} - Y_{k} / μ$ . Equation (19) yields a closed-form solution [

39],

X_{k + 1}^{'} = U τ_{e} (Σ) V^{T}

, where

τ_{e} (\cdot)

denotes a soft threshold operator,

e = 2 / μ

; and

U, Σ

, and

V

are factor matrices that perform singular value thresholding decomposition on matrix

C_{X^{'}}

2) Update E: fixing $X^{'}, G, N, Y$ , the objective function of E can be expressed as:

\begin{array}{l} E_{k + 1} = \underset{E}{a r g m i n} λ {‖E‖}_{1} + <Y_{k}, X_{k + 1}^{'} + E + G_{k} + N_{k} - X> + \\ \frac{μ}{2} {‖X_{k + 1}^{'} + E + G_{k} + N_{k} - X‖}_{F}^{2} = \underset{X^{'}}{a r g m i n} (λ {‖E‖}_{1} + \frac{μ}{2} {‖E - C_{E}‖}_{F}^{2}) \end{array}

(20)

where $C_{E} = X - X_{k + 1}^{'} - G_{k} - N_{k} - Y_{k} / μ$ . Equation (20) also yields a closed-form solution [

40], and

E_{k + 1} = τ_{g} (C_{E})

. Here,

e = 2 λ / μ

for

τ_{e} (\cdot)

3) Update G: fixing $X^{'}, E, N, Y$ , the objective function of G can be expressed as:

\begin{array}{l} G_{k + 1} = \underset{G}{a r g m i n} (ρ {‖G‖}_{F}^{2} + <Y_{k}, X_{k + 1}^{'} + E_{k + 1} + G + N_{k} - X> + \\ {‖X_{k + 1}^{'} + E_{k + 1} + G + N_{k} - X‖}_{F}^{2}) = ρ {‖G‖}_{F}^{2} + \frac{μ}{2} {‖G + C_{G}‖}_{F}^{2} \end{array}

(21)

where $C_{G} = X_{k + 1}^{'} + E_{k + 1} + N_{k} - X + Y_{k} / μ$ .

The unconstrained optimization problem with a double Frobenius norm can be solved using the matching method. Equation (21) can be rewritten as:

\begin{array}{l} ρ {‖G‖}_{F}^{2} + \frac{μ}{2} {‖G + C_{G}‖}_{F}^{2} = \sum_{\begin{array}{l} i = 1,2, \dots, n \\ j = 1,2, \dots, m \end{array}} [ρ g_{i, j}^{2} + \frac{μ}{2} (g_{i, j} + c_{i, j})^{2}] = \\ \sum_{\begin{array}{l} i = 1,2, \dots, n \\ j = 1,2, \dots, m \end{array}} \{(ρ + \frac{μ}{2}) {(g_{i, j} + \frac{μ}{2 ρ + μ} c_{i, j})}^{2} + [\frac{μ}{2} - {(\frac{μ}{2 ρ + μ})}^{2}] c_{i, j}^{2}\} \end{array}

(22)

where $g_{i, j}$ and $c_{i, j}$ are the elements in the $i^{t h}$ row and $j^{t h}$ column of $G$ and $C_{G}$ , respectively.

From (22), $ρ + μ / 2$ is consistently greater than 0. Consequently, for $g_{i, j} = - \frac{μ}{2 ρ + μ} c_{i, j}$ , the optimal solution of (22) can be obtained as:

G_{k + 1} = - \frac{μ}{2 ρ + μ} C_{G}

(23)

4) Update N:

N_{k + 1} = \underset{N}{a r g m i n} {‖X_{k + 1}^{'} + E_{k + 1} + G_{k + 1} + N - X + \frac{Y_{k}}{μ}‖}_{F}^{2}

(24)

The solution of N can be obtained directly by letting the norm in (24) to be 0.

N_{k + 1} = X - X_{k + 1}^{'} - E_{k + 1} - G_{k + 1} - Y_{k} / μ

(25)

5) Update Y:

Y_{k + 1} = X [Y_{k} + μ (X_{k + 1}^{'} + E_{k + 1} + G_{k + 1} + N_{k + 1} - X)]

(26)

The convex functions above have closed-form solutions, and the convergence of their overall and individual parts has been proven [

20].

From (26), it is evident that $μ$ affects the iteration step length and convergence efficiency of the proposed method. Thus, we propose an adaptive strategy. A larger step is set when the solution is far from the optimal point to accelerate convergence, while a smaller step is set when it is close to optimality to avoid fluctuations and ensure convergence.

μ_{k + 1} = \{\begin{array}{l} (1 + τ l g η_{k + 1}) μ_{k} η_{k + 1} \geq β \\ μ_{k} α < η_{k + 1} < β \\ 0.8 μ_{k} η_{k + 1} \leq α \end{array}

(27)

η_{k + 1} = \frac{{‖X - X_{k - 1}^{'} - E_{k - 1} - G_{k - 1} - N_{k - 1}‖}_{F}}{{‖X - X_{k}^{'} - E_{k} - G_{k} - N_{k}‖}_{F}}

(28)

where $η_{k}$ is the gradient change rate at iteration k; $τ$ is a variation coefficient; and $α$ and $β$ are the thresholds of the gradient change rate. The variables can be updated alternately from (19) to (26) until the errors between the two calculations are below threshold $ε$ , which is set to be 0.001.

Algorithm 2 summarizes the steps for solving the HDR model.

Algorithm 2 : solving HDR model

Input: harmonic data matrix X; parameters $λ, ρ, α, β, ε$

Output: matrices $X^{'}$ , E, G

1: Construct HDR model based on multivariate norm matrix using (17)

2: Combining the augmented Lagrangian algorithm, transform (17) into (18)

3: for $i t e r = 1$ to k

According to (18)-(28), update $X^{'}, E, G, N, Y, μ$ in sequence

if ${‖X_{k + 1}^{'} - X_{k}^{'}‖}_{F}^{2} \leq 0.001$ , ${‖E_{k + 1} - E_{k}‖}_{F}^{2} \leq 0.001$ , and ${‖G_{k + 1} - G_{k}‖}_{F}^{2} \leq 0.001$

Output $X^{'}$ , E, and G

else

$i t e r = i t e r + 1$

end

4: Complete recovery of missing data

The flowchart of the proposed HDR method based on multivariate norm matrix is shown in Fig. 3.

Fig. 3 Flowchart of proposed HDR method based on multivariate norm matrix.

Step 1: correlation analysis of harmonic data. First, we construct distance matrix D using (7). Then, (8) is used to evaluate the data correlations of the PQMDs.

Step 2: correlation clustering of PQMDs. Correlation matrix $C^{'}$ is constructed using (8) and (9). Spectral clustering is then performed on $C^{'}$ via (10)-(16), and the subgroup results are obtained.

Step 3: construction of HDR model based on multivariate norm matrices. The characteristics of harmonic data are analyzed. Then, the HDR model based on multivariate norm matrices is applied based on the conventional low-rank matrix model, as expressed in (17).

Step 4: solution of HDR model. By combining with the augmented Lagrangian algorithm, (17) is transformed into (18). Then, an adaptive alternating direction method of multipliers is applied, as shown in (19)-(28). Finally, the errors of the two calculations are used to determine whether to terminate the algorithm and retrieve the recovered harmonic data.

IV. Simulation Analysis

Simulation analysis was performed for the verification of the HDR method. Subsequently, field data from a power system in southeastern China were used to verify the reliability of our proposal. The simulation results were compared with those of the Lagrange interpolation (LIP) [

10], PCA [41], and iterative propagation reconstruction (IPR) methods [18]. All the simulations were performed in the MATLAB 2018 software on a computer equipped with 16 GB RAM and a 2.90 GHz Intel Core i5-10400F processor, running the Windows 10 operating system.

A. Missing Data Models for PQMDs

Most data loss events in existing studies have involved at most three continuous datapoints [

2]. Additionally, most studies have not focused on less probable events such as large amounts of continuous or non-continuous data loss [10]. To simplify the harmonic data problem, data loss can be summarized in three basic models, as illustrated in Fig. 4 and described below.

Fig. 4 Models of data loss.

1) Model I: single datapoint loss. A datapoint is randomly lost in a dataset.

2) Model II: continuous data loss. A segment of datapoints is randomly lost in a dataset.

3) Model III: mixed data loss. Models I and II are combined.

The mean squared error (MSE) can be used to quantify the accuracy of HDR as:

M S E = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - x_{i}^{'})^{2}

(29)

where $x_{i}^{'}$ and $x_{i}$ are the restored and true values, respectively; and n is the number of recovered datapoints.

B. Adjustment of Simulation Parameters

In (17), weight coefficient $λ$ of E is generally set as $λ = 1 / \sqrt[]{m a x (m, n)}$ [

42]. Owing to the uncertainty of noise, weight coefficient

ρ

of G should be rectified using historical data. We selected historical data with different data loss ratios for recovery verification. The recovery accuracy curves shown in Fig. 5 were obtained by varying weight coefficient

ρ

. The proposed method has few recovery errors when

ρ \in (0.15,0.7)

. In this paper, empirical value

ρ = 0.6

has been taken uniformly.

Fig 5 Rectification of weight coefficient $ρ$ .

As shown in Fig. 6, the variation of penalty factor $τ$ , upper limit $β$ , and lower limit $α$ affect the performance of the proposed method. After experimental comparisons, the combination $τ = 1$ , $α = 0.8$ , and $β = 1.2$ provided the best results, and the proposed method converged to an accuracy of 0.001.

Fig. 6 Convergence of adaptive penalty factor. (a) Parameters $α$ and β. (b) Parameter $τ$ .

Figure 7 shows that the error of the adaptive method decreases rapidly initially. The error reduction slows down around the optimal point, and the error fluctuation does not diverge over the iterations.

Fig. 7 Convergence rate of penalty factors.

C. Evaluation on IEEE 39-bus Test System

We used the IEEE 39-bus test system (New England) for simulation analysis (Fig. 8) considering 18 PQMDs, which were divided into five correlation subgroups using Algorithm 1, as listed in Table II.

Fig. 8 Diagram of IEEE 39-bus test system (New England) [

25].

TABLE II CORRELATION SUBGROUPS FOR PQMDs IN IEEE 39-BUS SYSTEM

Subgroup	Bus
I	2, 3, 9
II	6, 7, 10, 12
III	4, 13, 14, 16, 18
IV	19, 21, 23
V	26, 27, 28

To make the simulation environment more realistic, we injected the 3^rd, 5^th, 15^th, 25^th, 30^th, 35^th, and 50^th harmonic currents into the system. The 3^rd, 15^th, 25^th, 30^th, and 35^th harmonics were injected at bus 25, the 3^rd, 5^th, 30^th, 35^th, and 50^th harmonics were injected at bus 19, and the 3^rd, 5^th, 15^th, 25^th, and 50^th harmonics were injected at bus 5. The output data rate of each PQMD was 1 over 3 s. Thus, in the simulation. There are 30 datapoints in a duration of 90 s. The single, continuous, and mixed data loss models were randomly imposed on the PQMD data. To avoid errors caused by the randomness in the simulations, all the results reported in tables from this subsection are the means across 50 repetitions.

To examine the applicability of the proposed method, we randomly performed single, continuous, and mixed data loss simulations on 30 datasets collected from buses 6, 14, and 19. The MSE values of the LIP, PCA, IPR, and proposed methods for the data recovery of the 3^rd, 5^th, 15^th, 25^th, 30^th, 35^th, and 50^th harmonics are listed in Tables III-V for the corresponding buses. Figure 9 shows data recovery for bus 19 under a definite 30% mixed data loss.

Fig. 9 Data recovery for bus 19 under a definite 30% mixed data loss. (a) 3^rd-order harmonic current amplitude. (b) 5^th-order harmonic data. (c) 15^th-order harmonic data. (d) 25^th-order harmonic data. (e) 30^th-order harmonic data. (f) 35^th-order harmonic data. (g) 50^th-order harmonic data.

1) Correctness Verification of HDR

Tables III-V show that the four evaluated methods work well for single datapoint loss. The errors of the four methods fluctuate by approximately 0.02% for the considered harmonics. However, the proposed method outperforms the LIP, PCA, and IPR methods regarding 30% continuous data loss and 30% mixed data loss. For bus 6, the MSE values of the proposed method are 0.39%, 0.42%, 0.36%, 0.35%, 0.39%, 0.41%, and 0.41% under 30% continuous data loss of the 3^rd, 5^th, 15^th, 25^th, 30^th, 35^th, and 50^th harmonics, respectively. The MSE values of the IPR method are 29.86%, 29.93%, 29.91%, 29.36%, 29.74%, 27.72%, and 30.11% under 30% continuous data loss of the 3^rd, 5^th, 15^th, 25^th, 30^th, 35^th, and 50^th harmonics, respectively. The error of the PCA method under 30% continuous data loss is approximately 31%, and that of the LIP method is approximately 42%. The MSE values of the proposed method are 0.48%, 0.51%, 0.46%, 0.48%, 0.49%, 0.50%, and 0.48% under 30% mixed data loss of the 3^rd, 5^th, 15^th, 25^th, 30^th, 35^th, and 50^th harmonics, respectively. Among the comparison methods, the IPR method achieves the best recovery performance (approximately 36%), while the LIP method has the worst recovery performance (approximately 50%). These results indicate that the harmonic order has a small effect on the proposed method.

TABLE III MSE VALUES OF LIP, PCA, IPR, AND PROPOSED METHODS FOR BUS 6

Order	Model	MSE (%)
Order	Model	LIP	PCA	IPR	Proposed
3^rd	I (single loss)	0.02	0.02	0.01	0.02
	II (30% loss)	42.26	30.22	29.86	0.39
	III (30% loss)	50.26	35.74	35.01	0.48
5^th	I (single loss)	0.02	0.02	0.01	0.02
	II (30% loss)	44.28	31.16	29.93	0.42
	III (30% loss)	51.36	37.27	36.82	0.51
15^th	I (single loss)	0.02	0.02	0.02	0.01
	II (30% loss)	39.28	30.37	29.91	0.36
	III (30% loss)	47.89	36.17	35.27	0.46
25^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	39.16	30.83	29.36	0.35
	III (30% loss)	48.92	37.16	34.90	0.48
30^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	40.88	31.75	29.74	0.39
	III (30% loss)	48.86	36.26	35.63	0.49
35^th	I (single loss)	0.02	0.02	0.01	0.02
	II (30% loss)	40.39	30.99	27.72	0.41
	III (30% loss)	50.62	37.18	34.26	0.50
50^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	41.64	31.26	30.11	0.41
	III (30% loss)	50.51	36.02	35.25	0.48

TABLE IV MSE VALUES OF LIP, PCA, IPR, AND PROPOSED METHODS FOR BUS 14

Order	Model	MSE (%)
Order	Model	LIP	PCA	IPR	Proposed
3^rd	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	40.91	30.45	27.65	0.41
	III (30% loss)	48.32	34.87	32.89	0.46
5^th	I (single loss)	0.02	0.02	0.02	0.01
	II (30% loss)	42.14	31.28	28.35	0.44
	III (30% loss)	48.73	36.38	34.51	0.52
15^th	I (single loss)	0.02	0.02	0.01	0.02
	II (30% loss)	40.26	32.27	30.26	0.39
	III (30% loss)	51.86	36.95	34.73	0.48
25^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	42.16	31.08	27.87	0.40
	III (30% loss)	52.19	35.98	33.94	0.49
30^th	I (single loss)	0.01	0.02	0.02	0.01
	II (30% loss)	40.02	29.86	28.52	0.37
	III (30% loss)	48.27	35.37	34.29	0.45
35^th	I (single loss)	0.02	0.02	0.02	0.01
	II (30% loss)	41.26	32.21	30.01	0.39
	III (30% loss)	47.48	37.18	36.49	0.48
50^th	I (single loss)	0.02	0.03	0.01	0.02
	II (30% loss)	40.67	31.22	27.33	0.42
	III (30% loss)	50.25	36.67	34.12	0.53

TABLE V MSE VALUES OF LIP, PCA, IPR, AND PROPOSED METHODS FOR BUS 19

Order	Model	MSE (%)
Order	Model	LIP	PCA	IPR	Proposed
3^rd	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	40.27	29.83	28.41	0.39
	III (30% loss)	49.55	36.24	34.19	0.47
5^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	42.85	32.24	28.87	0.41
	III (30% loss)	46.27	36.79	35.24	0.49
15^th	I (single loss)	0.02	0.02	0.02	0.01
	II (30% loss)	40.29	31.38	30.05	0.37
	III (30% loss)	49.96	35.98	34.98	0.42
25^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	42.09	31.19	28.63	0.39
	III (30% loss)	50.27	37.28	35.28	0.49
30^th	I (single loss)	0.02	0.03	0.01	0.01
	II (30% loss)	39.97	32.28	27.78	0.42
	III (30% loss)	49.24	35.94	35.29	0.51
35^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	41.28	30.08	29.28	0.39
	III (30% loss)	51.71	36.27	34.28	0.49
50^th	I (single loss)	0.02	0.02	0.02	0.01
	II (30% loss)	42.39	30.94	28.86	0.41
	III (30% loss)	49.91	36.24	35.22	0.45

When the PQMDs with data loss are located in different areas, the proposed HDR method still exhibits a good recovery performance and high stability. For bus 14, the errors of the proposed method are approximately 0.02%, 0.40%, and 0.48% for single, 30% continuous, and 30% mixed data loss, respectively. This is similar to the performance of the HDR method for the recovery in buses 6 and 19. Meanwhile, the recovery performances of the IPR, PCA, and LIP methods fluctuate. Under 30% mixed data loss, the recovery errors fluctuate in 32%-37% for the IPR method, 34%-38% for the PCA method, and 46%-52% for the LIP method.

Tables III-V reveal that the IPR method outperforms the LIP and PCA methods for the HDR but is still inferior to the proposed method. This is because the IPR method propagates the residual values to nearby buses and adds them to the results of the previous iteration to obtain the output of the current iteration. Using this method, reconstruction residuals obtained from insufficient local information cannot compensate for the fluctuations of unknown information. Consequently, the IPR method is vulnerable to the data loss ratio under continuous or mixed data loss.

The PCA method decomposes data into components using the direction of maximum variance as the main feature. However, the data loss ratio can affect decomposition. Moreover, harmonic data do not necessarily follow a Gaussian distribution, and the principal components obtained using the PCA method are not always optimal. Multiple factors can cause the ineffectiveness of the PCA method in recovering harmonic data. The LIP method uses known data to build a suitable interpolation function to estimate unknown data. With increasing continuous missing data, the error of the LIP method increases. Moreover, this method is more dependent on sample data, hindering the accurate estimation of data trends. Consequently, the LIP method is poor at recovering missing harmonic data with transient and disordered characteristics.

2) Running Time of HDR

The computational complexity of the four evaluated methods was investigated, obtaining the running time for single, 30% continuous, and 30% mixed data loss listed in Table VI. Considering the mixed data loss ratios in bus 14, the running time of the four evaluated methods are shown in Fig. 10.

TABLE VI RUNNING TIME OF FOUR EVALUATED METHODS

Bus (order)	Model	Running time (s)
Bus (order)	Model	LIP	PCA	IPR	Proposed
4 (3^rd)	I (single loss)	0.04	0.58	0.62	0.56
	II (30% loss)	1.17	2.05	2.31	1.89
	III (30% loss)	1.39	2.57	2.87	2.26

Fig. 10 Running time of four evaluated methods for different data loss ratios of model III.

Table VI and Fig. 10 show that the LIP method has the shortest running time, but it has an excessive recovery error. The proposed method is slightly better than the PCA method, and the IPR method has the highest computational complexity. This is because each iteration of the IPR method requires propagation to process the reconstruction residuals. As the ratio of data loss increases, the IPR method should process more residuals and propagate over more proximity buses, causing a sudden increase in running time. The HDR method uses the adaptive alternating direction method of multipliers to solve the model with an adaptive setting of penalty factor $μ$ . A larger step length can be set far from the optimal point to accelerate convergence, and a smaller step length can be set close to the optimal point to avoid oscillation and ensure convergence. The overall running time of the HDR method is slightly better than that of the PCA method, but the proposed method provides a substantially higher recovery performance. Thus, the proposed method can effectively recover mixed data loss with a reasonable computational complexity.

3) Applicability Verification of HDR

1) Scenario 1: different data loss ratios

The effectiveness of the four evaluated methods was compared by varying the data loss ratios of the 5^th harmonic at bus 6 for mixed data loss, obtaining the results listed in Table VII. The data not shown in Table VII indicate excessive errors (>80%).

TABLE VII EFFECT OF DATA LOSS RATIOS ON RECOVERY PERFORMANCE

Loss ratio (%)	MSE (%)
Loss ratio (%)	LIP	PCA	IPR	Proposed
10	20.94	9.72	11.29	0.29
20	42.18	26.29	20.35	0.42
30	51.36	37.27	36.82	0.51
40	67.20	46.03	42.15	1.26
50		58.14	49.27	2.35
60		72.93	61.27	4.75
70			74.84	9.51

The recovery performance of the four methods decreases with increasing data loss ratio. For high data loss ratios (e.g., 50%), the LIP and PCA methods are ineffective for recovery, losing their practical significance. For continuous data loss, the error of the LIP method is large. When the data loss ratio is less than 30%, the PCA and IPR methods are effective. When data loss ratios are above 40%, the errors of the PCA, IPR, and HDR methods increase sharply. The PCA and IPR methods stop recovering data, but the proposed method maintains a high recovery accuracy. When the data loss ratio is 70%, the MSE of the proposed method is 9.51%. Overall, under severe harmonic data loss, the proposed method retains an acceptable recovery accuracy.

To better explore the applicability of the proposed method, 18 PQMDs were sequentially subjected to mixed data loss ratios ranging from 30% to 70% (at 10% intervals). The MSE values of the proposed method at the PQMDs are shown in Fig. 11. The proposed method guarantees the accuracy of data recovery for all the PQMDs under different data loss ratios. Thus, the proposed method seems promising for practical applications.

Fig. 11 MSE values of proposed method under different ratios of model III.

2) Scenario 2: noise and outliers

Noise often interferes with measurements of harmonic data. To investigate the noise robustness of the HDR method, white Gaussian noise and sparse outliers were added to the data from bus 14. The signal-to-noise ratio (SNR) ranged from 30 dB to 80 dB (at 10 dB intervals). Sparse outliers were randomly added to the data. The MSE values at bus 14 with 30% mixed data loss are listed in Table VIII. The data not shown in Table VIII indicate excessive errors (>80%).

TABLE VIII MSE VALUES AT BUS 14 WITH 30% MIXED DATA LOSS WITH ADDED NOISE AND SPARSE OUTLIERS

Order	SNR	MSE (%)
Order	SNR	LIP	PCA	IPR	Proposed
3^rd	30			70.27	4.68
	40			57.29	3.29
	50		70.23	52.38	2.53
	60	75.27	56.25	45.28	1.83
	70	63.29	47.93	40.11	1.16
	80	52.26	39.26	35.27	0.72
5^th	30			75.26	4.99
	40			60.71	3.46
	50		75.92	53.01	2.68
	60	77.12	59.49	46.92	1.79
	70	67.26	48.79	41.23	1.19
	80	53.27	39.87	36.27	0.81
15^th	30			72.33	4.73
	40			61.23	3.06
	50		72.37	55.28	2.38
	60	82.14	55.26	47.06	1.72
	70	70.26	48.27	40.96	1.35
	80	55.26	40.15	37.28	0.76

Overall, the proposed method exhibits good noise robustness. The LIP and PCA methods are the most affected by noise because they neglect noise modeling, thus being sensitive to Gaussian noise and pulse noise. The IPR method provides better noise immunity by first propagating the residuals to the other buses in each iteration before projecting them to the low-pass filter subspace. The proposed method suitably separates harmonic data, Gaussian noise, and sparse outliers through the recovery model based on multivariate norm matrices. The data and noise spaces are separated by joint multidimensional multinorm optimization. This method weakens the influence of noise on the solution of low-rank data matrices. Thus, the proposed method achieves high immunity against noise.

D. Evaluation on IEEE 118-bus Test System

The IEEE 118-bus test system was used to further verify the effectiveness of the proposed method. The diagram of the IEEE 118-bus test system is shown in Fig. 12. We injected the 3^rd, 15^th, and 25^th harmonics at bus 5, the 3^rd, 5^th, and 35^th harmonics at bus 18, the 5^th, 15^th, and 50^th harmonics at bus 66, the 15^th, 25^th, and 50^th harmonics at bus 69, the 5^th, 25^th, and 35^th harmonics at bus 94, and the 3^rd, 5^th, 15^th, 25^th, 35^th, and 50^th harmonics at bus 99. The PQMDs were then divided into nine subgroups, as shown in Table IX. The rest of the simulation conditions were the same as those used for the IEEE 39-bus test system.

Fig. 12 Diagram of IEEE 118-bus test system.

TABLE IX CORRELATION SUBGROUPS FOR PQMDS IN IEEE 118-BUS SYSTEM

Subgroup	PQMD
I	3, 7, 9, 11, 12
II	17, 21, 25, 28, 32
III	24, 68, 71, 75
IV	34, 37, 38, 41
V	45, 49, 53, 56
VI	59, 62, 64, 65
VII	77, 79, 92, 96, 100
VIII	83, 85, 90
IX	103, 105, 106, 110

As subgroup VIII has the fewest PQMDs, its data recovery environment is the most hostile (fewer devices without data loss mean less recovery information available). Therefore, bus 83 was randomly selected for data recovery in subgroup VIII. Single, continuous, and mixed data loss models were randomly applied to bus 83.

1)　Correctness Verification of HDR

As shown in Table X, the LIP method has the worst recovery performance with the largest error, while the PCA and IPR methods have smaller recovery errors but still underperform the proposed method. Moreover, harmonic orders have less impact on the proposed method. For different harmonic orders, the MSE values of the LIP method fluctuate in 0.01%-0.02% under single data loss, 49%-45% under 30% continuous data loss, and 48%-51% under 30% mixed data loss. For the PCA method, the respective fluctuations are 0.02%-0.03%, 30%-33%, and 35%-38%, and for the IPR method, the fluctuations are 0.02%, 28%-33%, and 34%-36%. On the other hand, the MSE values of the proposed method fluctuate in 0.01%-0.02% under single data loss, 0.36%-0.42% under 30% continuous data loss, and 0.46%-0.52% under 30% mixed data loss.

TABLE X MSE VALUES OF DATA RECOVERY FOR BUS 83

Order	Model	MSE (%)
Order	Model	LIP	PCA	IPR	Proposed
3^rd	I (single loss)	0.01	0.02	0.02	0.02
	II (30% loss)	40.38	32.14	29.58	0.40
	III (30% loss)	51.22	36.77	34.05	0.49
5^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	42.93	32.53	31.23	0.38
	III (30% loss)	48.85	37.27	34.73	0.46
15^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	39.95	30.21	32.29	0.42
	III (30% loss)	48.37	36.18	35.76	0.51
25^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	43.10	30.48	29.64	0.38
	III (30% loss)	50.61	35.72	35.01	0.46
35^th	I (single loss)	0.02	0.03	0.02	0.01
	II (30% loss)	44.27	32.29	28.49	0.41
	III (30% loss)	49.58	37.55	34.19	0.52
50^th	I (single loss)	0.02	0.02	0.02	0.02
	II (30% loss)	40.22	30.99	30.26	0.36
	III (30% loss)	50.24	36.18	35.94	0.48

2)　Applicability Verification of HDR

As shown in Table XI, when the data loss ratio is 30%, the MSE values of the LIP method are approximately 50%, being out of practical significance. When the data loss ratio is greater than 40%, the MSE values for the LIP and PCA methods are greater than 48% and 42%, respectively, while for the proposed method, it is only about 1.13%. Hence, the PCA and IPR methods cannot accurately recover missing data, while the proposed method retains a high recovery accuracy. When the data loss ratio is 70%, the MSE of the proposed method is approximately 9.0%. Therefore, the proposed method has a high recovery accuracy when large amounts of data are lost.

TABLE XI MSE VALUES UNDER DIFFERENT DATA LOSS RATIOS FOR BUS 83

Order	Loss ratio (%)	MSE (%)
Order	Loss ratio (%)	LIP	PCA	IPR	Proposed
3^rd	10	20.16	11.38	10.32	0.29
	20	40.28	25.83	22.15	0.43
	30	49.95	37.49	34.91	0.51
	40	72.36	46.01	44.53	1.17
	50		60.37	53.36	2.29
	60		71.99	66.47	4.52
	70			78.72	9.25
5^th	10	22.05	12.26	10.35	0.30
	20	44.73	23.34	20.82	0.43
	30	54.24	37.38	36.35	0.49
	40	69.89	49.72	43.27	1.06
	50		60.30	51.83	2.16
	60		73.88	62.18	4.28
	70			74.49	8.66
15^th	10	26.53	12.45	10.75	0.21
	20	44.90	23.78	18.89	0.38
	30	53.07	39.84	35.47	0.45
	40	72.25	50.01	41.25	0.97
	50		64.97	48.32	1.95
	60			57.68	3.82
	70			69.93	7.55
25^th	10	23.80	11.84	9.95	0.28
	20	42.25	24.05	19.47	0.43
	30	50.27	35.82	33.85	0.55
	40	63.36	47.37	41.01	1.32
	50	78.83	59.26	55.34	2.46
	60		74.49	70.28	4.19
	70				8.28
35^th	10	20.14	10.24	12.23	0.24
	20	39.27	23.46	20.47	0.39
	30	49.93	38.51	35.59	0.51
	40	58.89	51.93	42.21	1.05
	50	73.21	64.28	51.04	2.27
	60			65.52	4.01
	70			74.49	8.94
50^th	10	25.53	10.23	11.14	0.32
	20	43.28	22.38	23.92	0.44
	30	51.94	36.04	35.14	0.52
	40	64.42	44.28	45.03	1.25
	50		57.70	53.22	2.29
	60		75.34	66.35	4.17
	70			72.34	9.01

Overall, the simulation analysis for the IEEE 118-bus system is consistent with that for the IEEE 39-bus system. The effect of harmonic orders on the proposed method is small, and the proposed method retains a high recovery accuracy even under severe data loss.

E. Validation on Real Data

We used field data from a power system in southeastern China to validate the proposed method. The PQMDs in the region were correlated into clusters to obtain a test platform, as illustrated in Fig. 13. We assumed that data loss occurred at PQMD 2, where single, continuous, and mixed data losses were performed randomly on 50 datapoints. The results reported in this subsection are the means across 50 simulation iterations.

Fig. 13 Diagram of 35 kV regional network for HDR validation.

1)　Correctness Verification of HDR

The MSE values of data recovery for PQMD 2 are listed in Table XII.

TABLE XII MSE VALUES OF DATA RECOVERY FOR PQMD 2

Order	Model	MSE (%)
Order	Model	LIP	PCA	IPR	Proposed
3^rd	I (single loss)	0.03	0.03	0.02	0.02
	II (40% loss)	69.20	50.27	44.38	1.61
	III (40% loss)	75.26	55.18	48.21	1.89
5^th	I (single loss)	0.02	0.03	0.02	0.02
	II (40% loss)	69.28	51.06	44.57	1.63
	III (40% loss)	76.12	55.26	47.73	1.82
7^th	I (single loss)	0.03	0.03	0.02	0.02
	II (40% loss)	70.05	52.91	42.95	1.72
	III (40% loss)	75.12	56.38	45.26	1.91
15^th	I (single loss)	0.03	0.03	0.02	0.02
	II (40% loss)	71.02	51.98	45.72	1.61
	III (40% loss)	77.28	56.84	50.97	1.84

The LIP method has the worst recovery performance with large errors for single, 40% continuous, and 40% mixed data loss of approximately 0.03%, 70%, and 76%, respectively. The errors of the PCA and IPR methods are lower but still considerably higher than those of the proposed method. In particular, the PCA method has errors of approximately 0.03%, 51%, and 56% for the three data loss models. The error of the IPR method is slightly lower than that of the PCA method but also reaches approximately 48% under 40% mixed data loss. In contrast, the errors of the proposed method are approximately 0.02%, 1.62%, and 1.90% under the three models, respectively, showing that it retains high recovery accuracy and stability for practical applications.

2)　Applicability Verification of HDR

The reliability and accuracy of the proposed method were verified by varying the data loss ratios of the 3^rd harmonic from PQMD 2 under mixed data loss, obtaining the results listed in Table XIII. The errors of the LIP, PCA, and IPA methods increase sharply with increasing data loss ratio. For field data, the LIP method cannot accurately recover data beyond 20% mixed data loss. The PCA method cannot accurately recover missing data under 40% mixed data loss. Meanwhile, for 50% mixed data loss, the MSE of the IPR method is 65.94%. In contrast, the error of the proposed method is only 12.28% under 70% mixed data loss, thus retaining a high recovery accuracy.

TABLE XIII EFFECT OF DATA LOSS RATIO ON HDR PERFORMANCE USING REAL DATA

Loss ratio (%)	MSE (%)
Loss ratio (%)	LIP	PCA	IPR	Proposed
10	36.97	18.42	20.64	0.43
20	53.89	32.28	29.20	0.51
30	60.56	40.19	40.81	1.18
40	75.26	55.18	48.21	1.89
50		79.02	65.94	3.76
60			74.63	6.86
70				12.28

3)　Running Time of HDR

Table XIV lists the running time of the four evaluated methods for the recovery of the 3^rd harmonic from PQMD 2. The LIP method has a short running time, but the high recovery errors undermine its performance. For single datapoint loss, the running time is 1.35 s for the PCA method, 1.04 s for the IPR method, and 1.07 s for the proposed method. Under 40% mixed data loss, the IPR method has the longest running time (approximately 5.01 s), followed by the PCA method (approximately 4.59 s) and the proposed method (approximately 4.47 s). These results are consistent with those of the simulation analysis. Hence, the proposed method can achieve excellent performance in terms of both recovery accuracy and computational complexity in practical applications.

TABLE XIV RUNNING TIME OF FOUR EVALUATED METHODS FOR RECOVERY OF 3^RD HARMONIC FROM PQMD 2

Model	Running time (s)
Model	LIP	PCA	IPR	Proposed
I (single loss)	0.04	1.35	1.04	1.07
II (40% loss)	2.75	3.42	4.26	3.25
III (40% loss)	3.36	4.59	5.01	4.47

V. Conclusion

We introduce an HDR method based on multivariate norm matrices. Through extensive simulation and field data verifications, the following conclusions are drawn.

1) Based on the DTW and normalized cut algorithms, the proposed method can select appropriate PQMDs when the system topology is unknown or parameters change, thereby improving the applicability to large-scale power systems.

2) The adaptive penalty factor can reduce the number of iterations and ensure the convergence stability.

3) Pollution matrices (noise and sparse outliers) are introduced to separate features from original data, thereby improving the anti-interference ability.

4) Comparison experiments at different power system scales and recovery scenarios demonstrate the advantages of the proposed method in terms of speed, accuracy, and applicability.

The proposed method leverages low-rank properties for HDR, being suitable for practical engineering applications. However, the method cannot fully explore system data relationships because the system models are simplified. In future research, we will aim to recover complete data loss (total data loss for all devices in an area) and compensate for systematic errors owing to power transformer deviations or device measurement algorithms.

REFERENCES

C. Zhuang, J. An, Z. Liu et al., “Data completion for power load analysis considering the low-rank property,” CSEE Journal of Power and Energy Systems, vol. 8, no. 6, pp. 1751-1759, Nov. 2022. [Baidu Scholar]

F. L. Grando, A. E. Lazzaretti, and M. Moreto, “The impact of PMU data precision and accuracy on event classification in distribution systems,” IEEE Transactions on Smart Grid, vol. 13, no. 2, pp. 1372-1382, Mar. 2022. [Baidu Scholar]

R. Xu, X. Ma, Y. Wang et al., “Spectral graph theory-based recovery method for missing harmonic data,” IEEE Transactions Power Delivery, vol. 37, no. 5, pp. 3688-3697, Oct. 2022. [Baidu Scholar]

W. Yao, Y. Liu, D. Zhou et al., “Impact of GPS signal loss and its mitigation in power system synchronized measurement devices,” IEEE Transactions on Smart Grid, vol. 9, no. 2, pp. 1141-1149, Mar. 2018. [Baidu Scholar]

M. Wu and L. Xie, “Online detection of low-quality synchrophasor measurements: a data driven approach,” IEEE Transactions on Power Systems, vol. 32, no. 4, pp. 2817-2827, Jul. 2017. [Baidu Scholar]

A. Sundararajan, T. Khan, A. Moghadasi et al., “Survey on synchrophasor data quality and cybersecurity challenges, and evaluation of their interdependencies,” Journal of Modern Power Systems and Clean Energy, vol. 7, no. 3, pp. 449-467, May 2019. [Baidu Scholar]

M. Netto, J. Zhao, and L. Mili, “A robust extended Kalman filter for power system dynamic state estimation using PMU measurements,” in Proceedings of IEEE PES General Meeting, Boston, USA, Jul. 2016, pp. 1-5. [Baidu Scholar]

J. Zhao, G. Zhang, K. Das et al., “Power system real-time monitoring by using PMU-based robust state estimation method,” IEEE Transactions on Smart Grid, vol. 7, no. 1, pp. 300-309, Jan. 2016. [Baidu Scholar]

Q. Cui and Y. Weng, “Enhance high impedance fault detection and location accuracy via μ-PMUs,” IEEE Transactions on Smart Grid, vol. 7, no. 1, pp. 797-809, Jan. 2020. [Baidu Scholar]

K. D. Jones, A. Pal, and J. S. Thorp, “Methodology for performing synchrophasor data conditioning and validation,” IEEE Transactions on Power Systems, vol. 30, no. 3, pp. 1121-1130, May 2015. [Baidu Scholar]

C. Huang, F. Li, L. Zhan et al., “Data quality issues for synchrophasor applications Part II: problem formulation and potential solutions,” Journal of Modern Power Systems and Clean Energy, vol. 4, no. 3, pp. 353-361, Jul. 2016. [Baidu Scholar]

Z. Yang, H. Liu, T. Bi et al., “An adaptive PMU missing data recovery method,” International Journal of Electrical Power & Energy Systems, vol. 116, p. 105577, Mar. 2020. [Baidu Scholar]

J. Yu, A. Y. S. Lam, D. J. Hill et al., “Delay aware power system synchrophasor recovery and prediction framework,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 3732-3742, Jul. 2019. [Baidu Scholar]

Y. Qin and J. G. Zhu, “TDMR with machine learning data detection channel,” IEEE Transactions on Magnetics, vol. 58, no. 2, pp. 1-5, Feb. 2022. [Baidu Scholar]

J. C. Fan and T. Chow, “Deep learning based matrix completion,” Neurocomputing, vol. 266, pp. 540-549, Nov. 2017. [Baidu Scholar]

C. Genes, I. Esnaola, S. M. Perlaza et al., “Robust recovery of missing data in electricity distribution systems,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 4057-4067, Jul. 2019. [Baidu Scholar]

X. Wang, M. Wang, and Y. Gu, “A distributed tracking algorithm for reconstruction of graph signals,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 728-740, Jun. 2015. [Baidu Scholar]

G. Lee, S. H. Kim, D. I. Kim et al., “Multi-channel recovery for distributed quality management of synchrophasor data,” IEEE Transactions on Power Systems, vol. 37, no. 3, pp. 2384-2396, May 2022. [Baidu Scholar]

T. Wu, Y. Zhang, Y. Liu et al., “Missing data recovery in large power systems using network embedding,” IEEE Transactions on Smart Grid, vol. 12, no. 1, pp. 680-691, Jan. 2021. [Baidu Scholar]

J. Yu, D. J. Hill, V. Li et al., “Synchrophasor recovery and prediction: a graph-based deep learning approach,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7348-7359, Oct. 2019. [Baidu Scholar]

C. Ren and Y. Xu, “A fully data-driven method based on generative adversarial networks for power system dynamic security assessment with missing data,” IEEE Transactions on Smart Grid, Vol. 34, no. 6, pp. 5044-505, Nov. 2019. [Baidu Scholar]

T. Wu, W. Xue, H. Wang et al., “Extreme learning machine-based state reconstruction for automatic attack filtering in cyber physical power system,” IEEE Transactions on Industrial Informatics, vol. 17, no. 3, pp. 1892-1904, Mar. 2021. [Baidu Scholar]

J. Miao and K. I. Kou, “Color image recovery using low-rank quaternion matrix completion algorithm,” IEEE Transactions on Image Processing, vol. 31, pp. 190-201, Nov. 2022. [Baidu Scholar]

Z. Zhang, M. Dong, K. Ota et al., “Alleviating new user cold-start in user-based collaborative filtering via bipartite network,” IEEE Transactions on Computational Social Systems, vol. 7, no. 3, pp. 672-685, Jun. 2020. [Baidu Scholar]

J. Yi and W. Xu, “Necessary and sufficient null space condition for nuclear norm minimization in low-rank matrix recovery,” IEEE Transactions on Information Theory, vol. 66, no. 10, pp. 6597-6604, Oct. 2020. [Baidu Scholar]

J. Lin and G. Michailidis, “System identification of high-dimensional linear dynamical systems with serially correlated output noise components,” IEEE Transactions on Signal Processing, vol. 68, pp. 5573-5587, Aug. 2020. [Baidu Scholar]

P. Gao, M. Wang, S. G. Ghiocel et al., “Missing data recovery by exploiting low-dimensionality in power system synchrophasor measurements,” IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1006-1013, Mar. 2016. [Baidu Scholar]

S. Konstantinopoulos, G. M. De Mijolla, J. H. Chow et al., “Synchrophasor missing data recovery via data-driven filtering,” IEEE Transactions on Smart Grid, vol. 11, no. 5, pp. 4321-4330, Sept. 2020. [Baidu Scholar]

K. Mahapatra and N. R. Chaudhuri, “Online robust PCA for malicious attack-resilience in wide-area mode metering application,” IEEE Transactions on Power Systems, vol. 34, no. 4, pp. 2598-2610, Jul. 2019. [Baidu Scholar]

K. Chatterjee, K. Mahapatra, and N. R. Chaudhuri, “ Robust recovery of PMU signals with outlier characterization and stochastic subspace selection,” IEEE Transactions on Smart Grid, vol. 11, no. 4, pp. 3346-3358, Jul. 2020. [Baidu Scholar]

Y. Hao, M. Wang, and J. H. Chow, “Modeless streaming synchrophasor data recovery in nonlinear systems,” IEEE Transactions on Power Systems, vol. 35, no. 2, pp. 1166-1177, Mar. 2020. [Baidu Scholar]

B. Foggo and N. Yu, “Online PMU missing value replacement via event-participation decomposition,” IEEE Transactions on Power Systems, vol. 37, no. 1, pp. 488-496, Jan. 2022. [Baidu Scholar]

E. J. Candes and T. Tao, “The power of convex relaxation: near-optimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053-2080, May 2010. [Baidu Scholar]

M. Liao, D. Shi, Z. Yu et al., “An alternating direction method of multipliers based approach for PMU data recovery,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 4554-4565, Jul. 2019. [Baidu Scholar]

J. Mei, M. Liu, Y. Wang et al., “Learning a Mahalanobis distance-based dynamic time warping measure for multivariate time series classification,” IEEE Transactions on Cybernetics, vol. 46, no. 6, pp. 1363-1374, Jun. 2016. [Baidu Scholar]

D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series,” in Proceedings of KDD Workshop, Seattle, USA, Jul. 1994, pp. 359-370. [Baidu Scholar]

U. V. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395-416, Aug. 2007. [Baidu Scholar]

E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717-772, Apr. 2009. [Baidu Scholar]

K. W. Chen, C. C. Lai, Y. Huang et al., “An adaptive learning method for target tracking across multiple cameras,” in Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, Anchorage, USA, Jun. 2008, pp. 1-8. [Baidu Scholar]

A. Bruckstein, D. Donoho, and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Review, vol. 51, no. 1, pp. 34-81, Feb. 2009. [Baidu Scholar]

P. Jain, R. Meka, and I. S. Dhillon, “Guaranteed rank minimization via singular value projection,” in Proceedings of Advances in Neural Information Processing Systems, Red Hook, USA, Dec. 2010, pp. 937-945. [Baidu Scholar]

E. J. Candes, X. Li, Y. Ma et al., “Robust principal component analysis,” Journal of the ACM, vol. 58, no. 3, pp. 1-37, May 2011. [Baidu Scholar]

Address:No.19 Chengxin Avenue, Jiangning District, Nanjing 211106, China

E-mail: mpce@alljournals.cn

Tel:86-25-81093060

Fax:86-25-81093040

Home

Introduction

Editorial Board

For Author

Call For Papers

APC

Sponsor & Publisher