Abstract
In this study, a machine learning based method is proposed for creating synthetic eventful phasor measurement unit (PMU) data under time-varying load conditions. The proposed method leverages generative adversarial networks to create quasi-steady states for the power system under slowly-varying load conditions and incorporates a framework of neural ordinary differential equations (ODEs) to capture the transient behaviors of the system during voltage oscillation events. A numerical example of a large power grid suggests that this method can create realistic synthetic eventful PMU voltage measurements based on the associated real PMU data without any knowledge of the underlying nonlinear dynamic equations. The results demonstrate that the synthetic voltage measurements have the key characteristics of real system behavior on distinct time scales.
OVER the past decade, thousands of phasor measurement units (PMUs) have been deployed in backbone transmission systems in North America and abroad. This enables improved monitoring and control of the power system dynamics at considerably higher resolutions than previously possible. Transient dynamic data recorded by PMUs are of particular value to the research community for distinct research interests such as real-time monitoring, control, and protection. Although machine learning (ML) based methods have been proposed for a wide range of tasks such as those in [
Therefore, it is critical for public researchers to create a massive amount of realistic eventful PMU data to train, test, and calibrate data-driven methods that can be applied to real cases. Although researchers have recently contributed to the creation of datasets based on large-scale realistic synthetic grid models [
To address these challenges, we propose a method for generating eventful PMU data based on limited real data that leverages generative adversarial networks (GANs) to create quasi-steady states for the power system under time-varying load conditions and utilizes neural ordinary differential equations (ODEs) to capture the transient behaviors of the power system during voltage oscillation events. This method is potentially generalizable to other real power systems. We separately validate the fidelity of the synthetic load and voltage oscillation data from various perspectives.
The contributions of this paper are summarized as follows.
1) Generation of data-driven eventful PMU measurements. The proposed method for generating eventful PMU voltage measurements can create realistic-looking PMU streams that capture the patterns of load changes and system oscillations over distinct time scales, of which the fidelity and scalability are demonstrated for a large-scale real dataset.
2) Efficient data generation algorithm. The proposed method achieves an efficient learning process by decoupling distinct time scales separately and leveraging the low-rank property of high-dimensional datasets.
The remainder of this paper is organized as follows. Section II introduces the problem formulation for the task of creating synthetic PMU data using ML. Section III briefly reviews the basic concepts of the GAN and neural ODE models adopted in this study. Section IV proposes a method for creating eventful PMU data under time-varying load conditions. Section V presents a case study using a real dataset. Finally, Section VI draws conclusions and plans for future work.
In this section, we present mathematical formulations for the task of generating eventful PMU data. Here, we only have access to the power flow model of a large-scale real system and have no knowledge of the dynamic model. We assume that the created multi-time-scale PMU measurements are a linear combination of the steady-state voltage and voltage oscillation, which are determined by the pattern of changes in the load and the nature of the system dynamics, respectively. Therefore, the task is separated into two subtasks: ① the generation of steady-state voltage measurements; and ② the generation of voltage oscillation measurements. We further discuss the challenges and propose the corresponding instructions for the method design.
Consider a set of historical PMU measurements including voltage and current measurements. We denote the voltage measurement matrix as:
(1) |
where is the voltage at PMU at time , and is the sampling period; is the number of PMUs; and is the number of time steps.
We assume that the voltage measurements are collected when the system is in a quasi-steady state. The task for generating steady-state voltage measurements aims to develop a data creation algorithm using the real samples such that the synthetic multichannel time-series data , containing measurement channels over arbitrary time steps, exhibit similar properties as those of the historical data, such as the slowly-varying pattern attributed to changes in the load.
We denote the voltage oscillation measurement matrix as with the same definition, which is collected under eventful system conditions. We assume that can be expressed by a linear combination of the equilibrium voltage and voltage oscillation .
(2) |
The task for generating voltage oscillation measurements aims to learn the pattern of the voltage oscillation using real samples such that the created synthetic time-series data , containing measurement channels over arbitrary time steps, exhibit realistic properties such as the decaying periodic oscillation determined by the dynamic characteristics of the system and the low rank due to the high coherency throughout the system.
Although we separate the task for generating PMU measurement data into two subtasks, two key challenges still need to be resolved for ML-based synthetic PMU data generation approaches: ① enabling an ML-based data generation method to efficiently learn from a high-dimensional dataset; and ② guaranteeing that the created PMU data are meaningful in terms of complying with physical laws. The remainder of this subsection discusses our method for addressing these challenges and describes the resulting algorithm design.
The dimensions of the time-series data and are nontrivial in the context of PMU data generation. A high dimensionality may render the training process intractable and degrade the performance of the generative algorithms. Therefore, the proposed method should address these challenges from both temporal and spatial perspectives. First, the proposed method can decompose a long time series into multiple time resolutions and separately learn the temporal correlations of distinct time scales. Second, the proposed method can reduce the order of high-dimensional measurements by utilizing existing low-rank characteristics, which are attributed to a strong spatial correlation.
As real PMU measurements comply with physical laws, data fidelity, one of the main criteria for synthetic data quality, is another challenge. It requires Kirchhoff’s laws to be satisfied by the synthetic data at each snapshot and that the evolving synthetic time series follow the characteristics of the dynamics of the power system. For the first requirement, the proposed method can create synthetic load profiles and calculate synthetic voltage measurements via power flow simulation to automatically guarantee Kirchhoff’s laws. For the second requirement, the method can learn fast oscillation patterns using an ML model that embeds the ODE format.
The GAN model, first proposed in [
The two key models of a GAN model, the generative model (generator) and discriminate model (discriminator) , are implemented by neural networks, which are iteratively updated by optimizing the objective function as:
(3) |
where and are the real data samples and random noise sampled from a predefined distribution, respectively; and is an expectation function.
Additionally, another variant of a GAN model [
(4) |
where is a label representing the category of interests.
The neural ODE model [
(5) |
where is the estimated state at time ; is the result of measurements at time ; and is the function representing a neural network parameterized by , which indicates how the measurements evolve along the timeline.
We assume that the multi-time-scale eventful PMU measurements are a linear combination of steady-state voltage measurements and voltage oscillation measurements, which are determined by the pattern of changes in the slowly-varying load and the nature of the fast-varying system dynamics, respectively. With this assumption, we separate the eventful PMU data generation task into two subtasks. The first aims to create realistic time-varying load profiles and then estimate the steady-state voltage measurements via a power flow simulation based on the obtained system model. The second subtask aims to synthesize realistic voltage oscillation profiles that follow the periodic patterns of the real transient dynamics of the system. With such an instructive principle, a novel algorithm incorporating GAN [

Fig. 1 Proposed method incorporating GAN and neural ODE models. (a) Training process of GAN model for synthetic load data. (b) Training process of neural ODE model for synthetic voltage oscillation measurements. (c) Generation of entire synthetic voltage measurements by trained GAN and neural ODE models.
In
The remainder of this section introduces the detailed algorithms for ① the generation of steady-state voltage measurements that consists of GAN-based load profile generation and simulation-based estimation of the steady-state voltage measurements; and ② the generation of voltage oscillation measurements that leverages neural-ODE-based time-series learning.
The task for generating steady-state voltage measurements consists of two steps: ① the generation of a GAN-based multiresolution load profile [
We use the algorithm for generating a multiresolution bus-level load profile proposed in [
1) Compute the power consumption of different load buses using PMU voltage and current measurements.
2) Down-sample the load data into multiple time scales and resolutions, including hour-long profiles at two samples per minute, week-long profiles at one sample per hour, and year-long profiles at one sample per week.
3) Train a generative model for the load profiles at each time scale and resolution, which is implemented by the conditional GAN in
Algorithm 1 : algorithm for generating GAN-based bus-level load profile |
---|
Require: historical load data at a certain time scale , associated labels , random noise data , learning rate , batch size , initial parameter for the model , and initial parameter for the model |
while and not converged |
Sample batch from and |
Sample batch from and |
#Update the model using gradient descent |
|
|
#Update the model using gradient descent |
|
|
end while |
Using the power flow simulation model accompanied by the dataset, we estimate the steady-state voltage measurements under certain load conditions by performing a power flow simulation at every time step. Given one synthetic load profile, the power flow simulation is repeatedly performed at each time step such that all system loads and the generation are scaled by the per-unit value of the load profile at the snapshot. Here, we admit that generation dispatch under different load conditions is simple without incorporating factors such as power markets and planned outages, which require further investigation but are outside the scope of this paper.
In summary, we generate steady-state voltage measurements in two steps. By leveraging a model that generates well-trained load profiles, we first generate a massive number of realistic load profiles during a certain time period that have a similar pattern but exhibit diversity. By assigning synthetic load profiles to the load buses in the simulation model and proportionally scaling the generation dispatch, we obtain a massive number of steady-state voltage measurements at different time scales and resolutions via power flow simulation.
Inspired by the data-driven system identification method SINDy [

Fig. 2 Diagram of training neural ODE model for generating voltage oscillation measurements.
The details are summarized in the following steps and formally presented in
Algorithm 2 : algorithm for generating voltage oscillation measurements |
---|
Require: eventful voltage measurements , reduced PCA approximation rank , batch size , learning rate , loss function , and initial parameter for model |
#Decomposition |
|
#Dimension reduction |
|
|
#Train time-series learning model |
while not converged |
Sample batch that are segments from and |
Generate synthetic data |
, |
|
|
end while |
To decompose the original voltage measurements into the equilibrium voltage and voltage oscillation , the moving average method is first used to process the original voltage measurements, where the average voltage calculated in the moving window is defined as the equilibrium voltage and the residual is defined as the voltage oscillation.
To implement feature extraction, PCA method is used to process the voltage oscillation and equilibrium voltage to obtain the reduced-order features and . Here, the underlying assumption is that the characteristics of and have a one-to-one correspondence with the original measurements and . The PCA method uses the parameter to determine the number of principal components to be retained, which also indicates the reduced rank of the approximated data after reconstruction. We select the principal components with the highest variances as the feature time series such that these components can explain at least 95% of the variability in the original measurements.
We assume that the equilibrium voltage is uniquely determined by the load conditions. The task of generating voltage oscillations under time-varying load conditions is thus equivalent to generating voltage oscillations when the equilibrium voltage varies. Therefore, we train a neural ODE model to learn the oscillation pattern of the low-dimensional features at the corresponding equilibrium .
In summary, given the voltage oscillation measurements calculated by the moving average method, we first perform order reduction to improve the computational efficiency and reduce the model complexity and then leverage the neural ODE model to learn the underlying dynamic behavior of the extracted feature time series. As the synthetic steady-state voltage measurements are within the varying equilibrium, we can create a massive number of voltage oscillations using the well-trained model , of which the data creation process also requires the PCA mapping matrix for transformation.
In this section, we demonstrate the proposed method using a large-scale real PMU dataset. We first show that the generated load profiles and steady-state voltage measurements are visually indistinguishable from the real samples and exhibit the same statistical properties. We also show the fidelity of the generated voltage oscillation measurements using a modal analysis.
In this study, we use a large-scale real PMU dataset obtained from a major United States electricity utility company. This dataset was collected at a rate of 30 samples per second for three consecutive years from approximately 400 PMUs throughout the utility’s territory and mainly contains voltage and current measurements. Furthermore, we have access to a large-scale power simulation model of the relevant network that contains more than 30000 buses and covers the utility’s territory. The dataset provides the unique identifiers of the PMU buses that are consistent with the simulation model, thereby enabling the localization of the PMUs in the simulation model.
On the basis of the system topology and placement of the PMUs, we identify 12 fully monitored load buses, of which the load demand can be directly calculated by the positive-sequence complex current and voltage measurements. The load profiles reflect the periodic patterns of load changes at different time scales. The dataset also contains seven system-wide voltage oscillation events in the records, where only one weakly damped event lasted for approximately 2 hours and the others quickly vanished. The weakly damped event shows the shifting dominant modes of the system oscillation.
In the remainder of this section, we demonstrate the proposed method by generating voltage equilibrium profiles based on real load profiles and creating voltage oscillation profiles based on quickly and weakly damped events.
The details of the data processing and model training for the two subtasks are introduced below. The configuration of the neural network model and the computational environment are presented in Appendix A.
Following
To separate the equilibrium voltage and voltage oscillation profiles, the moving average method is used to process the voltage measurements of each voltage oscillation event in the dataset, where the size of the moving window is set to be 10 s. The order of the processed high-dimensional voltage measurements is reduced to 4 by PCA, as these 4 dominant components can explain more than 95% of the variability. We train model on low-rank features, as instructed in
The GAN model for generating synthetic load profiles is trained with the power measurements at the fully monitored load buses in the real dataset as the training data, with the aim of having a realistic and diverse pattern. The fidelity is validated by comparing its statistical characteristics with those of real profiles.
The generative models for the time-series load data are validated with statistical comparisons. The following two metrics are used to verify that the synthetic data capture the characteristics of the real data.
1) Wasserstein distance. The goal of model is to learn a function that maps the known noise distribution to the distribution of real data. Training is successful when the distribution of the generated data matches that of real data. The Wasserstein distance is a measure of the distance between two distributions, and it can be used to quantitatively assess the closeness of the distributions of the generated and real data.
2) Power spectral density (PSD). An important characteristic of time-series load data is periodicity. Because loads are tied to the routines and behaviors of people, they have different recurring patterns. One approach to identify these periodicities is to examine the PSD of time-series data.

Fig. 3 Comparison of PSDs of real and synthetic load profiles.
In sequence, we create 1000 1-hour-long minute-level (per-unit) load profiles that represent diverse load changes over different time periods such as daytime or nighttime, weekday or weekend, and seasons. Given one per-unit load profile as an input, we scale all loads and generation in the simulation model and solve for the power flow at every time step. Finally, we obtain the steady-state voltage measurements of 1000 different load conditions by repeating the simulation. To validate the synthetic voltage measurements, we compare the distributions of the real and synthetic 1-hour-long steady-state voltage angle measurements under different load conditions for a PMU, as shown in

Fig. 4 Comparison of distributions of real and synthetic 1-hour-long steady-state voltage angle measurements under different load conditions for a PMU.
The neural ODE model for the voltage oscillation measurements is trained according to the details introduced in Section V-B. To demonstrate the learning capacity, the results for synthetic voltage oscillation data for two events of distinct duration are presented: a 10-second quickly damped oscillation event and a 2-hour weakly damped oscillation event.
We first train the neural ODE model with the voltage measurements in a 10-second-long event as the training dataset. The visual comparison in

Fig. 5 Visual comparison between real and synthetic voltage angle measurements at the same selected buses for a quickly damped event that lasts for only 10 s. (a) Bus A. (b) Bus B. (c) Bus C. (d) Bus D.
We further train and test the proposed method with a 2-hour voltage oscillation event with the same settings as in Section V-B. In contrast to the quickly damped event, this weakly damped event shows more complex system dynamics, in which the voltage measurements have several changing dominant modes over time. Therefore, modal analysis is promising to validate the fidelity of the synthetic voltage oscillation measurements. To this end, the Prony method [
(6) |
where is the energy of mode; is the amplitude; is the window size; is the mode frequency; is the sampling time of sample i; and is the time constant of the mode.
Considering that the total number of modes is large, we select the dominant modes such that the sum of their energies account for 95% of the total energy. A synthetic time series for a certain PMU is realistic if and only if its synthetic dominant mode is close to a real one .
We repeatedly perform random generation times, as shown in
(7) |
where is an indicator that shows whether the sample is realistic according to the criteria in (8).
(8) |
The statistics of the modal analysis of the synthetic voltage oscillation measurements for a weakly damped event that lasts for 2 hours are shown in

Fig. 6 Statistics of modal analysis of synthetic voltage oscillation measurements for a weakly damped event that lasts for 2 hours.
In summary, we demonstrate that the synthetic load profiles and steady-state PMU voltage measurements have realistic statistical properties and confirm that the generated voltage oscillation data have realistic oscillation modes. By combining Algorithms
In this study, we propose an ML-based method to create synthetic eventful PMU data under time-varying load conditions. Our method uses a GAN to generate load data and incorporates neural ODEs to capture the transient behavior of oscillation events that occur in a system. We utilize this method to synthetically create a massive amount of eventful PMU data under the generated time-varying load conditions and confirm that the synthetic data exhibit realistic characteristics across multiple time scales from statistical and modal analysis perspectives. The generated realistic synthetic data have the potential to alleviate the lack of real eventful PMU data and can be potentially used for the training, testing, and calibration of subsequent data-driven methods.
In general, the proposed method is feasible as long as the number of synthetic variables is less than the number of independent variables in the algebraic equations that are mainly derived from Kirchhoff’s laws. Future research will extend this study to synthesize arbitrary numbers of variables with conserved algebraic relationships.
Appendix
Table AI presents the model structure of the neural networks, where models and account for the generation of synthetic load profiles based on a GAN (the neural network models are implemented by TensorFlow-Keras), whereas model is used to learn the voltage oscillation pattern (the neural network model is implemented by TensorFlow). MLP denotes a multiplayer perceptron followed by the number of neurons, and Conv denotes a convolutional layer followed by the number of filters. The computational environment consists of an Intel Core i7-9700 central processing unit (CPU), 32 GB of memory, and an NVIDIA RTX 2060 graphics processing unit (GPU).
Layer | Model | Model | Model |
---|---|---|---|
Input | 25 | 900 | 8 |
Layer 1 | MLP, 64 | Conv | MLP, 100 |
Layer 2 | MLP, 256 | MLP, 128 | MLP, 100 |
Layer 3 | MLP, 900 | MLP, 32 | MLP, 4 |
Layer 4 | Conv, 4 | MLP, 1 | |
Layer 5 | Conv, 1 |
References
L. Xie, Y. Chen, and P. Kumar, “Dimensionality reduction of synchrophasor data for early event detection: linearized analysis,” IEEE Transactions on Power Systems, vol. 29, no. 6, pp. 2784-2794, Apr. 2014. [Baidu Scholar]
R. E. Helou, D. Kalathil, and L. Xie. (2020, Aug.). Fully decentralized reinforcement learning-based control of photovoltaics in distribution grids for joint provision of real and reactive power. [Online]. Available: http://arxiv.org/abs/2008.1231 [Baidu Scholar]
D. Wu, X. Zheng, D. Kalathil et al., “Nested reinforcement learning based control for protective relays in power distribution systems,” in Proceedings of 2019 IEEE 58th Conference on Decision and Control (CDC), Nice, France, Dec. 2019, pp. 1925-1930. [Baidu Scholar]
T. Huang, N. M. Freris, P. Kumar et al., “A synchrophasor data-driven method for forced oscillation localization under resonance conditions,” IEEE Transactions on Power Systems, vol. 35, no. 5, pp. 3927-3939, Mar. 2020. [Baidu Scholar]
A. B. Birchfield, T. Xu, K. M. Gegner et al., “Grid structural characteristics as validation criteria for synthetic networks,” IEEE Transactions on Power Systems, vol. 32, no. 4, pp. 3258-3265, Oct. 2016. [Baidu Scholar]
Y. Xu, N. Myhrvold, D. Sivam et al., “US test system with high spatial and temporal resolution for renewable integration studies,” in Proceedings of 2020 IEEE PES General Meeting, Montreal, Canada, Aug. 2020, pp. 1-5. [Baidu Scholar]
Breakthrough Energy Sciences. (2021, Aug.). A 2030 United States macro grid: unlocking geographical diversity to accomplish clean energy goals. [Online]. Available: https://science.breakthroughenergy.org/publications/MacroGridReport.pdf [Baidu Scholar]
D. Wu, X. Zheng, Y. Xu et al. (2021, Apr.). An open-source model for simulation and corrective measure assessment of the 2021 texas power outage. [Online]. Available: https://arxiv.org/abs/2104.04146v1 [Baidu Scholar]
A. Pinceti, L. Sankar, and O. Kosut. (2021, Jul.). Generation of synthetic multi-resolution time series load data. [Online]. Available: https://arxiv.org/abs/2107.03547v1 [Baidu Scholar]
A. Pinceti, L. Sankar, and O. Kosut. (2021, Jul.). Synthetic time-series load data via conditional generative adversarial networks. [Online]. Available: https://arxiv.org/abs/2107.03545 [Baidu Scholar]
Y. Chen, Y. Wang, D. Kirschen et al., “Model-free renewable scenario generation using generative adversarial networks,” IEEE Transactions on Power Systems, vol. 33, no. 3, pp. 3265-3275, Jan. 2018. [Baidu Scholar]
X. Zheng, B. Wang, and L. Xie, “Synthetic dynamic PMU data generation: a generative adversarial network approach,” in Proceedings of 2019 International Conference on Smart Grid Synchronized Measurements and Analytics (SGSMA), College Station, USA, May 2019, pp. 1-6. [Baidu Scholar]
X. Zheng, B. Wang, D. Kalathil et al., “Generative adversarial networks-based synthetic PMU data creation for improved event classification,” IEEE Open Access Journal of Power and Energy, vol. 8, pp. 68-76, Feb. 2021. [Baidu Scholar]
X. Zheng, N. Xu, L. Trinh et al. (2021, Oct.). PSML: a multi-scale time-series dataset for machine learning in decarbonized energy grids. [Online]. Available: https://arxiv.org/abs/2110.06324 [Baidu Scholar]
C. Esteban, S. L. Hyland, and G. Rätsch. (2017, Jun.). Real-valued (medical) time series generation with recurrent conditional GANs. [Online]. Available: https://arxiv.org/abs/1706.02633 [Baidu Scholar]
T. Xu, L. K. Wenliang, M. Munn et al. (2020, Jun.). COT-GAN: generating sequential data via causal optimal transport. [Online]. Available: https://arxiv.org/abs/2006.08571 [Baidu Scholar]
J. Yoon, D. Jarrett, and M. van der Schaar, “Time-series generative adversarial networks,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, Dec. 2019, pp. 5508-5518. [Baidu Scholar]
Z. Lin, A. Jain, C. Wang et al., “Using GANs for sharing networked time series data: challenges, initial promise, and open questions,” in Proceedings of the ACM Internet Measurement Conference, Pittsburgh, USA, Oct. 2020, pp. 464-483. [Baidu Scholar]
I. Goodfellow, J. Pouget-Abadie, M. Mirza et al., “Generative adversarial nets,” in Proceedings of the 28th International Conference on Advances in Neural Information Processing Systems, Montreal, Canada, Dec. 2014, pp. 2672-2680. [Baidu Scholar]
L.-C. Yang, S.-Y. Chou, and Y.-H. Yang. (2017, Mar.). MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. [Online]. Available: https://arxiv.org/abs/1703.10847 [Baidu Scholar]
L. Yu, W. Zhang, J. Wang et al., “SeqGAN: sequence generative adversarial nets with policy gradient,” in Proceedings of Thirty-first AAAI Conference on Artificial Intelligence, San Francisco, USA, Feb. 2017, pp. 2852-2858. [Baidu Scholar]
R. Fu, J. Chen, S. Zeng et al. (2019, Apr.). Time series simulation by conditional generative adversarial net. [Online]. Available: https://arxiv.org/abs/1904.11419v1 [Baidu Scholar]
M. Mirza and S. Osindero. (2014, Nov.). Conditional generative adversarial nets. [Online]. Available: https://arxiv.org/abs/1411.1784 [Baidu Scholar]
R. Chen, Y. Rubanova, J. Bettencourt et al., “Neural ordinary differential equations,” in Proceedings of the 32nd International Conference on Advances in Neural Information Processing Systems, Montreal, Canada, Dec. 2018, pp. 6571-6583. [Baidu Scholar]
S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,” Proceedings of the National Academy of Sciences, vol. 113, no. 15, pp. 3932-3937, Apr. 2016. [Baidu Scholar]
B. L. Thayer, Z. Mao, Y. Liu et al., “Easy SimAuto (ESA): a python package that simplifies interacting with PowerWorld simulator,” Journal of Open Source Software, vol. 5, no. 50, p. 2289, Jun. 2020. [Baidu Scholar]
P. J. Schmid, “Dynamic mode decomposition of numerical and experimental data,” Journal of Fluid Mechanics, vol. 656, pp. 5-28, Aug. 2010. [Baidu Scholar]