Abstract
This paper aims at developing a data-driven optimal control strategy for virtual synchronous generator (VSG) in the scenario where no expert knowledge or requirement for system model is available. Firstly, the optimal and adaptive control problem for VSG is transformed into a reinforcement learning task. Specifically, the control variables, i.e., virtual inertia and damping factor, are defined as the actions. Meanwhile, the active power output, angular frequency and its derivative are considered as the observations. Moreover, the reward mechanism is designed based on three preset characteristic functions to quantify the control targets: ① maintaining the deviation of angular frequency within special limits; ② preserving well-damped oscillations for both the angular frequency and active power output; ③ obtaining slow frequency drop in the transient process. Next, to maximize the cumulative rewards, a decentralized deep policy gradient algorithm, which features model-free and faster convergence, is developed and employed to find the optimal control policy. With this effort, a data-driven adaptive VSG controller can be obtained. By using the proposed controller, the inverter-based distributed generator can adaptively adjust its control variables based on current observations to fulfill the expected targets in model-free fashion. Finally, simulation results validate the feasibility and effectiveness of the proposed approach.
THE increasing pressure from environment protection has made it urgent to conduct the research on accommodating high penetration level of renewable energy [
It is notable that the control operation of VSG is executed by software. As a result, the control parameters, i.e., virtual inertia and damping factor, can be set arbitrarily without physical limits. Up to now, a lot of control strategies for VSG have been presented to achieve the desired dynamic performance, which can be roughly classified into two categories, i.e., rule-based approach and optimization-based approach. The rule-based approach determines the control behavior by using the predefined operation rule. For instance, an adaptive-gain inertial control is proposed in [
Recently, there is an increasing interest in investigating the parameter setting for VSG by using optimization-based approach, where the adjustment of parameters is driven by optimal solutions. For example, the stability of a microgrid with multi-VSGs is assessed based on the voltage angle deviations [
Thanks to the rapid evolvement of artificial intelligence technology, the reinforcement learning approaches enable to find the optimal control policy by only using data interaction between agent and unknown environment, which can be considered as a promising approach to deal with the aforementioned challenge. Up to now, a lot of reinforcement learning algorithms have been proposed [
Mainly with the aforementioned inspirations, the paper investigates the optimal and adaptive control problem for VSG in model-free scenario, where a decentralized deep policy gradient (DDPG) algorithm is developed and employed to solve this problem. The DDPG is obtained by using the decentralized stochastic gradient descent approach [
1) The optimal and adaptive control problem for VSG is formulated and transformed into a reinforcement learning task. Therein, the expected performance to achieve multiple control targets for angular frequency and active power regulations are simultaneously considered in the designed optimization target.
2) A data-driven optimal control policy is designed and embedded into the VSG controller based on the DDPG algorithm. It enables the IBDG to adaptively respond to system disturbances and obtain expected performance with the maximum long-term return in model-free fashion.
The remainder of this paper is organized as follows. Section II introduces VSG control, identifies its control variables as well as observation variables, and presents the unknown system dynamics. In Section III, multiple characteristic functions are defined to formulate the expected control targets. Subsequently, the optimal control problem is transformed into a reinforcement leaning task, which is further solved by introducing the DDPG algorithm. Several case studies are provided to verify the effectiveness of the proposed approach in Section IV. Finally, Section V concludes this paper.
A simplified diagram of power system is shown in the upper-right corner of

Fig. 1 Overall structure, control, decision and leaning process.
The emulated swing equation of the VSG controller is adopted as:
(1) |
where is the emulated mechanical power; is the output active power after low-pass filtering; is the nominal system angular frequency; is the virtual angular frequency of the corresponding IBDG; is the angular frequency measured by the phase-locked loop (PLL); and and are the virtual inertia and damping factor, respectively.
According to the system frequency deviation, the governor is implemented to adjust the input power command, i.e., , which adopts the -P droop controller as follows:
(2) |
where and are the reference active power and droop coefficient, respectively. The choice of is determined by standard approach [
Unlike the droop coefficient, the choice of virtual inertia and damping factor is more flexible without special restrictions. Thus, we can adaptively adjust the two controllable parameters over time to obtain the expected performance. Note that increasing or decreasing the control parameters may result in different influences on the dynamic characteristics of the active power output and angular frequency in different system environments.
As a grid-forming converter control, the inertial control performance of a VSG depends on both the control parameter design and power system frequency response . Hence, in order to optimally design the VSG frequency control, the frequency response model of a complex power system should be considered. On one hand, accurate modeling of power system frequency response requires global information on governor data and generator inertial constants from multiple stations, which is difficult to obtain for local converter control design. On the other hand, the conventional power system frequency response model can no longer describe the frequency trajectory of a power system with high penetration level of renewable energy, which suffers more from deteriorated system frequency profile. Various energy sources including wind turbine generators, PV generators, and battery energy storage systems, have modified the electromechanical behavior of the original power system. Therefore, considering these two aspects, the data-driven control strategy is needed to be developed to optimally adjust VSG control design with the absence of power system model.
For each IBDG, there are two control parameters considered to be adjusted at time , which is denoted by at:
(3) |
To show the dynamic performance, each IBDG is equipped with a VSG controller to observe its real-time states of the output active power, angular frequency, and the derivative of angular frequency, i.e., , , and . The set of all observations at time is defined as st:
(4) |
Note that the adaptive parameter adjustment is based on the control policy to be designed and the observed system states . In this paper, a deterministic control policy is defined as the following function, which maps to :
(5) |
The nonlinear state-space equation of the whole system in an implicit form can be written as:
(6) |
where is the vector of all the state variables, e.g., , , output current and voltage of each IBDG, output frequency, active power of each SG, etc.; and is the uncertain disturbance or variable such as the sudden change of active power reference and load demand, etc.
It is worth noting that the studied problem in this paper satisfies the Markov property [
For a reinforcement learning task, three key elements need to be defined, i.e., observation state, action, and reward. In this paper, the observation state and action correspond to and shown in (4) and (5), respectively. As shown in
With regard to the frequency regulation, the occurrence of poorly damped oscillation is not designed. Define as the absolute value for angular frequency deviation and as the preset upper bound of . There are two cases, , and , that need to be considered separately. For the case , although the frequency deviation is within the allowable limits, we expect the frequency deviation to be as small as possible and the corresponding settling time to be as short as possible. To achieve this goal, we can set a small penalty item for to assess the immediate frequency deviation. Moreover, the larger becomes, the bigger the penalty is. For another case , the system undergoes huge security risk. Thus, to reduce the occurrence of this situation, we should add a very big penalty once . Based on the aforementioned discussion, the characteristic function for the deviation of angular frequency is defined as:
(7) |
where and are the small and big penalty coefficients, respectively.
Note that one major functionality of VSG control is to obtain slow electromechanical dynamics like the SG. In other word, a better transient process should contribute to the reduced rate of change of frequency (ROCOF). To this end, the characteristic function for the change rate of angular frequency is defined as:
(8) |
where is a small penalty coefficient.
For the characteristic of active power output, it is also expected to obtain well-damped oscillation. Similar to the functionality of the first part of (7), the characteristic function for the deviation of active power output is defined as:
(9) |
where is the corresponding penalty coefficient; and is the absolute value for the deviation of active power output. Since may change greatly due to the intermittent renewable energy resources, e.g., wind and solar, it is not important to limit the upper bounder of during the transient process. Moreover, the capacity of the inverter is selected so that the headroom is available for necessary inertial support.
According to the expected performance and the characteristic function defined above, the reward at time is denoted by rt as:
(10) |
where , , and are the weight coefficients. By choosing different weight coefficients, different output characteristics can be obtained.
Note that the dynamic performance of the active power and angular frequency regulation is measured by a relatively long time reward. For example, we consider a case where a sudden change in load happens at , resulting in large frequency oscillation. The beginning to the end of the frequency oscillation corresponds to a time interval. Whether the dynamic performance gets better or not depends on cumulative penalties for long time response but not for one moment only. To this end, the return from state is further defined as the cumulative future rewards , whose mathematical expression is given by:
(11) |
where T is the total time; and is the discount factor.
Then, after making an observation and executing an action , the action value function under the control policy is the expected return defined as :
(12) |
where denotes the expected value of . Our objective becomes finding the optimal control policy that maximizes the expected return from the start of the disturbance.
As stated in Section II, both the system observation state and action are continuous. To account for this attribute, the concept of DPG algorithm based on actor-critic architecture is adopted and further extended in this paper. More importantly, we focus on adopting the decentralized stochastic gradient descent approach to replace the stochastic gradient descent approach in the learning process of traditional DPG algorithm, which is further referred to as DDPG algorithm. By using the DDPG algorithm, the global computation process can be divided into individual computation unit, resulting in faster convergence process. It is assumed that there are computation units. The information sharing among the computation units is described by a graph , where is the set of nodes representing the computational units; represents the available communication links; is the associated adjacency matrix, and is the neighbor node of j. It is assumed that graph is undirected and connected. To achieve experience replay, the experiences at each time step will be stored in a data set D, which is accessible to every computation unit.
The overall block diagram exhibiting the realization of the policy updating based on the distributed DDGD algorithm is presented in
(13) |
where is the decay rate.
Define and as the estimated actor network parameters of the
(14) |
(15) |
In this paper, multiple computation units cooperate to train . At each step, to minimize (14), every computation unit samples random mini-batch of experiences from the memory pool D to compute local stochastic gradient denoted by . is updated by applying the chain rule to maximize the expected return. Specifically, the mathematical expression of the action gradient using samples for approximating is given by:
(16) |
where J is the approximate value function. The parameters and are further updated via local computation based on the information of its own and that of the neighbors:
(17) |
(18) |
where and are the learning rates. Finally, we can obtain and by using the averaged value of and for all .
Based on the current action, the VSG controller will change its control parameters. Then, new transition will be generated, which is used to update the parameters and . Correspondingly, the control policy is updated. After that, the one-step learning process is finished. The detailed learning process based on DDPG algorithm to find the optimal control strategy is presented in
(19) |
(20) |
(21) |
(22) |
Remark: Compared with the DPG algorithm, the decentralized stochastic gradient descent approach is embedded into the DDPG algorithm. With this effort, the DDPG algorithm can simultaneously employ multiple computation units to train the neural network parameters as shown in (19)-(22), resulting in faster convergence speed than the traditional DPG algorithm. In this paper, the reinforcement learning task is designed for the VSG controller of individual IBDG. It also means that all those parallel computation units are cooperative to train one VSG controller as shown in
In this section, we focus on verifying the effectiveness and feasibility of the DDPG algorithm with simulations in a modified IEEE 14-bus test system [

Fig. 2 Modified IEEE 14-bus test system.
In this case study, the adopted structures of the actor and critic neural networks are shown in
(23) |

Fig. 3 Structures of critic and actor networks.
Moreover, the fully connected layer multiplies the input by a weight matrix and then adds a bias vector. The scaling layer is used for scaling the input variables. The rest of simulation parameters are listed in

Fig. 4 Cumulative reward for each episode. (a) DDPG algorithm. (b) DPG algorithm.
Next, the traditional DPG algorithm is employed to solve the same problem, which can be seen as a special case of the DDPG algorithm with one computation unit, i.e., . Meanwhile, the decentralized stochastic gradient descent approach is changed into the stochastic gradient descent approach during back propagation. With the same neural network structures and parameters, the episode reward obtained by using the DPG algorithm is shown
In this case study, we aim at verifying the effectiveness of the well-trained VSG controller under load disturbance. At s, a 0.7 MW load disturbance is added in the test system. The simulation results are shown in Figs.

Fig. 5 Frequency response after load disturbance.

Fig. 6 Active power output of IBDG after load disturbance.
In this case study, the focus is on testing the effectiveness of the well-trained VSG controller after the change of active power reference. At s, there is a step change for active power reference from 0.7 p.u. to 0.5 p.u.. The simulation results for the frequency response and active power output of the IBDG are shown in Figs.

Fig. 7 Frequency response after change of power reference.

Fig. 8 Active power output of IBDG after change of power reference.
As observed, both the frequency and active power output gradually converge to a new stable equilibrium with well-damped oscillations, and the system ROCOF is mitigated. Thus, the expected performance targets are fulfilled. This implies that the well-trained VSG controller exhibits better adaptability and works well after the change of power reference.
In this case study, the performance of the well-trained VSG controller obtained from the first case study is further tested in a new IEEE 14-bus test system, which is different from that used in offline training. Specifically, the SG at bus 2 is replaced with an IBDG and the IBDG at bus 14 is disconnected. Referring to the structure of IEEE 14-bus test system, three synchronous condensers are commissioned at bus 3, bus 8, and bus 6, respectively. By replacing the system SG with IBDG and integrating synchronous condensers, the equivalent inertial constant and frequency response model of the system are inevitably changed. At time s, a 0.4 MW load disturbance at bus 4 is added in the test system. The comparative system frequency responses and active power outputs with different converter controls after load disturbance are shown in Figs.

Fig. 9 Frequency responses with different converter controls after load disturbance.

Fig. 10 Active power responses with different converter controls after load disturbance.
Typically, the grid-following converter control approach does not participate in power system frequency regulation, where it simply follows the system frequency through PLL. Both droop converter control and VSG are able to participate in the power system frequency regulation and enhance the system small-signal stability due to their grid-forming nature. Furthermore, the proposed data-driven VSG control is able to better arrest the ROCOF of power system and provide necessary inertial control. Meanwhile, the oscillation of active power output is also well damped. Note that it is impossible for the data-driven VSG controller to be trained in all transient scenarios.
Next, we further test the performance of the proposed VSG controller after fault transient. The system dispatching scenario is the same as that presented in Figs.

Fig. 11 Frequency responses with different converter controls after fault transient.
The simulation results show that the VSG controller also works well in the new test system. However, the better performance cannot always be ensured in any kind of new systems, since it is not trained in the new environment. In practical application, the VSG controller requires re-training if used in different systems.
This paper investigates the adaptive and optimal control problem for VSG. To achieve the expected control performance target for frequency regulation and active power regulation, multiple characteristic functions are defined and further used to form the immediate reward. With this effort, the optimal control problem is finally formulated as a reinforcement learning task. To handle this task, the DDPG algorithm is employed to learn the optimal control policy with the objective of maximum long-term return. The implementation of the DDPG algorithm does not need any expert knowledge and does not rely on the system model. Thus, we can obtain the optimal control policy in a model-free fashion, which is the major advantage compared with the existing optimal control approaches used in VSG. In the future, the voltage stability and further application of the DDPG algorithm will be considered.
REFERENCES
H. Zhang, Y. Li, D. W. Gao et al., “Distributed optimal energy management for energy internet,” IEEE Transactions on Industrial Informatics, vol. 13, no. 6, pp. 3081-3097, Dec. 2017. [Baidu Scholar]
J. Zhou, Y. Xu, and H. Sun, “Distributed power management for networked AC/DC microgrids with unbalanced microgrids,” IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 1655-1667, Mar. 2020. [Baidu Scholar]
Y. Li, H. Zhang, X. Liang et al., “Event-triggered based distributed cooperative energy management for multienergy systems,” IEEE Transactions on Industrial Informatics, vol. 15, no. 14, pp. 2008-2022, Apr. 2019. [Baidu Scholar]
Y. Li, D. W. Gao, W. Gao et al., “Double-mode energy management for multi-energy system via distributed dynamic event-triggered Newton-Raphson algorithm,” IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 5339-5356, Nov. 2020. [Baidu Scholar]
R. Wang, Q. Sun, D. Ma et al., “The small-signal stability analysis of the droop-controlled converter in electromagnetic timescale,” IEEE Transactions on Sustainable Energy, vol. 10, no. 3, pp. 1459-1469, Jul. 2019. [Baidu Scholar]
Z. Yi, Y. Xu, W. Gu et al., “A multi-time-scale economic scheduling strategy for virtual power plant based on deferrable loads aggregation and disaggregation,” IEEE Transactions on Sustainable Energy, vol. 11, no. 3, pp. 1332-1346, Jul. 2020. [Baidu Scholar]
J. Zhou, Y. Xu, H. Sun et al., “Distributed event-triggered [Baidu Scholar]
consensus based current sharing control of DC microgrids considering uncertainties,” IEEE Transactions on Industrial Informatics, vol. 16, no. 12, pp. 7413-7425, Dec. 2020. [Baidu Scholar]
Y. Li, D. W. Gao, W. Gao et al., “A distributed double-Newton descent algorithm for cooperative energy management of multiple energy bodies in energy internet,” IEEE Transactions on Industrial Informatics, doi: 10.1109/TII.2020.3029974 [Baidu Scholar]
Q. Zhong and G. Weiss, “Synchronverters: inverters that mimic synchronous generators,” IEEE Transactions on Industrial Electronics, vol. 58, no. 4, pp. 1259-1267, Apr. 2011. [Baidu Scholar]
Q. Zhong, “Virtual synchronous machines: a unified interface for grid integration,” IEEE Power Electronics Magazine, vol. 3, no. 4, pp. 18-27, Dec. 2016. [Baidu Scholar]
J. Chen and T. O’Donnell, “Parameter constraints for virtual synchronous generator considering stability,” IEEE Transactions on Power Systems, vol. 34, no. 3, pp. 2479-2481, May 2019. [Baidu Scholar]
Z. Yi, Y. Xu, J. Zhou et al., “Bi-level programming for optimal operation of an active distribution network with multiple virtual power plants,” IEEE Transactions on Sustainable Energy, vol. 11, no. 4, pp. 2855-2869, Oct. 2020. [Baidu Scholar]
J. Lee, G. Jang, E. Muljadi et al., “Stable short-term frequency support using adaptive gains for a DFIG-based wind power plant,” IEEE Transactions on Energy Conversion, vol. 31, no. 3, pp. 6289-6297, Sep. 2016. [Baidu Scholar]
D. Li, Q. Zhu, S. Lin et al., “A self-adaptive inertia and damping combination control of VSG to support frequency stability,” IEEE Transactions on Energy Conversion, vol. 32, no. 1, pp. 397-398, Mar. 2017. [Baidu Scholar]
F. Wang, L. Zhang, X. Feng et al., “An adaptive control strategy for virtual synchronous generator,” IEEE Transactions on Industry Applications, vol. 54, no. 5, pp. 5124-5133, Sept. 2018. [Baidu Scholar]
J. Li, B. Wen, and H. Wang, “Adaptive virtual inertia control strategy of VSG for micro-grid based on improved bang-bang control strategy,” IEEE Access, vol. 7, pp. 39509-39514, Mar. 2019. [Baidu Scholar]
H. Wu, X. Ruan, D. Yang et al., “Small-signal modeling and parameters design for virtual synchronous generators,” IEEE Transactions on Industrial Electronics, vol. 63, no. 7, pp. 4292-4303, Jul. 2016. [Baidu Scholar]
M. Li, W. Huang, N. Tai et al., “A dual-adaptivity inertia control strategy for virtual synchronous generator,” IEEE Transactions on Power Systems, vol. 35, no. 1, pp. 594-604, Jan. 2020. [Baidu Scholar]
H. Wu and X. Wang, “A mode-adaptive power-angle control method for transient stability enhancement of virtual synchronous generators,” IEEE Journal of Emerging and Selected Topics in Power Electronics, vol. 8, no. 2, pp. 1034-1049, Jun. 2020. [Baidu Scholar]
J. Alipoor, Y. Miura, and T. Ise, “Stability assessment and optimization methods for microgrid with multiple VSG units,” IEEE Transactions on Smart Grid, vol. 9, no. 2, pp. 1463-1471, Mar. 2018. [Baidu Scholar]
W. Du, Q. Fu, and H. Wang, “Power system small-signal angular stability affected by virtual synchronous generators,” IEEE Transactions on Power Systems, vol. 34, no. 4, pp. 3209-3219, Jul. 2019. [Baidu Scholar]
U. Markovic, Z. Chu, P. Aristidou et al., “Fast frequency control scheme through adaptive virtual inertia emulation,” in Proceedings of 2018 IEEE Innovative Smart Grid Technologies–Asia, Singapore, Singapore, Mar. 2018, pp. 787-792. [Baidu Scholar]
U. Markovic, Z. Chu, P. Aristidou et al., “LQR-based adaptive virtual synchronous machine for power systems with high inverter penetration,” IEEE Transactions on Sustainable Energy, vol. 10, no. 3, pp. 1501-1511, Jul. 2019. [Baidu Scholar]
W. Cjch and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3-4, pp. 279-292, May 1992. [Baidu Scholar]
D. Silver, G. Lever, N. Heess et al., “Deterministic policy gradient algorithms,” in Proceedings of the 31st International Conference on Machine Learning, Beijing, China, Jun. 2014, pp. 387-395. [Baidu Scholar]
V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015. [Baidu Scholar]
T. P. Lillicrap, J. J. Hunt, A. Pritzel et al. (2019, Jul.). Continuous control with deep reinforcement learning. [Online]. Available: https://arxiv.org/abs/1509.02971v2 [Baidu Scholar]
J. Schulman, P. Moritz, S. Levine et al. (2018, Oct.). High-dimensional continuous control using generalized advantage estimation. [Online]. Available: https://arxiv.org/abs/1506.02438 [Baidu Scholar]
Y. Li. (2018, Nov.). Deep reinforcement learning: an overview. [Online]. Available: https://arxiv.org/abs/1701.07274 [Baidu Scholar]
X. Lian, W. Zhang, C. Zhang et al. (2018, Sept.). Asynchronous decentralized parallel stochastic gradient descent. [Online]. Available: https://arxiv.org/abs/1710.06952 [Baidu Scholar]
W. Du, Z. Chen, K. P. Schneider et al., “A comparative study of two widely used grid-forming droop controls on microgrid small signal stability,” IEEE Journal of Emerging and Selected Topics in Power Electronics, vol. 8, no. 2, pp. 963-975, Jun. 2020. [Baidu Scholar]
M. I. Jordan and T. M. Mitchell, “Machine learning: trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255-260, Jul. 2015. [Baidu Scholar]
R. Wang, Q. Sun, P. Zhang et al., “Reduced-order transfer function model of the droop-controlled inverter via Jordan continued-fraction expansion,” IEEE Transactions on Energy Conversion, vol. 35, no. 3, pp. 1585-1595, Sept. 2020. [Baidu Scholar]
W. Yan, L. Cheng, S. Yan et al., “Enabling and evaluation of inertial control for PMSG-WTG using synchronverter with multiple virtual rotating masses in microgrid,” IEEE Transactions on Sustainable Energy, vol. 11, no. 2, pp. 1078-1088, Apr. 2020. [Baidu Scholar]
P. Wawrzynski, “Control policy with autocorrelated noise in reinforcement learning for robotics,” International Journal of Machine Learning and Computing, vol. 5, no. 2, pp. 91-95, Apr. 2015. [Baidu Scholar]