Abstract
This study investigates a safe reinforcement learning algorithm for grid-forming (GFM) inverter based frequency regulation. To guarantee the stability of the inverter-based resource (IBR) system under the learned control policy, a model-based reinforcement learning (MBRL) algorithm is combined with Lyapunov approach, which determines the safe region of states and actions. To obtain near optimal control policy, the control performance is safely improved by approximate dynamic programming (ADP) using data sampled from the region of attraction (ROA). Moreover, to enhance the control robustness against parameter uncertainty in the inverter, a Gaussian process (GP) model is adopted by the proposed algorithm to effectively learn system dynamics from measurements. Numerical simulations validate the effectiveness of the proposed algorithm.
POWER system frequency control is critical for maintaining grid stability when imbalance between generation and load occurs. As the penetration of inverter-based resources (IBRs) such as renewable energy and battery storage continues to increase, modern power systems are facing significant challenges due to reduced mechanical inertia and increased disturbances. Therefore, power system stability control has recently spurred much interest from both academia and industry [
Various control methods have been proposed for IBRs to provide frequency regulation services [
To deal with the challenges, various advanced frequency controllers are developed recently [
The primary contribution of this work is the development of a safe model-based reinforcement learning (MBRL) algorithm for grid-forming (GFM) inverter based frequency regulation. Inspired by [
This paper is organized as follows. Section II formulates the GFM inverter based frequency regulation problem. In Section III, the GFM inverter based frequency regulation via the safe MBRL controller is designed. The numerical simulations are presented in Section IV. Section V concludes the paper.
The diagram of GFM inverter based primary frequency control is depicted in

Fig. 1 Diagram of GFM inverter based primary frequency control.
We assume the bus voltage magnitudes to be 1 p.u., and neglect the reactive power flows. The frequency dynamics of VSG-based power control loop of the GFM inverter can be given by the swing equations [
(1) |
where is the control action function of the battery energy storage systems (BESSs), which denotes the active charging power (i.e., in
(2) |
where and are the susceptance and conductance components of the element of the admittance matrix , respectively; and are the voltage magnitude and phase of node i, respectively; and is the voltage phase of the main grid. Note that lossy power flow model is adopted in (2).
We aim to propose a control policy to improve the dynamic performance of VSG after disturbances with the minimal cost. The optimal control problem can be formulated as:
(3) |
where is the state of the VSG; and are the positive definite matrices; and are the lower and upper limitations of the control actions , respectively, which are determined by the maximum charging and discharging capacities of BESSs; and is the vector of . As shown in
The primary objective of the controller is to safely learn about the frequency dynamics of VSG from measurements and adapt the control policy for optimal performance, without encountering unstable system states. This implies that the adjustment of the control policy throughout the learning process must be performed in such a way that the system state remains within the ROA. The parameter uncertainty and nonlinearity of the AC power flow, as described in (2), make the design of controllers for (1) challenging. The proposed controller for GFM inverter based frequency regulation is depicted in

Fig. 2 Proposed algorithm for GFM inverter based frequency regulation.
By discretizing the dynamic model shown in (1) and (2), the dynamics can be reformulated as the following nonlinear discrete-time system:
(4a) |
where is the step size for the discrete simulation; and the subscript denotes the discrete time index.
(4b) |
where denotes the true dynamics of the VSG, comprising two components: a known model represented by , and a priori unknown model errors denoted by . In inverters, the parameter (e.g., and in (1)) can undergo dynamic changes, which introduces uncertainties. To ensure the stability and predictability of the system, we assume the dynamic of the VSG is -Lipschitz continuous, which means that the dynamic does not change too rapidly between any two points in its domain. This assumption holds true for the VSG system as described in (1), with a supporting proof provided in Appendix A.
To enable safe learning, we adopt GP model to learn a reliable statistical system model described by (1) and (2). GP model is a powerful method in machine learning and statistical modeling. GP consists of random variables, and any finite group of them follows a joint Gaussian distribution. In system modeling, GP is often used to capture complex relationships in data [
After learning about the inverter dynamics from measurements, the goal is to safely adapt the optimal control policy without leading to unstable system conditions. The safety of the controller is characterized by the safe region of states and actions, commonly referred to as the ROA [
Theorem 1 If for all within the level set ( is the state space, ), then is an ROA, so that implies for all and .
The theorem indicates that when a fixed policy is employed, applying the dynamics to the state consistently results in decreasing values in the Lyapunov function. Consequently, the system state is assured to converge inevitably towards the equilibrium point. Further details can be found in [
The dynamics of VSG are uncertain, leading to uncertainty in . This introduces an additional challenge in determining using the above theorem. According to the GP model, is contained in with probability higher than . is the Lipschitz constant of the Lyapunov function . To ensure safe state-actions are always safe, we define the upper bound of as , where . Therefore, in accordance with the aforementioned theorem and considering , the system stability in (1) is assured if is satisfied for all . Nevertheless, determining becomes impractical when attempting to identify all states on the continuous domain that satisfy . To address this challenge, we can discretize the state space into cells denoted as , such that . In this context, represents the cell with the minimal distance to . Considering the system dynamic is -Lipschitz continuous and the control policy is -Lipschitz continuous, we can get the following theorem [
Theorem 2 If holds for all and for some , then holds for all with probability at least , where . And is an ROA for the dynamics under policy .
In this way, under a fixed policy , the ROA can be identified within the discretized state space as follows:
(5) |
It should be noted that the ROA is dependent on the policy. To get the largest possible ROA, we can optimize the policy using (6). The corresponding optimal policy for is .
(6) |
where is the set of safe policies.
The ROA optimized by (6) is contained in true ROA with probability at least for all . Precisely solving (6) is intractable, thus we adopt the ADP [
(7) |
where is the policy with parameters ; is the discount factor; is a Lagrange multiplier for the safety constraint; is the cost function; and is the value function of the Bellman’s equation, which is approximated using piecewise linear approximations [
For the proposed algorithm, a safe initial point is essential for initiating the learning process. Consequently, an initial policy is required, ensuring the asymptotic stability of the system origin in (1) within a confined set of states. In this work, we utilize a linear-quadratic regulator (LQR) controller as our initial policy. In addition, to expand the ROA throughout the learning process, the agent strategically explores the state-action pairs for which the system dynamics are most uncertain. To achieve this, we meticulously choose measurement data points based on:
(8) |
where is the lower bound of .
The proposed algorithm is summarized in
Algorithm 1 : safe MBRL algorithm for GFM inverter based frequency regulation |
---|
Load the power system simulation environment; initialize the LQR-based initial policy; initialize the parameters of the policy ; initialize the GP model for VSG dynamics and ADP value functions; set the total number of episodes ; and set the training step |
Get the initial safe set based on the initial LQR controller and the corresponding initial Lyapunov function |
for do |
for do |
Based on (8), select a new safe sample of the state-action pair () |
Update the GP model for VSG dynamics based on the actively selected new data point |
Optimize policy by solving (7) using the SGD-based optimization method |
Update the Lyapunov function (i.e., value function ) |
Using the updated policy, calculate in (6) to ensure that , holds |
Compute and update the safe set (i.e., ROA) |
Return the well-trained policy |
A case study was conducted on a GFM inverter system, as shown in
The case study was conducted on an Intel Core i7-8650U @ 1.90 GHz Windows based computer with 16 GB RAM. The convergence process of the training for the proposed algorithm is illustrated in

Fig. 3 Convergence process of training for proposed algorithm.
The proposed algorithm exhibited remarkable convergence, typically requiring only a few tens of iterations. Under the obtained control policy, the ROA is shown in

Fig. 4 ROA under safe MBRL-based control policy.
We investigated the frequency control performance of the proposed algorithm, as depicted in

Fig. 5 BESS charging action of proposed algorithm and frequency control performance of proposed algorithm and comparing algorithms. (a) Hz. (b) Hz. (c) Hz.
To demonstrate the superiority of the proposed algorithm over traditional model-free DRL algorithms (e.g., the deep deterministic policy gradient (DDPG) algorithm outlined in [

Fig. 6 Frequency control performance of proposed algorithm and model-free DRL algorithms. (a) Hz. (b) Hz. (c) Hz.
The stable control performance of the proposed algorithm can be largely attributed to the integration of Lyapunov stability theory into the learning process, which provides a safety guarantee characteristic. More specifically, the proposed algorithm selects optimal control actions within the ROA, ensuring a level of safety that model-free DRL algorithms cannot guarantee for the learned policy.
Furthermore, to test the robustness of the proposed algorithm against inverter parameter variations, such as and in (1), we evaluated the performance of the well-trained safe MBRL controller under different parameter settings.

Fig. 7 Robustness of safe MBRL-based control policy against and uncertainties with Hz. (a) Robustness of safe MBRL-based control policy against uncertainty. (b) Robustness of safe MBRL-based control policy against uncertainty.
In this paper, we presented a novel safe MBRL algorithm for GFM inverter based frequency regulation with stability guarantee. The proposed algorithm ensures stability by learning a Lyapunov function and utilizes ADP-based RL to enhance control performance. Additionally, the GP modeling was employed to capture VSG dynamics and enhance robustness to parameter uncertainty. The proposed algorithm offers a safe and robust controller for GFM inverter based frequency regulation. Simulation results demonstrated that the performance of the proposed algorithm surpasses that of traditional droop control and model-free DRL algorithms. Moreover, the proposed algorithm only requires the measurements of the voltage phase and angular frequency of the inverter, which are easily accessible in modern power systems. The ease of implementation of the proposed algorithm enhances its potential for practical applications.
Appendix
Lemma 1 The control policy is Lipschitz continuous with Lipschitz constant .
Proof 1 In this work, . is the output of a - layer network, which is given by:
(A1) |
In the hidden layers, ReLU activation functions are used. For the th layer, there exits a constant such that holds for all and . Here, is a vector satisfying and is a small enough positive number. The output layer utilizes tanh activation function, thus the network satisfies , with . This means the control policy is Lipschitz continuous with Lipschitz constant .
Lemma 2 The closed-loop dynamics of the VSG given in (4b) are Lipschitz continuous with Lipschitz constant .
Proof 2 From the dynamics given in (1) and Lemma 1, the dynamic function of VSG is a continuously differentiable function. Any continuously differentiable function is locally Lipschitz. Therefore, the closed-loop dynamics of VSG given in (4b) are Lipschitz continuous with Lipschitz constant .
Lemma 3 The Lyapunov function is Lipschitz continuous with Lipschitz constant .
Proof 3 In this work, the Lyapunov function is set as the value function of the ADP method. The value function is approximated using a piecewise linear function that is continuous. Given that the slopes of this piecewise linear function are bounded, the Lyapunov function exhibits Lipschitz continuity with a Lipschitz constant denoted by .
Theorem 2 can be proofed as follows. According to Lemma 1 of [
The LQR-based initial policy is designed based on the linearized VSG dynamics. According to formulas (
(B1) |
The eigenvalues of the system are:
(B2) |
where is the mutual admittance between the IBR node and the main grid. As shown in Fig. 1, the mutual admittance can be calculated using the line parameters as [
(B3) |
It can be found that the eigenvalues depend on the operating point, virtual inertia, and damping coefficients, and line parameters and . In this work, . The per unit values of and are set to be 5 and 1, respectively.
References
A. Bidram, A. Davoudi, and F. L. Lewis, “A multiobjective distributed control framework for islanded AC microgrids,” IEEE Transactions on Industrial Informatics, vol. 10, no. 3, pp. 1785-1798, Aug. 2014. [Baidu Scholar]
D. Chen, K. Chen, Z. Li et al., “PowerNet: multi-agent deep reinforcement learning for scalable power grid control,” IEEE Transactions on Power Systems, vol. 37, no. 2, pp. 1007-1017, Mar. 2022. [Baidu Scholar]
Z. A. Obaid, L. M. Cipcigan, L. Abrahim et al., “Frequency control of future power systems: reviewing and evaluating challenges and new control methods,” Journal of Modern Power Systems and Clean Energy, vol. 7, no. 1, pp. 9-25, Jan. 2019. [Baidu Scholar]
P. Verma, K. Seethalekshmi, and B. Dwivedi, “A cooperative approach of frequency regulation through virtual inertia control and enhancement of low voltage ride-through in DFIG-based wind farm,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 6, pp. 1519-1530, Nov. 2022. [Baidu Scholar]
X. Meng, J. Liu, and Z. Liu, “A generalized droop control for grid-supporting inverter based on comparison between traditional droop control and virtual synchronous generator control,” IEEE Transactions on Power Electronics, vol. 34, no. 6, pp. 5416-5438, Jun. 2019. [Baidu Scholar]
J. Liu, Y. Miura, H. Bevrani et al., “Enhanced virtual synchronous generator control for parallel inverters in microgrids,” IEEE Transactions on Smart Grid, vol. 8, no. 5, pp. 2268-2277, Sept. 2017. [Baidu Scholar]
K. Sakimoto, Y. Miura, and T. Ise, “Stabilization of a power system with a distributed generator by a virtual synchronous generator function,” in Proceedings of 8th International Conference on Power Electronics, Jeju, South Korea, Jun. 2011, pp. 1498-1505. [Baidu Scholar]
P. He, Z. Li, H. Jin et al., “An adaptive VSG control strategy of battery energy storage system for power system frequency stability enhancement,” International Journal of Electrical Power & Energy Systems, vol. 149, p. 109039, Jul. 2023. [Baidu Scholar]
M. Li, W. Huang, N. Tai et al., “A dual-adaptivity inertia control strategy for virtual synchronous generator,” IEEE Transactions on Power Systems, vol. 35, no. 1, pp. 594-604, Jan. 2020. [Baidu Scholar]
J. Alipoor, Y. Miura, and T. Ise, “Power system stabilization using virtual synchronous generator with alternating moment of inertia,” IEEE Journal of Emerging and Selected Topics in Power Electronics, vol. 3, no. 2, pp. 451-458, Jun. 2015. [Baidu Scholar]
F. Wang, L. Zhang, X. Feng et al., “An adaptive control strategy for virtual synchronous generator,” IEEE Transactions on Industry Applications, vol. 54, no. 5, pp. 5124-5133, Sept. 2018. [Baidu Scholar]
A. Ademola-Idowu and B. Zhang, “Frequency stability using MPC-based inverter power control in low-inertia power systems,” IEEE Transactions on Power Systems, vol. 36, no. 2, pp. 1628-1637, Mar. 2021. [Baidu Scholar]
Z. Yan and Y. Xu, “Data-driven load frequency control for stochastic power systems: a deep reinforcement learning method with continuous action search,” IEEE Transactions on Power Systems, vol. 34, no. 2, pp. 1653-1656, Mar. 2019. [Baidu Scholar]
Y. Li, W. Gao, W. Yan et al., “Data-driven optimal control strategy for virtual synchronous generator via deep reinforcement learning approach,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 4, pp. 919-929, Aug. 2021. [Baidu Scholar]
W. Cui, Y. Jiang, and B. Zhang, “Reinforcement learning for optimal primary frequency control: a Lyapunov approach,” IEEE Transactions on Power Systems, vol. 38, no. 2, pp. 1676-1688, Mar. 2023. [Baidu Scholar]
W. Cui and B. Zhang, “Lyapunov-regularized reinforcement learning for power system transient stability,” IEEE Control Systems Letters, vol. 6, pp. 974-979, Jun. 2022. [Baidu Scholar]
F. Berkenkamp, M. Turchetta, A. Schoellig et al., “Safe model-based reinforcement learning with stability guarantees,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, California, USA, Dec. 2017, pp. 908-919. [Baidu Scholar]
H. Shuai, J. Fang, X. Ai et al., “Stochastic optimization of economic dispatch for microgrid based on approximate dynamic programming,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 2440-2452, May 2019. [Baidu Scholar]
H. Shuai, J. Fang, X. Ai et al., “Optimal real-time operation strategy for microgrid: an ADP-based stochastic nonlinear optimization approach,” IEEE Transactions on Sustainable Energy, vol. 10, no. 2, pp. 931-942, Apr. 2019. [Baidu Scholar]
D. Raisz, D. Deepak, F. Ponci et al., “Linear and uniform swing dynamics in multimachine converter-based power systems,” International Journal of Electrical Power & Energy Systems, vol. 125, p. 106475, Feb. 2021. [Baidu Scholar]
V. Vittal, J. D. McCalley, P. M. Anderson et al., Power System Control and Stability. New York: John Wiley & Sons, 2019. [Baidu Scholar]
C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. Cambridge: The MIT Press, 2005. [Baidu Scholar]
F. Berkenkamp, R. Moriconi, A. P. Schoellig et al., “Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes,” in Proceedings of 2016 IEEE 55th Conference on Decision and Control, Las Vegas, USA, Dec. 2016, pp. 4661-4666. [Baidu Scholar]
B. She, J. Liu, F. Qiu et al., “Systematic controller design for inverter-based microgrids with certified large-signal stability and domain of attraction,” IEEE Transactions on Smart Grid, doi: 10.1109/TSG.2023. 3330705 [Baidu Scholar]
H. K. Khalil, Nonlinear Systems. London: Prentice Hall, 1996. [Baidu Scholar]
D. Duvenaud. (2014, May). The kernel cookbook: advice on covariance functions. [Online]. Available: https://www.cs.toronto.edu/duvenaud/cookbook [Baidu Scholar]