Journal of Modern Power Systems and Clean Energy

ISSN 2196-5625 CN 32-1884/TK

网刊加载中。。。

使用Chrome浏览器效果最佳,继续浏览,你可能不会看到最佳的展示效果,

确定继续浏览么?

复制成功,请在其他浏览器进行阅读

Deep Reinforcement Learning Enabled Bi-level Robust Parameter Optimization of Hydropower-dominated Systems for Damping Ultra-low Frequency Oscillation  PDF

  • Guozhou Zhang 1
  • Junbo Zhao 2 (Senior Member, IEEE)
  • Weihao Hu 1
  • Di Cao 1 (Member, IEEE)
  • Nan Duan 3 (Senior Member, IEEE)
  • Zhe Chen 4 (Fellow, IEEE)
  • Frede Blaabjerg 4 (Fellow, IEEE)
1. School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China; 2. Department of Electrical and Computer Engineering, University of Connecticut, Storrs, CT, 06269, USA; 3. Lawrence Livermore National Laboratory, Livermore, USA; 4. Department of Energy Technology, Aalborg University, 9220 Aalborg, Denmark

Updated:2023-11-15

DOI:10.35833/MPCE.2022.000529

  • Full Text
  • Figs & Tabs
  • References
  • Authors
  • About
CITE
OUTLINE

Abstract

This paper proposes a robust and computationally efficient control method for damping ultra-low frequency oscillations (ULFOs) in hydropower-dominated systems. Unlike the existing robust optimization based control formulation that can only deal with a limited number of operating conditions, the proposed method reformulates the control problem into a bi-level robust parameter optimization model. This allows us to consider a wide range of system operating conditions. To speed up the bi-level optimization process, the deep deterministic policy gradient (DDPG) based deep reinforcement learning algorithm is developed to train an intelligent agent. This agent can provide very fast lower-level decision variables for the upper-level model, significantly enhancing its computational efficiency. Simulation results demonstrate that the proposed method can achieve much better damping control performance than other alternatives with slightly degraded dynamic response performance of the governor under various types of operating conditions.

I. Introduction

IN recent years, the negative effects of the fossil fuel based power system attract more and more attentions. In this context, increasing the penetration of renewable energy with low carbon and sustainable characteristics in the power system is a significant pathway to address this issue [

1]. Hydropower is one of the cheapest and reliable sources of renewable energy and it accounts for nearly 50% of renewable power generation [2]. However, recent studies indicate that the hydropower-dominanted systems are easily subject to the ultra-low frequency oscillation (ULFO) [3], which have been observed in China [4] as well as Nordic and Colombia [5], [6]. If ULFO is not properly controlled, there is a risk of system instability.

In [

7], the root cause of ULFO is investigated based on the vector margin method, where it is shown that the hydraulic turbine-governor creates a negative damping torque in the ultra-low frequency band. In [8], via the damping torque analysis, it is found that the unreasonable parameter settings of the hydraulic governor proportional-integral (PI) controller lead to ULFO. Furthermore, if the ratio of the proportional parameter to the integral parameter is too small for the hydropower governor, the ULFO can occur [9]. To damp out ULFO, various strategies have been proposed. In [10], a high-voltage direct current frequency limiting controller (HVDC-FLC) is developed. Note that for partial hydropower-dominated systems, only the AC transmission is used for the transmission of electric energy and thus it is difficult to adopt this method. In [11], the power system stabilizer (PSS) is configured on the governor to mitigate the impacts of the negative damping torque. By contrast, in [12], the PSS and proportion resonant (PR) are integrated to form the PR-PSS for ULFO control. A multi-band PSS, named PSS4B, is also proposed [13] with low-band, intermediate-band, and high-band controllers. The low-band controller provides the damping of the ULFO modes. However, these configurations are not always available in practice, limiting their applications in power systems.

Optimizing the proportional-integral-derivative (PID) parameters of the governor is another alternative for ULFO control. It is also shown that the re-tuning of the governor settings has high practical value [

14]. In [15], [16], a robust optimization method for tuning governor is proposed. A robust fixed order control design method is also developed in [2]. However, these methods are limited to several typical operating conditions. To this end, a bi-level optimization model (min-max or max-min) is proposed [17]. In particular, the performance of optimization variables under extreme operating conditions is taken as the objective to achieve generalizability under all operating conditions. Although the results are conservative, it is appropriate considering the importance of system stability unless other effective methods are available. To solve the bi-level model, it is advocated to replace the lower-level model using Karush-Kuhn-Tucker formulation [18], duality principle [19], and penalty function [20], yielding a single-level optimization model. However, for the decision variables provided by the upper-level model at each iteration, the lower-level model needs to repeat optimizations to find the corresponding extreme operating condition, which is usually time-consuming.

To this end, a novel bi-level PID parameter optimization model is proposed for ULFO control. It has the following contributions:

1) Based on the Routh-Hurwitz criterion, the mechanism of the ULFO is studied and the feasibility of formulating ULFO control as optimizing PID parameters is demonstrated. In particular, the problem of ULFO control is formulated as a bi-level robust PID optimization model. This is in contrast with the formulation in [

14], [15] and allows us to deal with a wider range of extreme system operating conditions. In addition, the optimized PID parameters retain the dynamic performance of the governor.

2) To improve the efficiency of solving bi-level model, this paper forms the lower-level model into the Markov decision process (MDP) solved by a deep deterministic policy gradient (DDPG) based algorithm. After that, the decision variables transferred by the upper-level model can be quickly addressed via the well-trained DDPG agent without the repeated optimization. This is a novel method to solve min-max optimization model and is different from previous min-max model [

18]-[20], yielding significantly improved computational efficiency.

The rest of this paper is organized as follows. Section II introduces the system model. In Section III, the formulation of governor parameter optimization is presented. The proposed bi-level robust parameter optimization model is presented in Section IV. The case study is provided in Section V. Section VI presents the actual hydropower-dominated system and conclusions are given in Section VII.

II. System Model

A. Two-machine System

Figure 1 shows a two-machine system, which is introduced to act as the studied system to investigate ULFO, including hydropower unit (G1), thermal power unit (G2), and load. The key parameter settings of this system are provided in Appendix A. The hydropower mainly consists of a generator, a governor, and a turbine, as shown in Fig. A1 of Appendix A. Among them, the linearized dynamic equation of generator can be described as:

TJdΔωdt=ΔPm-ΔPe-DsΔω (1)

Fig. 1  Schematic of two-machine system.

where TJ is the inertia time constant; Δω is the rotor speed deviation; ΔPm is the mechanical power deviation; ΔPe is the electromagnetic power deviation; and Ds is the damping coefficient.

Assuming frequency-dependent load as ΔPe=KLΔω, where KL is the load frequency sensitivity, and substituting ΔPe=KLΔω into (1), the transfer function of the generator Ggens can be described as [

15]:

Ggens=ΔωsΔPms=1TJs+Ds+KL=1TJs+D (2)

where D=Ds+KL. The transfer function of prime mover (governor and turbine) Gms can be defined as [

4]:

Gms=ΔPms-Δωs=GgovsGtsGgovs=KDs2+KPs+KIbpKI+s11+TGsGts=1-Tws1+0.5Tws (3)

where Ggovs and Gts are the transfer functions of governor and turbine, respectively; KP, KI, and KD are the proportional, integral, and differential coefficients of the governor, respectively; bp is the adjustment coefficient of the governor; TG is the response time of the governor; and Tw is the water hammer effect and it depends on the operating conditions of hydroturbine [

2]:

Tw=i=1nLiAiQrgHr (4)

where Li and Ai are the length and sectional area of the ith diversion pipeline, respectively; Qr is the rated water flow; g is the gravitational acceleration; Hr is the rated water head; and n is the number of diversion pipelines.

For the thermal power unit, this paper ignores the boiler dynamic process and mainly considers the steam turbine and governor [

8]:

ΔPmhs=-GhsΔωs=-KaΔωs1+sTg1+sTch (5)

where ΔPmhs is the damping torque of thermal generator; Ghs is the transfer function of thermal power unit; Ka is the gain; Tg and Tch are the time-steps of hydraulic system and high-pressure cylinder, respectively.

B. ULFO Mechanism Analysis

The dynamic responses of G1 and G2 under fault 1, i.e., a double-phase short-circuit fault at bus 3 from 2.0 s to 2.2 s, are shown in Fig. 2. It can be observed that, the angle speed of two generators varies with the same phase and amplitude. No oscillations between two units are observed. It is different from traditional low-frequecy oscillation (LFO). The Prony method [

21] is utilized to identify the oscillation model and the identification results show that a ULFO mode -0.0001+j0.403 exists in the system.

Fig. 2  Dynamic responses of G1 and G2 under fault 1.

To investigate the cause of ULFO in the studied system, we calculate the damping torque coefficient of two units’ primary frequency regulation (PFR). Among them, the detailed calculation process of damping torque coefficient of hydropower unit is provided in [

12] and can be defined as:

DHs=1-0.5TGTwωd2-ωd2TωTG+0.5Tw1-0.5TGTwωd2+ωd2TG+0.5Tw (6)

Submitting s=jωd into (3), the damping torque coefficient of stream power unit can be obtained as [

8]:

ΔPm=-DTΔω-jDIMΔω (7)
DTs=Ka1-TgTchωd21-TgTchωd22+Tg+Tch2ωd2 (8)

where DT denotes the damping torque coefficient; and DIM denotes the synchronous torque coefficient.

Submitting ωd=2πf into (6) and (8), the trajectory of damping torque coefficients DH and DT changing with frequency f can be obtained, as shown in Fig. 3.

Fig. 3  Trajectory of damping torque coefficients of hydropower and stream power units. (a) Hydropower unit. (b) Stream power unit.

It can be observed that the hydropower unit will produce a negative damping under the ultra-low frequency band (below 0.1 Hz). In contrast, the thermal power unit would produce a positive damping in this band. Therefore, the ULFO is strongly related with hydropower unit.

To further investigate the relationship of ULFO and hydropower unit, the characteristic equation of closed-loop transfer function of the hydropower unit is calculated based on (2) and (3) and can be written as:

TJs+DbpKI+s1+TGs1+0.5Tws+1-TwsKDs2+KPs+KI=0 (9)

After simplification, (9) can be rewritten as:

s4+a1s3+a2s2+a3s+a4=0 (10)

where a1, a2, a3, and a4 are the Routh-Hurwitz criterion coefficients, which are related to Tw. Specifically, the Routh-Hurwitz criterion coefficients and oscillation frequency changing with Tw are shown in Fig. 4. Note that, the detailed expression of these coefficients and the calculation process of oscillation frequency are provided in Appendix A.

Fig. 4  Routh-Hurwitz criterion coefficients and oscillation frequency changing with Tw. (a) Routh-Hurwitz criterion coefficients. (b) Oscillation frequency.

It can be observed from Fig. 4(a) that with the increase of Tw, all Routh-Hurwitz criterion coefficients decrease and in particular, the coefficient c1 becomes negative. Base on the Routh-Hurwitz criterion [

3], it can be concluded that the system is unstable, and there is a negative damping oscillation mode in the system. In addition, Fig. 4(b) shows that, with the increase of Tw, the oscillation frequency decreases to below 0.1 Hz. It means that this dominant negative damping oscillation mode is ULFO mode.

In summary, the ULFO is strongly related to the hydro-power units, and it is caused by the PFR of the hydropower units. More specifically, due to the water hammer effect, the hydraulic governor easily produces negative damping torque, as shown in Fig. 3, resulting in the negative ULFO mode.

Besides, Routh-Hurwitz criterion coefficients are related to the hydrogovernor PID parameters. In fact, we can adjust PID parameter settings to make these coefficients keep positive. Based on this consideration, we test the trajectory of c1 with different PID parameter settings and the results are shown in Fig. 5. It can be observed that the tuning of PID parameters can make the c1 positive. In this way, the system becomes stable, which means that optimizing PID parameters contributes to preventing ULFO.

Fig. 5  c1 changing with different PID parameter settings.

III. Formulation of Governor Parameter Optimization

Based on the system linearization technology, the state matrix A can be obtained [

22]. Then, (11) can be utilized to diagonalize matrix A to obtain eigenvalues:

A=VΛUVT=I (11)
Λ=diagλ1,λ2,,λULFO,,λmλULFO=σULFO+jβULFOξULFO=-σULFO/σULFO2+βULFO2 (12)

where U and V are the left and right feature vectors, respectively; Λ is the diagonal matrix; λm denotes the mth eigenvalue; λULFO is the ULFO mode; σULFO and βULFO are the real part and imaginary part of λULFO, respectively; and ξULFO is the damping ratio.

As mentioned above, optimizing PID parameter settings contributes to stable ULFO mode λULFO. However, there are some requirements: ① less effect on other oscillation modes: we should avoid weakening the damping of other modes when improving the damping of ULFO mode; ② good dynamic performance for the PFR: previous studies indicate that if without proper design, optimizing PID parameters may deteriorate PFR dynamic performance, and thus weakening the frequency adjustment ability of governor [

14], [15]; ③ robustness to the change of operating conditions: in actual operation, the operating condition of the turbine is time-varying. The optimized PID parameters that show good performance under one operating condition may not work well under other conditions. Therefore, during the optimization process, extreme operating conditions should be considered. Based on the above considerations, the suppression of ULFO is modeled as a bi-level min-max optimization model with constraints, which is described as:

minJupperKP,i,KI,i,KD,i,Tw,i*KP,iminKP,iKP,imaxKI,iminKI,iKI,imaxKD,iminKD,iKD,imaxTw,i*=argmaxyzlowerKP,i,KI,i,KD,iTw,iminTw,iTw,imaxξj'KP,i,KI,i,KD,i,Tw,i>ξ0    i=1,2,,N (13)

where Jupper is the objective function of the optimization problem; Tw,i* is the time constant of water hammer effect of the ith governor representing the operating conditions of the governor and limited between 0.5 and 3 [

15]; yzlower is the objective function of the lower model; ξj' is the damping of the jth oscillation mode except for the ULFO mode; ξ0 is the desired damping ratio of the oscillation mode that is typically set to be 0.05 [12]; and N is the number of the governors. Jupper can be reformulated as [15]:

JupperKP,i,KI,i,KD,i,Tw,i=JSTB+JITAEJSTB=ξULFO<ξsetξULFO-ξset+σULFO>σsetσULFO-σsetJITAE=i=1N0TtΔPmt-ΔPmdt (14)

where JSTB is the stability of the ULFO; JITAE is the primary frequency control performance of the prime mover; σset is the desired real part of the eigenvalues; ξset is the damping ratio of the eigenvalues; ΔPmt is the dynamic response of governor under disturbance; and ΔPm is the steady-state value of the prime mover under disturbance. In fact, JSTB can be further written as:

JSTB=ξset-ξULFO+σULFO-σset    ξULFO<ξset,σULFO>σset0                                             ξULFOξset,σULFOσset (15)

It can be observed from (15) that, when ξULFO<ξset,σULFO>σset, JSTB is defined as ξset-ξULFO+σULFO-σset, by minimizing this target, ξULFO and σULFO of ULFO mode would be pushed close to the predetermined damping and real part. Note that the damping of ULFO mode is set to be a bigger value. By optimizing PID parameter settings to make the damping of ULFO mode close to the predetermined damping, the damping of ULFO mode is improved. Once the damping and real parts of ULFO mode reach to the predetermined value, JSTB becomes 0.

By minimizing JSTB, the output deviation of the prime mover can be optimized to close to steady-state point. In this way, both the oscillation amplitude and oscillation time of the prime mover under disturbance can be optimized, which means that the dynamic performance of the primary frequency control can be improved.

IV. Proposed Bi-level Robust ParameterOptimization Model

For the bi-level robust parameter optimization model, the lower-level model is reformulated as the MDP and solved by the DDPG algorithm. Subsequently, the trained agent is employed to assist particle swarm optimization (PSO) to solve the upper-level model to obtain the optimal solution. The overall scheme of the optimization process is shown in Fig. 6.

Fig. 6  Overall scheme of optimization process.

A. Solving Lower-level Model via DDPG Algorithm

The core functions of lower-level model shown in Fig. 6 can be described as:

Tw,i*=argmaxyzlowerKP,i,KI,i,KD,iTw,iminTw,iTw,imaxi=1,2,,Nξj'KP,i,KI,i,KD,i,Tw,i>ξ0 (16)

In each episode, the upper-level model would provide a set of PID parameter setting KP,i,KI,i,KD,i to the lower-level model. The lower-level model searches the extreme operating condition Tw,i* with the worst performance under this PID setting condition. In this way, both the upper-level and lower-level variables are determined and the objective function (14) can be calculated and sent to the upper-level model.

In fact, the function of the lower-level model is to find the corresponding extreme operating condition under every PID setting condition. Therefore, the key to solving this lower-level model is to find the policy function Tw,i*=πKP,i,KI,i,KD,i that maximizes the objective function. It is a decision-making problem in an uncertain environment. This paper reformulates it as an MDP, and the key elements related to MDP are defined as a tuple, S, A, P, R, where S denotes the state and is composed of the PID parameter settings of each hydrogovernor; A denotes the action and is represented as the operation conditions of governor; P denotes the transition probability; and R denotes the reward and is used to evaluate the action taken by the agent at each time step. In this paper, the reward rsk,ak is defined as:

rsk,ak=JKP,ik,KI,ik,KD,ik,Tw,ik+j=1mξ0<ξj'ξ0-ξj'KP,ik,KI,ik,KD,ik,Tw,ik (17)

where the superscript k denotes the kth turbine governor.

The reinforcement learning algorithm is a common method to solve this MDP [

22]. The background of reinforcement learning is described as follows:

1) Q-learning: it is a widely used reinforcement learning method [

23], where an agent is trained to learn the optimal control policy to maximize the cumulative reward. In Q-learning, each state-action pair sk,ak is assigned with a Q-value, stored in the Q-table, and denoted as Qsk,ak. For a given state, the higher Q-value for an action denotes a higher potential cumulative reward and it can be utilized to update Q-value via Bellman equation [24]:

Qsk,akQsk,ak+αrsk,ak+γmaxQsk+1,ak+1-Qsk,ak (18)

where γ is the discount rate; and α is the learning rate. The ε-greedy policy is adopted in Q-learning, where the agent will choose the action with the highest Q-value during the training process. Note that Q-learning is only suitable for the case where state-action space is small. The increase of state-action space size would make the Q-table become too big, resulting in each Q-value being rarely updated. In this context, deep Q-network (DQN) is proposed to address it.

2) DQN: DQN is proposed for solving high dimension state-action problem via combining deep neural network (DNN) with Q-learning method. Specifically, a DNN is utilized to approximate Q-table, named Q-network, represented as Qsk,ak,θ, where θ denotes the parameters of DNN. The Q-network takes the state as input and outputs the Q-value for each state-action pair. It can be trained via minimizing the loss function:

Lθ=Ey-Qsk,ak,θ2 (19)

where E· is the expected function; y=rsk,ak+γmaxQsk+1,ak+1,θ' denotes the target Q-value, and θ' denotes the parameter of target Q-network and is updated via soft-update method [

25]. To stabilize training process, it can be calculated via the target Q-network Qsk+1,ak+1,θ'.

Moreover, a replay buffer is employed in DQN to break the identically distributed state samples and reduce the correlation between them, leading to improved data efficiency. Specifically, during the training process, all information is saved as an experience ek=st,at,rt,st+1 and stored in the memory D=e1,e2,,eM. After the buffer is full, the oldest experience will be replaced by the newly obtained one. Subsequently, at each iteration, the agent will sample a mini-batch of experience from the replay buffer.

3) DDPG: DDPG is the standard DQN method only working effectively for solving control problems with continuous states and low-dimension discrete action sets. It is not suitable to solve the optimization problem of PID parameter setting. To this end, DDPG is introduced and it can achieve better performance in solving control problems with continuous action space than DQN. Figure 6 shows the procedure of the DDPG and it consists of two eponymous ingredients: an actor network is utilized to fit the state sk to action ak, denoted as policy function μsk|θμ via adjusting the network parameter θμ=W1a,b1a,W2a,b2a,,Wna,bna; and a critic network is utilized to fit the action-value function Qsk,akθQ via adjusting network parameter θQ=W1c,b1c,W2c,b2c,,Wmc,bmc.

During the training process, these two networks are trained against each other. Among them, the critic network can be updated by the loss function [

26]:

LθQ=1Nmmi=1nQsk,akθQ-yk2yk=rsk,ak+γQsk+1,μskθμθQθk+1Q=θkQ+αQθμLθkQ (20)

where Nmm is the number of mini-batch; and αQ is the learning rate of the critic networks, Then, the parameters of the actor network can be updated by the gradient descent [

27]:

θμJθμ=             1Nmmi=1NαQs,aθQs=sk,a=μsk+𝒩θμμsθμs=skθk+1μ=θkμ+αμθμJθkμ (21)

where αμ is the learning rate of the actor network; and 𝒩 is the Gaussian noise.

Moreover, to stabilize the training process, both target actor-critic pair networks are added in DDPG, which can be parameterized by θμ' and θQ'. At each iteration, the soft updating (22) is utilized to synchronize its parameters to the target actor-critic networks [

26].

θQ'τθQ+1-τθQ'θμ'τθμ+1-τθμ' (22)

where τ is the soft-update rate.

B. Combining PSO and Well-trained DDPG-enabled Agent to Solve Upper-level Model

After off-line training, the well-trained DDPG-enabled agent can learn the optimal control policy and provide extreme operating conditions for each PID parameter setting of the system.

Tw,i*=μKP,i,KI,i,KD,iθμ    i=1,2,,N (23)

In this way, the proposed bi-level min-max optimization model (13) can be transformed into a single-level mathematical programming problem with constraints, i.e.,

minJKP,i,KI,i,KD,i,Tw,i*KP,iminKP,iKP,imaxKI,iminKI,iKI,imaxKD,iminKD,iKD,imaxTw,i*=μKP,i,KI,i,KD,iθμi=1,2,,N (24)

It can be observed from (23) that the bi-level model is converted into a nonlinear optimization problem. A heuristic algorithm is a good choice to solve that and this paper uses the PSO algorithm [

28]. The detailed solution processes of the optimization model are as follows.

Step 1:   define the solution space and fitness function. The PSO is used to find the optimal PID parameter settings for governor. In this context, the particle position is designed as the PID parameter settings (KP,i, KI,i, KD,i). The fitness function is applied to evaluate the training error and the goodness of a given solution, which is defined as the objective function and shown in (14).

Step 2:   initialize random swarm location and velocities. Before beginning to search the optimal position, each particle is initialized with the random PID parameter setting within the allowable ranges. Moreover, the direction and length of movement of the particle at each episode are named velocity, which is also initialized.

Step 3:   calculate the fitness of each particle. The PID parameter setting carried by each particle is transmitted to the well-trained DDPG agent. Then, the actor network in DDPG can provide the extreme operating condition Tw-i* with the worst performance for each particle under the corresponding PID parameter setting. Next, both PID parameter setting and operating condition data of each particle can be updated to the studied system to calculate the fitness via (14).

Step 4:   update the particle position and velocity. The velocity and position of each particle can be updated via:

vi+1d=ωvid+c1r1pid-xid+c2r2pg-xidxi+1d=xid+vid (25)

where vid is the velocity of the dth particle at the ith iteration; xid is the position of the dth particle at the ith iteration; pg is the best position among all particles in the population up to the nth iteration; pid is the optimal position of the dth particle up to the nth iteration; ω is the inertia factor; c1 and c2 are the acceleration coefficients; and r1 and r2 are the random numbers in [0,1]. In this paper, c1=c2=2; ω=0.6 [

15].

Step 5:   iterate to find the optimal solution. Repeat Steps 2 and 3 until the minimum error is met, or the maximum number of iterations is reached. Output the final result as the solution to the above optimization problem.

V. Case Study

In this section, the performance of the proposed method is tested on the IEEE 10-machine 39-bus system. All generators adopt a fifth-order model and a simplified excitation system [

14]. The eigenvalue analysis results show that a ULFO mode of 0.00045+j0.55 exists in the system. The generators strongly related with this mode include G1, G2, G5, and G8. To damp out that oscillation, we formulate the ULFO suppression problem as a bi-level PID parameter setting optimization model. Then, the DDPG and PSO algorithms are combined to solve this model. The simulation is carried out via Python and power system analysis software package (PSASP) [29]. Among them, the studied system is constructed in PSASP. PSO and DDPG algorithms are modelled in Python.

A. Performance Evaluation of Trained Agents

The numbers of layers and neurons for the networks in the DDPG algorithm are set as follows: both actor and critic networks adopt the same structure, which contains three hidden layers; and the numbers of neurons for the hidden layers are 128, 64, and 64, respectively. The hyperparameters of the DDPG algorithm can be found in Table I.

Table I  Hyperparameters of DDPG Algorithm
ParameterValue
Learning rate for actor network 0.001
Learning rate for critic network 0.002
Experience replay memory capacity 8000
Step size of each episode 12
Mini-batch size 40

To obtain the corresponding extreme operating conditions with different PID parameter settings, which maximize the objective function, numerous scenarios are constructed for training. Specifically, we generate scenarios by randomly sampling from the upper- and lower-limits of PID parameter settings of each hydroturbine (0KP,i25, 0KI,i0.5, 0KD,i3). Note that 95% of these PID parameter settings are used for training while the remaining 5% are used for testing. The cumulative reward changing with the episode during the training process is shown in Fig. 7, where the blue solid lines and the shaded areas represent the average accumulated reward and the standard deviation of the reward, respectively. At the beginning of the training process, as the parameters of DDPG are randomly initialized, the agent does not learn the optimal policy that satisfies all constraints, leading to low accumulated reward. As the training process goes further, the agent interacts with the system to obtain experiences to optimize the parameters of DDPG, and the cumulative reward increases gradually. After about 5000 episodes, the algorithm converges.

Fig. 7  Cumulative reward changes with episode during training process.

After training, the well-trained agent can provide each PID parameter setting to the corresponding extreme operating condition. After that, this agent can be applied to assist the PSO algorithm to solve the upper-level model. To show the computational efficiency of this agent, we compare the solution methods of the lower-level model in [

19] and [20] with our trained agent. The computational efficiency comparison of different methods is shown in Table II. It can be observed that our proposed method achieves the highest computational efficiency.

Table II  Computational Efficiency Comparison of Different Methods
MethodAverage solution time of lower-level model (s)
Optimization method in [19] 13.40
Optimization method in [20] 3.70
Proposed method 0.05

Moreover, we further investigate the performance of the PSO and differential evolution (DE) to test the performance of these two methods. To balance the training time and optimization effect, by careful trial and error, the population size of these two methods is set to be 30. Except the population size, the hyperparameters of the method have a great impact on the optimization performance. To deal with it, each hyperparameter in the method will first be discretized into several points in its feasible region. Then, these discrete points form a test table by the orthogonal method.

In this way, we can form nine sets of hyperparameter settings for DE and PSO algorithms, respectively. Then, we can train the PSO and DE algorithms with each set of hyperparameter settings, as shown in Fig. 8, where the solid lines and the shaded areas represent the average and the standard deviations of the objective function under different trials, respectively. Moreover, some quantitative indicators are introduced to evaluate the optimization performance of PSO and DE algorithms, as shown in Table III. It can be observed that, the performance of both PSO and DE algorithms is almost similar in terms of the best fitness value obtained in the 18 runs. Specifically, DE algorithm is slightly better than PSO algorithm. However, the PSO algorithm performs better compared with DE algorithm in terms of the average, the worst, and the standard deviations of the objective function.

Fig. 8  Convergence process of DE and PSO algorithms with different hyperparameter settings.

Table III  Quantitative Indicators of Optimization Performance of PSO and DE Algorithms with Different Hyperparameter Settings
AlgorithmValue of quantitative indicator
BestAverageWorstStandard deviation
PSO 26.42 33.18 53.30 9.71
DE 25.01 41.07 140.27 37.68

B. Control Performance Comparisons

To test the effectiveness of the proposed method, five cases are tested by setting different time constants of the hydro-turbine Tw, as shown in Table IV. Other PID parameter setting tuning methods are compared as well, including the optimization method shown in [

14] (called robust method I), and the one in [15] (called robust method II). The optimized parameters are used in the time-domain simulations to validate the damping control performances. Taking G1 and G2 as examples, the frequency deviations of units under fault 1, i.e., a double-phase short-circuit fault at bus 3 from 2.0 s to 2.2 s, are shown in Fig. 9. The damping of ULFO mode by different optimization methods is presented in Table V. It can be observed from Table V and Fig. 9 that ULFO exists in the system before PID parameter tuning. After PID parameter optimization, the damping of ULFO mode is enhanced and the oscillation is suppressed. However, the PID parameter settings optimized by different methods show different performances. Compared with the other two robust optimization methods, the proposed method achieves the best damping of ULFO mode and allows the generators to reach a steady state in the shortest time.

Table IV  Tw of Hydro-turbines in Different Cases
GeneratorTw of hydro-turbine
Case 1Case 2Case 3Case 4Case 5
G1 0.5 2.5 1.5 1.5 2.5
G2 1.5 1.5 2.5 0.5 0.5
G3 0.5 1.5 1.5 1.5 0.5
G4 2.5 2.5 2.5 1.5 1.5
G5 1.5 2.5 0.5 1.5 0.5
G6 1.5 0.5 0.5 1.5 2.5
G7 0.5 1.5 0.5 0.5 2.5
G8 0.5 0.5 1.5 1.5 1.5
G9 1.5 2.5 1.5 2.5 1.5
G10 2.5 0.5 1.5 0.5 2.5

Fig. 9  Frequency deviations of units under fault 1. (a) G1. (b) G2.

Table V  Damping of ULFO Mode by Different Optimization Methods
MethodReal partImaginary partDamping (%)
Original parameters 0.0005 0.54 -0.093
Robust method I -0.0410 0.59 6.930
Robust method II -0.0540 0.55 9.770
Proposed method -0.0920 0.52 17.420

To further elaborate on the results, the dynamic characteristic of G1 and step responses of PFR are shown in Figs. 10 and 11. It can be observed that the original parameters make the units achieve the best dynamic response but the worst damping performance of ULFO. The parameter settings re-tuned by the three optimization methods can improve the damping characteristics of the governor with the cost of reduced dynamic characteristics. Among them, the proposed method not only improves the damping characteristics of the governors but also retains a better dynamic response of PFR than the other two methods.

Fig. 10  Dynamic characteristic of G1. (a) Damping curve. (b) Step response.

Fig. 11  Step responses of PFR. (a) G1. (b) G2.

C. Robustness to Various Operating Conditions

To test the robustness of all three methods under different disturbances, four different two-phase short-circuit faults are applied with 100 ms fault duration and the faulty buses are 8, 16, 23, and 25, which are named as fault 2, fault 3, fault 4, and fault 5, respectively. The simulation results are shown in Fig. 12. It can be observed that all methods can damp out the oscillations. Among them, the proposed method takes the least time to damp out the oscillations as compared with the other two robust methods. The proposed method also has the smallest oscillation magnitude, demonstrating the best performance of damping out UFLO under different faults.

Fig. 12  Dynamic response of G1 under different disturbances. (a) Fault 2. (b) Fault 3. (c) Fault 4. (d) Fault 5.

Furthermore, different Tw settings for Cases 2-5 in Table IV are taken as comparative cases to test the robustness of the proposed method under different cases.

For each case, time-domain simulations are carried out with different PID parameter settings. The results are shown in Fig. 13.

Fig. 13  Dynamic response of G1 under different cases. (a) Case 2. (b) Case 3. (c) Case 4. (d) Case 5.

It can be concluded that the PID parameters optimized by the proposed method can achieve better performance in comparison with two other optimization methods. To further investigate the performance of the proposed method, we adopt the Monte Carlo method to sample time constant Tw of turbines between 0.5 s and 3 s to generate 300 cases. For each case, the ULFO mode is calculated and the probability density function (PDF) of ULFO nodes is also calculated, as shown in Fig. 14. Note that the red curve denotes the PDF distribution and is formed by fitting the eigenvalues produced via different PID parameter tuning method. It can be observed that before PID parameter optimization, the damping of the ULFO mode is less than 0 for some cases and not located in the secure region (ξ>5%). This means that there is a risk of ULFO induced instability during the operation of the system. Figure 14(b), (c), and (d) shows the distributions of ULFO modes under different PID parameter settings. The statistical hypothesis for damping ratio of ULFO mode under 300 operating conditions is listed in Table VI. Clearly, compared with two other robust methods, the proposed method can make the ULFO mode achieve better mean value and smaller standard deviation of the damping ratio, yielding a larger stability margin.

Fig. 14  Damping PDF of ULFO mode under different cases. (a) Original parameters. (b) Robust method I. (c) Robust method II. (d) Proposed method.

Table VI  Statistical Hypothesis for Damping Ratio of ULFO Mode Under 300 Operating Conditions
MethodMean value (%)Standard deviation
Original parameters 0.27 0.0075
Robust method I 6.18 0.0420
Robust method II 10.25 0.0536
Proposed method 13.78 0.0213

VI. Actual Hydropower-dominated System

In this section, an actual hydropower-dominated system in Sichuan Power Grid, China is introduced as a studied system, which is shown in Fig. 15. The abbreviations shown in Fig. 15 are the acronyms of the region names. This system consists of 53 buses, 26 lines, and 17 hydropower units and the installed capacity is 1100 MW. For such system, the eigenvalue analysis results show that a ULFO mode -0.0013+j0.597 exists in this system. To damp out that oscillation, the proposed method is employed to optimize the governor PID parameters, which are strongly related to the ULFO (CJB-1, CJB-2, CPQ-3, CTCH-2).

Fig. 15  Simplified system topology used for case.

A. Further Evaluation of Proposed Method

To test the effectiveness of the proposed method under different datasets, we generate three datasets via different distribution functions. Specifically, we generate three datasets by sampling from the upper- and lower-limits of PID parameter settings of each hydroturbine via the uniform distribution function, normal distribution function, and rayleigh distribution function, respectively. Each dataset is utilized to train a DDPG agent. The training process is shown in Fig. 16. The curves and shaded areas in Fig.16 represent the mean and variance of the reward function during the training process, respectively.

Fig. 16  Cumulative reward changes with episode under different datasets.

It can be observed from Fig. 16 that, with the increase of training episodes, the proposed method would converge under three different datasets. It means that the DDPG agent can learn the control policy that maximizes the cumulative reward. Then, these well-trained agents can be utilized to assist PSO to find the appropriate PID parameter settings to prevent ULFO. The effectiveness analysis of these obtained PID parameter settings is provided in the next part.

B. Control Performance Analysis and Comparison

To test the control performance of obtained PID parameter settings, we send the PID parameter settings to the system and compare these settings under a fault, where a two-phase short-circuit faults are applied with 150 ms fault duration and the faulty buses are CJL-220. The simulation is carried out under two different cases and the results are shown in Fig. 17. Clearly, each group of PID parameter settings obtained via different datasets can prevent ULFO phenomena under all three operating conditions, which means that the proposed method can work effectively under different dataset and the appropriate PID parameter settings can be found to prevent ULFO.

Fig. 17  Comparison results of different PID parameter settings obtained via different datasets. (a) Tw=2.5. (b) Tw=3.0.

Moreover, to further investigate the performance of the proposed method, the common PID parameter tuning method (Ziegler-Nichols) is applied to act as benchmark method. The details of this method can be found in [

30]. The comparison results of different PID tuning methods can be found in Fig. 18. It can be observed that both the Ziegler-Nichols method and the proposed method can damp out the frequency oscillations. Among them, the proposed method can make the frequency deviation curve recover to the steady point faster. It means that the proposed method can search better PID parameter settings and has better control performance.

Fig. 18  Comparison results of different PID tuning methods. (a) Tw=2.5. (b) Tw=3.0.

VII. Conclusion

In this paper, a novel bi-level robust parameter optimization model is proposed to re-tune the PID parameters to control ULFO. Different from the conventional robust optimization methods, the PID parameter optimization is reformulated into the form of a min-max optimization model to ensure the effectiveness of the optimized PID parameters under various extreme operating conditions. The DDPG is developed to train an agent for the fast decision making of lower-level model and ensure the effective interactions with the upper-level model, significantly improving the computational efficiency.

After the agent is trained, each iteration only needs 0.05 s to provide actions for the upper-level model. Simulation results are carried out in IEEE 10-machine 39-bus system and actual hydropower-dominated system in Sichuan Power Grid. Three hundred cases are generated by Monte Carlo method to act as test cases and the comparison results show that the proposed method can achieve better damping control performance. The mean value of the ULFO mode damping under 300 cases can reach 13.78%. The mean value of other two robust methods can only reach 6.18% and 10.25%, respectively. It means that the PID parameters optimized by the proposed method can achieve better damping control performance in comparison with two other robust optimization methods under different operating conditions.

Appendix

Appendix A

The standard PFR model of the hydropower unit is shown in Fig. A1.

Fig. A1  Structure diagram of standard PFR model of hydropower unit.

The parameter settings of hydropower unit, thermal power unit, and AC lines are listed in Tables AI-AIII, respectively. The parameter setting of generators in two-machine system is listed in Table AIV. The rated active and reactive power of generators and load are listed in Table V.

Table AI  Parameter Setting of Hydropower Unit
VariableValueVariableValue
KP 4.0 bp 0.05
KI 2.5 D 2.00
KD 0.5 TG 0.20
TJ 10.0 Tw 4.00
Table AII  Parameter Setting of Thermal Power Unit
VariableValue
Ka 1.0
Tch 0.2
Tg 0.3
Table AIII  Parameter Setting of AC Line
AC lineRXB/2
x1 0.0057 0.0625 0.5145
x2 0.0032 0.0323 0.2806
Table AIV  Parameter Setting of Generators in Two-machine System
ParameterG1G2
Xd 0.4400 1.0870
Xd' 0.2500 0.2890
Xd 0.2200 0.2020
Xq 1.7000 0.6840
Xq' 0.5000 0.6870
Xq 0.6000 0.2280
X2 0.0025 0.2150
Ra 0.0250 0
Td0' 8.0000 10.0300
Td0 0.0300 0.0400
Tq0' 0.4000 0.2590
Tq0 0.0500 0.0600
H 13.000 12.0000
D 0 0
Table AV  Rated Active and Reactive Power of Generators and Load
Generator and loadP (MW)Q (Mvar)
G1 700 185
G2 700 185
Load 1 600 100

The detailed expressions of Routh-Hurwitz criterion coefficients a1, a2, a3, and a4 are shown as follows:

a1=0.5TGTwTJbp+0.5TwTJN0+TGTJN0+0.5TGTwDN0-TwN10.5TGTwTJN0a2=TJbp0.5Tw+TG+Tw0.5TGDbp-N2+0.5TGTwN0TJ+0.5TwD+TGDTJN0a3=N2-Tw+bpTJ+0.5TwD+TGD+DN00.5TGTwTJN0a4=1+Dbp0.5TGTwTJN0No=1/KIN1=KD/KIN2=KP/KI (A1)

The calculation process of oscillation frequency of two-machine system can be given.

Considering two-machine system is unstable, a dominant oscillation mode exists in the system. In this context, (5) can be represented as:

s-λ1s-λ2s+σ+jωs+σ-jω=0 (A2)

That is further rewritten as:

s4+2σ-λ1-λ2s3+λ1λ2+σ2+ω2-2σλ1+λ2s2+σ2+ω2λ1+λ2+2σλ1λ2s+σ2+ω2λ1λ2=0 (A3)

where λ1, λ2, and σ±jω are the poles of the closed-loop transfer function. Note that λ1 and λ2 can be two real roots or a pair of the conjugate complex root. σ±jω is the dominant pole. When σ=0, the frequency response will oscillate critically. Meanwhile, we have:

a1=-λ1+λ2a3=-ω2λ1+λ2 (A4)

Based on (10), the oscillation frequency ω is defined as:

ω=a3/a1 (A5)

Combining (A5) and (A1), ω can be rewritten as:

ω=2N2-Tw+bpTJ+0.5TwD+TGD+DN0TGTwTJbp+TwTJN0+TGTJN0+TGTwDN0-2TwN1 (A6)

The obtained PID parameters of IEEE 10-machine 39-bus system is shown in Table AVI.

Table AVI  PID Parameter Setting
MethodGeneratorKPKIKD
Robust method I G1 8.36 0.34 2.33
G2 2.77 0.04 1.18
G5 17.90 0.09 2.66
G8 1.54 0.18 1.10
Robust method II G1 8.61 0.29 2.69
G2 2.81 0.26 1.74
G5 11.98 0.32 0.60
G8 6.59 0.05 1.63
Proposed method G1 6.73 0.33 2.07
G2 2.28 0.11 1.39
G5 13.58 0.18 1.52
G8 2.68 0.21 1.73

References

1

Z. Arthouros. (2019, Feb.). The renewables 2019 global status report. [Online]. Available: http://www.ren21.net/gsr-2019/pages/foreword/fore word/ [Baidu Scholar] 

2

C. Jiang, J. Zhou, P. Shi et al., “Ultra-low frequency oscillation analysis and robust fixed order control design,” International Journal of Electrical Power & Energy Systems, vol. 104, pp. 269-278, Jan. 2019. [Baidu Scholar] 

3

W. Yang, J. Yang, R. Tang et al., “Experimental investigation of theoretical stability regions for ultra-low frequency oscillations of hydropower generating systems,” Energy, vol. 186, pp. 1-11, Nov. 2019. [Baidu Scholar] 

4

Z. Liu, W. Yao, J. Wen et al., “Effect analysis of generator governor system and its frequency mode on inter-area oscillations in power systems,” International Journal of Electrical Power & Energy Systems, vol. 96, pp. 1-10, Mar. 2018. [Baidu Scholar] 

5

W. Mo, Y. Chen, Y. Liu et al., “Analysis and measures of ultralow-frequency oscillations in a large-scale hydropower transmission system,” IEEE Journal of Emerging & Selected Topics in Power Electronics, vol. 6, no. 3, pp. 1077-1085, Sept. 2018. [Baidu Scholar] 

6

H. Pico, J. D. McCalley, A. Angel et al., “Analysis of very low frequency oscillations in hydro-dominant power systems using multi-unit modeling,” IEEE Transactions on Power Systems, vol. 27, no. 4, pp. 1906-1915, Nov. 2012. [Baidu Scholar] 

7

R. Xie, I. Innocent, D. Rimorov et al., “Fundamental study of common mode small-signal frequency oscillations in power systems,” International Journal of Electrical Power & Energy Systems, vol. 106, pp. 201-209, Jan. 2019. [Baidu Scholar] 

8

Y. Chen, Y. Liu, Z. Tang et al., “Analysis of ultra-low frequency oscillation in yunnan asynchronous sending system,” in Proceedings of IEEE PES General Meeting, Chicago, USA, Jul. 2017, pp. 1-5. [Baidu Scholar] 

9

G. Wang, X. Zheng, X. Guo et al., “Mechanism analysis and suppression method of ultra-low-frequency oscillations caused by hydropower units,” International Journal of Electrical Power & Energy Systems, vol. 103, pp. 102-114, Dec. 2018. [Baidu Scholar] 

10

H. N. Villegas, “Electromechanical oscillations in hydro-dominant power systems: an application to the Colombian power system,” M.S. thesis, Department of Electrical and Computer Engineering, Iowa State University, Ames, USA, 2011. [Baidu Scholar] 

11

K. Sebaa, Y. Zhou, Y. Li et al., “Low-frequency oscillation damping control for large-scale power system with simplified virtual synchronous machine,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 6, pp. 1424-1435, Nov. 2021. [Baidu Scholar] 

12

G. Zhang, W. Hu, D. Cao et al., “Deep reinforcement learning based approach for proportional resonance power system stabilizer to prevent ultra-low-frequency oscillations,” IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 5260-5272, Nov. 2020. [Baidu Scholar] 

13

R. Grondin, I. Kamwa, L. Soulieres et al., “An approach to PSS design for transient stability improvement through supplementary damping of the common low-frequency,” IEEE Transactions on Power Systems, vol. 8, no. 3, pp. 954-963, Aug. 1993. [Baidu Scholar] 

14

P. Puri and S. Ghosh, “A hybrid optimization approach for PI controller tuning based on gain and phase margin specifications,” Swarm & Evolutionary Computation, vol. 8, pp. 69-78, Feb. 2013. [Baidu Scholar] 

15

G. Chen, F. Tang, H. Shi et al., “Optimization strategy of hydro-governors for eliminating ultralow-frequency oscillations in hydro-dominant power systems,” IEEE Journal of Emerging & Selected Topics in Power Electronics, vol. 6, no. 3, pp. 1086-1094, Sept. 2018. [Baidu Scholar] 

16

L. Chen, X. Lu, Y. Min et al., “Optimization of governor parameters to prevent frequency oscillations in power systems,” IEEE Transactions on Power Systems, vol. 33, no. 4, pp. 4466-4474, Jul. 2018. [Baidu Scholar] 

17

D. J. White and G. Anandalingam, “A penalty function approach for solving bi-level linear programs,” Journal of Global Optimization, vol. 3, no. 4, pp. 397-419, Jul. 1993. [Baidu Scholar] 

18

D. Bertsimas, E. Litvinov, X. Sun et al., “Adaptive robust optimization for the security constrained unit commitment problem,” IEEE Transactions on Power Systems, vol. 28, no. 1, pp. 52-63, Feb. 2013. [Baidu Scholar] 

19

Y. Du, L. Jiang, Y. Li et al., “A robust optimization approach for demand side scheduling considering uncertainty of manually operated appliances,” IEEE Transactions on Smart Grid, vol. 9, no. 2, pp. 743-755, Mar. 2018. [Baidu Scholar] 

20

S. Mei, W. Guo, Y. Wang et al., “A game model for robust optimization of power systems and its application,” Proceedings of the CSEE, vol. 33, no. 19, pp. 47-56, Jul. 2013. [Baidu Scholar] 

21

M. Netto and L. Mili, “Robust data filtering for estimating electromechanical modes of oscillation via the multichannel prony method,” IEEE Transactions on Power Systems, vol. 33, no. 4, pp. 4134-4143, Jul. 2018. [Baidu Scholar] 

22

G. Zhang, W. Hu, D. Cao et al., “A novel deep reinforcement learning enabled sparsity promoting adaptive control method to improve the stability of power systems with wind energy penetration,” Renewable Energy, vol. 178, pp. 363-376, Nov. 2021. [Baidu Scholar] 

23

L. Bo, L. Han, C. Xiang et al., “A Q-learning fuzzy inference system based online energy management strategy for off-road hybrid electric vehicles,” Energy, vol. 252, pp. 1-11. Aug. 2022. [Baidu Scholar] 

24

Z. Yan and Y. Xu, “Data-driven load frequency control for stochastic power systems: a deep reinforcement learning method with continuous action search,” IEEE Transactions on Power Systems, vol. 34, no. 2, pp. 1653-1656, Mar. 2019. [Baidu Scholar] 

25

J. Duan, D. Shi, R. Diao et al., “Deep-reinforcement-learning-based autonomous voltage control for power grid operations,” IEEE Transactions on Power Systems, vol. 35, no. 1, pp. 814-817, Jan. 2020. [Baidu Scholar] 

26

Y. Li, W. Gao, S. Huang et al., “Data-driven optimal control strategy for virtual synchronous generator via deep reinforcement learning approach,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 4, pp. 919-929, Jul. 2021. [Baidu Scholar] 

27

G. Zhang, W. Hu, D. Cao et al., “A data-driven approach for designing STATCOM additional damping controller for wind farms,” International Journal of Electrical Power & Energy Systems, vol. 117, p. 105620, May 2020. [Baidu Scholar] 

28

J. B. Park, K. S. Lee, J. R. Shin et al., “A particle swarm optimization for economic dispatch with non-smooth cost functions,” IEEE Transactions on Power Systems, vol. 20, no. 1, pp. 34-42, Feb. 2005. [Baidu Scholar] 

29

H. Chen and Z. Xu, “Comparison of mathematical models for transient stability calculation in PSASP and PSS/E and corresponding calculation results,” Power System Technology, vol. 28, no. 5, pp. 1-12, Mar. 2004. [Baidu Scholar] 

30

J. G. Ziegler and N. B. Nichols, “Optimum setting for automatic controllers,” Journal of Dynamic Systems Measurement and Control, vol. 64, pp. 759-768. Nov. 1942. [Baidu Scholar]