Abstract
With the large-scale distributed generations (DGs) being connected to distribution network (DN), the traditional day-ahead reconfiguration methods based on physical models are challenged to maintain the robustness and avoid voltage off-limits. To address these problems, this paper develops a deep reinforcement learning method for the sequential reconfiguration with soft open points (SOPs) based on real-time data. A state-based decision model is first proposed by constructing a Marko decision process-based reconfiguration and SOP joint optimization model so that the decisions can be achieved in milliseconds. Then, a deep reinforcement learning joint framework including branching double deep Q network (BDDQN) and multi-policy soft actor-critic (MPSAC) is proposed, which has significantly improved the learning efficiency of the decision model in multi-dimensional mixed-integer action space. And the influence of DG and load uncertainty on control results has been minimized by using the real-time status of the DN to make control decisions. The numerical simulations on the IEEE 34-bus and 123-bus systems demonstrate that the proposed method can effectively reduce the operation cost and solve the overvoltage problem caused by high ratio of photovoltaic (PV) integration.
Index of phases A, B, and C
Indexes of node
Index of branch from node i to node j
o Index of soft open point (SOP)
s Index of state
t Index of time
Set of SOPs
A Set of actions
B Set of branches
P Set of transition probabilities
R Set of rewards
S Set of states
Set of periods
Optimization period
Number of switching actions during period t
Temperature coefficient
, Parameters of Q target and Q network
, Parameters of critic target and critic network
Parameters of policy network o
Attenuation factor
A large number
Penalty factor (Boolean variable)
Number of samples
Electricity price during period t
Per switch action cost
The maximum number of switches in loop l
The maximum current of branch k for phase
L Total number of loops
N Total number of buses
Total number of SOPs
, The maximum active and reactive output power of distributed generation (DG) connected to bus i for phase
, Resistance and reactance
Capacity limit of SOP
, The maximum and minimum voltages of bus i for phase
Total number of switches
Binary variable representing the opening action of branch ij during period t
Binary variable representing the closing action of branch ij during period t
Sequential network reconfiguration (SNR) action during period t
SOP control strategy during period t
Joint control strategy of SNR and SOP during period t
SNR action on loop l during period t
Current
, Active and reactive power of SOP
, Active and reactive power
, Active and reactive injection power
, Active and reactive output power of generator connected to the node
, Active and reactive output power of DG connected to node
, Active and reactive demand power
, Three-phase active and reactive injection power
Power factor of DG
Bus injection power
Bus injection power during period
Voltage
Binary variable denoting state
DISTRIBUTION network reconfiguration (DNR) is an effective way to optimize distribution network (DN) operation. DNR optimizes the operation state of the DN by controlling the sectional switch or tie switch and ensures that the optimization results satisfy the operational constraints [
Considering the factors such as switching cost and surge current of closing loop, it is impossible for the tie switch to be disconnected frequently. Therefore, the traditional DNR is difficult to realize real-time topology adjustment. However, the soft open points (SOPs) can change the transmission power in real time, adjust the operating status, and realize the flexible interconnection between feeders [
The SP is a typical mixed-integer nonlinear programming, and the main solution methods include meta-heuristic algorithm (MHA) and mixed-integer programming (MIP). MHA is the product of the combination of random algorithm and local search algorithm, i.e., particle swarm optimizatio
Recently, with the development of artificial intelligence technology, power system dispatching methods based on historical data and deep reinforcement learning (DRL) have attracted researchers’ attention [
To address the challenges mentioned earlier, a DRL method for interaction with DN is proposed to solve the SP problem, which formulates the SP as a decision-making problem with multi-dimensional action space to minimize the operation cost. The SP model is first converted to an SP based on the Marko decision process (SP-MDP) model to construct a real-time decision model. The bus injection power is used as the state quantity, and the SP optimization strategy is taken as the action quantity. Then, a DRL framework including branching double deep Q network (BDDQN) and multi-policy soft actor-critic (MPSAC) algorithm is proposed to learn the SP control strategy with the SP-MDP model. Furthermore, the proposed method is evaluated on the IEEE 34-bus system and IEEE 123-bus system with high photovoltaic (PV) penetration. Numerical study results show that the proposed DRL method can successfully learn the SP control strategy and reduce system operation cost.
The significant contributions of this paper are listed below.
1) A DRL-based SNR and SOP joint optimization method is proposed, which constructs the state-based SP decision model with MDP theory, obtains decision results in milliseconds, and improves system operation economics compared with DNR.
2) A DRL framework is proposed including BDDQN for learning reconfiguration strategy by multi-dimensional action-value function and MPSAC for learning SOP control strategy through multi-policy network collaboration, which has better learning stability and performance than traditional DRL algorithm.
3) The proposed method uses the pre-trained BDDQN-MPSAC (BD-AC) agent and real-time bus injection power collected by the SCADA system or phase measurement unit (PMU) system to make optimization decisions. Thus, the influence of DG and load uncertainty on SP decision-making has been reduced to the most extent.
The remainder of this paper is organized as follows. The reinforcement learning modeling for SNR is presented in Section II. A DRL-based SP-MDP solution model is formulated in Section III. The case study is presented in Section IV and the conclusions are shown in Section V.
According to different control methods, SOP can be divided into three types: unified power flow controller, static synchronous series compensator, and back-to-back voltage source converter (B2B-VSC). This paper takes B2B-VSC as an example to explain the function and control mode of SOP in the DN, as shown in

Fig. 1 Function and control mode of SOP in DN.
B2B-VSC can precisely regulate the active power transmitted between two feeders and provide reactive power support. The variables for SOPs consist of the three-phase active outputs of the converter VSC1 and three-phase reactive power outputs of two converters. Assume that the active power of the two converters is equal, i.e., the active power output of VSC2 is . The three-phase reactive power output of two converters is not affected by each other due to the DC isolation, so it only needs to satisfy the capacity constraints of each converte
1) Active Power Constraint for SOP
(1) |
2) Capacity Constraints for SOP
(2) |
The operation efficiency of SOPs can reach 98
The objective function of SP is to minimize the operation cost of DN, including the energy loss and switch action cost. Note that is the optimization period, which is equal to 1 hour.
(3) |
The decision variables are network topology and the three-phase control strategy of SOP. While optimizing the objective function, the following constraints need to be met.
1) Power Balance Constraints
The distribution load flow equations [
(4) |
(5) |
(6) |
(7) |
2) Bus Voltage Constraints
(8) |
To ensure the power quality of DN, the bus voltage needs to be limited within a safe range.
3) Branch Power and Current Constraints
The branch power and current need to be limited within a safe range during DNR.
(9) |
4) Switch Action Constraints
The frequent action of a switch will shorten its life span. Therefore, limiting the number of switch actions is necessary to minimize the switching loss while reducing the network loss.
(10) |
5) Topological Constraint
The DN must have radial topology with all the buses energized.
(11) |
6) DG Constraints
The DG operation constraints can be expressed as:
(12) |
Then, combining the objective function (1) and constraints (2)-(12), we propose a state-based SP decision optimization model based on MDP theory.
RL aims to learn the optimal policy through the interaction process between the agent and the environment. An RL problem can be modeled with MDP, which is a standard formalism for solving SDM problems based on Markov process theor
(13) |
The detailed introduction of MDP to solve SDM problems can be found in [
(14) |
Afterward, the mixed-integer action set can be defined as a combination of SNR and SOP control strategies.
(15) |
To accommodate the radial operating characteristics of the DN, the is coded according to the position of action switch in the fundamental loop. Take the modified IEEE 34-bus system in

Fig. 2 Modified IEEE 34-bus system.
If the action switches during period t are s5, s25, and s27, .
Then, are used to represent the control variables of an SOP to ensure that the SOP control strategies satisfy constraints (2) and (3). The relationship between the floating points and the corresponding control strategies can be expressed as:
(16) |
According to the state set and action set , the state transition probability of SP between and can be written as:
(17) |
The goal of MDP is to find a series of optimal strategies that can maximize the cumulative reward , as shown in (18). Note that is the reward during period t, which is modeled based on the objective function (3).
(18) |
To achieve the target of maximizing while minimizing the operation cost, the reward is set as the penalty divided by the operation cost.
(19) |
If satisfies the operation constraints (4)-(12), and ; otherwise, and . For example, after action generated by the agent is transmitted to the environment, it does not satisfy the voltage constraint of the system, then the environment gives the agent a negative reward, i.e., punishment. Conversely, when satisfies the constraint, the environment will give the agent a positive reward. And the smaller the operation cost of , the greater the value of the .
In the state with action , the expectation of can be defined as state-action value . It can be expressed in a recursive form called the Behrman equation.
(20) |
The method of solving SP-MDP is to find a set of optimal SP sequence control strategies to maximize the Q-value. The above process transforms an SP problem into an SP-MDP, whose brief framework is shown in

Fig. 3 MDP framework of sequential DN reconfiguration.
Under the framework of SP-MDP, the agent generates topology and SOP control action according to the state of the DN during time t. And is transmitted to the DN for power flow calculation to get a reward . Then, the above operations are repeated at next time step. Finally, a series of strategies that can maximize are learned through a closed-loop iteration.
However, it is worth noting that in the realistic DN, the change of bus injection power state between adjacent periods is an uncertain random process, which is affected by the weather and the user’s electricity consumption behavior. Thus, it is difficult to give an explicit mathematical expression for the state transition probability in the SNR-MDP. Therefore, the model-free DRL algorithm with neural networks (NNs) is used to solve the SP-MDP model.
In this section, a DRL joint optimization solution method based on double deep Q network (DDQN) and soft actor-critic (SAC) framework is constructed to exploit the advantages of different DRL methods for discrete and continuously variable control. Then, the BDDQN based on the fundamental loop matrix is proposed, converting the reconfiguration decision problem into a multi-dimensional action space decision-making problem. Finally, the MPSAC based on the multi-policy network is proposed to learn three-phase SOP control strategies.
DRL combines deep learning and RL. The perception of deep learning is used to solve the modeling problems of policy and value function. And the decision-making ability of RL is used to define problems and optimize goals. The popular DRL algorithms for solving the MDP problem are DDQN that controls discrete variables and SAC that controls continuous variables. The detailed introduction of DDQN and SAC can refer to [
However, the state and action set will be too large due to many combinations of various control elements in the DN and the strong coupling. The agent cannot perform compelling exploration and training. Therefore, this paper proposes a solving method based on improved DDQN and SAC to realize the optimal joint control of DNR and SOP. In this paper, the proposed method is divided into two stages: offline training and online execution. The joint optimization framework of DRL is shown in

Fig. 4 Joint optimization framework of DRL.
In the offline training stage, BDDQN and MPSAC (BD-AC) agents learn the DN topology and SOP control strategy. Two agents share the reward and cooperate to learn the SP control strategies that maximize the cumulative reward. In the online execution stage, the DRL-based method can make decisions directly according to the real-time DN measurement data [
The DDQN uses two NNs, i.e., Q network and target Q network , to approximate state-action value (20) with the same architecture. Assuming that the data are sampled from experience pool , is input to the Q network, and the action can be selected based on the greedy strategy.
(21) |
where is a random number; is the greedy selection factor. If , is a random action. The action with the largest Q-value is the SNR strategy that can minimize the operation cost.
However, applying DDQN to DNR tasks requires addressing the combined growth of the number of possible actions and the number of action dimensions [
To solve this problem, we propose a BDDQN based on branching dueling Q-network (BDQ) [
The BDQ has the same number of sub-actions for each action dimension. However, the number of switches in each loop usually differs in the DN. Therefore, compared with BDQ, the significant advantage of BDDQN is that the length of the Q-value vector in each dimension can be adjusted adaptively according to the number of switches in each loop.
In

Fig. 5 Structure of Q network.
Moreover, the mean operator [
(22) |
where ; and . Therefore, the element in is the value produced by the switch action in the state . For example, is the action value of branch 16 that disconnects in the IEEE 34-bus system.
Then, the reconfiguration result in for each loop can be selected according to greedy selection.
(23) |
According to (21), BDDQN can select a switch with the maximum value in each row of to constitute a complete reconfiguration strategy. In this way, the original one-dimensional complex decision-making process can be transformed into a multi-dimensional simple decision-making process.
The target Q network is used to evaluate the Q-value of the SP strategy given by the agent. It can be expressed as:
(24) |
Then, the Q target-value and Q-value are input to the loss function to update the Q network parameters.
(25) |
BDDQN agent learns the optimal SNR control strategy by adjusting its Q network parameters and towards minimizing the operation cost objective.
As shown in
Unlike BDDQN, MPSAC algorithm learns policy networks and critic networks. The critic network evaluates SOP control actions generated by the policy networks to minimize the operation cost. Moreover, to improve the exploration efficiency of the algorithm, the optimization objective of the MPSAC is to maximize the sum of cumulative return and action entropy.
(26) |
where is the action-entropy function of ; generates a distribution of SOP control strategies in ; and is sampled from the distribution. The larger the value of , the more random the SOP control action generated by the policy network. can be expressed as [
(27) |
Assume that the SOP control action is Gaussian distributed with mean and covariance , where and are parameterized by the policy network [
(28) |
MPSAC is divided into two parts: policy evaluation and policy improvement. Assuming that the data are sampled from experience pool , in the policy evaluation part, the action value of is evaluated through the target critic network, which can be expressed as:
(29) |
where is equivalent to . Then, to learn the parameters and of the critic networks, and are input to the critic loss function , which can be expressed as:
(30) |
In the policy improvement part, the optimization of policy network parameters is achieved by minimizing the policy loss function , which can be expressed as [
(31) |
MPSAC agent can also learn the optimal SOP control strategy by adjusting its NN parameters , , and .
The above process can establish the BD-AC algorithm in multi-dimensional action space. Furthermore, the optimal SP strategy to reduce operation cost can be found by iterative training NN. The specific flow is summarized in
Algorithm 1 : DRL-based SP control algorithm |
---|
1) Offline training Input: historical dataset, discount factor , batch number |D|, action space dimension Initialize experience pool and , parameters β, θ, and ƞ For each episode, do Initialize sequence For decision time step , do If , then Select a random action and Else
End if Calculate the reward according to (19) Index from the historical dataset of bus injection power Store and in and , respectively Set If and , then Sample pairs of from Calculate and by (22) and (24) Use (25) to calculate Q network loss Update all the parameters β using the Adam [ Every C step rest Sample pairs of from Calculate by (29) Use (30) to calculate critic network loss Update the parameter using the Adam Use (31) to calculate policy network loss Update the parameter ƞ using the Adam Every C step rest End if End For End for 2) Online execution For each day do For do Collect real-time bus injection power in t Output and Use (16) and to generate the decision result Execute SOP and topology control strategy End for End for |
In summary, the difference between the proposed method and the traditional SNR method is as follows.
The traditional SNR method requires an optimization algorithm to find an offline solution to optimize the objective function value. Moreover, the traditional SP method uses the predicted value of load and DG output to obtain the SP solution, that is why it requires considering the uncertainty of DG output and load.
However, after the BD-AC agent is trained off-line, the proposed method has learned the mapping relationship from the historical data. And in the online execution stage, the real-time bus injection power collected by the SCADA system or PMU system [
In this section, to verify the performance of the proposed SP-MDP method, comprehensive case studies on IEEE standard test systems are conducted. The experimental data and the algorithm setup are first presented. Then, the IEEE 34-bus system is used to verify the superiority of BD-AC. Furthermore, the IEEE 123-bus syste
In this paper, load and DG data from 2012 [
Method | Hyperparameter | IEEE 34-bus system | IEEE 123-bus system |
---|---|---|---|
BDDQN | Minibatch size | 32 | 128 |
Discount factor | 0.99 | 0.99 | |
Learning rate | |||
Number of hidden units | 64 | 128 | |
MPSAC | Minibatch size | 128 | 256 |
Discount factor | 0.99 | 0.99 | |
Learning rate | |||
Number of hidden units | 128 | 256 |
The IEEE 34-bus system is shown in
Bus number | Capacity (kVA) | Power factor | DG type | ||
---|---|---|---|---|---|
Phase A | Phase B | Phase C | |||
5 | 250 | 250 | 250 | 0.90 | Wind power generation |
18 | 250 | 250 | 0 | 0.95 | Solar power generation |
22 | 200 | 0 | 0 | 0.90 | Wind power generation |
During the training process of the BD-AC, we record the weights of the NN every 50 epochs, which are used to evaluate the performance of the method on the testing data. The cumulative operation cost for different methods is shown in

Fig. 6 Cumulative operation cost for different methods.
In
Then, two scenarios are compared in detail to further illustrate the superiority of the proposed method.
Scenario 1: the unbalanced optimal operation with static DNR and SOP control.
Scenario 2: the unbalanced optimal operation with SP.
In

Fig. 7 Operation cost of different scenarios.

Fig. 8 Action value matrix of switch.
It can be observed from
Moreover, to verify the adaptation of the proposed method against the load power mutation, the following three cases based on the testing week data are considered.
Case 1: the load demand and DG output are reduced by 15% and increased by 15%, respectively.
Case 2: the DG output is increased by 15%.
Case 3: the load demand and DG output are increased by 15% and reduced by 15%, respectively.
It can be observed from
Case | Energy loss (kWh) | Operation cost ($) | The maximal bus voltage deviation (p.u.) | Operation cost reduction (%) |
---|---|---|---|---|
Case 1 |
8.05×1 |
1.04×1 | 0.054 | 15.18 |
Case 2 |
9.21×1 |
1.20×1 | 0.059 | 15.39 |
Case 3 |
1.12×1 |
1.45×1 | 0.064 | 16.01 |
Method | Time and corresponding open switch | Switch action number | Energy loss (kWh) | Operation cost ($) | Operation cost reduction (%) |
---|---|---|---|---|---|
Static DNR with SOP |
00:00-06:00: 7, 25, 30 07:00-09:00: 7, 25, 17 10:00-14:00: 7, 27, 17 15:00-16:00: 6, 25, 17 17:00-18:00: 7, 25, 17 19:00:23:00: 7, 25, 30 | 24 | 213.14 | ||
BD-AC | 00:00-23:00: 7, 25, 30 | 6 | 183.21 | 15.28 | |
MILP |
01:00-09:00: 7, 25, 30 10:00-24:00: 7, 25, 17 | 8 | 181.97 | 16.20 | |
HFWA [ |
01:00-09:00: 7, 27, 30 10:00-15:00: 7, 27, 17 16:00-24:00: 7, 13, 30 | 12 | 190.42 | 13.38 |
It can be observed from
To illustrate the superiority of DRL-based method in computing efficiency, the comparison of computational efficiency of different methods is shown in
Type | Method | Value |
---|---|---|
Training time | DQN-SAC | 10.36 hours |
BD-AC | 12.57 hours | |
Testing time | DQN-SAC | 2.41 ms |
BD-AC | 2.60 ms | |
HFWA | 171.81 s | |
MILP | 251.57 s |
In terms of training time, the proposed BD-AC increases the branching structure of the Q network and has multiple strategic networks. Therefore, the training time is longer than the traditional DQN-SAC algorithm. However, as shown in
According to the above results, it can be concluded that the proposed method can deal with the SP problem efficiently and accurately in the IEEE 34-bus system.
In this subsection, the modified IEEE 123-bus system in

Fig. 9 Modified IEEE 123-bus system.
Location | Phase | Capacity (kVA) | Location | Phase | Capacity (kVA) |
---|---|---|---|---|---|
15 | A | 355 | 117 | B | 266 |
99 | C | 355 | 118 | B | 533 |
111 | C | 533 | 65 | C | 355 |
113 | C | 266 | 123 | B | 266 |
57 | B | 533 | 126 | A | 266 |
In this subsection, four scenarios are compared in detail.
Scenario 1: the initial operation state without optimization.
Scenario 2: the unbalanced optimal operation with DNR.
Scenario 3: the unbalanced optimal operation with static DNR and SOP control.
Scenario 4: the unbalanced optimal operation with SP.
We compare the optimization results of different scenarios in detail on the testing week, as shown in
Scenario | Energy loss (kW) | Switch action number | Operation cost ($) | Overvoltage rate (%) |
---|---|---|---|---|
1 |
3.22×1 | 0 |
4.18×1 | 15.48 |
2 |
2.74×1 | 478 |
4.58×1 | 6.55 |
3 |
2.17×1 | 470 |
3.71×1 | 0 |
4 |
2.28×1 | 32 |
3.03×1 | 0 |
It can be observed from
Then, we select the day with the highest penetration of PVs in historical data to verify the ability of BD-AC to alleviate system overvoltage. The daily operation curves of three-phase total load and PV power are shown in

Fig. 10 Daily operation curves of three-phase total load and PV power.

Fig. 11 The maximum bus voltage deviation for different scenarios.
As shown in Figs.

Fig. 12 Q-value outpout of switch action for test day. (a) Loop 1. (b) Loop 2.

Fig. 13 Action distribution of SOP in the 1
In
As shown in Figs.

Fig. 14 Three-phase power transmission for SOP in scenario 4. (a) Active power of SOP1. (b) Reactive power of SOP1. (c) Active power of SOP2. (d) Reactive power of SOP2.
As shown in

Fig. 15 Network loss and operation cost of different scenarios. (a) Network loss. (b) Operation cost.
It can be observed from
The topology reverts to the disconnection state of branches 17-18 and 48-56. Therefore, we can conclude that the proposed method adjusts the control strategy according to the changes of system load and PV power, ensuring the economy and safety of system operation.
According to the above results, it can be concluded that the proposed method can reduce the operation cost of the complex three-phase unbalanced system with high PV penetration and avoid system overvoltage.
This paper proposes a novel unbalanced DNR and SOP joint optimization control method, which translates SP into SP-MDP model based on MDP theory. Considering the large space of optimization decisions for three-phase unbalanced system, the BDDQN and MPSAC algorithms are developed based on the structural characteristics of DN. Furthermore, a DRL optimization method based on BDDQN and MPSAC combines the real-time system state to obtain the topology control strategy. Comprehensive test results on two unbalanced DNs show that the proposed BD-AC agent can effectively learn the reconfiguration and SOP joint control policy. Moreover, the data-driven SP method also reduces the operation cost of the DN, and relieves the problem of overvoltage, which has a much lower computation time than the model-based method.
With the extensive application of SOP in the future, SOP inevitably produces losses. Moreover, in practice, the line parameters of the DN are difficult to be determined accurately. Therefore, considering the SOP loss and the uncertainty of line parameters, it is our future research focus to propose a more accurate and robust DRL-based optimization method.
Reference
Y. Gao, W. Wang, J. Shi et al., “Batch-constrained reinforcement learning for dynamic distribution network reconfiguration,” IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 5357-5369, Nov. 2020. [Baidu Scholar]
M. Naguib, W. A. Omran, and H. E. A. Talaat, “Performance enhancement of distribution systems via distribution network reconfiguration and distributed generator allocation considering uncertain environment,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 3, pp. 647-655, May 2022. [Baidu Scholar]
A. M. Eldurssi and R. M. O’Connell, “A fast nondominated sorting guided genetic algorithm for multi-objective power distribution system reconfiguration problem,” IEEE Transactions on Power Systems, vol. 30, no. 2, pp. 593-601, Mar. 2015. [Baidu Scholar]
F. Keynia, S. Esmaeili, and F. Sayadi, “Feeder reconfiguration and capacitor allocation in the presence of non-linear loads using new P-PSO algorithm,” IET Generation, Transmission & Distribution, vol. 10, no. 10, pp. 2316-2326, Jul. 2016. [Baidu Scholar]
X. Ji, Q. Liu, Y. Yu et al., “Distribution network reconfiguration based on vector shift operation,” IET Generation, Transmission & Distribution, vol. 12, no. 13, pp. 3339-3345, Jul. 2018. [Baidu Scholar]
S. Zhang and M. Sridharan, “A survey of knowledge-based sequential decision making under uncertainty,” AI Magazine, vol. 43, no. 2, pp. 1-6, Jun. 2022. [Baidu Scholar]
Q. Zhang, Y. Kang, Y. Zhao et al., “Traded control of human-machine systems for sequential decision-making based on reinforcement learning,” IEEE Transactions on Artificial Intelligence, vol. 3, no. 4, pp. 553-566, Aug. 2022. [Baidu Scholar]
L. Bai, T. Jiang, F. Li et al., “Distributed energy storage planning in soft open point based active distribution networks incorporating network reconfiguration and DG reactive power capability,” Applied Energy, vol. 210, pp. 1082-1091, Jan. 2018. [Baidu Scholar]
R. You and X. Lu, “Voltage unbalance compensation in distribution feeders using soft open points,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 4, pp. 1000-1008, Jul. 2022. [Baidu Scholar]
X. Dong, Z. Wu, G. Song et al., “A hybrid optimization algorithm for distribution network coordinated operation with SNOP based on simulated annealing and conic programming,” in Proceedings of 2016 IEEE PES General Meeting (PESGM), Boston, USA, Jul. 2016, pp. 1-5. [Baidu Scholar]
M. B. Shafik, H. Chen, G. I. Rashed et al., “Adequate topology for efficient energy resources utilization of active distribution networks equipped with soft open points,” IEEE Access, vol. 7, pp. 99003-99016, Jun. 2019. [Baidu Scholar]
M. B. Shafik, G. I. Rashed, H. Chen et al., “Reconfiguration strategy for active distribution networks with soft open points,” in Proceedings of 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, Jun. 2019, pp. 330-334. [Baidu Scholar]
V. B. Pamshetti, S. Singh, and S. P. Singh, “Reduction of energy demand via conservation voltage reduction considering network reconfiguration and soft open point,” International Transactions on Electrical Energy Systems, vol. 30, no. 1, pp. 1-8, Jan. 2020. [Baidu Scholar]
I. Diaaeldin, S. Abdel Aleem, A. El-Rafei et al., “Optimal network reconfiguration in active distribution networks with soft open points and distributed generation,” Energies, vol. 12, no. 21, p. 4172, Nov. 2019. [Baidu Scholar]
I. Sarantakos, N.-M. Zografou-Barredo, D. Huo et al., “A reliability-based method to quantify the capacity value of soft open points in distribution networks,” IEEE Transactions on Power Systems, vol. 36, no. 6, pp. 5032-5043, Nov. 2021. [Baidu Scholar]
H. Ji, C. Wang, P. Li et al., “An enhanced SOCP-based method for feeder load balancing using the multi-terminal soft open point in active distribution networks,” Applied Energy, vol. 208, pp. 986-995, Dec. 2017. [Baidu Scholar]
T. Ding, Z. Wang, W. Jia et al., “Multiperiod distribution system restoration with routing repair crews, mobile electric vehicles, and soft-open-point networked microgrids,” IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 4795-4808, Nov. 2020. [Baidu Scholar]
L. H. Macedo, J. F. Franco, M. J. Rider et al., “Optimal operation of distribution networks considering energy storage devices,” IEEE Transactions on Smart Grid, vol. 6, no. 6, pp. 2825-2836, Nov. 2015. [Baidu Scholar]
J. Duan, D. Shi, R. Diao et al., “Deep-reinforcement-learning-based autonomous voltage control for power grid operations,” IEEE Transactions on Power Systems, vol. 35, no. 1, pp. 814-817, Jan. 2020. [Baidu Scholar]
D. Zhang, X. Han, and C. Deng, “Review on the research and practice of deep learning and reinforcement learning in smart grids,” CSEE Journal of Power and Energy Systems, vol. 4, no. 3, pp. 362-370, Sept. 2018. [Baidu Scholar]
Z. Yan and Y. Xu, “Data-driven load frequency control for stochastic power systems: a deep reinforcement learning method with continuous action search,” IEEE Transactions on Power Systems, vol. 34, no. 2, pp. 1653-1656, Mar. 2019. [Baidu Scholar]
J. Jin and Y. Xu, “Optimal policy characterization enhanced actor-critic approach for electric vehicle charging scheduling in a power distribution network,” IEEE Transactions on Smart Grid, vol. 12, no. 2, pp. 1416-1428, Mar. 2021. [Baidu Scholar]
C. Wang, S. Lei, P. Ju et al., “MDP-based distribution network reconfiguration with renewable distributed generation: approximate dynamic programming approach,” IEEE Transactions on Smart Grid, vol. 11, no. 4, pp. 3620-3631, Jul. 2020. [Baidu Scholar]
P. Li, H. Ji, C. Wang et al., “Optimal operation of soft open points in active distribution networks under three-phase unbalanced conditions,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 380-391, Jan. 2019. [Baidu Scholar]
X. Jiang, Y. Zhou, W. Ming et al., “An overview of soft open points in electricity distribution networks,” IEEE Transactions on Smart Grid, vol. 13, no. 3, pp. 1899-1910, May 2022. [Baidu Scholar]
G. Carpinelli, G. Celli, S. Mocci et al., “Optimal integration of distributed energy storage devices in smart grids,” IEEE Transactions on Smart Grid, vol. 4, no. 2, pp. 985-995, Jun. 2013. [Baidu Scholar]
T. Yang, Y. Guo, L. Deng et al., “A linear branch flow model for radial distribution networks and its application to reactive power optimization and network reconfiguration,” IEEE Transactions on Smart Grid, vol. 12, no. 3, pp. 2027-2036, May 2021. [Baidu Scholar]
M. Van Otterlo and M. Wiering, “Reinforcement learning and Markov decision processes,” in Reinforcement Learning. New York: Springer, 2012, pp. 3-42. [Baidu Scholar]
S. Wang, L. Du, X. Fan et al., “Deep reinforcement scheduling of energy storage systems for real-time voltage regulation in unbalanced LV networks with high PV penetration,” IEEE Transactions on Sustainable Energy, vol. 12, no. 4, pp. 2342-2352, Oct. 2021. [Baidu Scholar]
Y. Du and F. Li, “Intelligent multi-microgrid energy management based on deep neural network and model-free reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1066-1076, Mar. 2020. [Baidu Scholar]
S. Wang, R. Diao, C. Xu et al., “On multi-event co-calibration of dynamic model parameters using soft actor-critic,” IEEE Transactions on Power Systems, vol. 36, no. 1, pp. 521-524, Jan. 2021. [Baidu Scholar]
Q. Huang, X. Xu, F. Blaabjerg et al., “Deep reinforcement learning based approach for optimal power flow of distribution networks embedded with renewable energy and storage devices,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 5, pp. 1101-1110, Sept. 2021. [Baidu Scholar]
A. Tavakoli, F. Pardo, and P. Kormushev, “Action branching architectures for deep reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, USA, Jul. 2018, pp. 4131-4138. [Baidu Scholar]
IEEE. (2021, Jun.). Resources | PES test feeder. [Online]. Available: https://site.ieee.org/pes-testfeeders/resources/ [Baidu Scholar]
X. Ji, Z. Yin, Y. Zhang et al., “Real-time robust forecasting-aided state estimation of power system based on data-driven models,” International Journal of Electrical Power & Energy Systems, vol. 125, pp. 1-11, Feb. 2021. [Baidu Scholar]
T. Hong, P. Pinson, S. Fan et al., “Probabilistic energy forecasting: global energy forecasting competition 2014 and beyond,” International Journal of Forecasting, vol. 32, no. 3, pp. 896-913, Jul. 2016. [Baidu Scholar]
Y. Zhang, X. Ji, J. Xu et al., “Dynamic reconfiguration of distribution network based on temporal constrained hierarchical clustering and fireworks algorithm,” in Proceedings of 2020 IEEE/IAS Industrial and Commercial Power System Asia, Weihai, China, Jul. 2020, pp. 1702-1708. [Baidu Scholar]
H. Zhai, M. Yang, B. Chen et al., “Dynamic reconfiguration of three-phase unbalanced distribution networks,” International Journal of Electrical Power & Energy Systems, vol. 99, pp. 1-10, Jul. 2018. [Baidu Scholar]
J. Xiao, Y. Li, X. Qiao et al., “Enhancing hosting capacity of uncertain and correlated wind power in distribution network with ANM strategies,” IEEE Access, vol. 8, pp. 189115-189128, Jan. 2020. [Baidu Scholar]