Abstract
With the booming of electric vehicles (EVs) across the world, their increasing charging demands pose challenges to urban distribution networks. Particularly, due to the further implementation of time-of-use prices, the charging behaviors of household EVs are concentrated on low-cost periods, thus generating new load peaks and affecting the secure operation of the medium- and low-voltage grids. This problem is particularly acute in many old communities with relatively poor electricity infrastructure. In this paper, a novel two-stage charging scheduling scheme based on deep reinforcement learning is proposed to improve the power quality and achieve optimal charging scheduling of household EVs simultaneously in active distribution network (ADN) during valley period. In the first stage, the optimal charging profiles of charging stations are determined by solving the optimal power flow with the objective of eliminating peak-valley load differences. In the second stage, an intelligent agent based on proximal policy optimization algorithm is developed to dispatch the household EVs sequentially within the low-cost period considering their discrete nature of arrival. Through powerful approximation of neural network, the challenge of imperfect knowledge is tackled effectively during the charging scheduling process. Finally, numerical results demonstrate that the proposed scheme exhibits great improvement in relieving peak-valley differences as well as improving voltage quality in the ADN.
IN recent years, electric vehicles (EVs) are widely used to reduce air pollution and emissions of greenhouse gases with the appeal for sustainable development goals [
EVs are regarded as ideal alternatives to provide various regulation services in the ADN [
Private charging is currently the dominant charging mode for household EVs in many countries [
Apart from the above regulation obstacles under TOU prices, previous approaches to solve the charging scheduling problem are not suitable for household EVs with private charging piles, accounting for their sequential arrivals and uncertain charging demands. For instance, the offline and online scheduling algorithms are proposed in [
Under these circumstances, the charging process of the household EV can only be regarded as an uninterruptable process and the charging demands cannot be obtained in advance. The charging scheduling problem of household EVs can be formulated as a Markov decision process (MDP) [
With the rapid development of deep learning (DL) and reinforcement learning (RL), deep reinforcement learning (DRL), which combines both advantages of DL and RL, is proposed to overcome the dimensional curse and solve the MDP problem with continuous action spaces [
In the field of charging scheduling problems, DRL has been implemented in various optimizations. Reference [
To address the above problems and take full use of substantial household EVs during valley period under TOU prices, this paper proposes a two-stage charging scheduling scheme for household EVs in the ADN. In the first stage, to relieve the power congestions and shorten the peak-valley differences, the optimal power flow (OPF) of the ADN is solved to determine the optimal charging profiles of CSs during the valley period. In the second stage, DRL based on proximal policy optimization (PPO) algorithm is employed to dispatch the household EVs sequentially within the low-cost period according to the optimal charging profiles. PPO algorithm was proposed by OpenAI in 2017 [
The main contributions of this paper are as follows.
1) A two-stage charging scheduling scheme for household EVs is proposed to improve the power quality of the ADN and achieve the optimal charging scheduling of EVs simultaneously during the valley period, which consists of the OPF of the ADN and charging dispatch of EVs. On this basis, the contributions of household EVs to power congestion management and peak-valley difference elimination are further exploited.
2) The realistic characteristics of household EVs are taken into consideration, including the limited controllability and uncertain charging demands. The charging process of EV is regarded as an uninterruptable procedure with constant power, and the charging scheduling process is modelled as a sequential MDP problem, thereby the owner can make a charging reservation to achieve charging scheduling without extra equipment investments.
3) The intelligent DRL agent based on the PPO algorithm is developed to schedule the charging process of EVs. Through the remarkable approximation function of the neural network, the agent can accumulate rich experience when interacting with various environments repeatedly to break the limitations on imperfect information. Hence, numerous household EVs are dispatched effectively to formulate the optimal charging profile even when lacking full knowledge of charging demands in advance.
The remainder of this paper is organized as follows. Section II establishes the two-stage charging scheduling scheme of household EVs. Section III introduces the MDP model of EV charging scheduling and the intelligent DRL agent based on the PPO algorithm. Case studies are conducted in Section IV using the real-world data of residential and EV loads, which proves the effectiveness of the proposed scheme. Section V concludes the remarks of this paper.
In this section, an overview of the charging scheduling scheme is introduced first to illustrate the coordination between the problems in the two stages. Then, the first-stage problem which considers the mutual impacts of different nodes is put forward to determine the optimal operation of the ADN with household EVs. On the basis of the optimal charging profiles provided by the first-stage problem, the detailed sequential decision problem in the second stage is formulated to describe the charging scheduling process.
As important flexible loads of the ADN, household EVs are not fully exploited for further regulation potential under TOU prices. Generally, most charging durations of household EVs are much shorter than their sojourn time [
At the same time, the ADN is suffering from power quality issues including dramatic peak-valley differences, power congestions, and voltage limit violations. Consequently, the managers of the ADN, i.e., distribution network operator (DSO) and energy supplier, are motivated to further dispatch household EVs to improve the power quality under TOU prices without extra equipment investments and charging costs, even earning profits through delivering ancillary services for power systems. Apart from the DSOs, estates or community administrators are also encouraged to implement such a charging scheduling, so as to satisfy increasing charging demands accounting for the limited carrying capacities of ADNs.
The schematic diagram of two-stage charging scheduling scheme of household EVs is shown in

Fig. 1 Schematic diagram of two-stage charging scheduling scheme of household EVs.
In the first stage, determining the optimal charging profiles of CSs is the key point. Because of the various operating characteristics of different ADNs, it is of great importance for the DSO to choose favorable optimization objectives at first. In this paper, charging scheduling of household EVs is employed to flatten the tie-line power to provide ancillary services to power systems. Considering the mutual impacts between different nodes, the optimal charging profiles of CSs are not appropriate to be determined simply according to their electricity sectors. Therefore, the OPF algorithm is used to solve the problem with regard to the secure and stable operation. The optimal charging power is calculated with the goal of shortening the peak-valley differences, based on historical and forecasted load data during the valley period.
In the second stage, to overcome the obstacle of limited charging information, the aggregator control center based on DRL is used to make decisions with imperfect knowledge and dispatch the charging processes of EVs in terms of the determined charging profile. Before the valley period, when the
The first-stage problem aims to involve EVs participating in shortening peak-valley differences and managing congestions of the ADN. Considering the fact that an ADN typically features radial topology, as shown in
(1) |
(2) |
(3) |
(4) |

Fig. 2 Diagram of ADN with radial topology.
where and are the active and reactive power flows from node to node , respectively; and are the active and reactive power demands at node , respectively, which are determined by the load demands (with superscript D) and generator outputs (with superscript g); Vi is the voltage at node ; and and are the resistance and reactance of the branch from node to node , respectively.
The DistFlow equations above are nonlinear and difficult to solve. Ignoring network losses, the DistFlow equations can be converted to linearized power flow equations as (5), which have been widely used in distribution network analysis [
(5) |
The OPF model proposed in this subsection aims to flatten the tie-line power, as well as maintain the node voltage within the acceptable range. The objective tie-line power should be determined in advance and power profiles of all nodes can be calculated using the OPF model with the goal of minimizing differences between real tie-line power and objective power. Considering the limited penetrations of EVs in the ADN at present, it is difficult to eliminate the peak-valley differences completely without abundant regulation capacity. Hence, the objective tie-line power which is related to residential consumption and the charging electricity can be computed as:
(6) |
(7) |
(8) |
where is the objective tie-line power of the ADN at time ; and are the power of EVs and residential loads at time , respectively; is the maximum power of residential loads during the valley period; is the total electricity consumption of EVs; is the electricity deviation between the residential loads and that of the maximum power; and and are the start time and end time of the valley period, respectively.
Therefore, the objective function of the OPF model can be represented by:
(9) |
where is the real tie-line power at time . The OPF problem aims to optimize the power profiles of all nodes to minimize the differences between and .
Assume and represent the set of EV nodes and the set of residential nodes, respectively. Considering the continuous characteristic of the charging process, it is difficult to regulate the charging power of CS dramatically in a short period, thereby the ramp rate of the CS needs to be limited within .
(10) |
In addition, the constraints of the ADN mainly include the nodal voltage and feeder ampacity as shown in (11) and (12), respectively.
(11) |
(12) |
where and are the minimum and maximum nodal voltages at node i, respectively; and and are the minimum and maximum ampacities of the branch from node to node , respectively.
After calculating the OPF of the ADN, the optimal charging profiles of CSs are determined. Then, the agent based on DRL will dispatch EVs to approach the optimal charging power.
The charging process of EVs can be divided into three parts, which are trickle charging, constant current charging, and constant voltage charging, where the constant current charging process accounts for 80% duration and has relatively constant power [
(13) |
where is the arrival time of the
EVs arrive sequentially and the specific charging demands can only be obtained precisely when an EV is plugged in. The aggregator control center aims to transfer the charging demands to formulate a redistribution scheme of charging demands based on the objective charging power. Through the charging scheduling of EVs, not only power congestions at the prophase of valley period can be alleviated, but also the ancillary service for shortening the peak-valley differences can be delivered to power systems.
EVs can be divided into adjustable groups and non-adjustable groups. The non-adjustable EV, whose charging time duration is longer than its sojourn time, will not be regulated. The start charging time of non-adjustable EVs needs to be set as their arrival time to satisfy charging demands and there is no need to involve them in the proposed charging scheduling. Therefore, the following charging scheduling focuses on adjustable EVs. When dispatching EVs to formulate the optimal charging profile, the charging demands can be described using a rectangle as demonstrated in

Fig. 3 Diagram of charging scheduling process.
Denoting as the optimized charging start time of the
(14) |
(15) |
where is the rated battery capacity of the
These adjustable EVs have already decided to charge during the valley period with lower electricity price though they have arrived earlier, and most of them instinctively choose to start charging at the beginning of valley period due to the lack of effective guidance. Therefore, the range of actionable space is set from to , which aims to determine their optimal charging periods. Besides, to satisfy the charging demands and save charging costs of EVs, is also constrained as:
(16) |
After dispatching the
(17) |
where is the scheduled power of the first EVs at time .
In this section, the MDP model of EV charging scheduling is developed at first. Then, the intelligent DRL agent based on the PPO algorithm is introduced, followed by the training workflow of the PPO algorithm.
The charging scheduling of household EVs can be modelled as an MDP due to the discrete arrivals of EVs and the randomness of charging demands, which can be appropriately solved using the DRL algorithm. An MDP can be represented as a tuple [
1) The state space observed by the agent is represented as:
(18) |
(19) |
where is the charging demand of the
2) The action space is represented as because the actions are taken sequentially, which determines the specific charging profile of the
3) Every action taken by the agent will obtain a reward, which describes the performance of this action and contributes to improving the agent to achieve the maximum cumulative rewards. The reward function is defined as:
(20) |
(21) |
where is the reward gained by the agent after taking the action ; is the deviation between the optimal charging power and the scheduled charging power of k EVs; is the optimal power of node at time ; and is the coefficient of reward, which is used to normalize the reward between different nodes with various EVs.
Moreover, the reward function can also reveal how much the charging demand is not satisfied or the charging costs have increased.

Fig. 4 Specific penalty when charging demands are satisfied or not satisfied. (a) Demands are satisfied. (b) Demands are not satisfied.
Theoretically, the total reward with optimal charging scheduling can be represented as:
(22) |
(23) |
where is a constant value for normalizing the reward.
From the perspective of the whole scheduling process, the agent will schedule all EVs to approach the optimal charging profiles so as to maximize the total reward. Nevertheless, considering the indivisibility of EV charging processes, it is tricky to realize the global optimum through the optimal decision of every single step. To be specific, the present decision has durable effects on the later charging scheduling processes, which are difficult to be involved into the optimization problem and solved using conventional methods. Based on the outstanding approximation ability of the neural network, DRL can take the subsequent effects into consideration. For example, the DRL agent may take an action that cannot gain the maximum reward at present, but it contributes to obtaining more rewards in the future and achieving the maximum total reward.
Policy gradient is an essential method for training the DRL agent to maximize the cumulative reward, which works by computing an estimator of the policy gradient and plugging it into a stochastic gradient ascent algorithm [
(24) |
where is the stochastic policy function with parameter ; is the estimator of the advantage function at time ; and are the action and state, respectively; and is the empirical average with finite samples.
As a result, the loss function is defined as:
(25) |
However, traditional policy gradient methods have low utilization efficiency of sampling data and have to spend too much time on sampling new data once the policy is updated. Besides, it is difficult to determine appropriate steps for updating policy so as to prevent resulting in large differences between the new policy and the old policy.
Therefore, the PPO algorithm was proposed in 2017 to address the above shortcomings. The detailed training workflow of the DRL agent with PPO algorithm is demonstrated in

Fig. 5 Training workflow of DRL agent with PPO algorithm.
To increase the sample efficiency, is used to interact with environments and sample trajectory sets with timesteps, while is the actual network that needs to be trained according to the demonstrations of . Utilizing importance sampling technology, the same trajectory sets can be used multiple times although there are differences between and . The probability ratio of new policy and old policy can be expressed as:
(26) |
Another point of PPO algorithm is that the new policy should avoid significant evolution from the old policy after every update, so as to maintain the accuracy of importance sampling and avoid accident performance collapse. Hence, a clipped surrogate function is used to remove the incentive for moving outside of the interval , so the loss function of PPO algorithm can be represented as [
(27) |
where is the clipping parameter, which aims to clip the probability ratio. For instance, the objective will increase if the advantage function is positive, but the increase is maintained within by a limit set by the clipping function.
Consequently, the network parameters of the new policy are updated using:
(28) |
Apart from the actor network, a critic network is used to estimate the state value function and the advantage function. The advantage function describes how much an action is better than other actions on average, which is defined as:
(29) |
(30) |
(31) |
where is the discounting factor, which aims to balance the importance between immediate and future rewards; and and are the value function and the action-value function, respectively. Therefore, is the expected value on average at state , which contains all optional actions. is the expected value at state when taking action .
The critic network is updated using regression to minimize a mean-squared-error objective [
(32) |
(33) |
where is the reward-to-go, which is the sum of rewards after a point in the trajectory.
The DRL agent with PPO algorithm is trying to schedule the EV charging process according to the optimal charging profile, with the goal of maximizing the total expected rewards. The training workflow of PPO algorithm is summarized in
Parameter | Value | Parameter | Value |
---|---|---|---|
0.15 | 2048 | ||
0.2 | 10 | ||
0.99 | 1000 | ||
3×1 | 64 |
The discounting factor and the clipping parameter are important hyperparameters that influence the performance agent observably. The importance of current action depends on the discounting factor , and a larger means that an agent is more long-sight so as to take full consideration of future uncertainties to achieve the maximum cumulative rewards. Thus, is set to be 0.99 [
Algorithm 1 : training workflow of PPO algorithm |
---|
1: Initialize policy network and value function network |
2: for do |
3: Run policy to interact with the environment for T timesteps and obtains the trajectory samples |
4: Calculate the reward-to-go |
5: Use to estimate the advantage function |
6: Compute the loss function with regard to with epochs of gradient decent |
7: |
8: end for |
Both the convergence speed and performance stability depend on [
To evaluate the performance of the proposed two-stage DRL-based charging scheduling scheme, case studies are conducted in this section.
An ADN for simulation is established based on the IEEE 14-node test feeder, as shown in

Fig. 6 ADN based on IEEE 14-node test feeder.
In accordance with the realistic situations in Zhejiang, China, the valley period of TOU is set from 22:00 to 08:00 of the next day. Meanwhile, the residential load data during the valley period are obtained from a housing estate in Hangzhou, Zhejiang, as shown in

Fig. 7 Residential load data profiles during valley period.
The original and optimal charging profiles of EVs at different CSs are shown in

Fig. 8 Original and optimal charging profiles of EVs at different CSs.
The numbers of household EVs are set to be 179, 225, and 146 at node 5, node 9, and node 13, respectively. It is assumed that the charging power obeys the uniform distribution and the starting SOC obeys the normal distribution [
Parameter | Description | Value |
---|---|---|
(kW) | Charging power | |
(%) | Charging efficiency | 90 |
(kWh) | Battery capacity | 50 |
(%) | Starting SOC | |
(%) | Expected SOC | 100 |
(hour) | Arrival time | |
(hour) | Departure time | 08:00 |
Note: normal distribution with the mean value of and the standard deviation of is abbreviated to ; uniform distribution with the minimum and maximum values of and , respectively, is abbreviated to .
TOU prices make great contributions to transferring charging demands from peak period to valley period. However, the charging processes cannot be dispatched effectively due to TOU prices are unable to describe the various demands of the ADN precisely during different time periods. Therefore, EV owners instinctively decide to start charging at the beginning of the valley period. As shown in
To alleviate the power congestions and schedule the charging demands according to distribution network operations, the DSO needs to determine the optimal charging profiles of EV CSs by solving the OPF. Utilizing the DistFlow model introduced in Section II, the OPF of the ADN is calculated with the goal of flattening tie-line power, and the optimal charging profiles are shown in
It can be observed that the main charging demands are transferred to 01:00-05:00, during which other electricity consumptions are the lowest. Moreover, the regulation targets are not allocated simply according to the total electricity demands of CSs; nodal voltages and impacts from other nodes are also taken into account to realize the multidimensional optimum. Therefore, the CSs are coordinated and the optimal charging profiles at different nodes are various, as shown in
On the basis of optimal charging profiles calculated in the first stage, the DRL agent needs to schedule the charging processes of EVs sequentially to approach the optimal profiles. The charging scheduling results of household EVs at different nodes are shown in

Fig. 9 Charging scheduling results of household EVs at different nodes. (a) Node 5. (b) Node 9. (c) Node 13.
As shown in
Node No. | Average deviation (MW) | The maximum deviation (MW) | Total reward |
---|---|---|---|
5 | 0.101 | 0.246 | 939.5 |
9 | 0.067 | 0.277 | 923.1 |
13 | 0.044 | 0.127 | 946.6 |
During the whole charging scheduling processes, the DRL agent makes efforts to maximize the reward and obtains a total reward of 939.5, 923.1, and 946.6 for node 5, node 9, and node 13 in the end, respectively. Similar to the indexes of average deviation and the maximum deviation, the total reward indicates that the agent performs better with a smoother objective charging profile.
Moreover, the average SOC and median SOC at specific hours are further analyzed, as shown in

Fig. 10 Average SOC and median SOC during valley period.
To verify the advantages of PPO algorithm, advantage actor critic (A2C) and deep Q-network (DQN) algorithms are implemented as the benchmarks. All training timesteps are set to be 512200 to analyze the total reward and the convergence speed. The cumulative reward is regarded as an index to evaluate the performance of the agents trained by different algorithms.

Fig. 11 Reward evolution curves of PPO, A2C, and DQN algorithms.
The number of start points of the reward curve is regarded as the performance of the random policy, which is around 600. The PPO algorithm achieves the highest reward about 937. The PPO algorithm reaches a relatively stable state after 50 episodes (102400 timesteps). In the following 250 episodes, the PPO algorithm keeps exploring the optimal strategy and stabilizes its policy networks. Finally, the agent comprehensively reaches convergence with lower reward variances.
The A2C algorithm has a sharp increase at the beginning of the training process, which appears much faster than that of the PPO algorithm. The results prove that the clipped function of PPO algorithm has worked and limited the drastic change of policy network, so as to effectively avoid performance collapse and local optimum. As illustrated in
The DQN algorithm converges to the total reward around 825 and the PPO algorithm outperforms it by more than 13%, which proves the advantages of actor-critic networks. Besides, DQN spends much time collecting abundant samples and filling up its replay buffer, so there is no improvement at the beginning of the training process.
It takes a total time of 782.75 s, 586.83 s, and 542.21 s for PPO, A2C, and DQN algorithms in the entire training process, respectively. Besides, the decision time of the proposed DRL agent with PPO algorithm is also tested and the results indicate that the average decision time per EV is about 2.5 ms. These tests have been carried out using Python 3.7 on an Inte
In terms of test results, the PPO algorithm outperforms the A2C algorithm, DQN algorithm, and random policy, although the PPO algorithm has the lowest training speed with the same timesteps. To be specific, the PPO algorithm can obtain the total reward of 937 when scheduling EV charging processes, which is 29, 112, and 337 more than that of the A2C algorithm, DQN algorithm, and random policy, respectively.
Then, the loss function performance of PPO algorithm is presented, as shown in

Fig. 12 Loss function performance of PPO algorithm during training process. (a) Value loss. (b) Loss.
Hence, the PPO algorithm is suited for addressing the charging scheduling problem and can be adopted to handle the uncertainty of environment.
The original and optimized tie-line power profiles are demonstrated in

Fig. 13 Comparison between original and optimized tie-line power profiles.
Different from the transmission network, the distribution network possesses much higher resistance, and the active power has more significant effects on voltages. As a result, apart from great contributions to the elimination of peak-valley differences, the voltage quality of the ADN is also improved through scheduling household EVs during the valley period. As shown in

Fig. 14 Original nodal voltages.
Utilizing the proposed two-stage charging scheduling scheme, the voltage violation problem is addressed effectively, as shown in

Fig. 15 Nodal voltages after charging scheduling.
Besides, the oscillations of nodal voltages are also limited with smoother tie-line power, which is beneficial for reducing the operating times of voltage regulation equipment such as on-load tap changers. For example, the voltage variations of node 4 decrease from 0.0051 p.u. to 0.0028 p.u. during the whole valley period. Meanwhile, the contributions to voltage regulation are not restricted to the nodes of CSs. As shown in
Node No. | Original voltage nadir (p.u.) | Optimized voltage nadir (p.u.) | Voltage improvement (%) |
---|---|---|---|
2 | 0.982 | 0.984 | 0.17 |
3 | 0.979 | 0.981 | 0.19 |
4 | 0.981 | 0.983 | 0.14 |
5 | 0.973 | 0.975 | 0.26 |
6 | 0.975 | 0.976 | 0.11 |
7 | 0.971 | 0.974 | 0.28 |
8 | 0.972 | 0.976 | 0.39 |
9 | 0.969 | 0.973 | 0.35 |
10 | 0.974 | 0.976 | 0.21 |
11 | 0.976 | 0.978 | 0.23 |
12 | 0.976 | 0.978 | 0.23 |
13 | 0.974 | 0.977 | 0.29 |
14 | 0.973 | 0.975 | 0.28 |
Because OPF is involved in the first-stage optimization problem, all nodal voltages are taken into consideration when determining the optimal charging profiles of CSs. Specifically, the voltage nadir of node 8 has 0.39% improvement. For these old communities with relatively poor electricity infrastructure, the proposed scheme can also satisfy residential power consumption and charging demands simultaneously with limited carrying capacity.
Therefore, under the existing TOU price circumstances, the proposed two-stage charging scheduling scheme can take full use of the regulation potential of household EVs during valley periods to improve the power quality of the ADN without extra equipment investments and charging costs, including peak-valley difference elimination, congestion management, and nodal voltage regulation.
In the context of taking full use of the regulation potential of household EVs under TOU prices, this paper proposes a two-stage charging scheduling scheme to dispatch household EVs. The first-stage problem aims to involve the charging scheduling of household EVs in operation and optimization of the ADN, and the optimal charging power profiles of CSs are determined by calculating the OPF so as to relieve the power congestions and shorten the peak-valley differences. Furthermore, a PPO-based DRL agent is developed to dispatch the charging processes of EVs in terms of the optimal charging power. Case studies with realistic data are conducted to illustrate the multidimensional performance of the proposed scheme. It is demonstrated that the PPO-based DRL agent can be adopted in different CSs with various objective charging profiles and EV amounts. Besides, the charging scheduling of EVs contributes to significant improvement in power quality, including decreasing the peak-valley differences and stabilizing the nodal voltages.
Moreover, the proposed scheme can be adopted properly in substantial distributed communities with the combination of edge computing technology. On this basis, numerous flexible loads, e.g., thermostatic loads, energy storage, RES, can be involved into the proposed scheme to be managed efficiently, so as to activate their flexibility and enhance the regulation capacity of ADNs in the near future.
References
T. Chen, X.-P. Zhang, J. Wang, et al., “A review on electric vehicle charging infrastructure development in the UK,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 2, pp. 193-205, Mar. 2020. [Baidu Scholar]
IEA. (2022, May). Global EV outlook 2022. [Online]. Available: https://www.iea.org/reports/global-ev-outlook-2022 [Baidu Scholar]
H. Liu, P. Zeng, J. Guo et al., “An optimization strategy of controlled electric vehicle charging considering demand side response and regional wind and photovoltaic,” Journal of Modern Power Systems and Clean Energy, vol. 3, no. 2, pp. 232-239, Jun. 2015. [Baidu Scholar]
Fco. J. Zarco-Soto, J. L. Martínez-Ramos, and P. J. Zarco-Periñán, “A novel formulation to compute sensitivities to solve congestions and voltage problems in active distribution networks,” IEEE Access, vol. 9, pp. 60713-60723, Apr. 2021. [Baidu Scholar]
B. Wei, Z. Qiu, and G. Deconinck, “A mean-field voltage control approach for active distribution networks with uncertainties,” IEEE Transactions on Smart Grid, vol. 12, no. 2, pp. 1455-1466, Mar. 2021. [Baidu Scholar]
Y. Luo, Q. Nie, D. Yang, et al., “Robust optimal operation of active distribution network based on minimum confidence interval of distributed energy beta distribution,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 2, pp. 423-430, Mar. 2021. [Baidu Scholar]
K. Xie, H. Hui, Y. Ding et al., “Modeling and control of central air conditionings for providing regulation services for power systems,” Applied Energy, vol. 315, p. 119035, Jun. 2022. [Baidu Scholar]
H. Wei, J. Liang, C. Li et al., “Real-time locally optimal schedule for electric vehicle load via diversity-maximization NSGA-II,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 4, pp. 940-950, Jul. 2021. [Baidu Scholar]
E. Hadian, H. Akbari, M. Farzinfar et al., “Optimal allocation of electric vehicle charging stations with adopted smart charging/discharging schedule,” IEEE Access, vol. 8, pp. 196908-196919, Oct. 2020. [Baidu Scholar]
H.-M. Chung, S. Maharjan, Y. Zhang et al., “Intelligent charging management of electric vehicles considering dynamic user behavior and renewable energy: a stochastic game approach,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 12, pp. 7760-7771, Jul. 2021. [Baidu Scholar]
J. Hu, C. Ye, Y. Ding et al., “A distributed MPC to exploit reactive power V2G for real-time voltage regulation in distribution networks,” IEEE Transactions on Smart Grid, vol. 13, no. 1, pp. 576-588, Sept. 2022. [Baidu Scholar]
S. Deb, A. K. Goswami, P. Harsh et al., “Charging coordination of plug-in electric vehicle for congestion management in distribution system integrated with renewable energy sources,” IEEE Transactions on Industry Applications, vol. 56, no. 5, pp. 5452-5462, Sept. 2020. [Baidu Scholar]
S. Das, P. Acharjee, and A. Bhattacharya, “Charging scheduling of electric vehicle incorporating grid-to-vehicle and vehicle-to-grid technology considering in smart grid,” IEEE Transactions on Industry Applications, vol. 57, no. 2, pp. 1688-1702, Mar. 2021. [Baidu Scholar]
L. Yan, X. Chen, J. Zhou et al., “Deep reinforcement learning for continuous electric vehicles charging control with dynamic user behaviors,” IEEE Transactions on Smart Grid, vol. 12, no. 6, pp. 5124-5134, Jul. 2021. [Baidu Scholar]
F. L. D. Silva, C. E. H. Nishida, D. M. Roijers et al., “Coordination of electric vehicle charging through multiagent reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2347-2356, May 2020. [Baidu Scholar]
L. Gan, X. Chen, K. Yu et al., “A probabilistic evaluation method of household EVs dispatching potential considering users’ multiple travel needs,” IEEE Transactions on Industry Applications, vol. 56, no. 5, pp. 5858-5867, Sept. 2020. [Baidu Scholar]
E. L. Karfopoulos and N. D. Hatziargyriou, “A multi-agent system for controlled charging of a large population of electric vehicles,” IEEE Transactions on Power Systems, vol. 28, no. 2, pp. 1196-1204, May 2013. [Baidu Scholar]
A.-M. Koufakis, E. S. Rigas, N. Bassiliades et al., “Offline and online electric vehicle charging scheduling with V2V energy transfer,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 5, pp. 2128-2138, May 2020. [Baidu Scholar]
Y. Li, M. Han, Z. Yang et al., “Coordinating flexible demand response and renewable uncertainties for scheduling of community integrated energy systems with an electric vehicle charging station: a bi-level approach,” IEEE Transactions on Sustainable Energy, vol. 12, no. 4, pp. 2321-2331, Oct. 2021. [Baidu Scholar]
S. Li, W. Hu, D. Cao et al., “Electric vehicle charging management based on deep reinforcement learning,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 3, pp. 719-730, May 2022. [Baidu Scholar]
D. Silver, A. Huang, C. J. Maddison et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016. [Baidu Scholar]
B. Huang and J. Wang, “Deep-reinforcement-learning-based capacity scheduling for PV-battery storage system,” IEEE Transactions on Smart Grid, vol. 12, no. 3, pp. 2272-2283, May 2021. [Baidu Scholar]
D. Qiu, Y. Ye, D. Papadaskalopoulos et al., “A deep reinforcement learning method for pricing electric vehicles with discrete charging levels,” IEEE Transactions on Industry Applications, vol. 56, no. 5, pp. 5901-5912, Sept. 2020. [Baidu Scholar]
M. Shin, D.-H. Choi, and J. Kim, “Cooperative management for PV/ESS-enabled electric vehicle charging stations: a multiagent deep reinforcement learning approach,” IEEE Transactions on Industrial Informatics, vol. 16, no. 5, pp. 3493-3503, May 2020. [Baidu Scholar]
J. Schulman, F. Wolski, P. Dhariwal et al. (2017, Jul.). Proximal policy optimization algorithms. [Online]. Available: http://arXiv:1707.06347 [Baidu Scholar]
S. Yoon and E. Hwang, “Load guided signal-based two-stage charging coordination of plug-in electric vehicles for smart buildings,” IEEE Access, vol. 7, pp. 144548-144560, Oct. 2019. [Baidu Scholar]
M. E. Baran and F. F. Wu, “Network reconfiguration in distribution systems for loss reduction and load balancing,” IEEE Transactions on Power Delivery, vol. 4, no. 2, pp. 1401-1407, Apr. 1989. [Baidu Scholar]
S. Tan, J.-X. Xu, and S. K. Panda, “Optimization of distribution network incorporating distributed generators: an integrated approach,” IEEE Transactions on Power Systems, vol. 28, no. 3, pp. 2421-2432, Aug. 2013. [Baidu Scholar]
C. Zhang, Y. Liu, F. Wu et al., “Effective charging planning based on deep reinforcement learning for electric vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 1, pp. 542-554, Jan. 2021. [Baidu Scholar]
S. Han, S. Han, and K. Sezaki, “Development of an optimal vehicle-to-grid aggregator for frequency regulation,” IEEE Transactions on Smart Grid, vol. 1, no. 1, pp. 65-72, Jun. 2010. [Baidu Scholar]
Z. Zhao and C. K. M. Lee, “Dynamic pricing for EV charging stations: a deep reinforcement learning approach,” IEEE Transactions on Transportation Electrification, vol. 8, no. 2, pp. 2456-2468, Jun. 2022. [Baidu Scholar]
W. Zhu and A. Rosendo, “A functional clipping approach for policy optimization algorithms,” IEEE Access, vol. 9, pp. 96056-96063, Jul. 2021. [Baidu Scholar]