Abstract
A time-variable time-of-use electricity price can be used to reduce the charging costs for electric vehicle (EV) owners. Considering the uncertainty of price fluctuation and the randomness of EV owner’s commuting behavior, we propose a deep reinforcement learning based method for the minimization of individual EV charging cost. The charging problem is first formulated as a Markov decision process (MDP), which has unknown transition probability. A modified long short-term memory (LSTM) neural network is used as the representation layer to extract temporal features from the electricity price signal. The deep deterministic policy gradient (DDPG) algorithm, which has continuous action spaces, is used to solve the MDP. The proposed method can automatically adjust the charging strategy according to electricity price to reduce the charging cost of the EV owner. Several other methods to solve the charging problem are also implemented and quantitatively compared with the proposed method which can reduce the charging cost up to 70.2% compared with other benchmark methods.
IN recent years, the development of electric vehicles (EVs) has provided a means to reduce air pollution and depletion of conventional carbon energy sources [
Various programming strategies have been proposed to optimize EV charging/discharging schedules, which can be divided into three categories: dynamic programming [
A stochastic dynamic programming based method for the scheduling of EV charging is proposed in [
Although programming-based methods capture the law of the interaction between electricity price and charging/discharging behavior to reduce the charging cost of the EV owner, these methods are not always scalable. For a given state, these programming methods require many iterations to obtain the optimal solution. However, the optimization of EV charging cost is a real-time optimization problem. Considering the computation time, the programming-based method is not suitable for the research of this problem [
In recent years, different neural network (NN) based methods have been applied to the research of EV [
The application of NNs in energy management can be divided into two categories: ① NNs assist in making decisions [
As a newly developing machine learning, reinforcement learning (RL) can develop an excellent control policy in the absence of initial environment information and the application of RL in decision-making is of great value. In recent literature, RL has solved EV charging schedule problems. Reference [
The core of Q-learning is an action-value matrix, which is composed of state and action variables whose size determines the complexity of Q-learning. In some low-dimensional state space and discrete action space cases, Q-learning can achieve good performance [
We consider an EV charging/discharging model with continuous action spaces, which have a flexible energy management policy, to minimize the charging costs for the EV owner. To overcome the shortcomings of [
1) A DRL-based charging/discharging strategy is proposed for the EV owner. Comparative tests are conducted with different benchmark methods to verify the effectiveness of the proposed method.
2) The novel recurrent neural network (RNN) architecture is used, which is an improved version of LSTM with only the forget gate used to extract the electricity price temporal pattern. A comparative test among different RNN-based feature extraction methods is conducted to demonstrate the impact of the feature extraction ability on the proposed method and to verify the effectiveness of the feature extraction ability of the JANET architecture.
3) Considering the randomness of EV owner’s commuting behavior, the charging/discharging action is decided when the arrival time and departure time of the EV are unknown.
The remainder of this paper is organized in the following structure. In Section II, the single EV charging/discharging scenario is introduced and modeled as a Markov decision process (MDP). The DDPG algorithm and JANET NN are described in Section III. Section IV describes the NN architecture, experimental details, and training process. In Section V, experimental results are presented in detail to demonstrate the effectiveness of the proposed method. Section VI presents comparison results with similar methods and analysis of the simulation results, and Section VII presents the conclusions.
Assuming that the EV can transmit the power to or receive power from the power grid. The arrival time and plug-in time of EV are and on day , respectively. EV departure time is on day . The episode begins when the EV arrives home on day , and ends when the EV leaves home on day .
In this paper, the charging process is defined as an MDP, which has unknown transition probabilities due to the randomness of EV owner’s commuting behavior and electricity price. This method utilizes the fluctuation in electricity price to minimize the cost. For example, if the EV is charged when electricity price is low and discharged when the electricity price is high, the reduction in charging costs for the EV owner can be achieved. The scenario of this model is shown in

Fig. 1 Single EV charging management model.
The problem of economic benefits of charging/discharging for the EV owner is modeled as an MDP, which has unknown transition probability with finite time steps. An MDP is a four-tuple , where is the state space, is the action space, is the reward function, and is the state transition function.
At time step t, the ICD obtains state , which includes the remaining capacity of the battery and the previous N-hour electricity prices of time t. Action is taken, which indicates the charging/discharging power of battery. After the action is executed, the agent receives an immediate reward , and the system transfers to a new state . An episode of MDP consists of a finite sequence of time steps, states, actions, rewards, and new states, at the first moment, there is s1, a1, r1, s2; the same to the second moment s2, a2, r2, s3; and at the last moment T, there is sT, aT, rT. The details of MDP formulation are defined as follows.
1) State: at time t, the state of the MDP is represented as , where Et is the remaining battery capacity of the EV, and is the previous N-hour electricity price at time t.
2) Action: at time t, the action is set to be at. The action of MDP is defined as the charging/discharging power, which can be selected continuously in the range , where indicates the maximum charging power of EV.
3) Reward function: the reward function can be expressed as:
(1) |
where is the maximum capacity of EV; denotes the time when the EV is at home; denotes the time when EV leaves home; is the electricity price at time t; and , , , and are the real-valued coefficients. Therein, these four coefficients are set to ensure that the power demand and economic benefits of the EV owner are satisfied, and the battery runs in a safe working mode.
During the V2G time of EV, indicates the charging cost at time . The two penalty terms and are added for safe operation of batteries. is the penalty term for the EV leaving home without being fully charged. In a real-world scenario, different EV owners have different driving distance demands, some of whom are more concerned with driving distance and others are more concerned with economic benefits. The proposed method considers EV owner’s demand and uses parameter to adjust the characteristics of the model to satisfy different demands, the detailed experiences are shown in Section V.
4) State transition function: the state transition function can be expressed as . In the deterministic part, only influences , and the relationship between Et and is . In the stochastic part, the transition function, which has unknown transition probability, follows the stochastic conditional probability , which is influenced by the randomness of electricity price and EV owner’s commuting behavior. In a model-based method, it is difficult to model an environment with such a stochastic conditional probability. This paper presents a model-free method to solve this problem by learning the state transition from unlabeled real-world data without designing an environmental dynamic model.
When an agent performs a task, it chooses an action according to policy to interact with the environment. After it implements the action, a new state is reached and the environment returns a reward to the agent. This process cycles until the agent completes the task well. The objective of RL can be defined as , where , and policy creates a mapping between the current state and the action to be applied (the action is modeled as a probability distribution). is a reward function, T means one episode has T steps, and is the discount factor used to indicate the importance of future rewards relative to immediate rewards. However, may be stochastic, which leads to being stochastic as well. In order to solve the stochastic , the objective of RL can be defined as .
The action value function is used in RL to improve the policy to achieve the objective . The action value function describes the cumulative expected reward obtained after taking action at state , and thereafter using policy [
(2) |
Its Bellman equation [
(3) |
where is the environment.
In this paper, the goal of the ICD is to reduce the charging cost for the EV owner during to . The EV charging scheduling is a sequential decision problem and it is not only influenced by economic benefits of the current time, but also influenced by the economic benefits and the battery energy in the future. As illustrated in (3), the immediate reward of charging/discharging is and is the future reward.
The proposed method uses a feature analysis model (FAM) to determine the potential patterns from historical electricity price data. Then, RL performs charging/discharging action based on the received features of future electricity price and Et information. Since the agent of the model has continuous action variables, compared with an agent that executes discrete action, the two agents in the same state have a different number of actions that can be selected, which leads to a much larger Q table dimension in the former than in the latter. In the training process, if the Q value of the agent performing continuous actions is calculated, the iterative calculation of the Q table increases dramatically, leading to a time-consuming training process that is difficult to converge [
(4) |
The DDPG algorithm is a DRL which is based on (4). The DDPG algorithm consists of two parts, i.e., the critic and the actor parts. The critic part approximates the action value function, and the actor part approximates the strategy function. The connection between the two parts is as follows: the environment provides to the agent, and the actor part of the agent makes an action based on . When the environment receives , it gives the agent a reward and a new . The agent must then update the critic part according to the reward, and then update the actor part in the direction suggested by the critic part. The algorithm moves to the next step and the process continues until a good actor is achieved, which is reflected by a high total reward.
There are four networks included in the DDPG algorithm [
The DDPG algorithm is a deterministic strategy. To find a better strategy, we add Gaussian noise to increase the randomness of the output action in the model.
(5) |
In this algorithm, the loss function is defined as [
(6) |
(7) |
where N is the batch size.
In (6) and (7), the gradient descent method is used to update the parameter in the direction of reducing the loss. To update the actor network, the gradient is defined as [
(8) |
In (8), the parameter of the strategy is updated in the direction that increases the .
In the DDPG algorithm, a target network parameter updating the method based on the “soft” mode is adopted; the critic/actor target network slowly tracks the critic/actor network parameter. This parameter updating method can significantly increase the stability of learning [
(9) |
(10) |
where .
Reference [
(11) |
(12) |
(13) |
(14) |
(15) |
(16) |
where , , , , , and are the input node, input gate, forget gate, output gate, cell state, and hidden state, respectively; and are the matrix weights; , , , and are the vectors of biases; is the tanh function; is the sigmoid function; and is the element-wise multiplication operation.
LSTM NN has two features. One is cell state , which has a recurrent self-connected edge with a constant weight of 1 to overcome gradient disappearance and gradient explosion [
The architecture of JANET retains the two features of LSTM but removes and . In addition, although of brings the same dynamic output range to each cell, it also causes training difficulties [
The proposed method has four JANET layers. The previous 24-hour electricity price data are processed by matrix and input into the first JANET layer. is the optimization parameter. The features of future electricity price are the outputs at the fourth JANET layer.
Electricity price data are processed before they are input into the JANET cell.
(17) |
After the data flow into the JANET cell, the hidden state of the first layer is computed as:
(18) |
(19) |
(20) |
(21) |
where e indicates the
At the fourth layer, the features of future electricity price can be calculated as:
(22) |
where is the optimization parameter. Then, in order to update the parameters of JANET, the loss function can be defined as:
(23) |
where is the electricity price of the current time t.
As shown in

Fig. 2 DRL method combining DDPG algorithm and JANET NN to perform real-time optimization of EV charging management strategy.
The training process of the FAM is performed in a supervised manner. The training data contain electricity prices for the first 200 days of 2017 [
After the training of FAM is completed, the training of DM can be implemented based on the FAM output. The training process and the main parameters of the DDPG are shown in
At the beginning of the DM training process, the replay buffer and four NNs are established. The purpose of establishing replay buffer [
The complete workflow of the proposed method is shown in

Fig. 3 Complete workflow of proposed method.
To show the randomness of commuting behavior, , , and are generated randomly. of EV when arriving home follows a normal distribution , where , and . and follow a uniform distribution and are sampled from the sets of {15, 16, 17, 18, 19, 20} and {6, 7, 8, 9, 10, 11}, respectively.
We use a FIAT 500e with battery storage of and in the experiments. The maximum charging power and discharging power of the battery are assumed to be 6 kW and -6 kW, respectively.
The experimental environment is implemented in Python using Tensorflow. The experimental workstation is a computer with an Intel Core i5-6300HQ and a NVIDIA GTX960M GPU.
1) Training results: the training data contain the data for the first 200 days of 2017 [

Fig. 4 Training results of FAM and DM. (a) Training accuracy of FAM. (b) Cumulative reward (average value) of each episode during DM training process.
2) Model performance: the test data are from 201-300 days of 2017 [

Fig. 5 Comparison of forecasted and actual electricity prices for days 201-230 of 2017.
The electricity price and charging/discharging behavior in four consecutive days are illustrated in

Fig. 6 Electricity price and charging/discharging behavior in four consecutive days. (a)
In
The detailed calculation of the charging cost is presented in the next section. In addition, a trained model can make a decision in 3 ms and it can fully meet the online request.
In a real-world scenario, different people have different driving distances to the individual destination. Those who drive a long distance pay more attention to than the economic benefits. In contrast, those who drive a short distance pay more attention to the economic benefits than to . To measure EV owner’s preference, in (1) is introduced, and the two scenarios can be switched as long as is adjusted. Specifically, is set to be 2 for the users with a long driving distance and 1 for the users with a short driving distance. The detailed parameters mentioned in (1) are summarized in
The training data and test data mentioned in case 1 are used in this case to investigate the performances of different FAMs and the effect of the FAMs on the DM. To further investigate whether the combination of convolutional NN (CNN) and RNN has a stronger ability to extract the temporal pattern than a single RNN, each RNN adds an additional comparative model that combines the CNN and RNN [
The prediction accuracy of different models is tested first. All models have the same parameters and training episodes, as shown in
(24) |
where is the experiment time; MSE is the mean square error; is the prediction value; is the real electricity price; and is the number of elements in the set.
The prediction errors of eight models in the 100-day test data are shown in

Fig. 7 Prediction errors of eight models in 100-day test set.
After studying the accuracy of different FAMs to predict the future electricity price trend, the effect of the different FAMs on the DM is investigated. To visualize the differences among the eight models, the cumulative charging cost of each model in the 100-day test set is subtracted from the charging cost of JANET. The cumulative cost data are obtained by combining the eight RNN models with the same DM and the results are shown in

Fig. 8 Differences in cumulative charging cost of each model in 100-day test set and JANET model.
Specifically, the economic cost differences of the CNN + JANET, CNN + GRU, GRU, CNN + LSTM, LSTM, BiLSTM, and CNN + BiLSTM models with JANET model are $0.6, $0.88, $1.15, $1.39, $1.56, $1.7, and $2.37 for the same DM, respectively.
The proposed method has two components, the FAM and the DM.
We propose a DRL-based method for the charging strategy to reduce the charging cost for the EV owner. The proposed method uses JANET, an improved version of LSTM, as the FAM to extract the variation regularity of electricity price, and applies a DRL algorithm to make decisions based on the extracted features. The proposed method combines the feature extraction ability of deep learning and the decision-making ability of RL, and provides better robustness for the uncertainty of electricity price and EV owner’s commuting behavior.
The research in this paper is similar to [
The results of case 1 show that the proposed method can learn an optimal charging strategy to manage the dynamics of electricity price.
In case 2, the effect of the FAMs on the DM is investigated. Figures
In order to further verify the effectiveness of the proposed method, different benchmark methods are investigated. The training data and test data of these methods are the same as those in case 1. The proposed method is compared with several baselines as follows.
1) RL-based methods: including DQN charging method in [
2) Unmanaged strategy: the unmanaged strategy charges the battery with a maximum power of 6 kW at until the battery storage is full at .
3) Theoretical limit: for the theoretical limit (MATLAB toolbox), , , , and electricity price are already known before, and a global optimal decision can be made.
Considering the probabilistic events that is not full at 24 kWh in DQN, DQWJ, DWN, DWCB, and the proposed method, the cumulative charging cost of the these methods can be calculated as:
(25) |
where and is the first price greater than 0 after . The same rules are also applied in Tables
(26) |
The cumulative charging costs of all methods in the 100-day test set are shown in

Fig. 9 Comparison of cumulative charging costs between proposed charging method and four other charging methods in 100-day test set.
By analyzing the results in
The DRL model of EV charging proposed in this paper can provide customized charging strategies for any specific EV to reduce charging costs. In addition, any deferrable load can use a variant of the proposed method to produce certain economic benefits not limited to EV. However, if a model proposed by large-scale EV owners avoids the peak price and charges at a relatively inexpensive time, the electricity price will rebound due to the economic regulation of the market, which will introduce new uncertainties. In order to avoid introducing the new uncertainties into large-scale optimization, the spatiotemporal pattern based system [
The EV is a leading product to drive a new industrial revolution. To promote the transformation of the market from fuel vehicles to EVs, consumer choice is a critical factor. Therefore, it is necessary to develop a strategy for reducing the EV charging cost to increase EV purchasing.
In this context, we propose a DRL-based method that combines the feature extraction ability of deep learning and the decision-making ability of RL for an EV charging strategy that reduces charging cost for the EV owner. The proposed method uses JANET, an improved version of LSTM, as the FAM to extract the variation regularity of electricity price, and applies a DRL algorithm to make decisions based on the extracted features. The simulation results show that the proposed method can reduce the charging cost up to 70.2% compared with other methods.
References
Z. Liu, Q. Wu, L. Christensen et al., “Driving pattern analysis of nordic region based on national travel surveys for electric vehicle integration,” Journal of Modern Power Systems and Clean Energy, vol. 3, no. 2, pp. 180-189, Jun. 2015. [Baidu Scholar]
T. Chen, X. Zhang, J. Wang et al., “A review on electric vehicle charging infrastructure development in the UK,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 2, pp. 193-205, Mar. 2020. [Baidu Scholar]
S. Li, W. Hu, D. Cao et al., “A multi-agent deep reinforcement learning-based approach for the optimization of transformer life using coordinated electric vehicles,” IEEE Transactions on Industrial Informatics. doi: 10.1109/TII.2021.3139650 [Baidu Scholar]
R. Gough, C. Dickerson, P. Rowley et al., “Vehicle-to-grid feasibility: a techno-economic analysis of EV-based energy storage,” Applied Energy, vol. 192, pp. 12-23, Apr. 2017. [Baidu Scholar]
J. Hu, H. Zhou, Y. Li et al., “Multi-time scale energy management strategy of aggregator characterized by photovoltaic generation and electric vehicles,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 4, pp. 727-736, Jul. 2020. [Baidu Scholar]
H. Patil and V. N. Kalkhambkar, “Grid integration of electric vehicles for economic benefits: a review,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 1, pp. 13-26, Jan. 2021. [Baidu Scholar]
E. B. Iversen, J. M. Morales, and H. Madsen, “Optimal charging of an electric vehicle using a markov decision process,” Applied Energy, vol. 123, pp. 1-12, Jun. 2014. [Baidu Scholar]
W. Hu, C. Su, Z. Chen et al., “Optimal operation of plug-in electric vehicles in power systems with high wind power penetrations,” IEEE Transactions on Sustainable Energy, vol. 4, no. 3, pp. 577-585, Jul. 2013. [Baidu Scholar]
C. Jin, T. Jian, and P Ghosh, “Optimizing electric vehicle charging: a customer’s perspective,” IEEE Transactions on Vehicular Technology, vol. 62, no. 7, pp. 2919-2927, Sept. 2013. [Baidu Scholar]
A. Ravey, R. Roche, B. Blunier et al., “Combined optimal sizing and energy management of hybrid electric vehicles,” in Proceedings of 2012 IEEE Transportation Electrification Conference and Expo (ITEC), Dearborn, USA, Jun. 2012, pp. 1-6. [Baidu Scholar]
D. Cao, W. Hu, J. Zhao et al., “Reinforcement learning and its applications in modern power and energy systems: a review,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1029-1042, Nov. 2020. [Baidu Scholar]
H. He, C. Wang, H. Jia et al., “An intelligent braking system composed single-pedal and multi-objective optimization neural network braking control strategies for electric vehicle,” Applied Energy, vol. 259, pp. 114-172, Feb. 2020. [Baidu Scholar]
H. Zhou, Y. Zhou, J. Hu et al., “LSTM-based energy management for electric vehicle charging in commercial-building prosumers,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 5, pp. 1205-1216, Sept. 2021. [Baidu Scholar]
Y. Wu, H. Tan, J. Peng et al., “Deep reinforcement learning of energy management with continuous control strategy and traffic information for a series-parallel plug-in hybrid electric bus,” Applied Energy, vol. 247, pp. 454-466, Aug. 2019. [Baidu Scholar]
Z. Wan, H. Li, H. He et al., “Model-free real-time EV charging scheduling based on deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 10, no. 5, pp. 5246-5257, Sept. 2019. [Baidu Scholar]
A. Chis, J. Lunden, and V. Koivunen, “Reinforcement learning-based plug-in electric vehicle charging with forecasted price,” IEEE Transactions on Vehicular Technology, vol. 66, no. 5, pp. 3674-3684, May 2017. [Baidu Scholar]
Z. Chen, C. Mi, J. Xu et al., “Energy management for a power-split plug-in hybrid electric vehicle based on dynamic programming and neural networks,” IEEE Transactions on Vehicular Technology, vol. 63, no. 4, pp. 1567-1580, May 2014. [Baidu Scholar]
J. Moreno, M. E. Ortuzar, and J. W. Dixon, “Energy-management system for a hybrid electric vehicle using ultracapacitors and neural networks,” IEEE Transactions on Industrial Electronics, vol. 53, no. 2, pp. 614-623, May 2006. [Baidu Scholar]
M. R. Shaarbaf and M. Ghayeni, “Identification of the best charging time of electric vehicles in fast charging stations connected to smart grid based on Q-learning,” in Proceedings of 2018 Electrical Power Distribution Conference (EPDC), Tehran, Iran, May 2018, pp. 78-83. [Baidu Scholar]
R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning. Cambridge: MIT Press, 1998. [Baidu Scholar]
S. Vandael, B. Claessens, D. Ernst et al., “Reinforcement learning of heuristic EV fleet charging in a day-ahead electricity market,” IEEE Transactions on Smart Grid, vol. 6, no. 4, pp. 1795-1805, Jul. 2015. [Baidu Scholar]
D. Stoyan and L. Redouane, “Reinforcement learning based algorithm for the maximization of EV charging station revenue,” in Proceedings of 2014 International Conference on Mathematics and Computers in Sciences and in Industry, Varna, Bulgaria, Sept. 2014, pp. 235-239. [Baidu Scholar]
D. O’Neill, M. Levorato, A. Goldsmith et al., “Residential demand response using reinforcement learning,” in Proceedings of IEEE SmartGrid Communications 2010, Gaithersburg, USA, Oct. 2010, pp. 409-414. [Baidu Scholar]
D. Osmankovic and S. Konjicija, “Implementation of Q-Learning algorithm for solving maze problem,” in Proceedings of the 34th International Convention MIPRO, Opatija, Croatia, May 2011, pp. 1619-1622. [Baidu Scholar]
V. Mnih, K. Kavukcuoglu, D. Silver et al. (2013, Dec.). Playing atari with deep reinforcement learning. [Online]. Available: https://arxiv.org/abs/1312.5602 [Baidu Scholar]
V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529-533, Feb. 2015. [Baidu Scholar]
D. Silver, A. Huang, C. J. Maddison et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, pp. 484-489, Jan. 2016. [Baidu Scholar]
E. Oh and H. Wang, “Reinforcement-learning-based energy storage system operation strategies to manage wind power forecast uncertainty,” IEEE Access, vol. 8, pp. 20965-20976, Jan. 2020. [Baidu Scholar]
C. Chen, M. Cui, F. Li et al., “Model-free emergency frequency control based on reinforcement learning,” IEEE Transactions on Industrial Informatics, vol. 17, no. 4, pp. 2336-2346, Apr. 2021. [Baidu Scholar]
M. Gheisarnejad, H. Farsizadeh, and M. H. Khooban, “A novel non-linear deep reinforcement learning controller for DC/DC power buck converters,” IEEE Transactions on Industrial Electronics. doi: 10.1109/TIE.2020.3005071 [Baidu Scholar]
Z. Wan, H. Li, H. He et al., “A data-driven approach for real-time residential EV charging management,” in Proceedings of 2018 IEEE PES General Meeting (PESGM’2018), Portland, USA, Jul. 2018, pp. 1-5. [Baidu Scholar]
T. P. Lillicrap, J. J. Hunt, A. Pritzel et al., “Continuous control with deep reinforcement learning,” in Proceedings of ICLR 2016: International Conference on Learning Representations 2016, San Juan, Puerto Rico, May 2016, p. 6. [Baidu Scholar]
V. D. W. Jos and J. Lasenby. (2018, Jan.). The unreasonable effectiveness of the forget gate. [Online]. Available: https://arxiv.org/abs/1804.04849 [Baidu Scholar]
B. Richard, Dynamic Programming. New York: Dover Publications, 1957. [Baidu Scholar]
K. Cho, B. Merrienboer, C. Gulcehre et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1724-1734. [Baidu Scholar]
W. Zaremba, I. Sutskever, and O. Vinyals. (2014, Jun.). Recurrent neural network regularization. [Online]. Available: https://arxiv.org/abs/1409.2329 [Baidu Scholar]
Z. C. Lipton, J. Berkowitz, and C. Elcan. (2015, Jan.). A critical review of recurrent neural networks for sequence learning. [Online].Available: https://arxiv.org/abs/1506.00019 [Baidu Scholar]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735-1780, Sept. 1997. [Baidu Scholar]
A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proceedings of 2013 IEEE International Conference on Acoustics Speech and Signal Processing, Vancouver, Canada, May 2013, pp. 6645-6649. [Baidu Scholar]
F. A. Gers, N. N. Schraudolph, and J. A. Schmidhuber, “Learning precise timing with LSTM recurrent networks,” Journal of Machine Learning Research, vol. 3, pp. 115-143, Mar. 2003. [Baidu Scholar]
F. A. Gers, J. A. Schmidhuber, and F. A. Cummins, “Learning to forget: continual prediction with LSTM,” Neural Computation, vol. 12, pp. 2451-2471, Dec. 2000. [Baidu Scholar]
X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS-11), Ft. Lauderdale, USA, Nov. 2011, pp. 315-323. [Baidu Scholar]
PJM. (2017, Mar.). Zone COMED. [Online]. Available: https://www.engieresources.com/ [Baidu Scholar]
G. Lai, W. C. Chang, Y. Yang et al., “Modeling long- and short-term temporal patterns with deep neural networks,” in Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, USA, Jun. 2018, pp. 95-104. [Baidu Scholar]
R. Dubey, S. R. Samantaray, and B. K. Panigrahi, “An spatiotemporal information system based wide-area protection fault identification scheme,” International Journal of Electrical Power & Energy Systems, vol. 89, pp. 136-145, Dec. 2017. [Baidu Scholar]
M. Cui, J. Wang, and B. Chen, “Flexible machine learning-based cyberattack detection using spatiotemporal patterns for distribution systems,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1805-1808, Mar. 2020. [Baidu Scholar]
S. Sun, Q. Yang, and W. Yan. “Optimal temporal-spatial PEV charging scheduling in active power distribution networks,” Protection and Control of Modern Power Systems, vol. 2, no. 1, pp. 1-10, Jan. 2017. [Baidu Scholar]
S. R. Etesami, W. Saad, N. B. Mandayam et al., “Smart routing of electric vehicles for load balancing in smart grids,” Automatica, vol. 120, p. 109148, Oct. 2020. [Baidu Scholar]
T. Ding, Z. Zeng, J. Bai et al., “Optimal electric vehicle charging strategy with Markov decision process and reinforcement learning technique,” IEEE Transactions on Industry Applications, vol. 56, no. 5, pp. 5811-5823, May 2020. [Baidu Scholar]