Abstract
The intermittency of renewable energy generation, variability of load demand, and stochasticity of market price bring about direct challenges to optimal energy management of microgrids. To cope with these different forms of operation uncertainties, an imitation learning based real-time decision-making solution for microgrid economic dispatch is proposed. In this solution, the optimal dispatch trajectories obtained by solving the optimal problem using historical deterministic operation patterns are demonstrated as the expert samples for imitation learning. To improve the generalization performance of imitation learning and the expressive ability of uncertain variables, a hybrid model combining the unsupervised and supervised learning is utilized. The denoising autoencoder based unsupervised learning model is adopted to enhance the feature extraction of operation patterns. Furthermore, the long short-term memory network based supervised learning model is used to efficiently characterize the mapping between the input space composed of the extracted operation patterns and system state variables and the output space composed of the optimal dispatch trajectories. The numerical simulation results demonstrate that under various operation uncertainties, the operation cost achieved by the proposed solution is close to the minimum theoretical value. Compared with the traditional model predictive control method and basic clone imitation learning method, the operation cost of the proposed solution is reduced by 6.3% and 2.8%, respectively, over a test period of three months.
THERE is a broad consensus that the high proportion of renewable energy generation is the key technology for achieving low-carbon energy supply and solving environmental problems [
In practice, energy storage systems (ESSs) including the vehicle-to-everything mobile ESS [
In the literature, a number of microgrid economic dispatch solutions have been proposed to cope with the aforementioned multiple uncertainties, covering day-ahead scheduling, intra-day optimization, and real-time dispatch.
The optimal day-ahead scheduling solutions generally require precise predictions of system power production and demand [
Based on the day-ahead energy scheduling, the intra-day optimization strategy can be implemented according to the real-time operation and the latest forecast information for the energy scheduling in the coming hours. These rolling/receding horizon optimization solutions combine offline and online methods to mitigate the problem of variability and uncertainty with their predictive and self-correcting capabilities. Since the latest system states can be updated with more accurate information on intra-day stage, the two-stage optimization or closed-loop model predictive control (MPC) framework has been investigated. In [
To further reduce uncertainties and eliminate the effect of prediction errors, the real-time dispatch has received more and more attention. These online solutions could not rely on cumbersome predictions of multiple random variables. This stochastic sequential decision problem is often considered and modeled as a Markov decision process (MDP), which uses Bellman’s equation to decompose the temporal dependency and partition the large-scale optimization. However, such a high-dimensional decision space may lead to the “curse of dimensionality” to MDP methodologies. To address these challenges, approximate dynamic programming (ADP) and reinforcement learning (RL) are developed to solve Bellman’s equation through value function approximation (VFA) [
To overcome the above limitations of RL, imitation learning (IL) based economic dispatch methods have attracted more and more attention. IL can greatly enhance the efficiency of RL in decision-making by learning from demonstration samples with expert knowledge. IL methods generally include two categories: behavior clone learning (BCL) and inverse reinforcement learning (IRL). BCL methods simulate the expert suggested demonstrations through supervised learning to realize the action decision under the corresponding state. IRL methods adopt a similar structure to RL, but the reward function in IRL is unknown. IRL simulates the optimal reward function by matching it with expert demonstrations. IRL tries to find the underlying intent of the expert policy so that it can provide a better generalization policy for unseen states or environments with slightly different dynamics. Whereas the model parameters of BCL without policy learning process are easier to train and optimize, and BCL is more convenient to deploy and reproduce.
Reference [
Framework | Main technique | Reference |
---|---|---|
Day-ahead scheduling | Stochastic optimization |
[ |
Fuzzy optimization |
[ | |
Robust optimization |
[ | |
Intra-day optimization | Two-stage approach |
[ |
Rolling optimization |
[ | |
Real-time dispatch | RL |
[ |
IL |
[ |
Compared with RL methods commonly used in a real-time fashion, IL-based economic dispatch methods offer the advantage of fully exploring the pattern distribution in historical data and making more efficient use of high-quality demonstration samples derived from expert experience. Following the IL process, the intelligent decision-making model can be deployed on edge computing platforms for extended periods of time, while minimizing latency and bandwidth requirements. Unlike existing MPC-based solutions, this direct inference approach without the need for iterative optimization saves significant field computing resources and reduces communication delays and congestion [
A successful clone IL model that can make an accurate inference to discriminate between different situations requires plentiful labeled training samples with sufficient diversification to maximize the pattern information in the data. However, the high cost of demonstration labeling or the inaccessibility of labeled samples is always the main reason for the “over-fitting” phenomenon in machine learning [
Thus, based on BCL, a hybrid model combining unsupervised learning and supervised learning is developed in this study to further enhance the learning accuracy and generalization capability of the decision-making solution for economic dispatch. The main ideas of this solution are as follows: the vast amount of historical data on the cloud platform are leveraged to analyze stochastic variables with inherent uncertainties of wind, photovoltaic (PV), load, and real-time price (RTP) through unsupervised learning to obtain the latent representation of the system operation patterns. Then, the decision-making sequences of the economic dispatch for certain historical days are recalculated by modeling the offline optimization problem. Since the optimization problem is solved after the fact and the conditions that have occurred are already known, there are no uncertainties involved, thus allowing for the attainment of an optimal dispatch. Afterward, the supervised learning model is applied to learn, remember, and understand the complex mapping between the optimal dispatch and the corresponding operation patterns. Finally, by utilizing sensing devices to obtain the latest system information, the well-trained model can be deployed to the edge and perform real-time economic dispatch based on actual operation conditions.
The main contributions can be summarized in two-fold.
1) An IL-based decision-making solution is developed to realize real-time economic dispatch, which substantially reduces the need for the precise forecasting of multiple stochastic variables and the development of sophisticated policies.
2) A hybrid model combining unsupervised learning and supervised learning is utilized to learn the optimal dispatch of different operation patterns using expert demonstrations, which improves the generalization ability of the proposed solution under multiple operation uncertainties.
The remainder of this paper is organized as follows. The system modeling and the formulation of the economic dispatch problem are presented in Section II. Section III presents the proposed IL-based real-time decision-making solution for microgrid economic dispatch. Section IV extensively evaluates the proposed solution and analyzes the numerical findings. Finally, conclusions are drawn in Section V.
A grid-connected microgrid with cloud-edge architecture is examined in this study through the point of common coupling (PCC) with various types of RESs, i.e., PV sources, micro wind turbines (WTs), and BESS, as illustrated in

Fig. 1 Illustration of grid-connected microgrid with cloud-edge architecture.
In this study, the PV sources, WTs, and load demands are considered non-dispatchable. The ESS is a dispatchable unit that coordinates the renewable energy generation and demand during the economic dispatch. The decision-making of economic dispatch in the microgrid aims to optimize the use of RESs to reduce imbalances between the power generation and demand, while minimizing operation costs in a real-time pricing environment and improving the lifespan of storage devices. The economic dispatch of a microgrid can be formulated as an optimization problem that considers long-term economic objectives and operation constraints.
Since regional microgrids are often located within a limited geographical area, the power loss is negligible. The power balance constraint is formulated as:
(1) |
where , , and are the power of WTs, PV sources, and loads in time slot t, respectively, which are the non-dispatchable variables; is the exchanged power absorbed/injected by/to the utility grid, and when is positive, it means purchasing electricity from the utility grid; and is the dispatched power of the BESS. When BESS is discharging, is positive, and when BESS is charging, is negative.
The total operation cost of a microgrid mainly includes two components: the electricity purchasing cost from the utility grid () and the BESS deterioration cost due to charging and discharging (). The utility grid with sufficient capacity can enable the power of the microgrid to be fed back at the same electricity price. Thus, the objective of economic dispatch in a long-term optimization horizon is:
(2) |
where is the state of health (SOH) of the BESS; and T is the set of time slots for the long-term objective (always one day) in the global optimization. The itemized costs are shown as:
(3) |
(4) |
where is the RTP of the electricity of the power grid in time slot ; and is the degradation coefficient of the BESS.
The SOH degradation iteration of the BESS caused by charging and discharging cycles is formulated as [
(5) |
where is the degradation factor associated to the change in the state of charge (SOC) of the BESS, which can be calculated as [
(6) |
where , , and are the degradation parameters determined by the BESS characteristics from empirical tests.
The SOC change of the BESS, i.e., , is determined by the charging or discharging power:
(7) |
where and are the efficiency coefficients of charging and discharging, respectively; is the time interval; and is the capacity of the BESS. SOC is restricted to [0.2, 0.8] to prevent BESS deterioration caused by deep charging and discharging.
The dispatched power output constraint of the BESS satisfies:
(8) |
where and are the lower and upper limits of the dispatched power of the BESS, respectively.
Once the renewable energy generation, demand, and electricity price are known, the economic dispatch can be formulated as a deterministic optimization problem. By solving this problem, the optimal dispatch over a day in different operation patterns from the historical data can be obtained. Different types of optimization tools can be used to solve this problem, such as commercial solvers Gurobi and CPLEX. For the optimization problem established in Section II-A, heuristic algorithms can be a powerful approach to address such non-convex optimization problem. The solved results can be evaluated by the expert experience to obtain the optimal dispatch decision as close as possible to the optimal solution. Among the heuristic algorithms, particle swarm optimization (PSO) algorithm is considered efficient with a minimal implementation complexity [
In this paper, a hybrid model combining unsupervised learning and supervised learning is proposed to construct the mapping relationship between complex operation patterns and optimal dispatch decisions considering multiple uncertain inputs and a real-time operation environment.
First, through the unsupervised learning model, the hidden representations of the time-series observations are extracted to reveal the potential knowledge in a variety of operation patterns. These observations are uncontrollable stochastic variables over a period, including wind power, PV power, load demand, and RTP. The matrix of the observed stochastic variables is formulated as:
(9) |
where is the length of the time series, which indicates the perception range of operation patterns.
Next, the supervised learning model is applied to memorize and learn the sophisticated inference from input space (constructed by the extracted features through the unsupervised learning model and the matrix of system state variables shown in (10)) to output space (labeled by the optimal dispatch decision at the corresponding time).
(10) |
The framework of the proposed solution is shown in

Fig. 2 Framework of proposed solution.
It is worth noting that the training set can be supplemented by the scenario generation method when the sample data are insufficient. For example, [
The autoencoder is an unsupervised learning method that can explicitly learn the important hidden representations on the manifold [
(11) |
where represents the encoding-decoding process using the DAE network.
The DAE can recover the original data from an encoded representation on the manifold of the corrupted input data via a decoding function . D is the original space dimension, and N is the encoding space dimension. The DAE learns the reconstruction distribution from the training data pairs through the following process [
1) Perturbation process adds stochastic noise into the original data to generate a corrupted input data .
2) Encoding function generates a hidden representation of the input data.
3) Decoding function reconstructs the input data from the encoded representation .
4) Loss metric can measure the dissimilarity between the original data and the reconstructed output.
The encoded representation is generated from a corrupted input data with perturbations, which necessitates learning a sufficiently clever mapping on the manifold to extract useful features for denoising.
Generally, a conditional probabilistic distribution is considered to independently perturb each dimension of the input data, i.e., .
In the encoding process, the corrupted input data are transformed to a encoded representation as [
(12) |
where is the weight coefficient matrix; is the hidden bias vector; ; and is the non-linear activation function.
Then, the hidden representation is reconstructed to by decoding function as [
(13) |
where is the input bias vector; ; and is the non-linear mapping function at the decoder. The parameters and of the encoder and decoder functions are trained by minimizing the reconstruction error, measured by the loss metric .
In this study, the supervised learning model is adopted to identify the complicated mapping between the input features from time sequences and the output dispatch decisions. The LSTM neural network [
(14) |
where denotes the input matrix of the LSTM neural network in time slot t, which includes time-series variables with different features.
The structure of the LSTM neural network is shown in

Fig. 3 Structure of LSTM neural network.
The process of forwarding propagation can be expressed as [
(15) |
(16) |
where is the state of hidden layer in time slot t; is the ReLU activation function; and are the weights between input/hidden layer and the hidden/output layer, respectively; is the weight between the current hidden layer and the hidden layer in the next time slot; and is the inference output of the LSTM neural network in time slot t. In this study, represents the decision variable of the BESS dispatch in the next time slot.
The input gate controls which parts of the new information are added and stored in the long-term memory state. The value of the input gate in time slot t can be expressed as [
(17) |
where is the input information of “memory” block in time slot t; is the weight between the input layer and the input gate; is the weight between the state of short-term memory in the previous time slot and the input gate; is the bias vector of the input gate; and is the sigmoid activation function.
The forget gate controls which long-term memory state should be dropped. The value of forget gate in time slot t can be expressed as [
(18) |
where is the weight between the input layer and the forget gate; is the weight between the state of short-term memory in the previous time slot and the forget gate; and is the bias vector of the forget gate.
The output gate controls which long-term memory state should be read and output in this time slot. The value of output gate in time slot t can be expressed as [
(19) |
where is the weight between the input layer and the output gate; is the weight between the state of a short-term memory in the previous time slot and the output gate; and is the bias vector of the output gate.
The output of and can be expressed as [
(20) |
(21) |
(22) |
where is the weight between the input layer and the main layer of the memory block; is the state of a short-term memory in the previous time slot and the main layer of the memory block; is the bias vector; and is the element-wise product of the vectors.
The microgrid shown in
For different microgrids, the optimal dispatching trajectories for IL is derived from the historical operation patterns of corresponding microgrid. Therefore, the proposed solution can be applied to the economic dispatch of various grid-connected microgrids in real-time electricity market environment.
The renewable energy generation and load profiles used in the simulations are taken from a microgrid testbed in 2015 [

Fig. 4 Pattern variation profiles of renewable energy generations, loads, and RTPs in training dataset. (a) WTs. (b) PV sources. (c) Loads. (d) RTPs.
In this study, the number of hidden layers of the LSTM neural network is set to be 2, and each hidden layer consists of 50 cell blocks. The number of layers of the DAE network is set to be 3. Then, the two networks are both trained using the backpropagation through the Adam algorithm with the loss function of RMSE [
In the simulation experiment, this study considers that the sensors used to monitor real-time status of the system in practice are reliable enough, so the impact of the field monitoring errors on the solutions is not considered.
To illustrate the performance of the proposed solution, two operation scenarios are selected: scenario 1 represents the normal operation scenario; while scenario 2 represents the worst operation scenario. In each scenario, the results calculated by the proposed solution and the optimal dispatch without uncertainties are compared, which are shown in Figs.

Fig. 5 Results calculated by proposed solution and optimal dispatch without uncertainties in scenario 1. (a) Results of proposed solution. (b) Results of optimal dispatch without uncertainties.

Fig. 6 Results calculated by proposed solution and optimal dispatch without uncertainties in scenario 2. (a) Results of proposed solution. (b) Results of optimal dispatch without uncertainties.
The result of
Compared with the results of the optimal dispatch without uncertainties given in
In scenario 2, as illustrated by
To assess and compare the computational complexity and the economic performance of the proposed solution, three different solutions are used as the benchmarks. The benchmark solutions are described as follows.
1) Solution 1: day-ahead stochastic scheduling [
2) Solution 2: MPC-based rolling optimization, which can be summarized as follows [
3) Solution 3: basic clone IL-based dispatch, which only uses the supervised learning model to verify the improvement of the unsupervised learning model on the proposed solution, in which the input of the LSTM neural network is the original feature not extracted by the DAE network.
Each solution involves either an offline process or an online process during execution. The computation of the offline process can be computed in the cloud platform. Online process is generally performed on the edge device and generally needs to be performed once per decision period. The regulation resolution of the simulation is 15 min, resulting in an online execution frequency of 96 times per day. All solutions are implemented using Python 3.7 on a computer with a 3.00 GHz Intel Core i5-7400U CPU, Nvidia GTX 1650 GPU and 8 GB RAM. The computational complexity analysis of different solutions is shown in
Solution | Computational process | Execution time (s) |
---|---|---|
Solution 1 | Stochastic variable forecasting (offline) | 1.0 |
Optimization solving (offline) | 60.0 | |
Solution 2 | Stochastic variable forecasting (online) | 1.0 |
Rolling optimization (online) | 60.0 | |
Solution 3 | Optimal scheduling sample optimization solving (offline) | 5400.0 |
Supervised learning training (offline) | 1200.0 | |
Real-time dispatch (online) | <0.1 | |
Proposed | Optimal scheduling sample optimization solving (offline) | 5400.0 |
Unsupervised learning training (offline) | 720.0 | |
Supervised learning training (offline) | 1200.0 | |
Real-time dispatch (online) | <0.1 |
The main computational complexity of the proposed solution lies in the acquisition of expert samples, which needs to optimize the optimal scheduling trajectories of historical days. Another part that requires computational cost is the training of the learning model. These complex calculations can be performed through the offline process. Economic dispatch decisions are made using the real-time information obtained through the online process. At this stage, the learning model only needs to perform forward inference, which incurs relatively low computational costs. This makes it highly suitable for meeting real-time computing requirements.
Compared with solution 1, the proposed solution needs to solve more optimization problems during the offline process to obtain IL sample trajectories. The computational cost of the proposed solution increases linearly with the number of samples. In contrast, solution 1 only needs to solve one optimization problem and does not require additional online computational processes during the day. The computational cost of the proposed solution during a single offline process is much higher than that of solution 1. However, the proposed solution can deploy the decision model in a long term (in this simulation, for three months) using well-trained model parameters. Therefore, the frequency of offline process for updating model parameters in the proposed solution can be very low, whereas the solution 1 needs to execute the offline process every day.
Compared with solution 2, the proposed solution requires both offline optimization and online decision-making. Solution 2 requires continuous optimization for each control interval, with the computational cost primarily incurred during the online process. Each optimization task needs to be completed within a short period, which necessitates edge devices to possess adequate computational capability. In contrast, the proposed solution only requires forward inference during the online process of the decision model, which incurs relatively low computational costs. This makes it easier to meet real-time computing requirements.
Compared with solution 3, the proposed solution introduces the training process of an unsupervised learning model. Since the unsupervised learning model does not need to obtain expert trajectories through optimization as supervised learning samples, the offline computational cost only slightly increases during the training of unsupervised learning model.
The operation costs in each testing month obtained from the proposed solution and three benchmark solutions are shown in

Fig. 7 Operation costs of different solutions over tested months.
The result presented in
As the time progresses from September to November, a noticeable trend emerges in which the cost savings for all solutions consistently decrease. The reason for this could be the consideration of a longer time period between the training data and the test month, as renewable energy generation patterns are more similar within the same season. This finding suggests that while the models may perform well initially, their effectiveness may gradually decline over time due to factors such as seasonal variations and evolving patterns of energy generations and loads. Thus, by rolling update of the model training and incorporating newly collected pattern data, the performance of the solution can be maintained and ensured.
The performance evaluation is carried out for the proposed solution, and the numerical results in terms of the average cost against the benchmark solutions are presented in
Test month | Solution | Cb ($) | Cg ($) | ($) | Inevitable cost ($) |
---|---|---|---|---|---|
Sept. | Solution 1 | 341.1 | 12653.5 | 12994.6 | 2691.0 |
Solution 2 | 263.7 | 12290.0 | 12553.7 | 2250.1 | |
Solution 3 | 215.5 | 11717.3 | 11932.8 | 1629.2 | |
Proposed | 247.8 | 11477.4 | 11725.2 | 1421.6 | |
Oct. | Solution 1 | 328.8 | 8902.7 | 9231.5 | 2764.1 |
Solution 2 | 222.7 | 8318.6 | 8541.3 | 2073.9 | |
Solution 3 | 165.5 | 8181.9 | 8347.4 | 1880.0 | |
Proposed | 194.7 | 7748.7 | 7943.4 | 1476.0 | |
Nov. | Solution 1 | 306.8 | 6045.1 | 6351.9 | 1137.7 |
Solution 2 | 207.5 | 5988.7 | 6196.2 | 982.0 | |
Solution 3 | 164.6 | 5866.0 | 6030.6 | 816.4 | |
Proposed | 173.5 | 5731.5 | 5905.0 | 690.8 |
According to the economic evaluation results, IL-based solutions are more competitive under conditions of the same available data. The proposed solution can achieve the greatest cost savings compared with other solutions during all test months.
Test month | Percentage of operation cost savings (%) | ||
---|---|---|---|
Solution 1 | Solution 2 | Solution 3 | |
Sept. | 9.8 | 6.6 | 1.7 |
Oct. | 14.0 | 7.0 | 4.8 |
Nov. | 7.0 | 4.7 | 2.1 |
Average | 10.5 | 6.3 | 2.8 |
For the proposed solution, the optimization results of economic dispatch under deterministic conditions are computed first, and then the machine learning model is used to learn the non-linear complex mapping between input patterns and optimal dispatch results in high-dimensional space. The generalization errors are manifested as deviations of inference results from the optimal dispatch trajectories in the new patterns. For the stochastic optimization framework based on prediction results, the cumulative prediction errors of multiple random variables cause the day-ahead stochastic optimization in day-ahead and the deterministic optimization under actual conditions to be inconsistent in the optimal solution space. Besides, the high-dimensional non-convex optimization solution may be easily affected by the multi-saddle points, which leads to suboptimal solutions [
This paper proposes an IL-based decision-making solution to realize microgrid economic dispatch in a real-time fashion. The proposed solution is capable of effectively addressing the economic dispatch problem with high operation uncertainties caused by the intermittency of renewable energy generation and the stochasticity in market prices and loads. By learning the optimal dispatch of the historical operation patterns in a data-driven way, the proposed solution with good generalization performance can make intelligent decisions close to the optimal dispatch. The proposed solution is easy to deploy in practice and suitable for the cloud-edge collaborative communication and computing architecture of the future microgrid.
The proposed solution is evaluated through simulation tests subject to various uncertainties. Compared with the benchmark solutions of day-ahead stochastic optimization, MPC-based rolling optimization, and basic clone IL-based dispatch, the numerical results demonstrate that the total operation cost of the proposed solution is reduced by 10.5%, 6.3%, and 2.8%, respectively, for all the test months.
References
S. Eslami, Y. Noorollahi, M. Marzband et al., “District heating planning with focus on solar energy and heat pump using GIS and the supervised learning method: case study of Gaziantep, Turkey,” Energy Conversion and Management, vol. 269, p. 116131, Oct. 2022. [Baidu Scholar]
Z. Wu, J. Wang, H. Zhong et al., “Sharing economy in local energy markets,” Journal of Modern Power Systems and Clean Energy, vol. 11, no. 3, pp. 714-726, May 2023. [Baidu Scholar]
S. Islam, A. Iqbal, M. Marzband et al., “State-of-the-art vehicle-to-everything mode of operation of electric vehicles and its future perspectives,” Renewable and Sustainable Energy Reviews, vol. 166, p. 112574, Sept. 2022. [Baidu Scholar]
D. Sadeghi, N. Amiri, M. Marzband et al., “Optimal sizing of hybrid renewable energy systems by considering power sharing and electric vehicles,” International Journal of Energy Research, vol. 46, no. 6, pp. 8288-8312, May 2022. [Baidu Scholar]
A. Bharatee, P. K. Ray, and A. Ghosh, “A Power management scheme for grid-connected PV integrated with hybrid energy storage system,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 4, pp. 954-963, Jul. 2022. [Baidu Scholar]
H. Shuai, J. Fang, X. Ai et al., “Stochastic optimization of economic dispatch for microgrid based on approximate dynamic programming,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 2440-2452, May 2019. [Baidu Scholar]
D. Prudhviraj, P. B. S. Kiran, and N. M. Pindoriya, “Stochastic energy management of microgrid with nodal pricing,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 1, pp. 102-110, Jan. 2020. [Baidu Scholar]
F. Conte, S. Massucco, M. Saviozzi et al., “A stochastic optimization method for planning and real-time control of integrated PV-storage systems: design and experimental validation,” IEEE Transactions on Sustainable Energy, vol. 9, no. 3, pp. 1188-1197, Jul. 2018. [Baidu Scholar]
S. E. Ahmadi, M. Marzband, A. Ikpehai et al., “Optimal stochastic scheduling of plug-in electric vehicles as mobile energy storage systems for resilience enhancement of multi-agent multi-energy networked microgrids,” Journal of Energy Storage, vol. 55, p. 105566, Nov. 2022. [Baidu Scholar]
D. Sadeghi, S. E. Ahmadi, N. Amiri et al., “Designing, optimizing and comparing distributed generation technologies as a substitute system for reducing life cycle costs, CO2 emissions, and power losses in residential buildings,” Energy, vol. 253, p. 123947, Aug. 2022. [Baidu Scholar]
M. Moafi, R. R. Ardeshiri, M. W. Mudiyanselage et al., “Optimal coalition formation and maximum profit allocation for distributed energy resources in smart grids based on cooperative game theory,” International Journal of Electrical Power & Energy Systems, vol. 144, p. 108492, Jan. 2023. [Baidu Scholar]
W. Dong, Q. Yang, X. Fang et al., “Adaptive optimal fuzzy logic based energy management in multi-energy microgrid considering operational uncertainties,” Applied Soft Computing, vol. 98, p. 106882, Jan. 2021. [Baidu Scholar]
J. Zhang, M. Cui, Y. He et al., “Multi-period two-stage robust optimization of radial distribution system with cables considering time-of-use price,” Journal of Modern Power Systems and Clean Energy, vol. 11, no. 1, pp. 312-323, Jan. 2023. [Baidu Scholar]
S. Sharma, A. Verma, Y. Xu et al., “Robustly coordinated bi-level energy management of a multi-energy building under multiple uncertainties,” IEEE Transaction on Sustainable Energy, vol. 12, no. 1, pp. 3-13, Jan. 2021. [Baidu Scholar]
L. Tian, L. Cheng, J. Guo et al., “System modeling and optimal dispatching of multi-energy microgrid with energy storage,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 5, pp. 809-819, Sept. 2020. [Baidu Scholar]
N. Nasiri, S. Zeynali, S. N. Ravadanegh et al., “A tactical scheduling framework for wind farm-integrated multi-energy systems to take part in natural gas and wholesale electricity markets as a price setter,” IET Generation, Transmission & Distribution, vol. 16, no. 9, pp. 1849-1864, May 2022. [Baidu Scholar]
W. Dong and Q. Yang, “Data-driven solution for optimal pumping units scheduling of smart water conservancy,” IEEE Internet of Things Journal, vol. 7, no. 3, pp. 1919-1926, Mar. 2020. [Baidu Scholar]
M. Daneshvar, B. Mohammadi-Ivatloo, K. Zare et al., “Two-stage robust stochastic model scheduling for transactive energy based renewable microgrids,” IEEE Transactions on Industrial Informatics, vol. 16, no. 11, pp. 6857-6867, Nov. 2020. [Baidu Scholar]
W. Hu, P. Wang, and H. B. Gooi, “Toward optimal energy management of microgrids via robust two-stage optimization,” IEEE Transactions on Smart Grid, vol. 9, no. 2, pp. 1161-1174, Mar. 2018. [Baidu Scholar]
M. A. Velasquez, J. Barreiro-Gomez, N. Quijano et al. “Intra-hour microgrid economic dispatch based on model predictive control,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 1968-1979, May 2020. [Baidu Scholar]
A. Parisio, E. Rikos, and L. Glielmo, “Stochastic model predictive control for economic/environmental operation management of microgrids: an experimental case study,” Journal of Process Control, vol. 43, pp. 24-37, Jul. 2016. [Baidu Scholar]
J. Sachs and O. Sawodny, “A two-stage model predictive control strategy for economic diesel-PV-battery island microgrid operation in rural areas,” IEEE Transaction on Sustainable Energy, vol. 7, no. 3, pp. 903-913, Jul. 2016. [Baidu Scholar]
Y. Yoldas, S. Goren, and A. Onen, “Optimal control of microgrids with multi-stage mixed-integer nonlinear programming guided Q-learning algorithm,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1151-1159, Nov. 2020. [Baidu Scholar]
C. Keerthisinghe, A. C. Chapman, and G. Verbič, “Energy management of PV-storage systems: policy approximations using machine learning,” IEEE Transactions on Industrial Informatics, vol. 15, no. 1, pp. 257-265, Jan. 2019. [Baidu Scholar]
E. Foruzan, L. Soh, and S. Asgarpoor, “Reinforcement learning approach for optimal distributed energy management in a microgrid,” IEEE Transactions on Power Systems, vol. 33, no. 5, pp. 5749-5758, Sept. 2018. [Baidu Scholar]
J. Duan, Z. Yi, D. Shi et al., “Reinforcement-learning-based optimal control of hybrid energy storage systems in hybrid AC-DC microgrids,” IEEE Transactions on Industrial Informatics, vol. 15, no. 9, pp. 5355-5364, Sept. 2019. [Baidu Scholar]
W. Liu, P. Zhuang, H. Liang et al., “Distributed economic dispatch in microgrids based on cooperative reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2192-2203, Jun. 2018. [Baidu Scholar]
Y. Shang, W. Wu, J. Guo et al., “Stochastic dispatch of energy storage in microgrids: an augmented reinforcement learning approach,” Applied Energy, vol. 261, p. 114423, Mar. 2020. [Baidu Scholar]
S. Dey, T. Marzullo, and G. Henze, “Inverse reinforcement learning control for building energy management,” Energy and Buildings, vol. 286, p. 112941, May 2023. [Baidu Scholar]
Q. Tang, H. Guo, and Q. Chen, “Multi-market bidding behavior analysis of energy storage system based on inverse reinforcement learning,” IEEE Transactions on Power Systems, vol. 37, no. 6, pp. 4819-4831, Nov. 2022. [Baidu Scholar]
S. Gao, C. Xiang, M. Yu et al., “Online optimal power scheduling of a microgrid via imitation learning,” IEEE Transactions on Smart Grid, vol. 13, no. 2, pp. 861-876, Mar. 2022. [Baidu Scholar]
Y. Zhang, Q. Yang, D. Li et al., “A reinforcement and imitation learning method for pricing strategy of electricity retailer with customers’ flexibility,” Applied Energy, vol. 323, p. 119543, Oct. 2022. [Baidu Scholar]
W. Dong, Q. Yang, W. Li et al., “Machine learning-based real-time economic dispatch in islanding microgrids in a cloud-edge computing environment,” IEEE Internet of Things Journal, vol. 8, no. 17, pp. 13703-13711, Sept. 2021. [Baidu Scholar]
S. Kulkarni, Q. Gu, E. Myers et al., “Enabling a decentralized smart grid using autonomous edge control devices,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7406-7419, Oct. 2019. [Baidu Scholar]
Z. Gong, P. Zhong, and W. Hu, “Diversity in machine learning,” IEEE Access, vol. 7, pp. 64323-64350, May 2019. [Baidu Scholar]
J. G. Vlachogiannis and K. Y. Lee, “A comparative study on particle swarm optimization for optimal steady-state performance of power systems,” IEEE Transactions on Power Systems, vol. 21, no. 4, pp. 1718-1728, Nov. 2006. [Baidu Scholar]
W. Dong and M. Zhou, “A supervised learning and control method to improve particle swarm optimization algorithms,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, no. 7, pp. 1135-1148, Jul. 2017. [Baidu Scholar]
J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of International Conference on Neural Networks, Perth WA, Australia, Nov. 1995, pp. 1942-1948. [Baidu Scholar]
W. Dong, X. Chen, and Q. Yang, “Data-driven scenario generation of renewable energy production based on controllable generative adversarial networks with interpretability,” Applied Energy, vol. 308, p. 118387, Feb. 2022. [Baidu Scholar]
Y. Bengio, A. Courville, and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013. [Baidu Scholar]
P. Vincent, H. Larochelle, I. Lajoie et al., “Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, pp. 3371-3408, Dec. 2010. [Baidu Scholar]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735-1780, Nov. 1997. [Baidu Scholar]
F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: continual prediction with LSTM,” Neural Computation, Vol. 12, pp. 2451-2471, Oct. 2000. [Baidu Scholar]
D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proceedings of International Conference on Learning Representations, San Diego, USA, May 2015, pp. 1-13. [Baidu Scholar]
Y. Wang, W. Dong, and Q. Yang, “Multi-stage optimal energy management of multi-energy microgrid in deregulated electricity markets,” Applied Energy, vol. 310, p. 118528, Mar. 2022. [Baidu Scholar]
Y. Dauphin, R. Pascanu, C. Gulcehre et al., “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization,” in Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, Dec. 2014, pp. 2933-2941. [Baidu Scholar]