Abstract
As the penetration of renewable energy continues to increase, stochastic and intermittent generation resources gradually replace the conventional generators, bringing significant challenges in stabilizing power system frequency. Thus, aggregating demand-side resources for frequency regulation attracts attentions from both academia and industry. However, in practice, conventional aggregation approaches suffer from random and uncertain behaviors of the users such as opting out control signals. The risk-averse multi-armed bandit learning approach is adopted to learn the behaviors of the users and a novel aggregation strategy is developed for residential heating, ventilation, and air conditioning (HVAC) to provide reliable secondary frequency regulation. Compared with the conventional approach, the simulation results show that the risk-averse multi-armed bandit learning approach performs better in secondary frequency regulation with fewer users being selected and opting out of the control. Besides, the proposed approach is more robust to random and changing behaviors of the users.
HIGH penetration of wind and solar has brought great challenges in stabilizing the frequency of power grids. Conventionally, the generators track system loading levels to maintain the system frequency within a safe range [
With the rapid development of advanced metering infrastructure and communication technology, demand-side resources including electric vehicles [
Previous research efforts have been made on achieving load aggregation with faster response time, larger flexibility, higher economic efficiency and user-friendliness [
Demand response programs usually offer user option to opt-out while receiving the control signal. The reasons of opt-out are various such as having important events and feeling uncomfortable. Thus, due to the uncertain opt-out behavior, the aggregated demand in practice may differ from the scheduled target and can hardly serve as reliable resource for SFR. To tackle this issue, we propose a control strategy based on a risk-averse multi-armed bandit (MAB) learning approach. Through the online learning process, the load aggregator can understand the opt-out behavior of the users, so as to mitigate the influence on the uncertain response of users for load aggregation. The contributions of this work can be summarized as follows. Firstly, risk-averse MAB learning approach has been applied to learn uncertain responses of the users to SFR commands of load aggregator. Secondly, the proposed approach improves the reliability of aggregating residential HVAC for SFR while reducing the number of called and opt-out behaviors of the users. Thirdly, the proposed approach is robust to the random and changing behaviors of the users.
The rest of this paper is structured as follows. Section II proposes the model of HVAC in demand aggregation. Section III presents the dynamic control strategy based on the MAB learning approach for SFR. Section IV illustrates the performance of the proposed approach with case studies. Section V concludes this paper.
The first-order differential equation is adopted to model the temperature variation process based on the assumption that the temperature change is almost linear within the narrow temperature deadband [

Fig. 1 On/off curve of HVDC and power consumption.
In
(1) |
where and are the thermal capacitance and resistance, respectively; is the outdoor temperature; is the thermal power; and is the on/off state of HVAC.
The discrete form of (1) is:
(2) |
is governed by a switching law [
(3) |
In this case, the total HVAC load profile can be obtained:
(4) |
where is the performance of coefficient of an HVAC; N is the total number of HVACs; and is the heat output of HVAC i.
The parameters of HVAC are given in
On the basis of thermal model of 1 HVAC, the potential of 50000 HVACs in SFR on a typical hot summer day in Houston [

Fig. 2 Aggregated load profile of 50000 HVACs.
Based on the aggregated load profile of HVACs, the reserve capacity for SFR at different time can be easily estimated. Furthermore, when a disturbance occurs, the dynamic demand control strategy determines which HVAC to switch on/off according to the SFR requirement and reserve. An RS approach is carried out in the previous research, where the users are stochastically called with a probability of [
(5) |
where is the expected power aggregation for SFR; and is the available reserve.
However, the above approach optimistically estimates the behaviors of the users. Usually, in the residential demand response contracts, users always have the option to opt-out from the regulation commands. The complexity and randomness of their behaviors in the real-world increase the difficulty of performing reliable load aggregation.
In this paper, we model the response of each user to frequency regulation command as a probability function which follows Bernoulli distribution :
(6) |
where means the user i follows the command and means user i opts out the command; and is the participation probability that user i follows the command, thus the expected load change is , and the variance is .
Practically, we can hardly obtain accurate value of , which affects the performance of SFR with HVACs. Therefore, we adopt an online learning approach to estimate and aggegate the optimal set of users to improve the performance of their HVACs in SFR.
The flowchart of frequency regulation scheme is shown in

Fig. 3 Flowchart of frequency regulation scheme.
Suppose a disturbance happens. After PFR, the system frequency reaches a new steady state. Calculate the present frequency deviation from the rated value. Once the deviation exceeds the frequency threshold , SFR would be performed. Since the demand side is required to simulate the droop characteristic of the generation side to restore the system frequency to the normal range, the SFR droop coefficient is estimated by:
(7) |
where is the maximum SFR reserve of HVAC; and is the maximum frequency deviation that the system can sustain for SFR. Then, at time t, the expected aggregated demand target for SFR can be calculated by:
(8) |
The smaller value is picked between the theoretical expectation and the actual reserve on demand side for SFR at time t in case the HVAC capacity is insufficient. The same formula can be used for upward and downward frequency regulation.
Next, the load aggregator selects the users to send the frequency regulation commands according to the target . If the user responds to the request, corresponding HVAC will be temporarily switched on/off, otherwise no action will be performed. For achieving reliable frequency regulation, our objective is to minimize the square difference between the actual aggregated power of HVAC after user response and the target in expectation:
(9) |
During each frequency regulation event, the load aggregator receives from the dispatching center, and then selects a group of users with unknown participation probability profile to minimize the aggregated power deviation from the target. It is similar to combinatorial MAB (CMAB) problem: the decision-maker is allowed to choose some revenue-generating arms from a set each time, where the distribution of rewards for each machine is unknown. The objective for decision-maker is to maximize the rewards after a limited number of trials. Different from maximizing revenue, we aim to minimize the expected mismatch between the actual aggregated power and in order to achieve reliable frequency regulation.
Response probabilities of the users are unknown in realities, the aggregator should learn response behavior of the users online and adjust selection strategy according to the feedbacks of previous events. Since the feedback can only be obtained when the user is selected within a limited number of events, there is a trade-off between the exploitation of the known information and exploration of more information. Specifically, the exploitation means selecting users based on present estimated participation probability , and the exploration means selecting users that have not been adequately called for more information.
For the CMAB problem, there exist many related algorithms such as -greedy, Thompson sampling [
Meanwhile, the algorithm introduced above ignores the risk together with the rewards. However, the behaviors of the users are random and changing. The load aggregators should not only consider the expected aggregation power that the users can achieve but also avoid the risk of users with high power expectation but erratic performance. Similar consideration often arises in financial investments. For example, in the stock market, investors may not only consider whether stocks will bring high returns but also their variation to avoid risk when making long-term investments. Therefore, to ensure reliable frequency regulation, load aggregators may prefer users that provide stable responses. In this paper, we adopt risk-averse MAB learning approach [
In Algorithm 1, and are the constants; are the HVAC power of each user; n is the total number of users; and is the set of selected users.
The core of the above algorithm is to sort users in descending order as:
(10) |
Apparently, this formula gives the priority to choose those users with high estimated participation probability and fewer times of receiving frequency regulation signals, where the trade-off between the exploitation and exploration is reflected. The second term of (10) is to lower users’ priority whose responses exhibit great variability. Thus, the risk brought from uncertain behaviors of the users can be mitigated.
Note that the highest order term of time complexity is the log-linear time, which scales well in risk-averse MAB learning approach.
The schematic of control framework is shown in

Fig. 4 Schematic of control framework.
In this section, the modified IEEE RTS 24-bus, 10-machine system [

Fig. 5 IEEE RTS 24-bus system.
We compare the risk-averse MAB based load aggregation approach with RS and offline approaches. Since the previous study [
(11) |
where is the expected mean of power of HVACs based on the estimated participation probability of of the user.
Meanwhile, the offline approach is often used to verify the effectiveness of online learning approaches, where the response probability of all users to the regulation commands is pretended to be known.
Assume that a disturbance of power supply shortage of 74 MW occurs at bus 2 at 15:00. There are 50000 users (one HVAC per user) who have signed up for providing frequency regulation service distributed at load buses 15, 19, and 20, and the average power consumption for each HVAC is . According to the load profile presented in
(12) |
Meanwhile, the number of HVAC available for SFR is calculated as:
(13) |
We consider that the actual participation probabilities of users obey the uniform distribution [0,1]. Since these values are unknown in practice, the initial estimates of them in RS and risk-averse MAB learning approaches are , and the mean is set to be 0.65.
When the system frequency falls below the threshold of 59.97 Hz, PFR is activated. And the responsive load capacity and frequency droop for PFR are 34.40 MW and 200.0 MW/Hz, respectively [

Fig. 6 System frequency after PFR.
The system frequency should be further brought back to the normal level ( Hz) through SFR by turning off the HVACs. According to (8), the target reduction of HVACs is 28.09 MW.

Fig. 7 Comparison of frequency regulation results.

Fig. 8 95% confidence intervals of load reduction results of different approaches. (a) Actual aggregated power. (b) Relative reduction deviations from target.
Besides,

Fig. 9 Number of called and opt-out users of different approaches. (a) Called users. (b) Opt-out users.
In practical application, the selection for each time comes with a cost and user fatigue. Therefore, the risk-averse MAB approach not only ensure the reliability of the load aggregation but also guarantee the economy and user-friendliness, which achieves the win-win results for both the load aggregators and users.
The mean of initial estimate of is 0.65, which is higher than the actual average value . Since is one indispensable input of user-selection approaches, its impact on the results of different approaches under diverse settings is discussed. We consider two more cases: ① is about 0.5, so the guess about the behavior of the user is very close to the truth; ② , which underestimates the actual response of the user.

Fig. 10 Impacts of initial estimated probability to actual aggregated power.

Fig. 11 Impacts of initial estimated probability to number of called and opt-out users of different approaches. (a) Called users. (b) Opt-out users.
In practice, the behaviors of the users have great uncertainty and randomness. Hence, the response probability is not always a constant value. It may be affected by the satisfaction level of frequency regulation program, temperature tolerance, outdoor temperature, personal lifestyle, and neighborhood effect. In this case, a variation ratio for the behaviors of the users is introduced. We assume that 10% of users would change their response probability for every 20 events. Other settings are consistent with Section Ⅳ-A.

Fig. 12 95% confidence intervals of load reduction results of different approaches with changing behaviors of users. (a) Actual aggregated power. (b) Relative reduction deviations from target.

Fig. 13 Comparison of frequency regulation results with changing behaviors of users.
In terms of economy and user-friendliness,

Fig. 14 Number of called and opt-out users of different approaches with changing behaviors of users. (a) Called users. (b) Opt-out users.
This paper presents a control strategy for aggregating residential HVACs to participate in SFR based on the risk-averse MAB. Based on the thermal model of individual HVAC, the aggregated load profile is estimated. Then, the frequency regulation reserve of HVACs for the up-down regulation and droop coefficient of SFR can be determined. In the aggregation process, the risk-averse MAB learning approach is implemented to understand the opt-out behavior of the users to frequency regulation commands. Through the online learning process of risk-averse MAB, the load aggregator can mitigate the uncertainty of aggregated demand and provide better SFR service.
Compared with conventional approach, the proposed MAB-based approach can achieve a better frequency regulation performance while fewer users are called and opt-out. The simulation results verify that the proposed approach is robust to the random and changing behaviors of the users. These advantages are beneficial for load aggregators to provide efficient and economical SFR service.
In the future, we plan to consider the impact of the fatigue effect of the users in responding to repeated demand aggregation control signals.
References
P. J. Douglass, R. Garcia-Valle, P. Nyeng et al., “Smart demand for frequency regulation: experimental results,” IEEE Transactions on Smart Grid, vol. 4, no. 3, pp. 1713-1720, Sept. 2013. [百度学术]
Y. Bian, H. Wyman-Pain, F. Li et al., “Demand side contributions for system inertia in the GB power system,” IEEE Transactions on Power Systems, vol. 33, no. 4, pp. 3521-3530, Jul. 2018. [百度学术]
H. Bevrani, A. Ghosh, and G. Ledwich, “Renewable energy sources and frequency regulation: survey and new perspectives,” IET Renewable Power Generation, vol. 4, no. 5, pp. 438-457, Sept. 2010. [百度学术]
A. Palomino and M. Parvania, “Data-driven risk analysis of joint electric vehicle and solar operation in distribution networks,” IEEE Open Access Journal of Power and Energy, vol. 7, pp. 141-150, Mar. 2020. [百度学术]
H. Liu, Z. Hu, Y. Song et al., “Vehicle-to-grid control for supplementary frequency regulation considering charging demands,” IEEE Transactions on Power Systems, vol. 30, no. 6, pp. 3110-3119, Nov. 2015. [百度学术]
Q. Zhai, K. Meng, Z. Dong et al., “Modeling and analysis of lithium battery operations in spot and frequency regulation service markets in Australia electricity market,” IEEE Transactions on Industrial Informatics, vol. 13, no. 5, pp. 2576-2586, Oct. 2017. [百度学术]
L. Zhao, W. Zhang, H. Hao et al., “A geometric approach to aggregate flexibility modeling of thermostatically controlled loads,” IEEE Transactions on Power Systems, vol. 32, no. 6, pp. 4721-4731, Nov. 2017. [百度学术]
R. D’hulst, W. Labeeuw, B. Beusen et al., “Demand response flexibility and flexibility potential of residential smart appliances: experiences from large pilot test in Belgium,” Applied Energy, vol. 155, no. 1, pp. 79-90, Oct. 2015. [百度学术]
S. Nistor, J. Wu, M. Sooriyabandara et al., “Capability of smart appliances to provide reserve services,” Applied Energy, vol. 138, no. 15, pp. 590-597, Jan. 2015. [百度学术]
M. Afzalan and F. Jazizadeh, “Residential loads flexibility potential for demand response using energy consumption patterns and user segments,” Applied Energy. doi: 10.1016/j.apenergy.2019.113693 [百度学术]
T. Clarke, T. Slay, C. Eustis et al., “Aggregation of residential water heaters for peak shifting and frequency response services,” IEEE Open Access Journal of Power and Energy, vol. 7, pp. 22-30, Nov. 2020. [百度学术]
O. Erdinç, A. Taşcıkaraoğlu, N. G. Paterakis et al., “End-user comfort oriented day-ahead planning for responsive residential HVAC demand aggregation considering weather forecasts,” IEEE Transactions on Smart Grid, vol. 8, no. 1, pp. 362-372, Jan. 2017. [百度学术]
Q. Shi, F. Li, Q. Hu et al., “Dynamic demand control for system frequency regulation: concept review, algorithm comparison, and future vision,” Electric Power Systems Research, vol. 154, pp. 75-87, Jan. 2018. [百度学术]
F. Pallonetto, M. De Rosa, F. D’Ettorre et al., “On the assessment and control optimisation of demand response programs in residential buildings,” Renewable and Sustainable Energy Reviews. doi: 10.1016/j.rser.2020.109861 [百度学术]
H. Hao, B. M. Sanandaji, K. Poolla et al., “Potentials and economics of residential thermal loads providing regulation reserve,” Energy Policy, vol. 79, pp. 115-126, Apr. 2015. [百度学术]
X. Wu, J. He, Y. Xu et al., “Hierarchical control of residential HVAC units for primary frequency regulation,” IEEE Transactions on Smart Grid, vol. 9, no. 4, pp. 3844-3856, Jul. 2018. [百度学术]
B. M. Sanandaji, T. L. Vincent, and K. Poolla, “Ramping rate flexibility of residential HVAC loads,” IEEE Transactions on Sustainable Energy, vol. 7, no. 2, pp. 865-874, Apr. 2016. [百度学术]
S. Weckx, R. D’Hulst, and J. Driesen, “Primary and secondary frequency support by a multi-agent demand control system,” IEEE Transactions on Power Systems, vol. 30, no. 3, pp. 1394-1404, May 2015. [百度学术]
S. Lin, D. Liu, F. Hu et al., “Grouping control strategy for aggregated thermostatically controlled loads,” Electric Power Systems Research, vol. 171, pp. 97-104, Jun. 2019. [百度学术]
Q. Shi, F. Li, G. Liu et al., “Thermostatic load control for system frequency regulation considering daily demand profile and progressive recovery,” IEEE Transactions on Smart Grid, vol. 10, no. 6, pp. 6259-6270, Nov. 2019. [百度学术]
Y. Shen, Y. Li, Q. Zhang et al., “State-shift priority based progressive load control of residential HVAC units for frequency regulation,” Electric Power Systems Research. doi: 10.1016/j.epsr.2020.106194 [百度学术]
D. S. Callaway, “Tapping the energy storage potential in electric loads to deliver load following and regulation, with application to wind energy,” Energy Conversion and Management, vol. 50, no. 5, pp. 1389-1400, May 2009. [百度学术]
Q. Shi, C. Chen, A. Mammoli et al., “Estimating the profile of incentive-based demand response (IBDR) considering technical models and social-psychological factors,” IEEE Transactions on Smart Grid, vol. 11, no. 1, pp. 171-183, Jan. 2020. [百度学术]
J. L. Mathieu, M. Dyson, and D. S. Callaway, “Using residential electric loads for fast demand response: the potential resource and revenues, the costs, and policy recommendations,” in Proceedings of the 2012 ACEEE Summer Study on Energy Efficiency in Buildings, Pacific Grove, USA, Aug. 2012, pp. 189-203. [百度学术]
D. J. Russo, B. Van Roy, A. Kazerouni et al., “A tutorial on thompson sampling,” Foundations and Trends in Machine Learning, vol. 11, no. 1, pp. 1-6, Nov. 2017. [百度学术]
A. Mohamed, A. Lesage-Landry, and J. A. Taylor, “Dispatching thermostatically controlled loads for frequency regulation using adversarial multi-armed bandits,” in Proceedings of 2017 IEEE Electrical Power and Energy Conference (EPEC), Saskatoon, Canada, Oct. 2017, pp. 338-343. [百度学术]
W. Chen, Y. Wang, Y. Yuan et al., “Combinatorial multi-armed bandit and its extension to probabilistically triggered arms,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1746-1778, Jan. 2016. [百度学术]
S. Li, B. Wang, S. Zhang et al., “Contextual combinatorial cascading bandits,” in Proceedings of the 33rd International Conference on Machine Learning, New York, USA, Jun. 2016, pp. 1245-1253. [百度学术]
Y. Li, Q. Hu, and N. Li, “Learning and selecting the right customers for reliability: a multi-armed bandit approach,” in Proceedings of the IEEE Conference on Decision and Control, Miami Beach, USA, Dec. 2018, pp. 4869-4874. [百度学术]
S. Vakili and Q. Zhao, “Risk-averse multi-armed bandit problems under mean-variance measure,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 6, pp. 1093-1111, Sept. 2016. [百度学术]
Y. Xu, L. Xie, and C. Singh, “Optimal scheduling and operation of load aggregators with electric energy storage facing price and demand uncertainties,” in Proceedings of 2011 North American Power Symposium, Boston, USA, Aug. 2011, pp. 1-7. [百度学术]
W. Lyu, J. Wu, L. Zhao et al., “Load aggregator-based integrated demand response for residential smart energy hubs,” Mathematical Problems in Engineering, vol. 2019, pp. 1-14, Apr. 2019. [百度学术]
F. Milano. (2006, Mar.). Power system analysis toolbox documentation for PSAT version 2.0.0_1. [Online]. Available: http://faraday1.ucd.ie/ psat.html [百度学术]