Abstract
The energy consumption of buildings has risen steadily in recent years. It is vital for the managers and owners of the building to manage the electric energy demand of the buildings. Forecasting electric energy consumption of the buildings will bring great profits, which is influenced by many factors that make it very difficult to provide an advanced forecasting. Recently, deep learning techniques are widely adopted to solve this problem. Deep neural network offers an excellent capability in handling complex non-linear relationships and competence in exploring regular patterns and uncertainties of consumption behaviors at the building level. In this paper, we propose a deep convolutional neural network based on ResNet for hour-ahead building load forecasting. In addition, we design a branch that integrates the temperature per hour into the forecasting branch. To enhance the learning capability of the model, an innovative feature fusion is presented. At last, sufficient ablation studies are conducted on the point forecasting, probabilistic forecasting, fusion method, and computation efficiency. The results show that the proposed model has the state-of-the-art performance, which reflects a promising prospect in application of the electricity market.
INCREASING energy demands have attracted more attention with the growth of the world population and economic development. At the same time, in order to solve problems of the pollution, carbon emissions and greenhouse, reducing energy consumption must be taken into account. According to the U.S. Energy Information Administration Monthly Energy Review, 40% of the energy consumption comes from buildings [
Since the energy and building types are various, the energy system in buildings is quite complicated [
In recent years, a large number of forecasting models have been proposed and applied to solve practical problems. The load forecasting category can be simply summarized as follows: very short-term load forecasting (VSTLF), short-term load forecasting (STLF), medium-term load forecasting (MTLF), and long-term load forecasting (LTLF). The main forecasting models are divided into two categories: traditional and artificial intelligence methods. The advantages of traditional methods depend on their clipping computing speeds and robustness [
Support vector machine (SVM) is a kernel-based machine learning algorithm, which can be used for both regression and classification. The algorithm reveals great competence in non-linear analysis. Support vector regression (SVR) [
Strategies based on machine learning offer a restricted capability of modeling in time series analysis. In recent years, neural networks have appeared as a powerful tool in the areas such as image processing and data analysis. Deep learning skills have been applied in load forecasting, and most of them rely on recurrent neural network (RNN) or long short-term memory (LSTM) [
Convolutional neural networks (CNNs) have achieved brilliant performances in the field of computer vision with well-known models such as AlexNet, ResNet, and DenseNet. ResNet was proposed in 2015, and it is the most widely popular CNN framework for feature extraction at present. Recently, ResNet models also demonstrate better performances in sequence processing, not only in speech synthesis, language modeling and machine translation, but also in electricity load forecasting. ResNet model contains a number of residual connections between different level blocks and delivers the errors to previous layers when the network is being trained with back propagation. This mechanism is able to increase the depth of the network and strengthen the ability to learn discriminative features. In other words, ResNet can effectively deal with the problem of gradient disappearance. An example of ResNet structure is illustrated in

Fig. 1 Structure of ResNet.
With a deeper architecture and sophisticated operation, neural networks, especially deep CNNs (DCNNs), provide superior abilities of learning discriminative features and non-linear relationships, which benefits extracting uncertainty factors of building load forecasting [
Although the range of forecasting time may vary from a few minutes to several years, especially shorter than a day, it is critical for buildings because utility prices may vary with the season and time. A more effective and efficient estimation of the peak electricity load in a day and the load shape has more possibilities to control the utility costs for providing more intelligence in smart buildings. In this paper, hourly electricity consumption data and temperature data are adopted to forecast hourly consumption one hour ahead (single-step forecasting) and 24 hours ahead (24-step forecasting).
Based on the literature outlined above, this paper proposes a novel CNN model for sub-hourly building load forecasting. The key contributions are as follows.
1) We propose a novel DCNN for building load forecasting. The baseline of the network is built on ResNet containing a number of residual connections in order to increase the depth of the model and enhance the ability of learning nonlinear relationships in time-series analysis.
2) For integrating the information of external factors into the forecasting model, we design another branch responsible for extracting the pattern of external factors per hour with fully connected layers.
3) A novel mechanism of feature fusion between two branches is proposed, which is also interpreted as a superior feature selection process and leads to a remarkable improvement in accuracy.
4) The proposed network serves in an end-to-end manner during the training and inference process. Sufficient ablation studies are conducted to demonstrate the effectiveness of innovations and great generalization in building load forecasting.
The rest of this paper will be organized as follows. Section II describes the details of the proposed model. Section III introduces the elements and details of the experiment process. Section IV reports the experimental results and discusses some doubts of the proposed model. Finally, the conclusions are summarized in Section V.
This paper proposes a CNN model with two branches including the forecasting branch and the external factor integration branch. There is an input of main network representing the sequence of historical load, and another input of branch represents outdoor weather variables. The vector Y is the output forecasting. The main purpose is to build a corresponding mapping relationship between inputs and future load sequences as follows.
(1) |
where , , are the historical loads, are the output forecasting, and and are the lengths of the input and output sequences, respectively; and is the error. When is 1, it is a single-step forecasting problem, and if is over 1, it is a multi-step forecasting problem.
In this paper, building load forecasting is solved as a sequence issue. Deep networks based on DCNN automatically extract the key change features layer by layer in the historical load sequence, and generate forecasting at the end of the model. External factors such as temperature are also the key features in building load forecasting and should be considered in the model. Changes of external factors will affect the pattern of load sequences. This effect is broad and comprehensive. For example, in hot weather, due to the use of air conditioning, the load changes more drastically during the day; on holidays and working days, the load curve changes quite differently. The external factors used in this paper refer specifically to historical data, since the focus here is on the complex relationship between historical external variables and load data.
Therefore, we propose a novel deep network structure to fuse sequence change features and external factor features. In this network, the external factor features are explicitly modeled as the constraints of the historical load, constraining the original input and changing characteristics of the historical load. Specifically, the learning to external variables generates indicator vectors, where each scalar is close to 0 or 1. Element-wise multiplication is done based on indicators with different layers in DCNN to control the expression of the key features. This is very different from the traditional feature fusion (TFF) ways, which generally concatenate sequence change features and external factor features, and generate forecasting through fully connected layers. Our model explicitly models the relationship between sequence change features and external factor features, introducing strong priors to the model. Therefore, this network could converge to a more optimal local minimum, which improves the forecasting accuracy.
Both of the forecasting branch and external factor integration branch run in a parallel way and constitute an end-to-end manner of training and inference. Forecasting branch serves for building load forecasting and external factor integration branch takes responsibilities of increasing additional significant features for efficient and effective forecasting exploring potential nonlinear relationships. Details of our proposed network are illustrated in

Fig. 2 Architecture of our proposed model.
The forecasting branch is a CNN that aims to extract the features of electric load sequences. It consists of a series of stacked residual blocks. In this study, we employ 4 blocks, each of which is composed of four layers with different functions: dilated convolution, Relu activation, normalization, and regular 1-dimension (1D) convolution, respectively, as shown in

Fig. 3 Illustration of one residual block as baseline of proposed forecasting branch.
Dilated convolution focuses on extracting a larger-scale of features in local receptive field and results in fusion of different dilated ratios, which means that comprehensive feature abstracting leads to an advanced performance in deep learning. Dilated convolution is widely used in image segmentation which can greatly increase the receptive field without increasing the computation complexity and the number of parameters. Therefore, we introduce the dilated convolution to extend receptive fields of a neuron by embedding zero-value holes at various scales. Dilated convolution does not mean that the blank elements are padded between the elements of the feature map when convoluted, but some elements are skipped over the existing elements, or the input is unchanged in this way, and some zero weights are inserted into kernel convolution parameters. In addition, dilated convolution introduces a parameter called dilation rate to indicate the expansion rate. In this study, we adopt the dilated convolution with rates 1, 2, 4, and 8, respectively. We also introduce residual connections to ensure that the gradient of our model will not disappear or explode due to the depth. Leaky Relu activation is responsible to filter salient features for data analysis [
Building load forecasting cannot live without external factors, especially weather conditions that have been proven a strong correlation in this field. External factor integration branch takes responsibilities of increasing additional significant features to improve the accuracy of building load forecasting. This branch contains multiple fully-connected layers with different numbers of activation functions in order to produce feature vectors with proper dimensions .
Specifically, the external factor integration branch takes a 24-dimension vector as the input, where each element represents the historical external factors per hour, totally 24 hours. Then, two fixed sizes of fully connected layers filter the input to extract coarse nonlinear features. Moreover, there are three hidden layers derived as outputs that are fused to forecasting branch, , , and , respectively, as illustrated in
CNN extracts the changing features in the load sequence and generates feature maps. Although deeper convolution layers can simultaneously extract and select features in feature maps, this process does not consider external factors.
In this paper, we propose a novel feature fusion or feature selection process shown in
We adopt the dataset from the genome project building data [
In addition, more buildings are added to our experiments to verify the forecasting ability of the proposed model. After eliminating the abnormal data, we adopt the data of a total of 300 buildings from January 1, 2010 to December 30, 2015, which are located in New York, Los Angeles, Chicago, Phoenix, London, and Switzerland, respectively. The buildings are all used for education, including offices, dormitories, laboratories, and classrooms. The floor areas range from 399 to 155679
In order to select more appropriate weather variables as the input of external factor integration branch, it is vital to select the most relevant features for building load forecasting before the model construction. There are some weather variables considered as candidates, which includes outdoor temperature (), humidity (), and wind speed (). Pearson correlation coefficient is also called simple correlation coefficient, which describes the closeness and correlation of the relationship between two variables. Its value stays between and 1. The value of 1 indicates that the variable is completely positively correlated, 0 expresses irrelevant, and means completely negatively correlated. Its calculation process can be summarized as:
(2) |
where represents the covariance; and and are the standard deviations of any weather variable and building load series , respectively.
The frequency of dataset acquisition is once an hour, and consequently there are 24 data points per day. The raw data may have noisy, missing or redundant variables. Therefore, raw data are preprocessed to ensure the dataset availability before experiments. The datasets are divided into training, validation, and testing dataset by 80%, 10%, and 10% in chronological order, respectively. Finally, in order to stabilize the learning process, input variables with corresponding validation and testing datasets are meticulously normalized. Normalization helps avoid dramatic changes on the gradient, which is beneficial to smoothen the convergence.
After all datasets are properly preprocessed, model parameters, i.e., weights and bias, and hyper-parameters, i.e., layers and number of neurons, are tuned using training and validation datasets. Once the optimal model parameters and hyper-parameters are obtained, the testing dataset of each building would be fed into the optimized models to evaluate performances.
In this study, our model is evaluated in comparison with GRU, LSTM, GCNN, and ResNet. Details of parameters are shown in
For fair comparison, some setup and hyperparameters of all models should be as consistent as possible, such as the batch size, optimization method, and number of trainable parameters. On this basis, we adopt the grid search method to determine other hyperparameters, including the learning rate, network depth, convolution kernel, and so forth.
To ensure the fairness, historical temperature data are also considered in four benchmark models. Unlike the proposed model, which handles the temperature and load sequences in two branches, historical temperature and load data in the benchmark models are integrated as two dimensions of the input sequence. The input vector form of the benchmark models is:
(3) |
We carry out point and probabilistic forecasting, separately, and evaluate the point forecasting results of the three buildings by the following three metrics, including mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE). Smaller values from these metrics mean that the model has lower errors and higher accuracies. The metrics are calculated based on (4)-(6).
(4) |
(5) |
(6) |
where ŷi and yi are the forecasting and true values, respectively; and N is the amount of data.
In order to estimate the quality of probabilistic forecasting, we adopt a comprehensive metric called pinball score (also called quantile score). It is widely recognized and can be interpreted as the accuracy of a quantile forecasting model. Pinball score for one quantile can be calculated by the following formula:
(7) |
where is the targeted quantile; and and are the forecasting and true values in the quantile at time , respectively.
The point forecasting accuracies of the three buildings are presented in

Fig. 4 Scatter plot of five models for building A. (a) ResNet. (b) GCNN. (c) LSTM. (d) GRU. (e) Proposed model.
Building A has lower MAPE values compared with the other two buildings. Along with the load decrease of different buildings, the accuracies of all models also go down (as shown from the results for buildings B and C). One of the reasons is larger buildings usually accommodate more occupiers. When there are more occupiers in a building, the uncertainty of their overall behaviors tends to be smaller, leading to more regular and predictable building load patterns. Another reason is the Pearson correlation coefficient of building A is the highest, which makes its forecasting benefits more from the temperature.
The MAPE of a more extensive experiment on 300 buildings is shown in

Fig. 5 MAPE heat map of single-step forecasting on 300 buildings.
In the practice of DR, 24-step forecasting provides dispatchers with the key information for pre-DR resource assignation of the next day. We evaluate the MAPE of 24-step forecasting on 300 buildings, as shown in

Fig. 6 MAPE heat map of 24-step forecasting on 300 buildings.
Probabilistic forecasting can provide more information than point forecasting, thus providing more application possibilities. Common probability forecasting include probability distribution parameter estimation, probability interval estimation, and quantile forecasting.
Among them, quantile forecasting does not depend on the distribution hypothesis, and the probability interval can be generated according to quantile forecasting. Therefore, quantile forecasting has a better generalization ability and forecasting accuracy. In this study, we use pinball loss as the loss function of each model and generate the forecasting of 9 quantiles from 0.1 to 0.9. By comparing the average pinball scores, we can observe the accuracy improvement brought by the proposed model for probabilistic forecasting.
Pinball scores of several models are presented in

Fig. 7 Pinball scores of probabilistic load forecasting for three buildings.
The TFF network also contains two branches, one for learning historical load sequences and the other for learning external factors. The difference with our model is that the representative vectors output from the two branches are only fused in the manner of vector concatenating, thus a fully-connected network is also needed in the rear for learning the vector obtained by concatenating. In this subsection, we compare our fusion model with TTF, keeping the other structures and parameters of two models consistent.
The comparison of the results is shown in
This is because our network explicitly models the relationship between the temperature and load curve, making it easier to converge to a more optimal extreme value.

Fig. 8 Learning curves of training error for building A with five models.
In this paper, we propose a DCNN based on ResNet for building load forecasting. With the dilated convolution in forecasting branch, the ability of CNN has been improved by extracting complex and significant features of load sequences. In addition, we propose another external factor integration branch which takes more significant weather features as the input. The features extracted from external factors will be fused effectively to enhance the ability of learning discriminative features remarkably. Therefore, the forecasting accuracy is optimized greatly without increasing parameters and operations. In this study, the performances of five different deep learning models, i.e., GRU, ResNet, LSTM, GCNN and our proposed model in the application of single-step and 24-step building load forecasting are systematically compared. Competitive results reveal that our model can serve more accurate forecasting, higher computational efficiency, and stronger generalization for different buildings.
References
U.S. Energy Information Administration. (2020, Nov.). U.S. energy information administration monthly energy review. [Online]. Available: http://www.eia.gov/totalenergy/data/monthly/#consumption [百度学术]
K. Amber, R. Ahmad, M. Aslam et al., “Intelligent techniques for forecasting electricity consumption of buildings,” Energy, vol. 157, pp. 886-893, May 2018. [百度学术]
F. Wang, H. Xu, T. Xu et al., “The values of market-based demand response on improving power system reliability under extreme circumstances,” Applied Energy, vol. 193, pp. 220-231, Jan. 2017. [百度学术]
I. Atzeni, L. G. Ordonez, G. Scutari et al., “Demand-side management via distributed energy generation and storage optimization,” IEEE Transactions on Smart Grid, vol. 4, no. 2, pp. 866-876, Jun. 2013. [百度学术]
Q. Shi, F. Li, Q. Hu et al., “Dynamic demand control for system frequency regulation: concept review, algorithm comparison, and future vision,” Electric Power Systems Research, vol. 154, pp. 75-87, Jan. 2018. [百度学术]
F. Kienzle, P. Ahcin, G. Andersson et al., “Valuing investments in multi-energy conversion, storage, and demand-side management systems under uncertainty,” IEEE Transactions on Sustainable Energy, vol. 2, no. 2, pp. 194-202, Apr. 2011. [百度学术]
R. Jain, K. Smith, P. Culligan et al., “Forecasting energy consumption of multi-family residential buildings using support vector regression: investigating the impact of temporal and spatial monitoring granularity on performance accuracy,” Applied Energy, vol. 123, pp. 168-178, Jun. 2014. [百度学术]
A. Colmenar-Santos, L. N. T. de Lober, D. Borge-Diez et al., “Solutions to reduce energy consumption in the management of large buildings,” Energy & Buildings, vol. 56, pp. 66-77, Jan. 2013. [百度学术]
N. Somu, G. Rama, and K. Ramamritham, “A hybrid model for building energy consumption forecasting using long short term memory networks,” Applied Energy, vol. 261, p. 114131, Mar. 2020. [百度学术]
H. Zhao and F. Magoul’s, “A review on the prediction of building energy consumption,” Renewable & Sustainable Energy Reviews, vol. 16, no. 6, pp. 3586-3592, Aug. 2012. [百度学术]
K. Song, Y.-S. Baek, D. Hong et al., “Short-term load forecasting for the holidays using fuzzy linear regression method,” IEEE Transactions on Power Systems, vol. 20, no. 1, pp. 96-101, Feb. 2005. [百度学术]
S. M. Moghaddastafreshi and M. Farhadi, “A linear regression-based study for temperature sensitivity analysis of Iran electrical load,” in Proceedings of IEEE International Conference on Industrial Technology, Chengdu, China, Apr. 2008, pp. 1-7. [百度学术]
H. Tao, G. Min, M. E. Baran et al., “Modeling and forecasting hourly electric load by multiple linear regression with interactions,” in Proceedings of IEEE PES General Meeting, Providence, USA, Jul. 2010, pp. 1-8. [百度学术]
L. Wei and Z. G. Zhang, “Based on time sequence of ARIMA model in the application of short-term electricity load forecasting,” in Proceedings of 2009 International Conference on Research Challenges in Computer Science, Shanghai, China, Dec. 2009, pp. 11-14. [百度学术]
C. M. Lee and C. N. Ko, “Short-term load forecasting using lifting scheme and ARIMA models,” Expert Systems with Applications, vol. 38, no. 5, pp. 5902-5911, May 2011. [百度学术]
W. Hong, “Electric load forecasting by seasonal recurrent SVR (support vector regression) with chaotic artificial bee colony algorithm,” Energy, vol. 36, no. 9, pp. 5568-5578, Sept. 2011. [百度学术]
W. Li, X. Yang, H. Li et al., “Hybrid forecasting approach based on GRNN neural network and SVR machine for electricity demand forecasting,” Energies, vol. 10, no. 1, p. 44, Jan. 2017. [百度学术]
D. Basak, P. Srimanta, and D. C. Patranbis, “Support vector regression,” Neural Information Processing Letters and Reviews, vol. 11, no. 10, pp. 203-224, Sept. 2007. [百度学术]
F. Cheng, X. Fu, and S. Wang, “Development of prediction models for next-day building energy consumption and peak power demand using data mining techniques,” Applied Energy, vol. 127, pp. 1-10, Aug. 2014. [百度学术]
F. Zhang, C. Deb, S. E. Lee et al., “Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique,” Energy & Buildings, vol. 126, pp. 94-103, Aug. 2016. [百度学术]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997. [百度学术]
T. Abel, P. V. Nguyen, M. Barad et al., “Genetic demonstration of a role for PKA in the late phase of LTP and in hippocampus-based long-term memory,” Cell, vol. 88, no. 5, pp. 615-626, Mar. 1997. [百度学术]
W. Kong, Z. Y. Dong, Y. Jia et al., “Short-term residential load forecasting based on LSTM recurrent neural network,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841-851, Jan. 2019. [百度学术]
W. Kong, Z. Y. Dong, D. J. Hill et al., “Short-term residential load forecasting based on resident behaviour learning,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 1087-1088, Mar. 2017. [百度学术]
H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting novel pooling deep RNN,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 5271-5280, Sept. 2017. [百度学术]
Z. Deng, B. Wang, Y. Xu et al., “Multi-scale convolutional neural network with time-cognition for multi-step short-term load forecasting,” IEEE Access, vol. 7, pp. 88058-88807, Jul. 2019. [百度学术]
Z. Deng, B. Wang, H. Guo et al., “Unified quantile regression deep neural network with time-cognition for probabilistic residential load forecasting,” Complexity, vol. 2020, pp. 1-18, Jan. 2020. [百度学术]
B. Xu, N. Wang, T. Chen et al. (2015, Nov.). Empirical evaluation of rectified activations in convolutional network. [Online]. Available: https://arxiv.org/abs/1505.00853 [百度学术]
S. Ioffe and C. Szegedy. (2015, Mar.). Batch normalization: accelerating deep network training by reducing internal covariate shift. [Online]. Available: https://arxiv.org/abs/1502.03167 [百度学术]
C. Miller and F. Meggers, “The building data genome project: an open, public data set from non-residential building electrical meters,” Energy Procedia, vol. 122, pp. 439-444, Sept. 2017. [百度学术]
H. Wang, Y. Wang, Q. Zhang et al., “Gated convolutional neural network for semantic segmentation in high-resolution images,” Remote Sensing, vol. 9, no. 5, p. 446, May 2017. [百度学术]