Abstract
Lately, the power demand of consumers is increasing in distribution networks, while renewable power generation keeps penetrating into the distribution networks. Insufficient data make it hard to accurately predict the new residential load or newly built apartments with volatile and changing time-series characteristics in terms of frequency and magnitude. Hence, this paper proposes a short-term probabilistic residential load forecasting scheme based on transfer learning and deep learning techniques. First, we formulate the short-term probabilistic residential load forecasting problem. Then, we propose a sequence-to-sequence (Seq2Seq) adversarial domain adaptation network and its joint training strategy to transfer generic features from the source domain (with massive consumption records of regular loads) to the target domain (with limited observations of new residential loads) and simultaneously minimize the domain difference and forecasting errors when solving the forecasting problem. For implementation, the dominant techniques or elements are used as the submodules of the Seq2Seq adversarial domain adaptation network, including the Seq2Seq recurrent neural networks (RNNs) composed of a long short-term memory (LSTM) encoder and an LSTM decoder, and quantile loss. Finally, this study conducts the case studies via multiple evaluation indices, comparative methods of classic machine learning and advanced deep learning, and various available data of the new residentical loads and other regular loads. The experimental results validate the effectiveness and stability of the proposed scheme.
SHORT-TERM load forecasting of power demand is one of the main research areas in electrical engineering [
Numerous techniques have been applied to generate forecasts, which can be grouped into classic statistical techniques (e.g., stochastic time-series models [
Lately, the community has paid attention to residential load forecasting and probabilistic forecasting problems. On the one hand, the residential load is uncertain and random, depending on varying individual behavior patterns, regional layouts, and complicated factors such as weather conditions, exhibiting random time-series characteristics [
Furthermore, sequence-to-sequence (Seq2Seq) RNNs can describe the volatile temporal characteristics of residential consumption records in distribution networks [
On the other hand, the probabilistic load forecasting solutions that can describe the uncertainty of predictions are upsurging [
Moreover, current studies seldom notice the limited implementation of the powerful Seq2Seq RNNs to the forecasting scheme for the new residential loads that generally lack enough data records. In other words, insufficient data make it difficult to generate accurate forecasts for the new residential load. To our knowledge, there remains no technical work focusing on new residential loads from real-life considerations, which motivates us to propose a probabilistic residential load forecasting scheme using limited data on the Seq2Seq RNN. The domain adaptation network is well-known in computer vision and other artificial intelligence fields, but its application to the residential load forecasting is not so well-known from expectation. Reference [
Hence, this paper proposes a short-term probabilistic residential load forecasting scheme. The aim of this study is to solve the load forecasting problem for the new residential loads, where many other regular loads have been connected to power system for a relatively long time. The distribution of energy demand of the new residential loads, which is difficult to approximate with insufficient or limited data, can be different from that of other regular loads that own relatively sufficient data. The innovation of the proposed scheme is the Seq2Seq adversarial domain adaptation network that can align distributions of sufficient data of regular loads and the limited data of new residential loads, thus reasonably meeting the assumption that the training data are independent and identically distributed before leveraging the potential power of deep learning methods to improve the forecasts, which is necessary but usually ignored. Thus, the proposed scheme can process sufficient data of regular loads to maintain model performance, though the data of the new residential loads are limited.
The contributions of this paper are as follows.
1) A Seq2Seq adversarial domain adaptation network and its joint training strategy are proposed to align the distribution of sufficient data of regular loads and the limited data of the new residential loads.
2) An LSTM cell based teacher forces the Seq2Seq RNN with quantile loss as the submodules of the Seq2Seq adversarial domain adaptation network to capture temporal dependencies between regular loads and the new residential loads.
3) The proposed Seq2Seq adversarial domain adaptation network is comprehensively compared with other classic methods in different horizons, criteria, and scarcity degrees of energy consumption observations.
The remainder of this paper is organized as follows. Section II formulates the short-term probabilistic residential load forecasting problem. Section III proposes the short-term probabilistic residential load forecasting scheme. Section IV performs a comprehensive case study to validate the proposed scheme using multiple evaluation indices, comparative techniques, and limited degrees. Section V draws the conclusions.
This section will clarify the short-term probabilistic residential load forecasting problem from the short-term deterministic load forecasting problem and the conventional multi-step (such as 24-point day-ahead) load forecasting problem.
Given the dataset at the time , we often process the chronological observations by identifying adequate models and optimizing the model assignments. Denote the current time as , the forecasting gap time as , the forecasting lead time (i.e., horizon) as , and the number of horizons as . The records at a certain time cover the elements that can be classified into the static covariate vector , the historical vector , and the future vector , where . The static covariate vector covers temporal-static factors such as the load location and the identity number of residents. The historical vector involves the information available from the historical time, i.e., the observed factors such as target loads and other public factors , where is denoted as the span of the last records for each horizon. The future vector involves the information for the future moments, i.e., the target vector and other pre-known factors such as calendar rules. In regression, the target vector is the dependent variable , and the input variable involves historical, future, and static covariate vectors, i.e., , , as expressed in (1).
(1) |
where reflects the mapping between the input and the output ; is the assignment space of the model , and is the number of candidate models; is the parameter vector or matrix of the model ; and is the number of parameters.
Given the dataset of influential factors and target variables, the experts estimate the optimal parameter assignments for the alternative models over varying optimal hyperparameters that are tuned by trials and errors with the training data , as expressed in (2). Let be the mapping described via model . Experts then compare the performance of the alternative models to identify the adequate model to produce predictions over the testing inputs (a.k.a., features), assuming that true testing targets remain unknown before deploying it, as shown in (3).
(2) |
(3) |
where is the set of ; is the empirical loss; is the expectation of the loss function on the independent and identically distributed training data under an unknown distribution ; and is a cost function to evaluate the forecasting accuracy.
When the lead time is a time step, the model generates a single scalar (i.e., point) at each time, where the load forecasting problem is defined as a single-step load forecasting problem in (4). When the lead time covers more than one observed moment for the whole next day or week, the model outputs a specific profile composed of multiple points at a time in the multi-step load forecasting problem. There are three major solutions to the multi-step load forecasting problem, i.e., the day-ahead forecasting problem in this study: ① combining multiple single-step load forecasting models, each of which produces forecasts independently, as depicted in (5); ② using a multi-step load forecasting model to generate interval-ahead curves once and for all, as given in (6), which is adopted in this study by deep learning methods; and ③ replacing the inputs with the last predicted values to generate the forecasts iteratively, as given in (7).
(4) |
(5) |
(6) |
(7) |
where h is the index of horizon, .
The single-step load forecasting problem is defined as a deterministic forecasting problem when the model generates a deterministic value at each time, as given in (4). The deterministic forecasts are the most common form but cannot portray the uncertainty of the predictions. Thus, uncertain predictions in the forms of intervals, quantiles, and distribution densities are targeted in the probabilistic load forecasting problem. Each model depicts the probability distribution with the random variable of the target variable at the time , given the random variable of the input , i.e., . The random variable correlates to the conditional probability distribution for the target variable or the forecasting error given the input .
Moreover, the conditional probability distribution can be represented by the probability distribution density in (8), the quantile point in (9), or the random prediction interval in (10) [
(8) |
(9) |
(10) |
where is the nominal quantile level, and is the order of quantile prediction; is the parameter vector or matrix of model for the quantile ; and is the prediction interval composed of lateral quantiles, and and are the upper and lower bounds of nominal quantile level, respectively.

Fig. 1 Load forecasts at a specific moment in different forms. (a) Point, prediction interval, and probability density. (b) Point, quantile, and probability density.
The proposed scheme consists of the attention-based encoder-decoder network, adversarial domain adaptation network, LSTM RNNs, and quantile loss.

Fig. 2 Overall framework of proposed scheme.
The training set is also mixed with the data record related to regular loads and their known factors, i.e., .
Generally, sufficient samples are required to identify flexible NNs with numerous parameters. The Seq2Seq adversarial domain adaptation network is proposed based on a feature extractor, a demand predictor, and a domain classifier by realizing a joint training process with the gradient reversal layer. For implementation, we adopt the generic elements as the submodules of the adversarial domain adaptation network including: ① the feature extractor by connecting an LSTM layer and a dropout layer, ② the demand predictor by connecting another LSTM layer and dropout layer, ③ a fully-connected feedforward layer as the domain classifier, and ④ dense layers after the demand predictor and domain classifier. In addition, we use an attention-based layer and a skip connection to capture the longer temporal dependency while mitigating the vanishing gradient problem, enhancing feature reuse, and facilitating the learning of identity mappings, which is motivated by the Seq2Seq RNN.
Both the data of the new residential load and regular loads can be leveraged via the Seq2Seq adversarial domain adaptation network, where the Seq2Seq RNN is well-trained by quantile loss before generating the multi-step probabilistic load forecasts. The optimal parameter assignments of the Seq2Seq adversarial domain adaptation network are represented by (11) and (12), and then the estimated network generates the final load forecasts by (13).
(11) |
(12) |
(13) |
where is the dataset of the new load; is the vector of parameters for the proposed scheme; is the prediction of the new load; is the function of the proposed Seq2Seq adversarial domain adaptation network; and are the feature extractor and the vector of its parameters, respectively; and are the demand predictor and the vector of its parameters, respectively; and are the output layer and the vector of its parameters, respectively; is the vector of parameters for domain classfier; and , , , and are the optimal parameter assignments of , , , and , respectively.
The parameters of the NN-based forecasting models are optimized by assuming the training data are independent and identically distributed. However, the accuracy of load forecasts could not be assured when the data distributions of the training and testing datasets vary, which is also known as a shift between data distributions of the training and test datasets. The concept of domain adaptation aims to learn a discriminative classifier or another predictor when there is a shift between data distributions of the training and test datasets, which is generally operated by matching the feature distributions in the source and target domains of synthetic or semi-synthetic image data. A dominant approach is to accomplish a feature space transformation that measures the similarity or dissimilarity between different distributions and maps the distributions of the source domain to the target domain [
On the one hand, the demand predictor predicts the class label of domains during the training and testing processes. On the other hand, during the training process, the domain classifier discriminates between the source and the target domains. The feature extractor that is connected to the demand predictor and the domain classifier learns the deep features with discriminative and domain-invariance capabilities. Specifically, the parameters of the two components are optimized to minimize their error on the training set, and the parameters of the feature extractor are optimized to minimize the losses of the demand predictor and the domain classifier. After the optimization, the feature extractor learned for the source domain can be implemented in the target domain. The error of the joint training processes of the feature extractor, demand predictor, and domain classifier on the adversarial domain adaptation network is calculated by:
(14) |
where is the domain classifier; is the number of samples; is the domain classification of observation, when the data at the time belong to the target domain, when the data at the time belong to the source domain; and are the loss functions; and is the trade-off weight between the losses of domain classifier and demand predictor .
As a result, the deep feature from the feature extractor represents a space transformation between the distribution of the output of the demand predictor in the source domain and that in the target domain , i.e.,
(15) |
where is the true distribution of the outputs , .
Moreover, the parameters , , and are orderly optimized to deliver the saddle point of (14):
(16) |
Specifically, the saddle point (16) is a stationary point of stochastic gradient descent (SGD) updates for the feedforward network model composed of a feature extractor , a domain classifier , and a demand predictor :
(17) |
(18) |
(19) |
where is the learning rate, which can vary over time.
However, we note that the minimization of the objective function (14), i.e., , includes a minimization optimization of prediction and a maximization of classification , and thus a “pseudo-function gradient” reversal layer only with the hyperparameter proposed in [
(20) |
(21) |
where is a gradient reversal layer; and is an identity matrix.
Based on , the modification of the objective function (14) with standard SGD forward propagation and backpropagation can be obtained as:
(22) |
The RNN is a promising approach for solving load forecasting problems because the record of energy consumption often exhibits temporal characteristics. RNNs proposed for processing sequential data (such as speech, multivariate time series, and text) can leverage the time interdependency in the chronological values through the ideas of parameter-sharing and graph-unrolling [
(23) |
(24) |
where is a hyperbolic tanh function; , and are the matrices of weights; and and are the vectors of biases.
The LSTM RNN adopts the gating mechanism to alleviate the gradient vanishing problem in conventional RNNs for modeling relatively long short-term dependency. It creates the path whose derivatives neither vanish nor explode to use the early temporal dependency (the hidden state) over the connection weights. The gradients flow via self-loops, where the weights are conditioned to the given data, as depicted in

Fig. 3 Diagram of RNN variants. (a) Recurrent graph, unrolled graph, and internal cell structure of RNNs. (b) Internal cell structure of LSTM RNNs.
Specifically, the LSTM RNN takes an outer recurrence as the conventional RNN, an outer recurrence as a generic RNN, and an internal recurrence as the LSTM cell, as given in (25)-(30). In the outer recurrence, the element-wise mediate variable correlates to the affine transformation of the input variable and the hidden layer vector from the last moment. In the internal recurrence, the internal state vector at the given time depends on the variable that relates to the input gate and the variable from the last moment that relates to the forget gate . The LSTM cell , i.e., the hidden layer vector , derives from the signal from the output gate . This process has been depicted in [
(25) |
(26) |
(27) |
(28) |
(29) |
(30) |
where is the Sigmoid function, which is another activation function besides ; and , , and are the specific parameters in the LSTM cell.
The Seq2Seq RNN exhibits superior capability in modeling temporal characteristics between the input and output sequences by leveraging the local context around the target. The Seq2Seq RNN was first designed and used in computer vision, speech recognition, and natural language processing [

Fig. 4 Encoder-decoder network with mixed inputs.
Existing works usually consider calendar-related information as categorial features such as the hours in a day, the days in a week, the months in a year, the distinction between holidays and non-holidays, the distinction between weekdays and weekends, and the distinction between varying seasons, which can be used as part of the inputs of the load forecasting model after encoding. The one-hot encoding approach is one of the most common encoding approaches for categorial features.

Fig. 5 Example of one-hot encoding approach on monthly calendar-related features.
To address the explosion of inputs by the one-hot encoding approach, we adopt the Bayesian target encoding to pre-process categorial features. Specifically, the values of the categorial features are compared with average observations of the load in the corresponding categorical values, as depicted in

Fig. 6 Example of Bayesian target encoding on monthly calendar features.
Moreover, we utilize a Z-score normalization method to map the values of the target load and its influential factors with different dimensions into a specific range [
(31) |
To evaluate the model performance, we compare load forecasts and true values via quantile score (QS) and Winkler score (WS) indexes.
The QS index calculated by (32) and (33) represents the mean of pinball losses throughout the lead time and all quantiles, respectively. A lower QS result indicates more precise forecasts compared with the ground truth values.
(32) |
(33) |
where is the quantile loss function for the nominal quantile level at the time ; is the optimal hyperparameter; is the prediction for quantile at the time ; and is the average quantile loss.
Based on the quantile loss, the loss function of the dense layer after the feature extractor, the gradient update (with the impact on the loss ), and the optimization process is established by (34)-(36), respectively.
(34) |
(35) |
(36) |
where is the dense loss function for the nominal quantile level at the time .
The gradient updates (16). Besides, the loss function (22) with the loss can be updated as:
(37) |
(38) |
(39) |
where is the loss function of the proposed frame for the nominal quantile level at the time ; is the prediction of the new load for the nominal quantile level at the time ; is the ground truth of load at the time ; is the prediction of the domain label at the time ; and is the ground truth label at the time .
The WS index evaluates the sharpness and reliability of the prediction intervals constrained to the quantile bounds by (40) and (41). A lower WS is generally desired because an overly wide interval is meaningless [
(40) |
(41) |
(42) |
where is the interval loss function at time ; is the average WS loss; and is the confidence level relating to and .
In addition, the categorical cross-entropy (CCE) index, which is popular for classification problems, is adopted in (43)-(46) to compute the domain classification error. The ratio between the scales of samples from the source domain and the target domain is considered. The criterion makes the gradients computable when training the classification model. A lower CCE index generally represents better classification capability. However, we require a relatively high classification error to achieve the distributional consistency of the output of the demand predictor from source domain inputs and target domain inputs , across the connection between the feature extractor , the gradient reversal layer , and the domain classifier , i.e., , in this study.
(43) |
(44) |
(45) |
(46) |
where is the CCE loss function at the time ; is the average CCE loss; is the number of the source domains; is the number of target domains; is the probability of the sample from the target domain at the time ; and represents the normalized exponential function.
A widely accepted real-life dataset published in [
Generally, the similarity should be evaluated before adapting the knowledge in the source domain to the target domain. As a preliminary similarity assessment, we pick the New Hampshire load zone as the target domain and other load zones as potential source domains. A dominant K-shape clustering algorithm is used to process the eight data sources of energy consumption and calculate the similarity between them [
Load zones | Monthly | Half-year | ||
---|---|---|---|---|
SBD | No. of clusters | SBD | No. of clusters | |
West/Central Massachusetts | 0.01693278 | 0 | 0.01404867 | 0 |
Vermont | 0.01923366 | 1 | 0.02019146 | 1 |
Rhode Island | 0.00092255 | 2 | 0.00131910 | 2 |
New Hampshire | 0 | 2 | 0 | 2 |
Northeast Massachusetts and Boston | 0.00065996 | 2 | 0.00102957 | 2 |
Southeast Massachusetts | 0.00141926 | 2 | 0.00289324 | 2 |
Maine | 0.00068811 | 2 | 0.00220722 | 2 |
Connecticut | 0.00082693 | 2 | 0.00192924 | 2 |
Before training the customized Seq2Seq RNN in adversarial domain adaptation, we manually determine the assignments of special parameters (i.e., hyperparameters) and tune them within specific ranges and numerical sets. This process involves experimental settings. In this study, these experimental settings are mainly divided into two groups: ① the training settings for the Seq2Seq RNN in adversarial domain adaptation in the proposed scheme, ablative analysis, and comparative models; and ② the model settings for the customized RNN itself. The experimental settings are summarized in
Classification | Hyperparameter | Value |
---|---|---|
Training setting | Length of input sequence | 24×1 |
Length of output sequence | 24×1 | |
Epoch number | 10000 | |
Repetition number | 10 | |
Early stopping patience | 50 | |
Optimization algorithm | AdaM | |
Number of random searching | 50 | |
Model setting | Learning rate | 0.001, 0.0001, 0.00001 |
EL | 1, 2, 3, 4, 5 | |
EN | 10, 20, 40, 80, 160 | |
DL | 1, 2, 3, 4, 5 | |
DN | 10, 20, 40, 80, 160 | |
Batch size | 64, 128, 256 | |
Activation function type | ||
Dropout rate | 0.1, 0.2, 0.3, 0.4, 0.5 |
Note: EL, EN, DL, and DN are the numbers of encoder layers, encoder neurons per layer, decoder layers, and decoder neurons per layer, respectively.
We identify the appropriate assignments of the model settings (e.g., the learning rate, the batch size, and the dropout rate [
With the empirically optimal experimental settings, we repeat the training process (i.e., optimizing model parameters) and the testing process (obtaining load forecasts and comparing them with ground truth values) for ten times to ensure the reproducibility and reliability of the results.
The case study is realized via the Python language (3.9.7), the TensorFlow wheel (2.7) [
To validate the feasibility of the proposed scheme, we conduct a detailed sensitivity analysis in terms of layers and neurons per layer in the encoder and decoder. Specifically, EL and DL are set to be 1, 2, 3, 4, and 5, and EN and DN range from 10 to 160. We then build varying Seq2Seq RNNs over the combinations of the two hyperparameter values. Similarly, the classification- and regression-related indexes are used to evaluate the model performance difference with varying assignments of hyperparameter. In this subsection, we assume that 100% of the source and target domains are available.
Hyperparameter | Average QS | Average WS | |||
---|---|---|---|---|---|
EL | EN | DL | DN | ||
1 | 40 | 2 | 40 | 0.00278 (0.00001) | 0.02613 (0.0001) |
2 | 0.00266 (0.00001) | 0.02461 (0.0001) | |||
3 | 0.00267 (0.00001) | 0.02362 (0.0001) | |||
4 | 0.00312 (0.00002) | 0.02077 (0.0001) | |||
5 | 0.01071 (0.00001) | 0.06536 (0.0001) | |||
2 | 10 | 2 | 40 | 0.00316 (0.00002) | 0.02291 (0.0001) |
20 | 0.00267 (0.00001) | 0.03284 (0.0002) | |||
80 | 0.00344 (0.00001) | 0.02490 (0.0001) | |||
160 | 0.00353 (0.00001) | 0.03123 (0.0002) | |||
2 | 40 | 1 | 40 | 0.00316 (0.00002) | 0.02869 (0.0003) |
3 | 0.00348 (0.00002) | 0.02236 (0.0001) | |||
4 | 0.00501 (0.00004) | 0.08236 (0.0007) | |||
5 | 0.00597 (0.00005) | 0.07154 (0.0006) | |||
2 | 40 | 2 | 10 | 0.00299 (0.00001) | 0.03160 (0.0002) |
20 | 0.00269 (0.00001) | 0.03320 (0.0002) | |||
80 | 0.00313 (0.00002) | 0.02006 (0.0001) | |||
160 | 0.00344 (0.00002) | 0.02474 (0.0001) |
Note: the values in brackets are standard deviations.
Furthermore, the encoder exhibits more impact on the QS (from 0.00266 to 0.01071 in terms of EL, or from 0.00266 to 0.00353 in terms of EN) than the decoder (from 0.00266 to 0.00597 in terms of DL or from 0.00266 to 0.00344 in terms of DN). This empirically confirms that the encoder of the feature extractor plays an essential role in the proposed scheme. Similarly, it should be considered that the optimal assignments for the lowest QS could cause narrow prediction intervals because of the high accuracy of the quantile forecasts and the ground truth values. Although the default configuration does not exhibit the best average WS () and the standard deviation for ten trials, we recommend the suboptimal assignments of hyperparameters to balance QS and WS when implementing the proposed scheme.
To vividly illustrate the performance of the proposed scheme in addressing the data lack phenomenon, we comprehensively simulate diverse scenarios related to the data availability of both source and target domains, and compare their performances. In other words, we evaluate the errors between ground truth values and prediction intervals as well as quantiles, when various proportions of samples from the two domains (10%, 20%, 40%, 60%, 80%, and 100%) are available, as summarized in
Proportion of samples (%) | Average CCE | Average QS | Average WS | |
---|---|---|---|---|
Source domain | Target domain | |||
100 | 100 | 0.2500 (0.008) | 0.00266 (0.00001) | 0.02461 (0.0001) |
80 | 0.2111 (0.006) | 0.00331 (0.00002) | 0.02306 (0.0001) | |
60 | 0.1687 (0.005) | 0.00267 (0.00001) | 0.03611 (0.0002) | |
40 | 0.1214 (0.004) | 0.00351 (0.00003) | 0.02442 (0.0001) | |
20 | 0.0667 (0.002) | 0.00286 (0.00002) | 0.03120 (0.0002) | |
10 | 0.0352 (0.001) | 0.00319 (0.00002) | 0.03129 (0.0002) | |
100 | 80 | 0.2361 (0.007) | 0.00298 (0.00001) | 0.02240 (0.0001) |
60 | 0.2187 (0.006) | 0.00312 (0.00002) | 0.02542 (0.0001) | |
40 | 0.1964 (0.006) | 0.00509 (0.00004) | 0.02142 (0.0001) | |
20 | 0.1667 (0.005) | 0.00350 (0.00002) | 0.02632 (0.0001) | |
10 | 0.1477 (0.005) | 0.00471 (0.00004) | 0.02548 (0.0001) | |
80 | 80 | 0.2500 (0.001) | 0.00338 (0.00002) | 0.02741 (0.0002) |
60 | 60 | 0.2500 (0.001) | 0.00363 (0.00002) | 0.02963 (0.0002) |
40 | 40 | 0.2500 (0.001) | 0.00402 (0.00003) | 0.02753 (0.0001) |
20 | 20 | 0.2500 (0.001) | 0.00414 (0.00003) | 0.02721 (0.0001) |
10 | 10 | 0.2500 (0.001) | 0.00862 (0.00006) | 0.06526 (0.0003) |
Note: the values in brackets are standard deviations.
From
Moreover,
To prove the superiority of the proposed scheme, we compare probabilistic forecasts generated by machine learning and deep learning schemes. The machine learning schemes include random forests (RFs) and gradient boosting decision trees (GBDTs). Deep learning schemes include generic fully connected feedforward NN (gen-FFNN), residual FFNN (res-FFNN), gated recurrent unit (GRU) RNN, LSTM RNN, generic temporal convolutional network (gen-TCN), conditional TCN (con-TCN), and WaveNet. In addition, static and teacher force (TF) Seq2Seq RNNs without domain adaptation are applied as ablative models to compare with the proposed scheme. We also utilize the QS and WS indexes to evaluate the forecasting results obtained from these schemes with different proportions of samples from the two domains, as summarized in
Scheme | 100% samples from target domain | 10% samples from target domain | ||
---|---|---|---|---|
QS | WS | QS | WS | |
GBDT | 0.00253 (0.00001) | 0.02946 (0.0003) | 0.00666 (0.00001) | 0.02603 (0.0001) |
RF | 0.00269 (0.00001) | 0.07370 (0.0007) | 0.00648 (0.00003) | 0.05934 (0.0002) |
Gen-FFNN | 0.00317 (0.00002) | 0.08682 (0.0008) | 0.00636 (0.00003) | 0.22743 (0.0011) |
Res-FFNN | 0.00317 (0.00002) | 0.08645 (0.0008) | 0.00623 (0.00003) | 0.21977 (0.0010) |
LSTM RNN | 0.00266 (0.00001) | 0.04991 (0.0005) | 0.01715 (0.00060) | 0.05719 (0.0003) |
GRU RNN | 0.00291 (0.00001) | 0.05945 (0.0005) | 0.01419 (0.00050) | 0.06197 (0.0003) |
Gen-TCN | 0.01144 (0.00010) | 0.03771 (0.0003) | 0.03312 (0.00150) | 0.13722 (0.0006) |
Con-TCN | 0.00775 (0.00007) | 0.03559 (0.0003) | 0.02412 (0.00100) | 0.10000 (0.0004) |
WaveNet | 0.00845 (0.00007) | 0.04233 (0.0004) | 0.01082 (0.00050) | 0.17805 (0.0100) |
TF Seq2Seq RNN | 0.00258 (0.00001) | 0.01209 (0.0001) | 0.09575 (0.00041) | 0.43193 (0.0020) |
Static Seq2Seq RNN | 0.00323 (0.00002) | 0.01173 (0.0001) | 0.09575 (0.00040) | 0.43193 (0.0020) |
Proposed | 0.00266 (0.00001) | 0.02461 (0.0002) | 0.00471 (0.00002) | 0.02548 (0.0001) |
Note: the values in brackets are standard deviations.
From
Meanwhile, the proposed scheme exhibits its superiority in leveraging sufficient records of regular loads and supplementing the available dataset when training the adaptive Seq2Seq RNN. Therefore, the proposed scheme keeps generating accurate forecasts and can accomplish the best performance in terms of both the QS index (0.00471) and the WS index (0.02548). Given the entire target domain, we validate the performance through the proposed scheme with true profiles by vividly illustrating a group of day-ahead quantile predictions and the ground truth values, as shown in

Fig. 7 Day-ahead quantile predictions and ground truth values given 100% samples from target domain.

Fig. 8 Day-ahead quantile forecasts and ground truth values given 10% samples from target domain.
The proportion and scale of renewable power generation such as solar power in the distribution system keep increasing, so it is imperative to develop load forecasting technologies to obtain precise net load profiles for planning and dispatching the power system in the context of penetrating renewables. This paper focuses on the volatile residential load series and addresses the data lack problem as a significant branch in the field of probabilistic load forecasting. The proposed scheme included a Seq2Seq RNN over two LSTM layers as the feature extractor and the demand predictor, respectively, and a fully connected feedforward layer as the domain classifier.
To implement the adversarial domain adaptation network, we mix historical records and newly collected residential load observations, train the Seq2Seq adversarial domain adaptation network with samples from source and target domains, and generate accurate forecasts.
In the case study, we investigate the stability and feasibility of the proposed scheme for day-ahead probabilistic forecasting by limiting the scale of available data from the source or target domains. The results show that the methods or techniques widely accepted may lose their extraordinary capability and become vulnerable when data resources are inevitably limited or insufficient. Meanwhile, although the Seq2Seq RNN is often fed with massive data, the proposed scheme can maintain robust performance for precise load forecasts as we gradually reduce the available scales of the source and target domains. This finding can inspire further discussions and investigations of new technologies to deal with the data lack phenomenon in this area. Future work will consider the attention mechanism when integrating domain adaptation into a Seq2Seq RNN.
References
W. Liao, S. Wang, B. Bak-Jensen et al., “Ultra-short-term interval prediction of wind power based on graph neural network and improved bootstrap technique,” Journal of Modern Power Systems and Clean Energy, vol. 11, no. 4, pp. 1100-1114, Jul. 2023.. [Baidu Scholar]
J. Zhu, H. Dong, W. Zheng et al., “Review and prospect of data-driven techniques for load forecasting in integrated energy systems,” Applied Energy, vol. 321, p. 119269, Sept. 2022. [Baidu Scholar]
IEA. (2019, Dec.). Renewables 2019. [Online]. Available: https://www.iea.org/reports/renewables-2019/distributed-solar-pv [Baidu Scholar]
IEA. (2021, Dec.). Renewables 2021. [Online]. Available: https://www.iea.org/reports/renewables-2021 [Baidu Scholar]
Q. Cui, J. Zhu, J. Shu et al., “Comprehensive evaluation of electric power prediction models based on D-S evidence theory combined with multiple accuracy indicators,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 3, pp. 597-605, May 2022. [Baidu Scholar]
L. Ghelardoni, A. Ghio, and D. Anguita, “Energy load forecasting using empirical mode decomposition and support vector regression,” IEEE Transactions on Smart Grid, vol. 4, no. 1, pp. 549-556, Mar. 2013. [Baidu Scholar]
H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting – a novel pooling deep RNN,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 5271-5280, Sept. 2018. [Baidu Scholar]
H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for short-term load forecasting: a review and evaluation,” IEEE Transactions on Power Systems, vol. 16, no. 1, pp. 44-55, Feb. 2001. [Baidu Scholar]
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436-444, May 2015. [Baidu Scholar]
T. Elsken, J. H. Metzen, and F. Hutter. Automated Machine Learning. Cham: Springer, 2019: 63-77. [Baidu Scholar]
B. Stephen, X. Tang, P. R. Harvey et al., “Incorporating practice theory in sub-profile models for short term aggregated residential load forecasting,” IEEE Transactions on Smart Grid, vol. 8, no. 4, pp. 1591-1598, Jul. 2017. [Baidu Scholar]
W. Kong, Z. Y. Dong, D. J. Hill et al., “Short-term residential load forecasting based on resident behaviour learning,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 1087-1088, Jan. 2018. [Baidu Scholar]
J. Ponoćko and J. V. Milanović, “Forecasting demand flexibility of aggregated residential load using smart meter data,” IEEE Transactions on Power Systems, vol. 33, no. 5, pp. 5446-5455, Sept. 2018. [Baidu Scholar]
W. Kong, Z. Y. Dong, Y. Jia et al., “Short-term residential load forecasting based on LSTM recurrent neural network,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841-851, Jan. 2019. [Baidu Scholar]
K. Cho, B. van Merrienboer, C. Gulcehre et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, pp. 1724-1734. [Baidu Scholar]
E. Skomski, J. Y. Lee, W. Kim et al., “Sequence-to-sequence neural networks for short-term electrical load forecasting in commercial office buildings,” Energy and Buildings, vol. 226, p. 110350, Nov. 2020. [Baidu Scholar]
N. Mughees, S. A. Mohsin, A. Mughees et al., “Deep sequence to sequence Bi-LSTM neural networks for day-ahead peak load forecasting,” Expert Systems with Applications, vol. 175, p. 114844, Aug. 2021. [Baidu Scholar]
Z. Masood, R. Gantassi, Ardiansyah et al., “A multi-step time-series clustering-based Seq2Seq LSTM learning for a single household electricity load forecasting,” Energies, vol. 15, no. 7, p. 2623, Apr. 2022. [Baidu Scholar]
M. Shepero, D. van der Meer, J. Munkhammar et al., “Residential probabilistic load forecasting: a method using Gaussian process designed for electric load data,” Applied Energy, vol. 218, pp. 159-172, May 2018. [Baidu Scholar]
L. Cheng, H. Zang, Y. Xu et al., “Probabilistic residential load forecasting based on micrometeorological data and customer consumption pattern,” IEEE Transactions on Power Systems, vol. 36, no. 4, pp. 3762-3775, Jul. 2021. [Baidu Scholar]
C. Li, Z. Dong, L. Ding et al., “Interpretable memristive LSTM network design for probabilistic residential load forecasting,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 6, pp. 2297-2310, Jun. 2022. [Baidu Scholar]
S. Li, Y. Zhong, and J. Lin, “AWS-DAIE: incremental ensemble short-term electricity load forecasting based on sample domain adaptation,” Sustainability, vol. 14, no. 21, p. 14205, Oct. 2022. [Baidu Scholar]
M. Huang and J. Yin, “Research on adversarial domain adaptation method and its application in power load forecasting,” Mathematics, vol. 10, no. 18, p. 3223, Sept. 2022. [Baidu Scholar]
J. Wang, X. Xiong, Z. Li et al., “Wind forecast-based probabilistic early warning method of wind swing discharge for OHTLs,” IEEE Transactions on Power Delivery, vol. 31, no. 5, pp. 2169-2178, Oct. 2016. [Baidu Scholar]
M. Baktashmotlagh, M. T. Harandi, B. C. Lovell et al., “Unsupervised domain adaptation by domain invariant projection,” in Proceedings of 2013 IEEE International Conference on Computer Vision, Sydney, Australia, Dec. 2013, pp. 769-776. [Baidu Scholar]
Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Proceedings of the 32nd International Conference on Machine Learning, Lille, France, Jul. 2015, pp. 1180-1189. [Baidu Scholar]
F. He, J. Zhou, Z. Feng et al., “A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm,” Applied Energy, vol. 237, pp. 103-116, Mar. 2019. [Baidu Scholar]
H. Dong, J. Zhu, S. Li et al., “Short-term residential household reactive power forecasting considering active power demand via deep Transformer sequence-to-sequence networks,” Applied Energy, vol. 329, p. 120281, Jan. 2023. [Baidu Scholar]
Q. Cui, J. Zhu, J. Shu et al., “Comprehensive evaluation of electric power prediction models based on D-S evidence theory combined with multiple accuracy indicators,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 3, pp. 597-605, May 2022. [Baidu Scholar]
W. Zhang, H. Quan, O. Gandhi et al., “Improving probabilistic load forecasting using quantile regression NN with skip connections,” IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 5442-5450, Nov. 2020. [Baidu Scholar]
ISO New England Inc. (2022, Dec.). Energy, load, and demand reports. [Online]. Available: https://www.iso-ne.com/ [Baidu Scholar]
J. Paparrizos and L. Gravano, “k-shape: efficient and accurate clustering of time series,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, Jan. 2015, pp.1855-1870. [Baidu Scholar]
N. Srivastava, G. Hinton, A. Krizhevsky et al., “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, Jan. 2014. [Baidu Scholar]
J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” Journal of Machine Learning Research, vol. 13, no. 2, Feb. 2012. [Baidu Scholar]
M. Abadi, A. Agarwal, P. Barham et al. (2016, Mar.). TensorFlow: large-scale machine learning on heterogeneous distributed systems. [Online]. Available: https://arxiv.org/abs/1603.04467 [Baidu Scholar]
F. Pedregosa, G. Varoquaux, A. Gramfor et al., “Scikit-learn: machine learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825-2830, Nov. 2011. [Baidu Scholar]
V. Taquet, V. Blot, T. Morzadec et al., (2022, July). MAPIE: an open-source library for distribution-free uncertainty quantification. [Online]. Available: https://arxiv.org/abs/2207.12274 [Baidu Scholar]
A. Gasparin, S. Lukovic, and C. Alippi. (2019, Jul.). Deep learning for time series forecasting: the electric load case. [Online]. Available: https://arxiv.org/abs/1907.09207 [Baidu Scholar]
K. Chen, K. Chen, Q. Wang et al., “Short-term load forecasting with deep residual networks,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 3943-3952, Jul. 2019. [Baidu Scholar]
S. Bai, J. Z. Kolter, and V. Koltun. (2018, Mar.). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. [Online]. Available: https://arxiv.org/abs/1803.01271 [Baidu Scholar]
A. Borovykh, S. Bohte, and C. W. Oosterlee. (2017, Mar.). Conditional time series forecasting with convolutional neural networks. [Online]. Available: https://arxiv.org/abs/1703.04691 [Baidu Scholar]
A. van den Oord, S. Dieleman, H. Zen et al. (2016, Sept.). WaveNet: a generative model for raw audio. [Online]. Available: https://arxiv.org/abs/1609.03499 [Baidu Scholar]