A Robust Segmented Mixed Effect Regression Model for Baseline Electricity Consumption Forecasting

Xiaoyang Zhou; Yuanqi Gao; Weixin Yao; Nanpeng Yu

网刊加载中。。。

使用Chrome浏览器效果最佳，继续浏览，你可能不会看到最佳的展示效果，

确定继续浏览么?

复制成功，请在其他浏览器进行阅读

A Robust Segmented Mixed Effect Regression Model for Baseline Electricity Consumption Forecasting PDF

- ORCID：
Xiaoyang Zhou
✉
- ORCID：
Yuanqi Gao
✉
- ORCID：
Weixin Yao
✉
- ORCID：
Nanpeng Yu
✉

Department of Statistics, University of California, Riverside, CA, USA； Department of Electrical and Computer Engineering, University of California, Riverside, USA

Updated：2022-01-21

DOI：10.35833/MPCE.2020.000023

Abstract

Renewable energy production has been surging around the world in recent years. To mitigate the increasing uncertainty and intermittency of the renewable generation, proactive demand response algorithms and programs are proposed and developed to further improve the utilization of load flexibility and increase the efficiency of power system operation. One of the biggest challenges to efficient control and operation of demand response resources is how to forecast the baseline electricity consumption and estimate the load impact from demand response resources accurately. In this paper, we propose a mixed effect segmented regression model and a new robust estimate for forecasting the baseline electricity consumption in Southern California, USA, by combining the ideas of random effect regression model, segmented regression model, and the least trimmed squares estimate. Since the log-likelihood of the considered model is not differentiable at breakpoints, we propose a new backfitting algorithm to estimate the unknown parameters. The estimation performance of the new estimation procedure has been demonstrated with both simulation studies and the real data application for the electric load baseline forecasting in Southern California.

Keywords

Segmented regression model; mixed effects; trimmed maximum likelihood; demand response; electric load

I. Introduction

THE renewable energy sector has experienced exponential growth in the past five to ten years. The global annual growth rates of solar photovoltaic and wind energy are 42% and 17% from 2010 through 2015, respectively [

1]. The renewable penetration level in certain parts of the world is much higher than the global average penetration level. For example, the renewable energy penetration level in California reached 30% in 2017. The recently passed California Senate Bill No. 100 will further boost renewable penetration level up to 60% by 2030 and 100% by 2045. To mitigate the increasing uncertainty and intermittency of renewable generation, demand response resources are in critical need. In the past ten years, traditional and passive price-based and incentive-based demand response programs have been implemented throughout the USA. In recent years, proactive demand response algorithms and programs are proposed and developed to improve the utilization of load flexibility and dispatchability further [2]. Accurate load impact forecasting is needed to leverage the load flexibility from the demand response resources effectively. The load impact from a demand response resource is defined as the difference between load baselines and metered load when a demand response event is triggered. In practice, it is very challenging to develop a good estimation of the load baseline which represents the electric load that would have occurred without demand response event [3].

A sound baseline estimation methodology should represent an appropriate tradeoff between simplicity and accuracy. The existing baseline methodology can be categorized into two types: Type-I and Type-II. In Type-I methodology, the baseline is estimated by using a similar day-based algorithm, which depends on historical interval meter data and similarity metrics such as weather and calendar. Simplicity is the most significant advantage of Type-I methodology [

4], [5]. In Type-II methodology, more sophisticated statistical methods are adopted to estimate and forecast the baseline electricity consumption. Type-II methodology typically yields better forecasting accuracy and is undergoing rapid development. It can be further divided into three groups: statistical methods, machine learning/deep learning methods, and hybrid methods. In the first group, [6] proposes a refined multiple linear regression model. Reference [7] proposes a method to coherently convert a set of lower-level node forecasting to aggregate nodes using empirical copula and Monte-Carlo sampling. In the same vein, [8] proposes an aggregation of random forest load forecasting framework. The second group utilizes deep learning algorithms. Reference [9] proposes support vector regressions models to forecast the demand response baseline. In [10], an ensemble ResNet deep neural network model is proposed. The sequence-to-sequence recurrent neural network with attention mechanism is adopted in [11]. In the third group, hybrid methods have been developed, in which more than one forecasting algorithms serve as building blocks for an overall model. Reference [12] proposes a cooperative quantile regression forest and multivariate quantile regression framework. Reference [13] proposes a two-level hybrid ensemble of deep belief network model.

There are several limitations with the existing approaches. First, some of the methods do not exploit the structure of the forecasting problem effectively. For example, the segmented nature of calendar variables on the load profile is not well addressed. Second, deep learning based forecasting algorithms are typically computationally expensive to train. In addition, they yield un-interpretable results and can be sensitive to the selection of hyperparameters. Third, hybrid methods are generally complicated to build, thus can be error-prone to implement and benchmark. Lastly, most of the existing work build and train a separate model for each time series. This significantly limits the scalability of model, especially for large service territories operated by electric utilities.

In this paper, we propose a mixed effect segmented regression (MESR) model, which is a Type-II methodology, to forecast the hourly electric load baseline in Southern California, USA at the 220 kV transformer bank level. One commonly used method for electric power demand forecasting at each hour is the multiple linear regression with hour as a categorical variable and weather data as continuous covariates. An alternative model for hour is to include it as a linear predictor. However, it is expected that the linear effect of hour on electric demand does not hold in the whole range of time. To this end, we propose to model the hour effect by a segmented regression model [

14]-[17], which can be considered as a compromise between modelling the hour as a global linear predictor and a categorical variable. The nonlinear relationship with breakpoints is piece-wise, segmented, broken-line, or multi-phased. The breakpoints are also called change-points, transition-points or switch-points in some applications. Using a segmented regression model for the covariate hour, the effect on the electric consumption changes continuously across time so that we can borrow the information from other hours when estimating the impact of hour. The estimated breakpoints can also tell us how the linear effect of hour changes across different segmented areas. Segmented regression models have been widely used in many areas. In medication, segmented regression is a powerful statistical tool for estimating the intervention effects of interrupted time series studies [18]. The segmented regression is also used to identify the changes in the recent trend of cancer mortality and incidence data analysis [19]. In ecology area, the segmented regression is a widely used statistical tool to model ecological thresholds [20]. For the geometric purpose, the segmented regression statistically models the trends at groundwater levels [21]. Many other examples with piecewise linear terms have been studied in the literature including mortality studies [22], Stanford heart transplant data [23], and mouse leukemia [24]. Note that electric consumption data are essentially longitudinal/panel data. They exhibits very strong spatio-temporal dependencies [25]. To incorporate the correlation among the observations and the individual-specific heterogeneity from each transformer bank, we propose to use the random effect regression model [26], [27].

Note that it is not trivial to compute the maximum likelihood estimate (MLE) for the MESR, since its log-likelihood is not differentiable at breakpoints. Many standard computation algorithms such as the Newton-Raphson algorithm can not be used directly. In this paper, we propose a backfitting algorithm to combine the segmented regression estimation method proposed in [

28] and the mixed effect regression estimation method proposed in [29] to maximize the non-differentiable log-likelihood of the mixed effect segmented regression model. Note that the MLE is sensitive to outliers, which is the case of our electric consumption data collected in the Southern California. We further propose a robust estimation procedure for the considered model by extending the idea of the least trimmed squares (LTS) estimate [30]. Simulation studies demonstrate the effectiveness of the proposed estimation procedures. The LTS also provides much better forecasting performance than the standard MLE for the testing data when forecasting the hourly electric power consumption in Southern California.

The rest of the paper is organized as follows. Section II introduces the MESR and describes the proposed robust estimation algorithms. Section III illustrates the finite sample performance of the proposed method using a simulation study. In Section IV, we apply the new estimation procedure to forecast the hourly electric power demand in Southern California, USA. Section V concludes the paper with some discussions.

II. MESR and Proposed Robust Estimation Algorithm

Given a random sample ${y_{i j}, x_{i j}, s_{i j}, z_{i j}}, i = 1,2, \dots, n, j = 1,2, \dots, n_{i}$ , where $n$ is the number of subjects; $n_{i}$ is the number of observations collected for the i^th subject; $y_{i j}$ is the response variable; $x_{i j}$ is the $p$ -dimensional fixed-effect covariate; $s_{i j}$ is the $q$ -dimensional random-effect covariate; and $z_{i j}$ is the breakpoint variable with the breakpoints ${φ_{k}}, k = 1,2, \dots, l$ . The proposed MESR model for the load baseline estimation can be written as:

y_{i j} = x_{i j}^{T} ϕ + s_{i j}^{T} γ_{i} + β_{0} z_{i j} + \sum_{k = 1}^{l} β_{k} (z_{i j} - φ_{k})_{+} + ε_{i j}

(1)

where $ϕ$ is the regression coefficient for the random effect covariates; $β_{0}$ and $β_{k}$ are the regression coefficients for the breakpoint variables; and the quantities with a subscript “ $+$ ” means taking the positive part. For example, $t_{+}$ equals $t$ if $t \geq 0$ and 0 otherwise; $γ_{i} \sim N_{q} (0, Σ_{γ})$ ; $ε_{i} = (ε_{i 1}, ε_{i 2}, \dots, ε_{i n_{i}}) ~$ $N_{n_{i}} (0, Σ_{ε})$ . In this paper, we assume that $Σ_{ε} = σ^{2} I_{n_{i}}$ , where $σ$ is the standard deviation of the error variable; and $I_{n_{i}}$ is the $n_{i} \times n_{i}$ identity matrix. The MESR (1) consists of three parts: multiple linear regression $x_{i j}^{T} ϕ$ , random effects $s_{i j}^{T} γ_{i}$ , and segmented regression $β_{0} z_{i j} + \sum_{k = 1}^{l} β_{k} (z_{i j} - φ_{k})_{+}$ , which models the heterogeneous linear effect of $z_{i j}$ on $y_{i j}$ across different areas of the breakpoint variable. $β_{k}$ measures the difference of slopes (linear effect of $z_{i j}$ on $y_{i j}$ ) before and after the breakpoint $φ_{k}$ . We mainly focus on the situation where the segmented parts are fixed effects. But the proposed estimation procedure can be extended to the situation where the segmented parts also contain random effects [

31]-[33].

Suppose that $y_{i} = (y_{i 1}, y_{i 2}, \dots, y_{i n_{i}})^{T}, X_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i n_{i}})^{T}, S_{i} = (s_{i 1}, s_{i 2}, \dots, s_{i n_{i}})^{T}$ , and $Z_{i} = (z_{i 1}^{*}, z_{i 2}^{*}, \dots, z_{i n_{i}}^{*})^{T}$ , where $z_{i j}^{*} = (z_{i j}, (z_{i j} - φ_{1})_{+}, \dots, (z_{i j} - φ_{l})_{+})^{T}$ . Then, (1) can be rewritten in matrix format as:

y_{i} = X_{i} ϕ + S_{i} γ_{i} + Z_{i} β + ε_{i}

(2)

where $β = (β_{0}, β_{1}, \dots, β_{l})^{T}$ . Based on (2), $E (y_{i} | X_{i}, Z_{i}, S_{i}) = X_{i} ϕ + Z_{i} β$ and $v a r (y_{i} | X_{i}, Z_{i}, S_{i}) = S_{i} Σ_{γ} S_{i}^{T} + σ_{ε}^{2} I_{n_{i}} ≜ Σ_{i}$ , where $E (\cdot)$ and $v a r (\cdot)$ denote conditional expectation and conditional variances, respectively. Therefore, the random effects $γ_{i}$ make the observations within each correlated subject. The log-likelihood function of ${y_{i j}, x_{i j}, s_{i j}, z_{i j}}, i = 1,2, \dots, n, j = 1,2, \dots, n_{i}$ is:

l (θ) \propto \sum_{i = 1}^{n} l n (| Σ_{i} |^{- 1 / 2}) - \frac{1}{2} \sum_{i = 1}^{n} (y_{i} - X_{i} ϕ - Z_{i} {β)}^{T} Σ_{i}^{- 1} (y_{i} - X_{i} ϕ - Z_{i} β)

(3)

where $θ$ collects all the unknown parameters ${ϕ, β, φ, σ, Σ_{γ}}$ in model (1). Unlike the traditional mixed effect model, maximizing (3) is not trivial since it is not differentiable at $φ_{k}$ . We propose a backfitting algorithm to maximize (3) by alternately updating the segmented regression part and the linear mixed effect part. Next, we discuss in detail how to perform such two estimation procedures.

A. Estimation of Breakpoints

Given the estimate ${ϕ, {\hat{Σ}}_{i}}$ , (1) will be a segmented regression model. The breakpoints and slopes in segmented regression can be estimated through many ways such as regression spline as well as Bayesian Markov chain Monte Carlo (MCMC) methods [

34], [35]. We will extend the linearization technique proposed in [28] to MESR (1) due to its simplicity of computation. According to the definition of breakpoints, the log-likelihood is not differentiable at

φ_{k}

. The breakpoint estimation can be performed via a first-order Taylor expansion of

(z_{i j} - φ_{k})_{+}

around an initial value

φ_{k}^{(0)}

(z_{i j} - φ_{k})_{+} \approx (z_{i j} - φ_{k}^{(0)})_{+} + (φ_{k} - φ_{k}^{(0)}) (- 1) I (z_{i j} > φ_{k}^{(0)})

(4)

where $I (\cdot)$ is the indicator function. It equals 1 if the condition inside the parenthesis is true and 0 otherwise; $(- 1) I (z_{i j} > φ_{k}^{(0)})$ is the first derivative of $(z_{i j} - φ_{k})_{+}$ assessed in $φ_{k}^{(0)}$ .

Let $v_{i j} = ((- 1) I (z_{i j} > φ_{1}^{(0)}), (- 1) I (z_{i j} > φ_{2}^{(0)}), \dots, (- 1) I (z_{i j} > φ_{l}^{(0)} {))}^{T}$ , ${\tilde{z}}_{i j} = (z_{i j}, (z_{i j} - φ_{1}^{(0)})_{+}, (z_{i j} - φ_{2}^{(0)})_{+}, \dots, (z_{i j} - φ_{l}^{(0)})_{+})^{T}$ , and $δ_{k} = β_{k} (φ_{k} - φ_{k}^{(0)})$ . Define $V_{i} = (v_{i 1}, v_{i 2}, \dots, v_{i n_{i}})^{T}$ , $δ = (δ_{1}, δ_{2}, \dots, δ_{l})^{T}$ , and ${\tilde{Z}}_{i} = ({\tilde{z}}_{i 1}, {\tilde{z}}_{i 2}, \dots, {\tilde{z}}_{i n_{i}})^{T}$ . Given the estimate ${ϕ, {\hat{Σ}}_{i}}$ , the log-likelihood (3) can be simplified as:

l_{1} (β, δ) \propto - \frac{1}{2} \sum_{i = 1}^{n} ({\tilde{y}}_{i} - {\tilde{Z}}_{i} β - V_{i} {δ)}^{T} {\hat{Σ}}_{i}^{- 1} ({\tilde{y}}_{i} - {\tilde{Z}}_{i} β - V_{i} δ)

(5)

where ${\tilde{y}}_{i} = y_{i} - X_{i} ϕ$ . Therefore, $β$ and $δ$ in (5) can be easily found by weighted least squares estimate. Note that $φ_{k} = (δ_{k} / β_{k}) + φ_{k}^{(0)}$ . The iterative algorithm will terminate at $δ_{k} = 0$ . Given the estimate ${ϕ, {\hat{Σ}}_{i}}$ , the algorithm to estimate the breakpoints is summarized in Algorithm 1.

Algorithm 1 : segmented regression estimation

1. Set initial values of all breakpoints $φ_{k}^{(0)}$ for $k = 1,2, . . ., l$ , and calculate the variable ${\tilde{Z}}_{i}$ and the variable $V_{i}$

2. Repeat

3. Fit the regression model of ${\tilde{y}}_{i}$ on ${\tilde{Z}}_{i}$ and $V_{i}$ by maximizing the log-like lihood (5). Update the breakpoint with equation $φ_{k}^{(s + 1)} = (δ_{k}^{(s)} / β_{k}^{(s)}) + φ_{k}^{(s)}$ , where $φ_{k}^{(s)}$ is the estimate of $φ_{k}$ at the $s$ ^th iteration

4. Until converge

B. Estimation of Mixed Effect Regression Models

In this sub-section, we discuss how to maximize (3) given the estimate $β$ and $φ$ , where $φ = (φ_{1}, φ_{2}, \dots, φ_{l})^{T}$ . Let ${\hat{Z}}_{i}$ be the estimate of $Z_{i}$ after replacing $φ_{k}$ by ${\hat{φ}}_{k}$ . Plugging in the estimate ${{\hat{Z}}_{i}, β}$ into the model (1), we can obtain:

y_{i}^{*} = X_{i} ϕ + S_{i} γ_{i} + ε_{i}

(6)

where $y_{i}^{*} = y_{i} - {\hat{Z}}_{i} β$ . Therefore, the model (6) is simply a traditional mixed effect regression model. We propose to employ the penalized weighted least square (PWLS) method to estimate the unknown parameters in (6). More details of computing the linear mixed effect regression model are given in [

29], which are also implemented in R package lme4.

C. Estimation of Mixed Effect Breakpoints

By combining the estimation procedures in Section II-A and II-B, we propose Algorithm 2 to maximize the log-likelihood (3) for the model (2).

Algorithm 2 : MLE

1. Set initial values of breakpoint $φ_{k}^{(0)}$ and $β^{(0)}$

2. Repeat

3. Given current breakpoint values $φ_{k}^{(s)}$ and slopes $β^{(s)}$ , calculate $y_{i}^{* (s)} = y_{i} - {\hat{Z}}_{i}^{(s)} β^{(s)}$

4. Fit mixed effect regression model by the PWLS estimation procedure introduced in Section II-B to obtain covariance matrix $Σ_{r}^{(s)}$ and the fixed effect regression estimate $ϕ^{(s)}$

5. Calculate ${\tilde{y}}_{i}^{(s)} = y_{i} - X_{i} ϕ^{(s)}$

6. Fit segmented regression model with ${\tilde{y}}_{i}^{(s)}$ and $Σ_{r}^{(s)}$ using Algorithm 1 and update segmented regression parameter estimate to $φ^{(s + 1)}$ and $β^{(s + 1)}$

7. Until converge

D. Estimation of Robust Segmented Mixed Effect Regression

It is well known that the MLE is sensitive to outliers and might give misleading results when there are outliers in the data, which is the case for our collected electric power demand data in Southern California. More details will be given in Section IV. The issue of outlier is well recognized in the field of load forecasting, and is typically solved using robust regression algorithms. For example, [

36] considers Huber’s robust regression; [37] advocates the use of

L_{1}

regression model. In the statistics literature, many robust regression methods have been proposed, although not all of them are investigated in the load forecasting literature, e.g., M-estimates [38], R-estimates [39], least median of squares (LMS) estimates [40], LTS estimates [30], S-estimates [41], MM-estimates [42], robust and efficient weighted least squares estimator (REWLSE) [43], mean shift method [44]-[46]. Reference [47] provides a good review of some commonly used robust regression estimation methods. Next, we propose the idea of least trimmed squares estimate [30] to provide a robust estimate of the model (1). Given an integer trimming parameter

h \leq N

, where

N

is the total number of training samples, the least trimmed squares minimize the sum of the smallest

h

squared residuals with objective function:

\sum_{k = 1}^{h} r_{(k)}^{2}

(7)

where $r_{(k)}^{2} \in [r_{(1)}^{2}, r_{(N)}^{2}]$ are the ordered squared residuals ${y_{i j} - {\hat{y}}_{i j}, i = 1,2, \dots, n, j = 1,2, \dots, n_{i}}$ with ${\hat{y}}_{i j} = x_{i j}^{T} ϕ + s_{i j}^{T} γ_{i} + {\hat{β}}_{0} z_{i j} + \sum_{k = 1}^{l} {\hat{β}}_{k} (z_{i j} - {\hat{φ}}_{k})_{+}$ . The robust MESR estimation based on LTS is described in Algorithm 3.

Algorithm 3 : LTS

1. A subsample of size $h^{*}$ is selected randomly from the data and then the model (1) is fitted to the subsample using Algorithm 2 of Section II- C. Let $θ^{(0)}$ be the initial parameter estimate

2. Repeat

3. Based on current model parameter estimate $θ^{(s)}$ , forecast $N$ responses ${\hat{y}}_{i j}^{(s)}$ , and calculate the residuals $r_{i j}^{(s)} = y_{i j} - {\hat{y}}_{i j}^{(s)}$ . Rank the squared residu als from smallest to largest and select the first $h$ observations that cor respond to the smallest $h$ squared residuals

4. Fit the model (1) to the subsample selected in Step 3 using Algorithm 2 and get the model parameter estimate $θ^{(s + 1)}$

5. Until converge

To increase the chance of finding the global minimum, one might run Algorithm 3 from many random subsamples and choose the solution which has the smallest trimmed squares. Let $r$ be the dimension of $θ$ . The initial sample size $h^{*}$ can be any small number larger than $r$ as long as the initial parameter estimate $θ^{(0)}$ can be computed based on the subsample. The maximum breakpoint of LTS [

48], i.e., the smallest fraction of contamination that can cause the estimator to take arbitrary large values, is

0.5

, which is attained when

h = [(N + r + 1) / 2]

, where

[\cdot]

means rounding to the nearest integer. If we have the prior that the proportion of outliers is no more than

α

, we can also set

h = [N (1 - α) + 1]

, where

α

is he trimming proportion. In practice, one might try several

α

values to evaluate LTS and check how the estimate behaves with different trimming proportions [49]-[51]. In our real data application, we use a validation data to choose the trimming proportion.

III. Simulation Study

In this section, we use a simulation study to illustrate the performance of the proposed estimation procedure for the MESR. All the computations are implemented in R. We use R package segmented::segmented [

52] for breakpoint estimation and lme4::lmer [53] for random-effect estimation. We generate observations

{y_{i j}, x_{i j}, s_{i j}, z_{i j}, i = 1,2, \dots, n, j = 1,2, \dots, n_{i}}

from the following model:

y_{i j} = ϕ_{0} + ϕ_{1} x_{i j} + γ_{i 0} + s_{i j} γ_{i 1} + β_{0} z_{i j} +

β_{1} (z_{i j} - φ_{1})_{+} + β_{2} (z_{i j} - φ_{2})_{+} + ε_{i j}

(8)

where $x_{i j} \sim P o i s (10)$ , the Poisson distribution with rate parameter 10; $s_{i j} \sim U n i f o r m (5,10)$ , the uniform distribution with lower and upper limits 5 and 10, respectively. The breakpoint variables $z_{i j}$ are $n_{i}$ arithmetic sequence ranging in $(0,20)$ , $ε_{i j} \sim N (0,0.5)$ , $[\begin{matrix} γ_{i 0} \\ γ_{i 1} \end{matrix}] \sim N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} σ_{r 1}^{2} & ρ σ_{r 1} σ_{r 2} \\ ρ σ_{r 1} σ_{r 2} & σ_{r 2}^{2} \end{matrix}])$ , with $σ_{r 1} = σ_{r 2} = 1, ρ = 0.5, i = 1,2, \dots, n, j = 1,2, \dots, n_{i}$ . The other parameters in (8) are set to be: $ϕ_{0} = - 2.5$ ; $ϕ_{1} = 1.5$ ; $β_{0} = 1.5$ ; $β_{1} = 1.5$ ; $β_{2} = - 2.5$ ; $φ_{1} = 6.67$ ; and $φ_{2} = 13.33$ .

We consider the following four simulation scenarios:

1) $n = 50,$ and $n_{i}$ is randomly chosen in $(90,110)$ .

2) $n = 50,$ and $n_{i}$ is randomly chosen in $(18,22)$ .

3) $n = 200,$ and $n_{i}$ is randomly chosen in $(450,550)$ .

4) $n = 200,$ and $n_{i}$ is randomly chosen in $(18,22)$ .

First, we utilize (8) to simulate the dataset without outliers. The model is estimated using MLE. In Tables I-IV, we report the mean, median, and standard deviation (SD) for the estimates of fixed effect regression parameters, breakpoints, segmented regression parameters, and random effect covariance matrix, respectively, based on 500 replications.

Table I Simulation Results of Fixed Effect Parameter Estimates by MLE for Situation Without Outliers

Parameter	$ϕ_{0} = - 2.5$			$ϕ_{1} = 1.5$
Parameter	Mean	Median	SD	Mean	Median	SD
$n = 50,$ $n_{i} \sim U (90,110)$	-2.505	-2.508	0.125	1.500	1.499	0.002
$n = 50,$ $n_{i} \sim U (18,22)$	-2.500	-2.502	0.169	1.500	1.500	0.005
$n = 200$ , $n_{i} \sim U (450,550)$	-2.498	-2.497	0.064	1.500	1.500	0.001
$n = 200,$ $n_{i} \sim U (18,22)$	-2.491	-2.498	0.125	1.500	1.500	0.004

Table II Simulation Results of Breakpoints Estimates by MLE for Situation Without Outliers

Parameter	$φ_{1} = 6.667$			$φ_{2} = 13.333$
Parameter	Mean	Median	SD	Mean	Median	SD
$n = 50,$ $n_{i} \sim U (90,110)$	6.667	6.666	0.022	13.334	13.332	0.012
$n = 50,$ $n_{i} \sim U (18,22)$	6.667	6.664	0.053	13.334	13.333	0.034
$n = 200,$ $n_{i} \sim U (450,550)$	6.667	6.667	0.006	13.333	13.333	0.003
$n = 200,$ $n_{i} \sim U (18,22)$	6.667	6.666	0.023	13.332	13.332	0.023

Table III Simulation Results of Breakpoint Slope Estimates by MLE for Situation Without Outliers

Parameter	$β_{0} = 1.5$			$β_{1} = 1.5$			$β_{2} = - 2.5$
Parameter	Mean	Median	SD	Mean	Median	SD	Mean	Median	SD
$n = 50$ , $n_{i} \sim U (90,110)$	1.499	1.500	0.006	1.499	1.499	0.008	-2.499	-2.499	0.008
$n = 50$ , $n_{i} \sim U (18,22)$	1.499	1.500	0.013	1.502	1.502	0.020	-2.502	-2.502	0.019
$n = 200$ , $n_{i} \sim U (450,550)$	1.500	1.500	0.001	1.500	1.500	0.001	-2.500	-2.500	0.002
$n = 200$ , $n_{i} \sim U (18,22)$	1.499	1.499	0.010	1.502	1.501	0.014	-2.501	-2.502	0.014

Table IV Simulation Results of Random Effect Estimates BY MLE for Situation Without Outliers

Parameter	$σ_{r 1} = 1$			$σ_{r 2} = 1$			$ρ = 0.5$
Parameter	Mean	Median	SD	Mean	Median	SD	Mean	Median	SD
$n = 50$ , $n_{i} \sim U (90,110)$	0.969	0.959	0.108	0.978	0.976	0.101	0.504	0.503	0.121
$n = 50$ , $n_{i} \sim U (18,22)$	0.988	0.988	0.141	0.993	0.990	0.096	0.490	0.501	0.150
$n = 200$ , $n_{i} \sim U (450,550)$	0.990	0.991	0.050	0999	0.999	0.056	0.499	0.499	0.001
$n = 200$ , $n_{i} \sim U (18,22)$	0.987	0.990	0.101	0.997	1.000	0.068	0.500	0.506	0.094

From Tables I-IV, we can see that the proposed MLE algorithm performs well when the dataset does not contain any outliers. Also, when the sample size increases, the SD of each parameter estimate decreases.

Next, we simulate the dataset with outliers based on model (8). The model parameters are estimated by both Algorithm 2 and Algorithm 3. In order to check how robust each estimate is against the outliers, we randomly choose 5% of each simulated data and add 30 to the response $Y$ (the range of $Y$ is $(15,69)$ ) and 10 to the value of $X$ (the range of $X$ is $(0,10)$ ). When applying LTS, we need to choose the trimming proportion $α$ , which has long been a difficult problem. However, LTS can provide a robust model estimate as long as the proportion of outliers is less than $α$ but with low efficiency if $α$ is too large. Usually, a conservative choice of $α$ is recommended in practice. In this paper, we report the results for both $α = 0.1$ and $α = 0.2$ . Note that the results of LTS will be better if $α = 0.05$ is used.

Tables V-VIII present the simulation results for the estimates of fixed effect regression parameters, breakpoints, segmented regression parameters, and random effect covariance matrix, respectively, based on 200 replications. From the tables, it is observed that the standard MLE fails to provide the reasonable estimates of fixed effect regression parameters and random effect covariance matrix when the data contain $5 %$ outliers while LTS can provide reasonable estimates for all parameters with both $α = 0.1$ and $α = 0.2$ .

Table V Simulation Results of Fixed Effect Estimates by MLE and LTS with Different

α

Levels for Situation with Outliers

Parameter	Method	$ϕ_{0} = - 2.5$			$ϕ_{1} = 1.5$
Parameter	Method	Mean	Median	SD	Mean	Median	SD
$n = 50$ , $n_{i} \sim U (90,110)$	MLE	3.332	3.334	0.663	1.019	1.017	0.026
	LTS $α = 0.2$	-2.535	-2.539	0.283	1.500	1.500	0.004
	LTS $α = 0.1$	-2.521	-2.522	0.210	1.500	1.500	0.003
$n = 50$ , $n_{i} \sim U (18,22)$	MLE	3.293	3.298	1.069	1.490	1.487	0.045
	LTS $α = 0.2$	-2.471	-2.438	0.616	1.500	1.500	0.015
	LTS $α = 0.1$	-2.496	-2.485	0.539	1.499	1.500	0.014
$n = 200$ , $n_{i} \sim U (450,550)$	MLE	3.310	3.314	0.180	1.017	1.017	0.006
	LTS $α = 0.2$	-2.502	-2.507	0.130	1.500	1.500	0.001
	LTS $α = 0.1$	-2.502	-2.505	0.089	1.500	1.500	0.001
$n = 200$ , $n_{i} \sim U (18,22)$	MLE	3.359	3.417	1.049	1.493	1.496	0.046
	LTS $α = 0.2$	-2.464	-2.437	0.290	1.500	1.500	0.007
	LTS $α = 0.1$	-2.488	-2.491	0.267	1.500	1.499	0.007

Table VI Simulation Results of Breakpoint Estimates by MLE and LTS with Different

α

Levels for Situation with Outliers

Parameter	Method	$φ_{1} = 6.667$			$φ_{2} = 13.333$
Parameter	Method	Mean	Median	SD	Mean	Median	SD
$n = 50$ , $n_{i} \sim U (90,110)$	MLE	6.647	6.679	0.546	13.322	13.317	0.324
	LTS $α = 0.2$	6.670	6.380	0.755	13.351	13.342	0.519
	LTS $α = 0.1$	6.670	6.677	0.415	13.323	13.332	0.283
$n = 50$ , $n_{i} \sim U (18,22)$	MLE	6.319	6.317	0.204	12.656	12.651	0.139
	LTS $α = 0.2$	6.425	6.445	0.185	13.027	13.037	0.237
	LTS $α = 0.1$	6.603	6.616	0.170	13.116	13.117	0.145
$n = 200$ , $n_{i} \sim U (450,550)$	MLE	6.671	6.673	0.172	13.333	13.337	0.098
	LTS $α = 0.2$	6.670	6.685	0.293	13.325	13.330	0.166
	LTS $α = 0.1$	6.670	6.677	0.166	13.328	13.330	0.094
$n = 200$ , $n_{i} \sim U (18,22)$	MLE	6.307	6.316	0.209	12.643	12.649	0.144
	LTS $α = 0.2$	6.427	6.433	0.082	13.030	13.039	0.118
	LTS $α = 0.1$	6.606	6.610	0.076	13.190	13.116	0.067

Table VII Simulation Results of Breakpoint Slope Estimates by MLE and LTS with Different

α

Levels for Situation with Outliers

Parameter	Method	$β_{0} = 1.5$			$β_{1} = 1.5$			$β_{2} = - 2.5$
Parameter	Method	Mean	Median	SD	Mean	Median	SD	Mean	Median	SD
$n = 50$ , $n_{i} \sim U (90,110)$	MLE	1.493	1.494	0.109	1.521	1.518	0.154	-2.516	-2.522	0.163
	LTS $α = 0.2$	1.501	1.502	0.123	1.510	1.500	0.166	-2.519	-2.505	0.168
	LTS $α = 0.1$	1.506	1.503	0.070	1.505	1.501	0.083	-2.509	-2.507	0.071
$n = 50$ , $n_{i} \sim U (18,22)$	MLE	1.590	1.590	0.039	1.544	1.536	0.074	-2.610	-2.602	0.135
	LTS $α = 0.2$	1.556	1.556	0.027	1.584	1.564	0.102	-2.602	-2.580	0.109
	LTS $α = 0.1$	1.575	1.576	0.029	1.557	1.557	0.069	-2.590	-2.588	0.070
$n = 200$ , $n_{i} \sim U (450,550)$	MLE	1.500	1.502	0.035	1.500	1.499	0.040	-2.500	-2.502	0.042
	LTS $α = 0.2$	1.500	1.502	0.054	1.502	1.500	0.051	-2.499	-2.502	0.054
	LTS $α = 0.1$	1.495	1.497	0.022	1.506	1.505	0.026	-2.506	-2.499	0.028
$n = 200$ , $n_{i} \sim U (18,22)$	MLE	1.587	1.586	0.037	1.549	1.544	0.075	-2.605	-2.611	0.138
	LTS $α = 0.2$	1.574	1.574	0.014	1.557	1.556	0.048	-2.499	-2.502	0.054
	LTS $α = 0.1$	1.553	1.553	0.014	1.571	1.566	0.035	-2.506	-2.499	0.028

Table VIII Simulation Results of Random Effect Estimates by MLE and LTS with Different

α

Levels for Situation with Outliers

Parameter	Method	$σ_{r 1} = 1$			$σ_{r 2} = 1$			$ρ = 0.5$
Parameter	Method	Mean	Median	SD	Mean	Median	SD	Mean	Median	SD
$n = 50$ , $n_{i} \sim U (90,110)$	MLE	0.518	0.379	0.582	1.019	1.026	0.117	0.843	0.999	0.612
	LTS $α = 0.2$	1.000	0.993	0.097	0.993	1.003	0.097	0.509	0.510	0.115
	LTS $α = 0.1$	0.992	0.998	0.111	0.994	0.999	0.097	0.511	0.519	0.107
$n = 50$ , $n_{i} \sim U (18,22)$	MLE	0.529	0.529	0.111	0.533	0.523	0.318	0.815	0.816	0.158
	LTS $α = 0.2$	0.823	0.825	0.121	0.779	0.779	0.066	0.498	0.485	0.066
	LTS $α = 0.1$	0.746	0.744	0.114	1.080	1.073	0.079	0.494	0.493	0.079
$n = 200$ , $n_{i} \sim U (450,550)$	MLE	0.769	0.897	0.435	1.013	1.015	0.056	0.619	0.592	0.272
	LTS $α = 0.2$	0.992	0.990	0.048	0.998	0.998	0.047	0.495	0.496	0.050
	LTS $α = 0.1$	0.992	0.989	0.048	0.998	0.997	0.047	0.496	0.496	0.049
$n = 200$ , $n_{i} \sim U (18,22)$	MLE	0.535	0.531	0.118	0.553	0.550	0.315	0.787	0.787	0.148
	LTS $α = 0.2$	0.884	0.848	0.060	0.781	0.782	0.033	0.476	0.473	0.085
	LTS $α = 0.1$	0.763	0.763	0.056	1.082	1.082	0.038	0.493	0.493	0.071

IV. Real Data Analysis

In this Section, we illustrate the application of the proposed estimation procedure of MESR to forecast the electric load in Southern California, USA.

A. Data

The electric consumption data are aggregated to fifty-two 220 kV transformer banks from December 31, 2012 to November 1, 2013 in Southern California Edison’s service territory. The task is to build a forecasting model for the total electricity consumption of residential customers at each 220 kV transformer bank on weekdays.

The data cleansing of the raw dataset is done in two steps. First, we exclude daily observations for commercial customers and remove zero-usage records from the electric consumption data file. Second, we add daily temperature and humidity information for each bank according to its zipcodes.

The response variable $U_{t}$ is the aggregated customers’ hourly electricity consumption recorded by the smart meters. We use the following transformation to make it comparative among 52 subgroups:

L_{t} = l n \frac{U_{t}}{A_{t o t a l}}

(9)

In (9), the transformed response variable is derived as follows. First, we divide the aggregated usage by the total air conditioning tonnage of a residential customer. The customer is in the air conditioning cycling program. Second, we apply the log-transformation. The electricity consumption is divided by the total AC tonnage because the latter determines the numerical magnitude of the load measurements. Since the new response variable represents the electricity consumption level per unit of air conditioning tonnage, the effects of other explanatory variables are comparable among different transformer banks, which allows to use common slopes to simplify the model. Figure 1 shows the transformed response variable for a few sample banks over 5 days.

Fig. 1 Data visualization of transformed response variable $L_{t}$ (shown in the figure as $l n U_{s a g e_{p e r, t}}$ ) versus transformer bank indicator variable $B$ (shown in the figure as A $_$ Bank) and time.

The explanatory variables are collected and listed in Table IX. A two-day lagged electricity consumption variable is selected rather than the one-day lagged variable, because the estimates of the load impact of the demand response resources need to be submitted to the independent system operator one day before the actual operations. The average temperature and humidity are included because they are highly correlated with the electricity consumption. The duty cycle option variable indicates the percentage of participation rate of air conditioning load in the program, and has a substantial influence over the load impact for air conditioning cycling demand response program. The transformer bank indicator variable $B$ is chosen as the random effect, because it contains information about the data from different geographic areas, which is thus expected to be heterogeneous with different baselines. A random effect model, assuming that $B$ are sampled from a larger population, is able to incorporate the individual-specific heterogeneity of $B$ while allowing to borrow information across $B$ with much smaller number of parameters (compared with one fixed effect parameter for each of 52 values of $B$ ). In addition, this allows to extend the model to additional transformer banks.

Table Ix Seven Explanatory Variables in Real Data Application

Notation	Explanatory variable
$L_{t}^{l a g}$	Base $e$ log of two-day lagged electricity consumption
$T_{t}$	Daily average ambient temperature
$H_{t}$	Humidity of the day
$H r_{t}$	Hour/time of the day
$A_{t}$	Duty cycle percentage
$A_{t o t a l}$	Total AC tonnage under the same transformer bank
$B$	Indicator variable of transformer bank

Note: B is the random effect variable; and $H r_{t}$ is the segmented variable.

In this paper, the training dataset is chosen as the samples in the first 205 observed weekdays for all transformer banks. The testing dataset consists of the samples from the 10 observed weekdays immediately following the training dataset. The total number of testing sample is 12480.

B. Model and Result

We apply the proposed estimation procedure of MESR to forecast the electricity consumption. Figure 2 shows the hourly trend for average electric consumption and its forecasting results for a typical day. Two breakpoints locate between 02:00- $3 : 00$ a.m. and 06:00-08:00 p.m., respectively.

Fig. 2 Hourly trend between average hourly electric consumption $L_{t}$ (shown in the figure as $l n U_{s a g e_{p e r, t}}$ ) with variable $H r_{t}$ (shown in the figure as t) averaged over all transformer bank $B$ .

It seems that the curve corresponding to the actual consumption (after the log transformation) indicates three segments with two breakpoints. The first breakpoint locates between 02:00 a.m. and 03:00 a.m., and the second breakpoint lies between 06:00 p.m. and 08:00 p.m.. We have also tried the model with three breakpoints (one more breakpoint in the middle segmented area), but the BIC for two breakpoints is smaller. The forecasting curve in Fig. 2 corresponds to the forecasted values across all transformer banks for the same forecasting day. As shown in Fig. 2, we can see that the proposed model can forecast the actual values well and the fitted values also match the breakpoint relationship.

The observations collected over time within the same transformer bank are correlated. The auto-correlation function of the observation time series of each transformer bank is shown in Fig. 3, where the time series demonstrates strong auto-correlation patterns.

Fig. 3 Auto-correlation function of observation time series of each transformer bank.

Ignoring such correlation by fixed effect model would result in inefficient estimates and lose forecasted power. To incorporate such correction, the transformer bank is treated as random effects. Using a random effect model can also drastically reduce the number of unknown parameters in the model, and thus lead to more efficient parameter estimates.

Next, we describe the construction of the fixed effects. The first six explanatory variables described in Table IX, along with their two-way and three-way interactions, are considered as potential explanatory variables. The model variables are selected via least absolute shrinkage and selection operator (LASSO) regression method [

54], which improves the forecasting accuracy and interpretability of the model. The final selected MESR is expressed as:

L_{t} = B + H r_{t} + (H r_{t} - φ_{1})_{+} + (H r_{t} - φ_{2})_{+} + T_{t} + H_{t} + A_{t} + L_{t}^{l a g} + [H r_{t} + (H r_{t} - φ_{1})_{+} + (H r_{t} - φ_{2})_{+}] T_{t} + [H r_{t} + (H r_{t} - φ_{2})_{+}] H_{t} + [H r_{t} + (H r_{t} - φ_{1})_{+} + (H r_{t} - φ_{2})_{+}] A_{t} + (T_{t} + H_{t}) A_{t} + T_{t} H_{t}

(10)

where $B \sim N (0, σ_{A_B a n k}^{2})$ is the normal distribution with mean 0 and variance $σ_{A_B a n k}^{2}$ . We apply both MLE and LTS algorithms to estimate the model and compare their forecasting performances. Since the true proportion of outliers is unknown, three proportions $α = 0.15,0.10,0.05$ are selected for LTS to fit the model (10). In addition, the proposed algorithms are compared with two benchmarks, i.e., the multiple linear regression model [

55] and the cooperative quantile regression forest (QRF)/multivariate quantile regression (MQR) method [12]. We set the training/testing data partitioning of these benchmarks to be the same as the setup discussed in Section IV-A. We compare their performances by mean absolute percentage error (MAPE) and root mean squared error (RMSE) on the testing dataset. The formula for MAPE and RMSE are given by

M A P E = \frac{1}{N} \sum \frac{|y_{i j} - {\hat{y}}_{i j}|}{y_{i j}}

and

R M S E = \sqrt[]{\frac{1}{N} \sum (y_{i j} - {\hat{y}}_{i j})^{2}}

, where

N

is the total number of testing samples. For better comparison, we also report three quartiles of absolute percentage error (APE) and absolute error (AE).

From Tables X and XI, the proposed robust segmented mixed effect regression model (LTS in the tables) outperforms the multiple linear regression model [

55] and the QRF/MQR model [55]. The improvements are more significant in terms of the RMSE and AE. The reason why the LTS has a slightly higher MAPE compared with the QRF/MQR baseline is that the LTS produces a bit larger estimation errors for some transformer banks with lighter loading. This results in a higher MAPE due to the small denominator. Within the LTS method, each evaluation criterion reaches the lowest value when

α = 0.1

and is much smaller than that of MLE. The breakpoint estimates shown in Table XII confirm the locations of the breakpoints shown in Fig. 2.

Table X Forecasting Results Evaluated by APE for Last 10 Days in October 2013 with MLE Compared with LTS with Different

α

Levels

Method	MAPE (%)	25% APE (%)	50% APE (%)	75% APE (%)
GEFcom2012 [55]	17.97	5.95	12.06	19.80
QRF/MQR [12]	10.15	2.36	5.25	10.70
MLE	13.94	4.55	8.48	13.66
LTS $α = 0.05$	11.08	2.78	5.45	9.37
LTS $α = 0.1$	10.75	2.46	4.95	8.77
LTS $α = 0.15$	10.88	2.55	5.10	9.01

Table XI Forecasting Results Evaluated by RMSE for Last 10 Days in October 2013 with MLE Compared with LTS with Different

α

Levels

Method	RMSE (kWh)	25% AE (kWh)	50% AE (kWh)	75% AE (kWh)
GEFcom2012 [55]	1305.65	26.92	168.96	684.95
QRF/MQR [12]	742.73	10.01	67.77	298.06
MLE	672.88	5.78	42.19	164.08
LTS $α = 0.05$	449.68	3.98	27.01	97.45
LTS $α = 0.1$	414.42	3.65	24.73	86.33
LTS $α = 0.15$	420.00	4.75	25.26	88.84

Table XII Breakpoint Estimation for Electric Power Demand Dataset via MLE and LTS with Different

α

Levels

Method	$φ_{1}$ (hour)	$φ_{2}$ (hour)
MLE	2.64	20.47
LTS $α = 0.05$	2.27	20.47
LTS $α = 0.1$	2.27	20.77
LTS $α = 0.15$	2.27	20.47

Table XIII displays the fixed effect and breakpoints slope estimates for LTS with $α = 0.1$ . The variance estimates of the random effects and the error term are 0.0015 and 0.0052, respectively.

Table XIII Parameter Estimates for Electric Power Demand Dataset Evaluated by LTS with

α = 0.1

and Significance Level of 0.05

Parameter	Estimate	p-value
Intercept	1.063×10⁰	<0.0001
$H r_{t}$	2.100×10^-2	0.0003
$(H r_{t} - φ_{1})_{+}$	-4.713×10^-2	<0.0001
$(H r_{t} - φ_{2})_{+}$	6.665×10^-2	<0.0001
$T_{t}$	-2.198×10^-2	<0.0001
$H_{t}$	2.556×10^-2	0.0010
$A_{t}$	8.838×10^-1	<0.0001
$L_{t}^{l a g}$	-2.223×10⁰	<0.0001
$H r_{t} H_{t}$	-2.493×10^-5	<0.0001
$(H r_{t} - φ_{2})_{+} H_{t}$	2.017×10^-4	<0.0001
$H r_{t} T_{t}$	-8.165×10^-4	<0.0001
$(H r_{t} - φ_{1})_{+} T_{t}$	1.018×10^-3	<0.0001
$(H r_{t} - φ_{2})_{+} T_{t}$	-1.606×10^-3	<0.0001
$H r_{t} A_{t}$	2.012×10^-2	0.0002
$T_{t} H_{t}$	-6.684×10^-5	<0.0001
$T_{t} A_{t}$	2.763×10^-2	<0.0001
$H_{t} A_{t}$	2.763×10^-2	<0.0001

According to Table XIII, all the parameters are significant with $α = 0.05$ . When calculating the p-values, Satterthwaite method is used for approximating degrees of freedom of the t-distribution for the t-test statistics. The variable $H r_{t}$ and its breakpoints have both positive and negative slopes, and the signs match the curves in Fig. 2. Also, there is a sensible positive relationship between $A_{t}$ and electric load $U_{t}$ .

V. Conclusion

In this paper, we propose a robust segmented mixed effect regression model to forecast the electric load baseline in Southern California. When estimating unknown parameters, we propose a backfitting algorithm by combining the ideas of the penalized least square method for random-effects regression model and the linearization technique [

28] for segmented regression. In addition, we extend the idea of LTS to MESR to provide a robust model estimate. Both simulation study and real data application demonstrate the effectiveness of the proposed new estimation procedures.

Since the model is built up with hourly data, we could also aggregate the data and construct a daily electric load model. In this paper, we assume that the number of breakpoints is known. If the number of breakpoints is unknown, one could apply the selection techniques proposed by [

56]-[59] to our model. In (1), all random effects are assumed to have a multivariate normal distribution. It will be interesting to extend the work of [60] to relax the normality assumption of the random effects in (1). In addition, for LTS, although an conservation

α

or serval

α

values can be used in practice, it requires some careful tuning of

α

so that LTS can have both robustness and high efficiency. Since the response variable by AC tonnage is normalized, it is expected that there is not too much heterogeneity for the effects of other variables after controlling the heterogeneity of different banks. But it is worthy of more research trying some more complicated models such as random slopes for all other variables and their interactions as well as nonparametric regression for hour, humidity, or temperature variables.

REFERENCES

R. Adib, H. Murdock, F. Appavou et al. (2016, Dec.). Renewables 2016 global status report. Global Status Report Renewable Energy Policy Network for the 21st Century (REN21). [Online]. Available: https://www.ren21.net/wp-content/uploads/2019/05/REN21_GSR2016_FullReport_en_11.pdf [Baidu Scholar]

Q. Wang, C. Zhang, Y. Ding et al., “Review of real-time electricity markets for integrating distributed energy resources and demand response,” Applied Energy, vol. 138, pp. 695-706, Jan. 2015. [Baidu Scholar]

S. Nolan and M. O’Malley, “Challenges and barriers to demand response deployment and evaluation,” Applied Energy, vol. 152, pp. 1-10, Aug. 2015. [Baidu Scholar]

T. Wei, Q. Zhu, and N. Yu, “Proactive demand participation of smart buildings in smart grid,” IEEE Transactions on Computers, vol. 65, no. 5, pp. 1392-1406, May 2016. [Baidu Scholar]

N. Yu, T. Wei, and Q. Zhu, “From passive demand response to proactive demand participation,” in Proceedings of 2015 IEEE International Conference on Automation Science and Engineering (CASE), Gothenburg, Sweden, pp. 1300-1306, Aug. 2015. [Baidu Scholar]

N. Charlton and C. Singleton, “A refined parametric model for short term load forecasting,” International Journal of Forecasting, vol. 30, no. 2, pp. 364-368, Apr. 2014. [Baidu Scholar]

S. B. Taieb, J. W. Taylor, and R. J. Hyndman, “Hierarchical probabilistic forecasting of electricity demand with smart meter data,” Journal of the American Statistical Association, vol. 6, pp. 1-17, Mar. 2020. [Baidu Scholar]

B. Goehry, Y. Goude, P. Massart et al., “Aggregation of multi-scale experts for bottom-up load forecasting,” IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 1895-1904, May 2020. [Baidu Scholar]

Y. Chen, P. Xu, Y. Chu et al., “Short-term electrical load forecasting using the support vector regression (SVR) model to calculate the demand response baseline for office buildings,” Applied Energy, vol. 195, pp. 659-670, Jun. 2017. [Baidu Scholar]

K. Chen, K. Chen, Q. Wang et al., “Short-term load forecasting with deep residual networks,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 3943-3952, Jul. 2019. [Baidu Scholar]

L. Sehovac and K. Grolinger, “Deep learning for load forecasting: sequence to sequence recurrent neural networks with attention,” IEEE Access, vol. 8, no. 8, pp. 36411-36426, Feb. 2020. [Baidu Scholar]

A. Bracale, P. Caramia, P. De Falco et al., “Multivariate quantile regression for short-term probabilistic load forecasting,” IEEE Transactions on Power Systems, vol. 35, no. 1, pp. 628-638, Jan. 2020. [Baidu Scholar]

Z. Cao, C. Wan, Z. Zhang et al., “Hybrid ensemble deep learning for deterministic and probabilistic low-voltage load forecasting,” IEEE Transactions on Power Systems, vol. 35, no. 3, pp. 1881-1897, May 2020. [Baidu Scholar]

P. I. Feder, “The log likelihood ratio in segmented regression,” The Annals of Statistics, vol. 3, no. 1, pp. 84-97, Jan. 1975. [Baidu Scholar]

R. Beckman and R. Cook, “Testing for two-phase regressions,” Technometrics, vol. 21, no. 1, pp. 65-69, Feb. 1979. [Baidu Scholar]

J. E. Ertel and E. B. Fowlkes, “Some algorithms for linear spline and piecewise multiple linear regression,” Journal of the American Statistical Association, vol. 71, no. 355, pp. 640-648, Apr. 1976. [Baidu Scholar]

A. Tishler and I. Zang, “A maximum likelihood method for piecewise regression models with a continuous dependent variable,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 30, no. 2, pp. 116-124, Jun. 1981. [Baidu Scholar]

A. K. Wagner, S. B. Soumerai, F. Zhang et al., “Segmented regression analysis of interrupted time series studies in medication use research,” Journal of Clinical Pharmacy and Therapeutics, vol. 27, no. 4, pp. 299-309, Aug. 2002. [Baidu Scholar]

H.-J. Kim, M. P. Fay, E. J. Feuer et al., “Permutation tests for joinpoint regression with applications to cancer rates,” Statistics in Medicine, vol. 19, no. 3, pp. 335-351, Jan. 2000. [Baidu Scholar]

J. D. Toms and M. L. Lesperance, “Piecewise regression: a tool for identifying ecological thresholds,” Ecology, vol. 84, no. 8, pp. 2034-2041, Aug. 2003. [Baidu Scholar]

Q. Shao and N. Campbell, “Applications: modelling trends in groundwater levels by segmented regression with constraints,” Australian & New Zealand Journal of Statistics, vol. 44, no. 2, pp. 129-141, Dec. 2002. [Baidu Scholar]

A. E. Kunst, C. W. Looman, and J. P. Mackenbach, “Outdoor air temperature and mortality in the netherlands: a time-series analysis,” American Journal of Epidemiology, vol. 137, no. 3, pp. 331-341, Feb. 1993. [Baidu Scholar]

N. Molinari, J.-P. Daurès, and J.-F. Durand, “Regression splines for threshold selection in survival data analysis,” Statistics in Medicine, vol. 20, no. 2, pp. 237-247, Jan. 2001. [Baidu Scholar]

R. Rigby and D. Stasinopoulos, “Detecting break points in the hazard function in survival analysis,” Statistical Modelling, vol. 1992, pp. 303-311, Dec. 1992. [Baidu Scholar]

J. Shi, Y. Liu, and N. Yu, “Spatio-temporal modeling of electric loads,” in Proceedings of North American Power Symposium (NAPS), Morgantown, USA, Sept. 2017, pp. 1-6. [Baidu Scholar]

N. M. Laird and J. H. Ware, “Random-effects models for longitudinal data,” Biometrics, vol. 38, no. 4, pp. 963-974, Dec. 1982. [Baidu Scholar]

P. Diggle, P. J. Diggle, P. Heagerty et al., Analysis of Longitudinal Data. Oxford: Oxford University Press, 2002. [Baidu Scholar]

V. M. Muggeo, “Estimating regression models with unknown break-points,” Statistics in Medicine, vol. 22, no. 19, pp. 3055-3071, Sept. 2003. [Baidu Scholar]

D. Bates. (2020, Nov.). Computational methods for mixed models. [Online]. Available: https://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf [Baidu Scholar]

P. J. Rousseeuw, “Least median of squares regression,” Journal of the American Statistical Association, vol. 79, no. 388, pp. 871-880, Jan. 1984. [Baidu Scholar]

V. M. Muggeo, D. C. Atkins, R. J. Gallop et al., “Segmented mixed models with random changepoints: a maximum likelihood approach with application to treatment for depression study,” Statistical Modelling, vol. 14, no. 4, pp. 293-313, May 2014. [Baidu Scholar]

M. Muggeo. (2016, Feb.). Segmented mixed models with random changepoints in R. [Online]. Available: https://www.researchgate.net/publication/292629179 [Baidu Scholar]

A. Dominicus, S. Ripatti, N. L. Pedersen et al., “A random change point model for assessing variability in repeated measures of cognitive function,” Statistics in Medicine, vol. 27, no. 27, pp. 5786-5798, Nov. 2008. [Baidu Scholar]

T. Hastie and R. Tibshirani, Generalized Additive Models. Virginia Beach: Chapman and Hall/CRC, 1990. [Baidu Scholar]

C. Gössl and H. Küchenhoff, “Bayesian analysis of logistic regression with an unknown change point and covariate measurement error,” Statistics in Medicine, vol. 20, no. 20, pp. 3109-3121, Oct. 2001. [Baidu Scholar]

J. Jiao, Z. Tang, P. Zhang et al., “Ensuring cyberattack-resilient load forecasting with a robust statistical method,” in Proceedings of 2019 IEEE PES General Meeting (PESGM), Atlanta, USA, pp. 1-5, Aug. 2019. [Baidu Scholar]

J. Luo, T. Hong, and S. Fang, “Robust regression models for load forecasting,” IEEE Transactions on Smart Grid, vol. 10, no. 5, pp. 5397-5404, Sept. 2019. [Baidu Scholar]

P. J. Huber, Robust Statistics. New York: John Wiley and Sons, 1981. [Baidu Scholar]

L. A. Jaeckel, “Estimating regression coefficients by minimizing the dispersion of the residuals,” The Annals of Mathematical Statistics, vol. 43, no. 5, pp. 1449-1458, Oct. 1972. [Baidu Scholar]

A. F. Siegel, “Robust regression using repeated medians,” Biometrika, vol. 69, no. 1, pp. 242-244, Apr. 1982. [Baidu Scholar]

P. Rousseeuw and V. Yohai, “Robust regression by means of s-estimators,” in Robust and Nonlinear Time Series Analysis. Berlin: Springer, 1984, pp. 256-272. [Baidu Scholar]

V. J. Yohai, “High breakdown-point and high efficiency robust estimates for regression,” The Annals of Statistics, vol. 15, no. 2, pp. 642-656, Jun. 1987. [Baidu Scholar]

D. Gervini and V. J. Yohai, “A class of robust and fully efficient regression estimators,” The Annals of Statistics, vol. 30, no. 2, pp. 583-616, Apr. 2002. [Baidu Scholar]

Y. She and A. B. Owen, “Outlier detection using nonconvex penalized regression,” Journal of the American Statistical Association, vol. 106, no. 494, pp. 626-639, Jun. 2011. [Baidu Scholar]

Y. Lee, S. N. MacEachern, and Y. Jung, “Regularization of case-specific parameters for robustness and efficiency,” Statistical Science, vol. 27, no. 3, pp. 350-372, Aug. 2012. [Baidu Scholar]

C. Yu, K. Chen, and W. Yao, “Outlier detection and robust mixture modeling using nonconvex penalized likelihood,” Journal of Statistical Planning and Inference, vol. 164, no. 1, pp. 27-38, Sept. 2015. [Baidu Scholar]

C. Yu and W. Yao, “Robust linear regression: a review and comparison,” Communications in Statistics-Simulation and Computation, vol. 46, no. 8, pp. 6261-6282, Mar. 2017. [Baidu Scholar]

D. L. Donoho and P. J. Huber, “The notion of breakdown point,” in A Festschrift for Erich L. Lehmann. Belmon: Wadsworth International Group, 1983. [Baidu Scholar]

N. Neykov, P. Filzmoser, R. Dimova et al., “Robust fitting of mixtures using the trimmed likelihood estimator,” Computational Statistics & Data Analysis, vol. 52, no. 1, pp. 299-308, Sept. 2007. [Baidu Scholar]

M. Li, S. Xiang, and W. Yao, “Robust estimation of the number of components for mixtures of linear regression models,” Computational Statistics, vol. 31, no. 4, pp. 1539-1555, Aug. 2016. [Baidu Scholar]

L. Yang, S. Xiang, and W. Yao, “Robust fitting of mixtures of factor analyzers using the trimmed likelihood estimator,” Communications in Statistics–Simulation and Computation, vol. 46, no. 2, pp. 1280-1291, Feb. 2017. [Baidu Scholar]

V. M. Muggeo, “Segmented: an R package to fit regression models with broken-line relationships,” R News, vol. 8, no. 1, pp. 20-25, Jan. 2008. [Baidu Scholar]

D. Bates, M. Mächler, B. Bolker et al., “Fitting linear mixed-effects models using lme4,” Journal of Statistical Software, vol. 67, no. 1, pp. 1-48, Oct. 2015. [Baidu Scholar]

R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267-288, Jan. 1996. [Baidu Scholar]

T. Hong, P. Pinson, and S. Fan, “Global energy forecasting competition 2012,” International Journal of Forecasting, vol. 30, no. 2, pp. 357-363, Apr. 2014. [Baidu Scholar]

J. Liu, S. Wu, and J. V. Zidek, “On segmented multivariate regression,” Statistica Sinica, vol. 7, no. 2, pp. 497-525, Apr. 1997. [Baidu Scholar]

M. S. Ben Aïssa, M. Boutahar, and J. Jouini, “Bai and Perron’s and spectral density methods for structural change detection in the US inflation process,” Applied Economics Letters, vol. 11, no. 2, pp. 109-115, Feb. 2004. [Baidu Scholar]

B. Strikholm and T. Teräsvirta, “Determing the number of regimes in a threshold autoregressive model using smooth transition autoregressions,” Tech. Rep. SSE/EFI Working Paper Series in Economics and Finance, Jan. 2005. [Baidu Scholar]

R. Prodan, “Potential pitfalls in determining multiple structural changes with an application to purchasing power parity,” Journal of Business & Economic Statistics, vol. 26, no. 1, pp. 50-65, Jan. 2008. [Baidu Scholar]

K. J. Lee and S. G. Thompson, “Flexible parametric models for random-effects distributions,” Statistics in Medicine, vol. 27, no. 3, pp. 418-434, May 2008. [Baidu Scholar]

Address:No.19 Chengxin Avenue, Jiangning District, Nanjing 211106, China

E-mail: mpce@alljournals.cn

Tel:86-25-81093060

Fax:86-25-81093040

Home

Introduction

Editorial Board

For Author

Call For Papers

APC

Sponsor & Publisher

A Robust Segmented Mixed Effect Regression Model for Baseline Electricity Consumption Forecasting PDF

Abstract

Keywords

I. Introduction

II. MESR and Proposed Robust Estimation Algorithm

A. Estimation of Breakpoints

B. Estimation of Mixed Effect Regression Models

C. Estimation of Mixed Effect Breakpoints

D. Estimation of Robust Segmented Mixed Effect Regression

III. Simulation Study

IV. Real Data Analysis

A. Data

B. Model and Result

V. Conclusion

REFERENCES