1 Introduction

The accuracy of electric load forecasts is crucial to the operations and planning activities in the electric power industry. During the past 30 years, most papers in the load forecasting literature have been devoted to the investigations of various forecasting techniques, such as regression analysis, time series analysis, and artificial neural networks [1,2,3,4]. Some of them, together with several emerging techniques, such as gradient boosting machines and random forests, have been recognized through notable load forecasting competitions [5,6,7,8,9]. Another track of research has focused on exploring the various variables for load forecasting models. The load or log-transformed of load is used in almost every load forecasting paper as the dependent variable [10, 11]. The frequently used explanatory variables include weather and calendar variables [12,13,14,15]. For medium and long term load forecasting, economy variables are usually used to drive the trend [10].

A typical electricity demand series presents multiple seasonal patterns. For instance, the demand is high during summer and winter and low during spring and fall due to the response to cooling and heating needs. The demand during workdays is typically higher than that of the weekend days. The demand during the daytime is mostly higher and more volatile than that during the sleeping hours at night. As a result, many load forecasting models use the calendar variables such as months of a year, days of a week, and hours of a day as the inputs to capture the multiple seasonal patterns in the demand series [1, 9, 10, 12].

When modeling the annual seasonality, the Gregorian calendar is usually used in the load forecasting models. It dissects the days of a year into 12 months based on the Moon’s orbit around the earth. Many literature directly used month as a class variable in the load forecasting model [7, 9, 10, 12]. Some other literature reported the usage of season as a class variable in the load forecasting model, where the season is usually pre-defined by grouping the 12 months of a year into summer and winter periods, four seasons, or four seasons plus some transition seasons [15, 16].

In fact, the season marked by changes in the weather is a result from the yearly orbit of the earth around the sun. Therefore, the widely-used Gregorian calendar, which is based on the Moon’s orbit around the Earth, for load forecasting may not be an accurate indicator for the change of the season. Alternatively, the 24 solar terms originated from China to guide the agriculture activities was developed based on the sun’s position in the zodiac. This leads to our research question: is the solar-term calendar better than the Gregorian calendar in load forecasting models?

This is the first formal study in the load forecasting literature that seeks better means than the Gregorian calendar to categorize days of a year. The case study is conducted for six states of the U.S. served by ISO New England (ISONE). Two forecast evaluation settings, namely cross validation and sliding simulation, are used to evaluate the effectiveness of the proposed method.

The rest of the paper is organized as follows: Section 2 introduces the case study data, the benchmark model using 12 months, and the proposed model using 24 solar terms. Section 3 describes the setup of the case study and presents the results. Section 4 discusses some practical considerations for implementing the proposed method. The paper is then concluded in Section 5 with a discussion on future research directions.

2 Background

2.1 Data

In this paper, we use nine years of hourly load and temperature data published by ISONE from 2008 to 2016 to conduct the case study [17]. ISONE serves the six states in the northeast U.S. including Connecticut (CT), Massachusetts (MA), Maine (ME), New Hampshire (NH), Rhode Island (RI), and Vermont (VT). Each of the five states except MA is considered as a load zone, while MA is dissected into three load zones, namely Northeastern MA and Boston (NEMASS), Southeastern MA (SEMASS), and Western Central MA (WCMASS). The system total (ISONE) is the sum of the load from these eight load zones. The load data is adjusted for the daylight-saving time (DST) as follows: at the beginning hour of DST, the zero reading is replaced by the average of the adjacent two hours; at the ending hour of DST, the load is divided by two.

Table 1 presents the summary statistics of the load and temperature data by zone and for the system for these nine years. Figure 1 shows the time series plots of the ISONE hourly load and temperature from year 2012 to 2016. The scatter plots in Fig. 2 depict the relation between load and temperature across the 12 months of a year. The time series plots and scatter plots at the zonal level are similar to those of the system total. In Section 3, we will present the experiments conducted for all eight load zones and the system total.

Table 1 Summary statistics of ISONE load zones and system total
Fig. 1
figure 1

Time series plots of hourly load and temperature (Zone = ISONE)

Fig. 2
figure 2

Scatter plot between load and temperature by month (Zone = ISONE; Year = 2012)

2.2 Benchmark model

Multiple linear regression (MLR) is a widely-deployed load forecasting technique in the field. One frequently cited MLR based load forecasting model is Tao’s Vanilla benchmark model, which was used as the benchmark model in the load forecasting track of Global Energy Forecasting Competition 2012 [7].

The same Vanilla model as defined in (1) is also used as the benchmark in this paper:

$$\begin{array}{*{20}l} {E_{Load} = \beta_{0} + \beta_{1} Trend + \beta_{2} M + \beta_{3} W + \beta_{4} H} \hfill \\ { + \beta_{5} WH + \beta_{6} T + \beta_{7} T^{2} + \beta_{8} T^{3} + \beta_{9} TM} \hfill \\ { + \beta_{10} T^{2} M \, + \, \beta_{11} T^{3} M \, + \, \beta_{12} TH \, + \, \beta_{13} T^{2} H+ \, \beta_{14} T^{3} H } \hfill \\ \end{array}$$
(1)

where E Load is the expected load; β i are the coefficients estimated using the ordinary least squares method; Trend represents chronological trend; M, W, and H are class variables representing the coincidence month of a year, day of a week, and hour of a day, respectively; T is the coincidence temperature. The multiplications between these variables represent the cross effects or interactions.

2.3 Solar term model

The 24 solar terms divide the Sun’s annual circular motion into 24 segments. Each segment occupied a 15-degree along the ecliptic. However, since the Sun’s speed along the ecliptic varies depending on the distance between the Earth and the Sun, the number of days that the Sun takes to travel between two adjacent solar terms varies slightly throughout a year. As a result, the 24 solar terms are not exactly equal sized pieces. Table 2 lists the starting dates of the 24 solar terms from the year 2008 to 2016.

Table 2 Starting dates of 24 solar terms (2008 to 2016) of a year [18]

The division of the solar terms has considered the variation of natural phenomena, such as seasons, climates and phenology. Intuitively, it is more precise than the Gregorian calendar to reflect the seasonal changes of weather. We use the solar month, denoted by SM, to replace the variable month, denoted by M, in the benchmark model 1 to obtain our proposed model 2:

$$\begin{array}{ll} {E_{Load} = \beta_{0} + \beta_{1} Trend + \beta_{2} SM + \beta_{3} W + \beta_{4} H + \beta_{5} WH} \hfill \\ { + \beta_{6} T + \beta_{7} T^{2} } \hfill \\ { + \beta_{8} T^{3} + \beta_{9} T \cdot SM + \beta_{10} T^{2} \cdot SM \, + \, \beta_{11} T^{3} \cdot SM \, + \, \beta_{12} TH+ \, \beta_{13} T^{2} H \, + \, \beta_{14} T^{3} H} \hfill \\ \end{array}$$
(2)

where SM is a class variable representing the 24 solar terms of a year.

3 Case study

The case study is built based on the nine years of data introduced earlier. For each of the eight load zones and the system total, we implement two forecast evaluation settings, cross validation and sliding simulation.

The last six years (2011 – 2016) are used to conduct the 6-fold cross validation. The data is divided by 6 pieces based on the calendar year. Each time we use five years of data to estimate the parameters of the model and use them to predict the other year. All nine years (2008 – 2016) are used to conduct the sliding simulation. Each time three consecutive years of history is used for parameter estimation. The resulting model is then used for one-year ahead ex post forecasting.

A widely-adopted error measure in the load forecasting literature, Mean Absolute Percentage Error (MAPE) as defined in (3), will be used to evaluate the forecast performance in this paper. The lower the MAPE value is, the better the forecasts are:

$$MPAE = \frac{100}{N}\sum\limits_{i = 1}^{N} {|(A_{i} - P_{i} )/A_{i} |}$$
(3)

where N is the number of observations; A i is the actual load; P i represents the predicted load. We evaluate the performance of the models based on the MAPE of each validation year and the average of MAPEs across all validation years.

3.1 V-fold cross validation

Cross validation is a widely-used forecast evaluation technique to avoid the potential overfitting issues [19]. In this study, we adopt the V-fold cross validation (VFCV) technique. It first partitions the data into V equally (or nearly equally) sized segments. One of the V segments is used as validation data and the rest V-1 segments are used as training data. This process is repeated V times without replication of the validation dataset. The performance of the model is usually evaluated based on its average performance across these V segments. In this study, we divide six years (2011 to 2016) of data into six segments by the calendar year for cross validation. For example, when predicting the load for the year of 2012, we use the data from 2011 to 2016 excluding 2012 as the training data to estimate the parameters of the model.

Table 3 lists the MAPEs (in %) of each validation year under the 6-fold cross validation setting. The two columns under each year list the average MAPE values of the two models introduced in Section 2.2 and 2.3, respectively. For each load zone, the bold MAPE values represent the model with better performance. The proposed model returns lower MAPE values for most of the six validation years. On average, eight of the nine zones have a lower average MAPE value from the proposed model with the relative MAPE improvement ranging from 0.20% to 5.10%.

Table 3 MAPEs (in %) of the proposed (Pro.) and the benchmark (Ben.) model (6-fold cross validation)

3.2 Sliding simulation

Another widely-used forecast evaluation technique is the sliding simulation [20]. Comparing with cross validation, sliding simulation better mimics the forecasting operations in real-world. Specifically, the sliding simulation technique allows the coefficients of a model to be estimated with a pre-defined length of historical window (e.g. three years) to forecast a period ahead (e.g. one year). For example, we first forecast the year of 2011 using the data from 2008 to 2010 as the training data. We then advance the forecast origin by a year to forecast the year of 2012 using the data from 2009 to 2011 as the training data. We repeat this rolling process until we complete the forecasts for all six years (2011-2016).

Table 4 lists the MAPEs (in %) under the sliding simulation setting. The results confirm what we have observed from the cross validation setting: for each zone, the proposed model outperforms the benchmark one for most of the validation years. On average, eight of the nine zones have a lower average MAPE value from the proposed model with the relative MAPE improvement ranging from 0.21% to 6.14%. For NEMASS, the difference in the average MAPE is negligible.

Table 4 MAPEs (in %) of the proposed (Pro.) and the benchmark (Ben.) model (sliding simulation)

Figure 3 is a time series plot of the actual load and two forecasted load profiles from the benchmark model and the proposed one in a summer week of 2016. For most forecasted hours, the forecasts from the proposed model are closer to the actual ones than those from the benchmark model. The MAPE value is 3.50% from the proposed model, while it is 4.36% from the benchmark one for this summer week. Figure 3 also shows that both forecasts are much higher than the actual load during the period of June 13 to June 15. This error can be further improved by introducing the recency effect to the model, which was discussed in detail in [4].

Fig. 3
figure 3

Actual vs. forecasted load in a summer week of 2016 (ISONE)

4 Discussion

4.1 Other potential studies

In this paper, the sliding simulation was conducted for one-year ahead forecasting under the regression framework. While the results from the experiment demonstrated the superior performance of using the 24 solar terms than its counterpart that uses the 12 Gregorian calendar months. Additional empirical case studies are encouraged to investigate the performance of including the proposed solar term variable in load forecasting models for other forecast horizons and on other forecasting techniques.

The 24 solar terms were originated from ancient China and then widely used in Southeast Asia to guide the agriculture activities. In this paper, the same solar terms were used for the load forecasting for six different states in the northeast U.S. Although these six states are in a different climate zone than China or Southeast Asia, this case study showed that the solar terms can still effectively categorize the days of a year for the load forecasting purpose. Following a similar analysis framework, studies can be conducted for other climate regions in order to get a more comprehensive understanding on using solar terms for load forecasting.

4.2 Tradeoff

When a regression model includes class variables, the complexity of the model is affected by the number of levels of these class variables. For example, when the 24 solar terms are included in the vanilla benchmark model to replace the original 12 months of a year, 23 instead of 11 parameters need to be estimated for the class variable SM. The interaction terms between SM and temperature variables further increases the number of estimated parameters. Although estimating these two models takes almost the same time using the GLM procedure in SAS®/STAT [21]: it takes 0.134 second to estimate the benchmark model 1 for one zone and one test year (i.e. one regression model) and takes 0.146 second to estimate the proposed model 2 for the same zone and year, but a more complex model may cause an overfitting issue as commented below.

When there are sufficient observations in the training data, it is probably fine to use a complicated model. In the case that the training data is limited, we may encounter the overfitting issues caused by too many parameters supported by insufficient amount of observations. However, avoiding the overfitting is both an art and science. The forecaster need to figure out a trade-off between the signal captured by the increased amount of variables and the noise over fitted by the same variables. A similar tradeoff analysis was presented in [4], where the authors showed that the inclusion of many (but not too many) variables helps improve the forecast accuracy.

Given the same training data and same model structure, using the solar terms with 24 levels is more likely to over fit the data than using the month with 12 levels. Meanwhile, the solar term model is more likely to capture more signals than the model based on 12 months. In the case study presented in this paper, the solar term model wins. Nevertheless, the performance of solar term models may degrade due to the overfitting issue as the underlying model structure becomes even more complicated, such as including the lagged and moving-average temperature variables interacting with the solar terms. This study points to another future research direction: how to group similar solar terms together to reduce the complexity of the model?

4.3 Data-driven models

In this paper, we have compared the use of 12 months and 24 solar terms. It is worth noting that there are other types of calendars being adopted in different parts of the world. An alternative to the use of these calendar models is a data-driven approach. Instead of adopting any calendar to come up with the “month-equivalent” variable, we can simply group the days of year via trial and error, such as enumerating the group size from 1 to 365 days. When the ultimate goal is to improve forecast accuracy, such a data-driven approach may be advantageous to its alternatives. However, the real world applications look for the features of a solution beyond its accuracy, such as interpretability, simplicity, consistency and defensibility. The practitioners tend to be more receptive to those conventional calendars, no matter Gregorian calendar or Chinese solar calendar, for these reasons.

5 Conclusion

In this paper, we used the solar terms as an alternative to the widely-used class variable month in the load forecasting model. The case study was conducted for the eight zones and the system total of ISONE for one-year ahead forecasting. The results from both V-fold cross validation and sliding simulation demonstrated the effectiveness of the proposed method in comparison to the Vanilla model using month to categorize the days of a year. Additional empirical studies are encouraged to test the effectiveness of the proposed method for load forecasting problem with other forecast horizon, using different modeling techniques, or for datasets in different climate zones. In addition, the findings from this paper also spark other interesting questions for future research. Recall that grouping the days of a week was discussed in [12] to reduce the complexity of the model. Following the similar idea, one may consider grouping the solar terms to overcome the potential overfitting issue as the underlying model grows with additional weather variables. The readers are welcome to test other calendars and data-driven methods as well.