Abstract
The increasing integration of renewable energy sources into current power systems has posed the challenge of adequately representing the statistical properties associated with their variable power generation. In this paper, a novel procedure is proposed to select a proper synthetic time series generation model for renewable energy sources to analyze power system problems. The procedure takes advantage of the objective of the specific analysis to be performed and the statistical characteristics of the available time series. The aim is to determine the suitable model to be used for generating synthetic time series of renewable energy sources. A set of indicators is proposed to verify that the statistical properties of synthetic time series fit the statistical properties of the original data. The proposal can be integrated into systematic tools available for data analysis without compromising the representation of the statistical properties of the original time series. The procedure is tested using real data from the New Zealand power system in a mid-term analysis on integrating wind power plants into the power system. The results show that the proposed procedure reduces the error obtained in analyzing power systems compared with reference models.
THE contribution of renewable energy sources (RESs) to power systems, e.g., wind- and solar-based energy, has undergone an accelerated expansion, and today, new wind farms and solar power plants are under construction or at the planning stage in different countries of the world [
With regard to its application to power systems, SS have other possible applications, e.g., ① the spinning-reserve required for their secure operation [
The selection of a model to generate SS is not a trivial task [
In the literature, Markov chains are widely used to generate SS for the analysis of power systems. These models can represent time series, whose statistical properties differ from the normal distribution [
Furthermore, autoregressive integrated moving average (ARIMA) models are reported to generate long-term SS analyses, where the energy is more important than the short-term power produced by the RES [
While Markov chains and ARIMA models can only represent temporal relationships in time series, vector autoregressive (VAR) models are used for the generation of SS in power system, where the spatial dependencies among the time series of RES are important (for instance, expansion planning analyses) [
Significant efforts to develop new models for generating SS can be found in [
This paper proposes a novel procedure that selects the models needed to generate SS for a specific analysis of power systems. The procedure can be applied to study the operation and expansion of power grids with high penetration of renewable energy. The results indicate that the proposed procedure allows choosing, from several candidate models, the one that best fits the objective of the study and that better reconstructs the statistical characteristics of the original time series. The proposed procedure considers the following general steps.
1) The input of the specific analysis objective is considered.
2) With the information contained in the raw data, it determines the models that best fit the requirements of the analysis.
3) Finally, the model is selected by applying different statistical tests and computing a set of proposed indicators that account for the appropriateness of each model to represent the statistical properties of the original data.
Thus, the main contributions of this paper include: ① definitions and propositions of criteria needed for a proper selection of models to generate SS that can be used in different studies related to the operation and expansion of power systems; ② a systematic procedure to be followed and the statistical analyses necessary to select a suitable model for the generation of SS; ③ a procedure to systematize and define transformations by estimating the order and adjusting the models appropriately.
The remainder of this paper is organized as follows. Section II presents the proposed procedure. In Section III, the results are obtained by applying the proposed procedure. Finally, Section IV sets out the concluding remarks and future work.
A suitable model for the generation of SS should produce an independent, identically distributed sample, each having the same fundamental properties as the original time series without replicating it exactly [
Firstly, in the model specification (also referred to as model identification), the different time series models that may be suitable for specified observed series are selected. In this step, the selected model is tentative and subject to revision later in the subsequent analysis. When choosing a model, in [
Secondly, the model fitting consists of finding the best possible estimates of these unknown parameters within a given model. For instance, the least-squares method can be considered.
Thirdly, the model verification is in charge of assessing the quality of the model that has been specified and estimated. In this step, two fundamental aspects are assessed. Initially, it is necessary to define whether the selected model fits the data. Next, it is necessary to assess whether the model assumptions are reasonably well satisfied. Thus, two situations may arise after the model verification, i.e., ① if there are no deficiencies found, it can be assumed that the modeling is complete and can be used; ② if deficiencies are found, another model is selected, i.e., we return to the model specification step. Finally, the three steps are repeated until, ideally, an acceptable model is found.
The proposed procedure involves selecting and tuning the parameters of a model for generating SS for power system analysis. The procedure is based on an approach that classifies the most common models developed in the existing literature and simultaneously compares them using various indicators.

Fig. 1 Classification of models for generation of SS.
The proposed procedure focuses on modeling time series of wind speed, wind power, solar radiation, and solar power. This is because wind- and solar-based RESs are currently the most-widely employed ones across power systems.
The proposed procedure takes the analysis and available time series (dataset) of the RES as inputs.

Fig. 2 Flow chart of model selection for generation of SS.
The steps shown in
1) Definition of Analysis to Be Performed
This step defines the specific type of power system analysis to be performed and some modeling requirements for the time series of the energy resources. The following aspects should be clarified in this step: ① time horizon of the analysis; ② time frame; ③ simulation type; ④ the role of RES in the analysis; ⑤ power system model. There are two types of analysis in this paper: ① system expansion planning; ② system operation planning.

Fig. 3 Types of analysis to be performed in power system and corresponding time horizon.
2) Selection of Time Series
In this step, the time series (raw data) of the RES are selected. The following aspects must be verified in this step: ① location; ② measured variables; ③ data length; ④ sampling time. The time series available to perform the analyses of power systems are often not generated at the locations of interest. In these cases, atmospheric models are used to generate time series at the locations of interest [

Fig. 4 General scheme for selection of time series.
In order to exemplify what is shown in
3) Analysis of Information to Assess Compliance with Requirements
In this step, the compliance with the requirements in the first step and the features of the available time series of the energy resources is assessed. If the time series meet all the requirements, there will be enough information to perform the desired analysis. Otherwise, the step of selection of time series is to be revisited for the missing information. Note that this is a checking step. It is verified that the following conditions are met, depending on the analysis to be performed.
1) As for the system expansion planning analysis, there are time series available for each energy development center; the time series are at least one year; and the sampling time of the time series is at least one hour.
2) As for the system operation planning, there are time series available for each RES currently in operation; the sampling time of the time series is one hour at the most; and the time series are synchronized (if the focus is in a very-short-time system operation planning).
If these conditions are not satisfied by the available time series, additional information about the RES centers should be gathered, or the focus of the analysis should be adapted to the information that the series provides. There is a dependency between the first and the second steps, which motivates the addition of a link between the first and the second steps in
4) Preprocessing of Raw Data and Statistical Analysis
In this step, a statistical characterization of the time series is carried out. This involves the preprocessing of raw data to convert the RES time series to meet the requirements for a proper application of SS models.

Fig. 5 General scheme for preprocessing of raw data and statistical analysis.
1) Trends and seasonality are identified through an augmented Dicker-Fuller test [
2) Temporal and spatial correlations are identified using the ACF and the cross-correlation function (CCF), respectively. When there is more than one power plant based on RES, the cross-correlation matrix (CCM) is used to identify the spatial correlation. Further, the single- and multi-variable versions of the Ljung-Box test are applied as complements [
Furthermore, since several models require that the time series have a normal probability density function, the procedure in [
Next, we describe how the tests and functions are used in this step. Firstly, the trends and seasonality are identified with hourly, daily, weekly, and monthly resolutions [
5) Definition of Candidate Models and Estimation of Model Parameters
In this step, a set of models suitable for the generation of SS is defined. The analysis requirements to be performed and the statistical features of the time series are considered for this purpose. Also, the assumptions/properties of the available models are checked. The set of models suitable for the generation of SS is formed by following a decision tree. In this step, the following three conditions are checked.
1) The need for spatial-temporal representation
Since just a few models satisfy this requirement (as shown in
2) Type of time series where the model is used
This includes solar radiation/power and/or wind speed/power in similar analyses. As a result, a set of available models is brought down to those previously used to represent the same energy source in similar power system analysis.
3) Compliance degree between statistical properties of time series and assumptions/requirements of models
The models that do not require the preprocessing of raw data are identified, thereby defining the set of suitable models, i.e., the second possible outcome of the decision tree. If all models require the preprocessing of the raw data, then they are classified into two categories: ① the models that only require a numerical derivative to comply with all their assumptions/requirements; ② the models that require the use of an integral transformation to comply with all their assumptions/requirements. The two sets of suitable models and their preprocessing methodologies constitute the third possible outcome of the decision tree.
6) Selection of Final Model
In this step, the final model is selected from the set of suitable models defined in the previous step. The structures and parameters are also identified. In order to select the suitable model, a five-step procedure is proposed as follows: ① analysis of data from the previous processing; ② determination of model parameters; ③ generation of SS; ④ calculation of assessment indicators on model benefits; ⑤ analysis of indicators and selection of the final model. To determine the model parameters, we follow the instruction given in the works where the models are presented.
The best model is selected considering the performance indicators of each model, which are based on the deviation of the statistics obtained from the SS and the original time series. The statistics considered are the mean, variance, standard deviation, and the quantiles 10%, 25%, 50%, 75%, and 90%. Besides, the deviations from the probability density function, ACF, PACF, and the CCM are also used as indicators. All these indicators are computed with the same resolution used in the statistical analysis, e.g., hourly, daily, weekly, and/or monthly, to prevent representation errors [
1) Root mean square error (RMSE)
(1) |
2) Root mean squared relative error (RMSRE)
(2) |
3) Error measurement
(3) |
where is the difference between the model and the time series for the th statistical feature; N is the number of statistical features considered; and is the th statistical feature of the time series to be represented through the SS; is the calculated using monthly statistics; is the RMSRE of the CCM; and is the average of and .
The model that gives the minimum error measurements is selected as the final model to generate SS for the power system analysis. If the error measurements are not significantly different from one to another, an additional analysis is carried out based on the objective of the analysis to be performed. For instance, in very short-term analyses that include chronological simulations, the spatial-temporal relationships are essential. Hence, the model with the minimum error measurements in the CCM is selected to generate SS. In contrast, for the medium, large, and very-large analyses, wherein energy simulations are considered, the energy contribution of the RES is highly important. Then, the model with the minimum error measurements in the statistical seasonality indicators is selected to generate SS. If chronological simulations of typical days are included, the SS must adequately represent the spatial-temporal relationships, the seasonality, and the energy contribution of the RES. Then, the model with the minimum error measurements in both statistical seasonality indicators and CCM is selected.
Note that the proposed procedure for the model selection considers that all assumptions are made in the formulation of the models (e.g., the requirements of co-variance stationarity and normal distribution of time series for ARMA models). The proposed procedure also attempts to prevent the models without previous statistical knowledge of the time series to be represented. Furthermore, by following the procedure, the model that better fits with the requirements of the analysis and the time series features is selected. This avoids the review and model selection procedures when only a model exploration search is performed. Finally, by following the proposed procedure, we diminish the error in the analysis that could appear due to the lack of representation in the statistical properties of the time series related to RES. In addition, the proposed procedure follows the logic of expert systems. Therefore, it could be integrated with other data analysis tools and/or used independently as an additional tool by system operators. This is beyond the scope of this paper, and therefore, such implementation is not included.
A case study is presented in Section III, where the proposed procedure is applied in system expansion planning.
In this section, the proposed procedure is applied to a mid-term operation planning analysis considering two wind power stations. The proposed procedure is implemented in the R and MATLAB software.
Step 1: the case study considers the system operation planning with a two-year horizon. The system is mainly thermal, and chronological simulations are included in the analysis. The model for the simulations considers the main transmission lines and the short-term technical constraints of the system. The addition of new power plants is not included in the analysis. Two wind power plants already in operation in the system are modeled.
Step 2: for this specific case, the time series for these power plants have an hourly resolution and correspond to the measurements made in New Zealand from 2004 to 2008 at STH1 and CKS1 locations (STH1 and CKS1 mean one location in wind sites in the Southland and Otago, and Cook Strait, respectively) [
Step 3: the time series requirements for this case study are as follows: a time frame of two years, a resolution of at least an hour, and spatial-temporal correlation. The latter is required because a chronological simulation is considered. The available time series of wind speed meet the first two requirements. The spatial-temporal correlation is verified through the CCF of the series.
Step 4:

Fig. 6 Time series spatial-temporal correlation. (a) ACF of time series at STH1 location. (b) CCF of time series at STH1 and CKS1 locations.
The next step is to determine whether the time series are normally distributed.

Fig. 7 Histogram of wind speed time series at STH1 location.
Then, the seasonality in the time series is assessed. The box plot of the times series is depicted in

Fig. 8 Box plot for STH1 location with hourly resolution.
Step 5: the set of suitable models should be defined. Given the spatial-temporal representation constraint, only the models with this capability are considered. Since raw datasets are synchronized with hourly and monthly seasonalities, and are not normally distributed, VAR models with different preprocessing methods [
(4) |
(5) |
(6) |
(7) |
(8) |
where is the vector of each time series; is the vector of the means computed with a monthly resolution; is the vector of the means computed with an hourly resolution; is the vector of the transformed time series; is the vector of probability density functions; is the vector of inverse standard normal distributions; is the resulting stationary normally distributed time series; is the vector of seasonality means; is the vector of seasonality standard deviation; and is the seasonality pattern, which belongs to the set S composed of hourly and monthly seasonality.
Once the raw data are preprocessed, the resulting time series associated with each original series are used to identify the structure and parameters of each model. The identification procedure ends when the residuals of the model related to behave as uncorrelated white noise. Furthermore, the SS generated using the model is normally distributed. The SS of the renewable resources are obtained by applying the inverse process described in (4)-(7).
Step 6: considering the objectives of the case study, the statistical indicators used to measure the accuracy of the models are computed for both hourly and monthly resolutions. Since the chronological simulation requires that the seasonality, the spatial-temporal correlations, and the energy contribution of the wind power plants are adequately represented, (3) is used to select the model to generate SS. This error measurement consists of a combination of RMSRE computed with a monthly resolution and RMSRE of the CCM. This error measurement is selected since the error metrics (1) and (3) are similar for all models and all statistical features presented in Section II. Furthermore, (3) is used as error measurement since high-order models tend to have similar RMSRE values when computed with an hourly resolution.

Fig. 9 RMSRE of each available model considered.

Fig. 10 RMSRE of CCM and monthly statistics for each available model considered.

Fig. 11 RMSRE values for each available model computed by solving (3) considering wind speed time series at STH1 location.
Now, we assess the accuracy of the VAR model 9, which is the selected model. The histogram of the SS generated with this model, its ACF, its PACF, its CCF, and its box plot are compared with the original wind speed time series. In this paper, only the results of the STH1 location are presented.

Fig. 12 Histograms of OTS and SS generated by VAR model 9.

Fig. 13 ACF and PACF of OTS and SS reconstructed through SS using VAR model 9 at STH1 location. (a) ACF. (b) PACF.

Fig. 14 Comparison of CCF of OTS and CCF reconstructed through SS using VAR model 9.
Finally,

Fig. 15 Box plots of OTS and SS using VAR model 9. (a) OTS. (b) SS.
This paper studies the challenge of selecting the appropriate model for generating SS for operation, planning, and expansion studies of power systems considering RESs. In this sense, a methodological proposal to select a suitable model to generate SS is proposed.
It has been demonstrated that if an adequate analysis is not carried out and the available models are applied without verifying their assumptions and application conditions, seasonal energy contributions or dependency structures may not be adequately characterized. According to the obtained results, it is found that the proposed procedure for model selection allows choosing the model that best achieves the objective of this paper, and can represent the statistical characteristics of the OTS.
Furthermore, the proposed approach is independent of the type of RES modeled, the nature of the time series (i.e., solar power, solar radiation, wind power, or wind speed), and the statistical features of the time series. Additionally, the results show that, if the proposed procedure is followed, it is possible to reduce the error in the analysis of power systems compared with a traditional approach. Future research shall be focused on scenario reduction strategies and the simplification of different steps of the proposed procedure. Besides, incorporating other types of modeling (e.g., machine learning, physics, and stochastic models, etc.) in the proposed framework is considered as future work.
References
IRENA. (2019). Renewable Energy Statistics 2019. [Online]. Available: https://www.irena.org/publications/2019/Jul/Renewable-energy-statistics-2019 [Baidu Scholar]
I. Andres, “CVAR constrained planning of renewable generation with consideration of system inertial response, reserve services and demand participation,” M.S. dissertation, Pontificia Universidad Católica de Chile, Santiago, Chile, 2014. [Baidu Scholar]
P. Chen, “Generation, stochastic modeling and analysis of power system with renewable generation,” M.S. dissertation, Aalborg University, Aalborg, Denmark, 2010. [Baidu Scholar]
B. Brown, R. Katz, and A. Murphy, “Time series models to simulate and forecast wind speed and wind power,” Journal of Climate and Applied Meteorology, vol. 23, no. 8, pp. 1184-1195, Aug. 1984. [Baidu Scholar]
A. Soroudi and T. Amraee, “Decision making under uncertainty in energy systems: state of the art,” Renewable and Sustainable Energy Reviews, vol. 28, pp. 376-384, Dec. 2013. [Baidu Scholar]
F. Ezbakhe and A. Pérez-Foguet, “Decision analysis for sustainable development: the case of renewable energy planning under uncertainty,” European Journal of Operational Research, vol. 292, pp. 1-13, Mar. 2020. [Baidu Scholar]
V. Onishi, C. Antunes, E. Fraga et al., “Stochastic optimization of trigeneration systems for decision-making under long-term uncertainty in energy demands and prices,” Energy, vol. 175, pp. 781-797, May 2019. [Baidu Scholar]
A. Papavasiliou and S. Oren, “Multiarea stochastic unit commitment for high wind penetration in a transmission constrained network,” Operation Research, vol. 61, no. 3, pp. 578-592, May-Jun. 2013. [Baidu Scholar]
A. Papavasiliou, S. Oren, and R. O’Neill, “Reserve requirements for wind power integration: a scenario-based stochastic programming framework,” IEEE Transactions on Power Systems, vol. 26, no. 4, pp. 2197-2206, Nov. 2011. [Baidu Scholar]
R. Billinton, H. Chen, and R. Ghajar, “Time-series models for reliability evaluation of power systems including wind energy,” Microelectronics Reliability, vol. 36, no. 9, pp. 1253-1261, Sept. 1996. [Baidu Scholar]
R. Billinton and W. Wangdee, “Reliability-based transmission reinforcement planning associated with large-scale wind farms,” IEEE Transactions on Power Systems, vol. 22, no. 1, pp. 34-41, Feb. 2007. [Baidu Scholar]
H. Haghi and S. Lotfifard, “Spatiotemporal modeling of wind generation for optimal energy storage sizing,” IEEE Transactions on Sustainable Energy, vol. 6, no. 1, pp. 113-121, Jan. 2015. [Baidu Scholar]
L. Kotzur, P. Markewitz, M. Robinius et al., “Time series aggregation for energy system design: modeling seasonal storage,” Applied Energy, vol. 213, pp. 123-135, Mar. 2018. [Baidu Scholar]
X. Serrano-Guerrero, G. Escrivá-Escrivá, S. Luna-Romero et al., “A time-series treatment method to obtain electrical consumption patterns for anomalies detection improvement in electrical consumption profiles,” Energies, vol. 13, no. 5, pp. 1-23, Feb. 2020. [Baidu Scholar]
A. Conejo, M. Carrión, and J. Morales, Decision Making Under Uncertainty in Electricity Markets. New York: Springer, 2010. [Baidu Scholar]
J. Ekström, M. Koivisto, I. Mellin et al., “Assessment of large scale wind power generation with new generation locations without measurement data,” Renewable Energy, vol. 83, pp. 362-374, Nov. 2015. [Baidu Scholar]
M. Koivisto, J. Ekstrom, J. Seppanen et al., “A statistical model for comparing future wind power scenarios with varying geographical distribution of installed generation capacity,” Wind Energy, vol. 19, no. 4, pp. 665-679, May 2016. [Baidu Scholar]
H. Nfaoui, H. Essiarab, and A. Sayigh, “A stochastic Markov chain model for simulating wind speed time series at Tangiers, Morocco,” Renewable Energy, vol. 29, no. 8, pp. 1407-1418, Jul. 2004. [Baidu Scholar]
G. Papaefthymiou and B. Klockl, “MCMC for wind power simulation,” IEEE Transactions on Energy Conversion, vol. 23, no. 1, pp. 234-240, Mar. 2008. [Baidu Scholar]
S. Karatepe and K. Corscadden, “Wind speed estimation: incorporating seasonal data using Markov chain models,” ISRN Renewable Energy, vol. 2013, pp. 1-9, Dec. 2013. [Baidu Scholar]
S. Kennedy and P. Rogers, “A probabilistic model for simulating long-term wind-power output,” Wind Engineering, vol. 27, no. 3, pp. 167-181, May 2003. [Baidu Scholar]
R. Aguiar and M. Collares-Pereira, “TAG: a time-dependent, autoregressive, Gaussian model for generating synthetic hourly radiation,” Solar Energy, vol. 49, no. 3, pp. 167-174, Sept. 1992. [Baidu Scholar]
D. Hill, D. McMillan, K. Bell et al., “Application of auto-regressive models to U.K. wind speed data for power system impact studies,” IEEE Transactions on Sustainable Energy, vol. 3, no. 1, pp. 134-141, Jan. 2012. [Baidu Scholar]
Y. Ge, Y. Nan, and L. Bai, “A hybrid prediction model for solar radiation based on long short-term memory, empirical mode decomposition, and solar profiles for energy harvesting wireless sensor networks,” Energies, vol. 12, no. 24, pp. 1-21, Dec. 2019. [Baidu Scholar]
A. Lojowska, D. Kurowicka, G. Papaefthymiou et al., “Advantages of ARMA-GARCH wind speed time series modeling,” in Proceedings of IEEE 11th International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), Singapore, Singapore, Jun. 2010, pp. 83-88. [Baidu Scholar]
V. Graham, K. Hollands, and T. Unny, “A time series model for Kt with application to global synthetic weather generation,” Solar Energy, vol. 40, no. 2, pp. 83-92, 1988. [Baidu Scholar]
B. Klockl and G. Papaefthymiou, “Multivariate time series models for studies on stochastic generators in power systems,” Power Systems Research, vol. 80, no. 3, pp. 265-276, Mar. 2010. [Baidu Scholar]
J. Morales, R. Mínguez, and A. Conejo, “A methodology to generate statistically dependent wind speed scenarios,” Applied Energy, vol. 87, no. 3, pp. 843-855, Mar. 2010. [Baidu Scholar]
D. Le, G. Gross and A. Berizzi., “Probabilistic modeling of multisite wind farm production for scenario-based applications,” IEEE Transactions on Sustainable Energy, vol. 6, no. 3, pp. 748-758, Jul. 2015. [Baidu Scholar]
J. Cryers and K.-S. Chan, Time Series Analysis with Applications in R. New York: Business Media LLC, 2008. [Baidu Scholar]
A. Papavasiliou and S. Oren, “Stochastic modeling of multi-area wind power production,” in Proceedings of 48th Hawaii International Conference on System Sciences, Hawaii, USA, Jan. 2015, pp. 1-10. [Baidu Scholar]
J. Dowds, P. Hines, T. Ryan et al., “A review of large-scale wind integration studies,” Renewable and Sustainable Energy Reviews, vol. 49, pp. 768-794, Sept. 2015. [Baidu Scholar]
G. Pritchard. (2016, Dec.). Code and data on wind power correlation. [Online]. Available: https://www.stat.auckland.ac.nz/~geoff/ [Baidu Scholar]
R. Tsay, Multivariate Time Series Analysis with R and Financial Applications, New Jersey: John Wiley & Sons, 2013. [Baidu Scholar]
W. Martinez, A. Martinez, and J. Solka, Exploratory Data Analysis with MATLAB, Abingdon: Taylor & Francis Group, 2017. [Baidu Scholar]
G. Box, G. Jenkins, G. Reinsel et al., Times Series Analysis Forecasting and Control, New Jersey: John Wiley & Sons, 2016. [Baidu Scholar]
K. Suomalainen, G. Pritchard, B. Sharp et al., “Correlation analysis on wind and hydro resources with electricity demand and prices in New Zealand,” Applied Energy, vol. 137, pp. 445-462, Jan. 2015. [Baidu Scholar]