1 Introduction

Mid-term hydro scheduling (MTHS) aims to manage hydropower generation as well as water release with maximum profit while satisfying various system constraints over a yearly horizon [1, 2]. This problem plays a crucial role in power system operation and a good solution can provide significant economic benefit. Due to uncertainty and probability distribution factors (e.g., natural inflow [3]), it is intractable to formulate and solve MTHS problems without approximation or simplification. Historically, the mean of annual natural inflow is formulated as input and the problem is described by deterministic models [4,5,6]. It is difficult to use these models to investigate the impact of natural inflow fluctuation on system operation and potential revenue.

To study the impact of uncertainty of natural flow on the MTHS, stochastic models are proposed in the literature. In [7], a stochastic optimization problem with Markov decision process models was developed allowing for the constraints of governmental regulations. In [8], a novel stochastic risk-constrained model was discussed. To consider the seasonal character and uncertainty of reservoir inflow, a composite optimization model was presented in [9]. A stochastic model with consideration of random water inflow and price errors was described by scenarios in [10]. And an energy equivalent reservoir model [11] was applied to the stochastic optimization of hydropower production decisions in the Brazilian power system. For cascaded hydropower systems, there are correlations among stations due to their dependence on a common river basin [12]. Typically, natural inflows of stations in the same river basin will increase or decrease synchronously under the influence of geographic factors. However, most reported literature focuses on the relationship between upstream and downstream reservoirs [13, 14], and considerations of correlation among natural inflows are limited. Therefore, it is necessary to consider the correlation of random natural inflows in MTHS problems.

The stochastic model can describe the mid-term hydro scheduling problem more accurately, while bringing great challenges of computation due to the fluctuation of variables and boundary conditions. Currently, stochastic dynamic programming (SDP) and the multi-scenario method are two major approaches to cope with this difficult issue.

SDP adopts discrete water inflow data and a transition probability matrix of adjacent periods to represent the randomness of water inflow [15,16,17], and maximizes the expected hydropower generation by finding the optimal solution for reservoir volume levels in each period. To reduce the computation time, stochastic dual dynamic programming (SDDP) [18,19,20] decomposes the multi-stage stochastic optimization problem into several one-stage sub-problems.

With the development of sampling and cutting technology, the multi-scenario method [21] is used to solve the MTHS problem. In this approach, uncertainty is formulated by random variables with known forecasting information, and many possible scenarios are generated. However, such a large number of scenarios will increase the computational burden. Scenario reduction methods, e.g., fast forward reduction [22], moment based reduction [23], and particle swarm algorithms [24], are employed to select representative discrete scenarios and bundle corresponding probabilities with the purpose of decreasing the computation load. However, most of these approaches can only obtain the mean value of the optimization objective, and cannot directly obtain other statistical information such as the standard deviation and probability distribution curves. One way to provide comprehensive statistical data is Monte Carlo simulation with a large enough sampling scale [21], but the computation time is too long. To overcome above mentioned deficiencies, it is imperative to develop an accurate and computationally efficient stochastic optimization algorithm to estimate the objective completely.

The main contributions of this paper include:

  1. 1)

    Latin hypercube sampling combined with Cholesky decomposition, enabling the detailed modeling of stochastic natural inflows with the consideration of correlations among cascaded hydro stations. Cholesky decomposition has been successfully applied to solve the correlation of wind farms [25]. In this paper, for the first time, it is introduced to deal with correlations in the MTHS problem.

  2. 2)

    Scenario bundling and sensitivity analysis to obtain rapidly and accurately the optimal mid-term scheduling decisions for a large number of sampling scenarios. The mean, standard deviation, and probability distribution of the objective are captured, which allows the probability analysis of generation profits.

The rest of this paper is structured as follows. Section 2 describes MTHS mathematical formulation for a cascaded hydro system. Section 3 presents the stochastic optimization approach. Numerical results from case studies are provided in Section 4, and conclusions are summarized in Section 5.

2 Mathematical model

2.1 Objective function

The proposed model aims to maximize the power generation profits of cascaded hydropower stations over a 1-year horizon. The objective is formulated as:

$$Y = \hbox{max} \sum\limits_{t = 1}^{T} {\sum\limits_{h = 1}^{{N_{H} }} {C_{t} w_{h,t} } }$$
(1)

where Y represents the total profit or generation for the cascaded hydro system; t and T are the index and number of months in the time horizon, respectively; Ct is the energy price in period t; h and NH are the index and number of hydropower units, respectively; \(w_{h,t}\) is the generation output of unit h in period t.

In the deregulated electricity market, the energy price during period t (Ct) is varying according to supply and demand [8]. However, this paper focuses on the vertically integrated utility in China, and the energy price is regulated and set by the government [7], hence, energy price parameters in the studied time horizon are set to be 1, and equivalently the objective of this problem is to maximize the total generation over the planning horizon.

2.2 Natural inflow probability model

Typically, natural inflow with correlation [26] is approximated as a normal distribution [27], so the probability density function for the natural inflow can be expressed as:

$$f(r_{{h_{p} ,t}} ) = \frac{1}{{\sqrt {2\uppi} \sigma_{{h_{p} ,t}} }}\exp \left[ { - \frac{{(r_{{h_{p} ,t}} - \mu_{{h_{p} ,t}} )^{2} }}{{2\sigma_{{h_{p} ,t}}^{2} }}} \right]$$
(2)

where hp is the index of reservoirs (or stations); \(r_{{h_{p} ,t}}\) is the natural inflow of reservoir \(h_{p}\) in period t, and \(f(r_{{h_{p} ,t}} )\) is the corresponding probability density function; \(\mu_{h_{p}, t}\) and \(\sigma_{h_{p}, t}\) are the mean and the standard deviation, respectively.

2.3 Constraints on reservoirs and hydro units

The following constraints provide a complete description of the problem of minimizing (1):

$$v_{{h_{p} ,t}} = v_{{h_{p} ,t - 1}} + \left[ {r_{{h_{p} ,t}} - \sum\limits_{{h \in h_{p} }} {q_{h,t} } - s_{{h_{p} ,t}} + \sum\limits_{{i \in U_{P} }} {\left( {\sum\limits_{{h_{1} }} {I_{{h_{1} ,i}} q_{{h_{1} ,t}} } + s_{i,t} } \right)} } \right]\Delta t$$
(3)
$$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{w}_{h,t} \le w_{h,t} \le \bar{w}_{h,t}$$
(4)
$$\bar{w}_{h,t} = D_{{{\text{d}},h,t}} \bar{P}_{h,t} \Delta t$$
(5)
$$\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{w}_{h,t} = \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{P}_{h,t} \Delta t$$
(6)
$$V_{{\hbox{min}},h_{p}} \le v_{{h_{p} ,t}} \le V_{{\hbox{max}},h_{p}}$$
(7)
$$Q_{{\hbox{min}},h} \le q_{h,t} \le Q_{{\hbox{max}},h}$$
(8)
$$Q_{{\hbox{min}},h_{p}}^{S} \le \sum\limits_{{h \in h_{p} }} {q_{h,t} + s_{{h_{p} ,t}} } \le Q_{{\hbox{max}},h_{p}}^{S}$$
(9)
$$\left\{ \begin{aligned} v_{{h_{p} ,0}} = V_{{{\text{ini}},h_{p} }}\\v_{{h_{p} ,T}} { = }V_{{{\text{term}},h_{p} }}\hfill \\ \end{aligned} \right.$$
(10)
$$\sum\limits_{r = 1}^{R} {V_{{h_{p} ,r - 1}} d_{{h,h_{p} ,t,r}} \le v_{{h_{p} ,t}} \le } \sum\limits_{r = 1}^{R} {V_{{h_{p} ,r}} d_{{h,h_{p} ,t,r}} }$$
(11)
$$\sum\limits_{r = 1}^{R} {d_{{h,h_{p} ,t,r}} } = 1$$
(12)
$$d_{{h,h_{p} ,t,r}} Q_{{\hbox{min}} ,h} \le q_{h,t,r} \le d_{{h,h_{p} ,t,r}} Q_{{\hbox{max}} ,h}$$
(13)
$$\sum\limits_{r = 1}^{R} {q_{h,t,r} } = q_{h,t}$$
(14)
$$w_{h,t} = \sum\limits_{r = 1}^{R} {K_{h.r} q_{h,t,r} \Delta t}$$
(15)

where \(v_{{h_{p} ,t}}\) is the water volume of reservoir \(h_{p}\) in period t; \(q_{h,t}\) is the water discharged from unit h in period t; \(s_{{h_{p} ,t}}\) is the spillage from reservoir \(h_{p}\) in period t; h1 is the index of upstream hydropower units, while i is the index of upstream reservoirs (or stations); \(I_{{h_{1} ,i}}\) is the correspondence relationship parameter of h1 and i, such as if unit h1 belongs to station i, the value is 1, otherwise the value is 0; UP denotes the set of upstream reservoirs for reservoir hp; \(q_{h_1,t}\) is the water discharged from upstream unit h1 in period t; \(s_{i,t}\) is the spillage from upstream reservoir i in period t; \(\bar{w}_{h,t}\), \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{w}_{h,t}\), \(\bar{P}_{h,t}\), \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{P}_{h,t}\) are the upper and lower bounds of generation and power for unit h in period t, respectively; \(D_{{{\text{d}},h,t}}\) is the load factor of hydropower unit h in period t; \(\Delta t\) is the length of each time period; \(V_{{\hbox{min}} ,h_{p} }\), \(V_{{\hbox{max}} ,h_{p} }\), \(q_{{\hbox{min}} ,h}\), \(q_{{\hbox{max}} ,h}\), \(Q_{{\hbox{min}} ,h_{p} }^{S}\), \(Q_{{\hbox{max} },h_{p} }^{S}\) are the minimum and maximum storage volumes, water discharge and total release, respectively; \(V_{{{\text{ini}},h_{p} }}\) and \(V_{{{\text{term}},h_{p} }}\) are the initial and expected final volume of reservoir \(h_{p}\); r and R are the index and number of operating zones, respectively; \(K_{h,r}\) is the energy generation efficiency and \(V_{{h_{p} ,r}}\) is the storage volume of unit h in operating zone r; and \(d_{{h,h_{p} ,t,r}}\) is the binary variable for unit h in station \(h_{p}\) at operating zone r during period t, such that if \(V_{{h_{p} ,r - 1}} \le v_{{h_{p} ,t}} \le V_{{h_{p} ,r}}\) the value is 1, otherwise the value is 0.

Constraint (3) describes the water balance of a reservoir between two consecutive time periods. Constraints (46) establish the upper and lower power generation limits. Constraints (79) impose the upper and lower bounds on storage, water discharge, and total release. Constraint (10) defines the initial volume and final volume, which are usually determined by long-term scheduling [27]. Constraints (1115) formulate the links of the energy production to the water volume and the water inflow [28], as shown in Fig. 1. According to (1113), only one of binary variables \(d_{{h,h_{p} ,t,r}}\) has the value of 1. If \(v_{{h_{p} ,t}}\) is being discharged in operating zone r, the binary variable \(d_{{h,h_{p} ,t,r}}\) is forced to be 1, otherwise it is 0. \(V_{{h_{p} ,0}}\) is assumed to be zero. Equation (15) indicates that if the water volume is in the operating zone r, the energy generation is the product of energy efficiency, water discharge in the operating zone r, and the time period.

Fig. 1
figure 1

Energy production curve

The above discussion, apart from probability expression of (2), establishes the mixed-integer linear programming (MILP) formulation for the MTHS problem. Since water inflows in (2) are modeled as random variables, the objective in (1) is stochastic. In order to estimate the profit and risk for the cascaded hydro system in the scheduling horizon, objectives for all of the possible natural inflow scenarios have to be calculated by the stochastic dispatching method.

3 Stochastic dispatching method

In order to solve MTHS problem with stochastic and correlated natural inflows, a new algorithm is developed. It consists of three parts, which are sampling, bundling and sensitivity analysis.

3.1 Sampling with correlated random variables

In the proposed algorithm, natural inflows are the input random variables with correlation. Since Latin hypercube sampling is more efficient and robust than simple random sampling [29, 30], and correlations between variables can be considered by using the correlation matrix during the pairing process [31], Latin hypercube sampling with the Cholesky decomposition method is used to generate samples according to their correlation. The detailed steps are as follows.

Step 1 :

Initialize t = 1, define the sample scale K, where K is a sufficiently large number of scenarios, such as 3000.

Step 2 :

Use Latin hypercube sampling to obtain the independent standardized normal distribution sampling matrix \(\varvec{R}_{0}^{t}\) with dimension NH×K.

Step 3 :

Based on analysis of historical statistics, calculate the correlation matrix of natural inflows \(\varvec{\rho}_{H,t}\). The matrix is symmetric and positive definite as:

$$\varvec{\rho}_{H,t} { = }\left[ {\begin{array}{*{20}l} 1 \hfill & {\rho_{1.2}^{t} } \hfill & \cdots \hfill & {\rho_{{1,N_{H} }}^{t} } \hfill \\ {\rho_{2,1}^{t} } \hfill & 1 \hfill & \cdots \hfill & {\rho_{{2,N_{H} }}^{t} } \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill \\ {\rho_{{N_{H} ,1}}^{t} } \hfill & \cdots \hfill & {\rho_{{N_{H} ,N_{H} - 1}}^{t} } \hfill & 1 \hfill \\ \end{array} } \right]$$
(16)
Step 4 :

Decompose the correlation matrix using the Cholesky method:

$$\varvec{\rho}_{H,t} { = }\varvec{HH}^{\text{T}}$$
(17)

where \(\varvec{H}\) is a lower triangular matrix.

Step 5 :

Construct a new matrix \(\varvec{R}_{u}^{t}\) including standard normal distribution variables with correlation:

$$\varvec{R}_{u}^{t} = \varvec{HR}_{0}^{t}$$
(18)
Step 6 :

Update the matrix \(\varvec{R}_{u}^{t}\) to \(\varvec{R}_{\text{p}}^{t}\) with variables following the normal distribution [μhp,t, σhp,t], and save \(\varvec{R}_{\text{p}}^{t}\) to the matrix \(\varvec{R}_{\text{s}}\) from rows NH(t − 1) + 1 to NHt.

Step 7 :

If t < T, set t = t + 1, and go to Step 2, otherwise stop and output the natural inflow sampling matrix \(\varvec{R}_{\text{s}}\) with dimension (NHT)K.

3.2 Natural inflow scenario bundling

Various scenarios have been produced by the sampling matrix \(\varvec{R}_{\text{s}}\). Each column in the matrix represents a natural inflow input scenario, meaning that \(\varvec{R}_{\text{s}} = [R_{1} ,R_{2} , \ldots ,R_{k} , \ldots ,R_{K} ]\). Due to the large number of scenarios, if the objectives for all of the scenarios are optimized directly, it is very time consuming. In order to reduce calculation time, a vector quantization algorithm is used to bundle scenarios [32]. Based on the distance between the data vectors, vector quantization assign each vector to a cluster through an optimization process. The required steps can be summarized as follows:

Step 1 :

Determine the bundling distance parameter \(\Delta d\) based on the actual requirement.

Step 2 :

Initiate the index and number of clusters as j = J = 1, set the cluster core Qj to be the first scenario \(R_{1}\), let the number of scenarios contained in cluster Nj= 0, and set the scenario index k = 1.

Step 3 :

For scenario k, find out the minimum Euclidean distance dk from the scenario to all cluster cores Qj (\(j \in \{ 1,2, \ldots ,J\}\)), which may be expressed as:

$$d_{k} = \mathop {\hbox{min} }\limits_{{j \in \{ 1,2, \ldots ,J\} }} d(R_{k} ,Q_{j} )$$
(19)
Step 4 :

If dk is less than or equal to \(\Delta d\), bundle scenario k to cluster j (where \(j = \arg \mathop {\hbox{min} }\limits_{{j \in \{ 1,2, \ldots ,J\} }} d(R_{k} ,Q_{j} )\)), increase \(N_{j} = N_{j} + 1\), update the cluster core Qj as the average of all the included scenarios and go to Step 6; otherwise, go to Step 5.

Step 5 :

If dk is greater than \(\Delta d\), create a new cluster, set J = J + 1, assign the scenario k to cluster J, save the scenario Rk as the core of the new cluster QJ, and set NJ= 1.

Step 6 :

If k is less than K, update k = k + 1 and go to Step 3; otherwise, finish the bundling because all of the scenarios are distributed to clusters, and Qj (\(j \in \{ 1,2, \ldots ,J\}\)) are the corresponding clustering cores.

3.3 Fast stochastic scheduling based on sensitivity analysis

In the framework of stochastic optimization, the objective of all the scenarios should be solved. Firstly, we can take one of the scenario clustering cores Qj as the natural inflow input, hence the MTHS problem becomes single-scenario hydropower scheduling.

For brevity’s sake, the single-scenario hydropower scheduling problem can be formulated as follows:

$$\hbox{max} \;Y = f(x)$$
(20)
$${\text{s}} . {\text{t}} .{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} AR + Bx = 0$$
(21)
$${\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} g(x,y) \le 0$$
(22)
$${\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} h(x,y) = 0$$
(23)
$$\left\{ \begin{aligned} {\kern 1pt} {\kern 1pt} x \ge 0 \hfill \\ y \in \{ 0,1\} \hfill \\ \end{aligned} \right.$$
(24)

where \(R\) represents the natural inflows in the scenario of the clustering core Qj; x is a set of positive variables; y is a set of binary variables; A and B are coefficients. Equation (21) is the water balance constraint as modeled in (3), and g(x, y) and h(x, y) are the other inequality and equality constraints. The above formulation can be solved by MILP using, for example, the CPLEX solver, and the acquired objective function is denoted as \(Y_{Q}\).

Thereafter, to acquire the objective functions for all of the scenarios in cluster j, sensitivity analysis is introduced. Figure 2 shows that all the scenarios in the same cluster are in close proximity to the clustering center. Due to sensitivity analysis theory [33], if the sensitivity of the objective function to the input random variable is \(S_{Y}\), the objective function \(Y_{R}\) of any scenario Rk in the cluster can be restored by:

$$Y_{R} = Y_{Q} + S_{Y} (R_{k} - Q_{j} )$$
(25)
Fig. 2
figure 2

Diagrammatic sketch of scenario cluster

In (25), the key task is to address the sensitivity. When there are binary variables in the proposed formulation, it is very difficult to acquire the sensitivity. In order to overcome this difficulty, the optimal binary variable \(\tilde{y}\) is used in the formula, and the scheduling problem is transformed into a linear programming problem. Some auxiliary variables \(\lambda , \, \mu , { }\gamma\) are introduced to convert (2024) into an unconstrained optimization problem as follows:

$$\hbox{max}\,L = f(x) - \lambda (AR + Bx) - \mu (g(x,\tilde{y})) - \gamma (h(x,\tilde{y}))$$
(26)

When the optimal solution is found, the gradient (partial derivative) of L for all variables will be equal to 0. So:

$$\left. {\nabla_{R} L} \right|_{\begin{subarray}{l} x = \tilde{x} \\ y = \tilde{y} \end{subarray} } = \left. {\nabla_{R} f(x)} \right|_{\begin{subarray}{l} x = \tilde{x} \\ y = \tilde{y} \end{subarray} } - \lambda A = 0$$
(27)

The sensitivity can be represented by the gradient of f(x) with respect to R, so according to (27) the sensitivity of the primitive objective function can be formulated as:

$$S_{Y} = \left. {\nabla_{R} f(x)} \right|_{\begin{subarray}{l} x = \tilde{x} \\ y = \tilde{y} \end{subarray} } = \lambda A$$
(28)

It can be seen that the physical interpretation of λ is the dual solution of the constraints (21). According to the dual principle, the sensitivity can be reached, and on the basis of (25) the objective functions for the scenarios included in cluster j may be obtained with high calculation speed.

Finally, if all of the clusters have been calculated by this method, the objectives for the original sampling scenarios (K scenarios) are captured, and then the expectation and probability distribution of the scheduling objective are able to be analyzed.

Consequently, by considering random and correlated natural inflows, the proposed approach can provide an operator a probability analysis of generation profits for cascaded hydro systems.

4 Case studies

The models and methods presented are applied to two actual cases in China, a two-station system and a ten-station system. The systems have been implemented on the GAMS and MATLAB platform. The tests are executed on an IBM dual processors computer operating at 2.5 GHz with 4 GB RAM under 32-bit Windows 7. The time horizon of the mid-term scheduling is 1 year, and the time step is 1 month.

4.1 Two-station system

The parameters of the hydropower stations are given in Table 1. The mean values of water inflow forecast information are listed in Table 2. The water inflow correlations of the two stations over the 12-month horizon are all set to be 0.6. In this system, there are 24 water inflow input random variables (multiply 2 stations by 12 months), and the sample scale K is 3000 scenarios.

Table 1 Characteristics of hydro stations in two-station system
Table 2 Predicted values of natural inflow in two-station system

In order to evaluate the performance of the proposed method, a series of studies of Latin hypercube sampling with Cholesky decomposition (LC), LC using scenario bundling (LC-SB) and LC combined with scenario bundling and sensitivity analysis (LC-SB-SA) are carried out on the test system. Since the sample scale of LC is 3000, the sampling generates 3000 natural inflow scenarios. In LC-SB, the bundling distance parameter is set to be 30 and the number of scenarios is reduced to 46. The mid-term cascaded hydropower scheduling problem is optimized by the three methods in turn. It is assumed that the results of LC are very accurate and can be treated as the reference case. Statistical analyses of the objective (Y) by the three methods are shown in Table 3. With respect to the result of LC with 3000 scenarios, the errors of LC-SB-SA are much smaller than that of LC-SB. Due to the reduction in scenarios, the solution of LC-SB is less accurate, and it gives a poorer estimate of the standard deviation than of the mean value.

Table 3 Comparisons of optimization results for total generation output with different methods in two-station system

In order to view the accuracy of all the optimization results, the optimized total generation output for 3000 scenarios by LC-SB-SA are compared with LC, and the relative errors are illustrated in Fig. 3. The errors are mainly caused by nonlinear constraints in the model. For scenarios further from the cluster center, the probability of a large error is higher. In Fig. 3, the maximum error is 4.81% and the average error is only 0.32%. It can be concluded LC-SB-SA has a high precision in solving the MTHS problem.

Fig. 3
figure 3

Relative errors for all the sampled scenarios in two-station system

Figure 4a, b compare the probability distribution of total generation obtained by LC-SB and LC-SB-SA against LC. The simulation results of LC-SB have a larger deviation. In contrast, the probability density curve of LC-SB-SA can better match that of LC, and the cumulative distribution curves of the two approaches are almost identical. This means that the proposed LC-SB-SA can accurately indicate the probability distribution of objectives in the optimal scheduling problem including uncertain factors.

Fig. 4
figure 4

Comparison of probability distribution curves

The cumulative probability distribution curves show the likelihood of achieving different levels of total generation. If only considering the predicted mean value of natural inflow, the objective function is 1220800.29 MWh, and the probability of achieving less than or equal to this target is 51.27%. This indicates the high risk of focusing only on the scenario with predicted mean value. Thus, being able to obtain the cumulative distribution curve quickly can help operators to evaluate the profit and risks of the cascaded hydropower system.

Table 4 summarizes the computing time for the three mentioned methods. The total computing time is composed of four parts, which are sampling, bundling, optimizing, and restoring. As shown in Table 4, the most time is consumed on the optimization calculation for each scenario, and the sampling, bundling, and other processes do not increase the time greatly. Since the size of scenario clusters for LC-SB is reduced, LC-SB consumes much less time than LC. Compared to the LC-SB, LC-SB-SA obtains more accurate dispatching results with a minor increment on computation time.

Table 4 Comparison of computing time with different methods for the two-station system

The sensitivity of probabilistic results and computing time for LC-SB-SA to different bundling distances is presented in Table 5. As the bundling distance increases, the size of scenario clusters tends to decrease, and the total computing time is also reduced. However, compared with LC, the relative errors of mean and standard deviation of the objective (εY) demonstrate an upward tendency. It can be found that, when 30 is selected as the bundling distance parameter, both of the errors and the computing time are moderate, which means that a reasonable bundling distance parameter can balance accuracy and computation time.

Table 5 Analysis of distance parameters for LC-SB-SA

To study the impact of inflow correlation on the simulation results, two-station test system has been optimized with different correlation coefficients. The correlation parameters are set to be 0.1, 0.3, 0.5, 0.7, 0.9, respectively, and corresponding probability distributions of the objective are shown in Fig. 5. As the water inflow correlation coefficient increases from 0.1 to 0.9, the range of variation of the total generation presents an increasing trend. The reason is that with the rising inflow correlation, the synchrony of the adjacent hydropower plants is enhanced, the probability of the generators being in the state of low output or high output at the same time is increased, and the total generation is more volatile.

Fig. 5
figure 5

Probability distributions of objective function with different correlation coefficients

4.2 Ten-station system

A ten-station system with 10 reservoirs and 34 hydropower units is used to test the proposed approach. The schematic layout and parameters of hydropower stations in [34] are adopted. The predicted mean values of natural inflow over a 12-month horizon are presented in Table 6. In the rainy season (June to October), standard deviations are assumed to be 6.67% of the mean, while in the dry season (November to May) standard deviations are assumed to be 3.33% of the mean. The water inflow correlation parameters among stations are shown in Table 7. The scenario bundling distance parameter is set to be 30 as for the two-station system. There are 120 natural inflow values (multiply 10 stations by 12 months) which are input random variables for the MTHS problem.

Table 6 Predicted mean values of natural inflow in ten-station system test
Table 7 Water inflow correlation parameters among stations

To check the effectiveness of the LC-SB-SA on large-scale stochastic scheduling, the LC with 3000 scenarios is selected as the baseline for comparison because of its high accuracy. The results of the two methods are given in Table 8. The relative error of mean, standard deviation, maximum, and minimum are 0.023%, 0.547%, 0.071%, 0.018%, respectively. Figure 6 shows probability distribution curves of total generation output corresponding to the dispatching schemes obtained by different methods, and the curves are almost coincident. Hence the LC-SB-SA method can yield the results with similar accuracy as LC method, and maintain its good performance when number random variables is increased to 120.

Table 8 Comparison of optimization results using different methods in ten-station system
Fig. 6
figure 6

Comparison of probability distribution of objective Y

The comparison of the computing time for the two approaches is studied in Table 9. The total computing time of LC-SB-SA is only 1.3% of that of LC. This is because LC-SB-SA uses scenario bundling to reduce the number of scenarios from the original 3000 scenarios to 37. Therefore, only a few scenarios need to be optimized. The optimization results of all the original 3000 scenarios can be restored through sensitivity analysis. The restoration step uses simple linear calculations and consequently takes a small amount of time. For a large-scale hydropower system, the optimizing step is more time consuming, so reducing the number of scenarios will lighten the computation burden significantly, with the restoring step to ensure precision of calcuations for each scenario. As a result, the proposed LC-SB-SA has clear advantages in providing an accurate solution with a small amount of calculation.

Table 9 Comparisons of computing time with different methods in ten-station system

Figure 7 presents the optimal total reservoir storage levels in each month, where the highlighted line is the result under the mean value scenario of natural inflow, and the other lines are solutions for all the scenario cluster centers. It illustrates that the proposed method can provide many candidate proposals corresponding to various operating circumstances.

Fig. 7
figure 7

Optimal reservoir volumes for all the scenario cluster centers

The probability density curves and cumulative probability distribution curves of the objective function are compared in Fig. 8 with and without consideration of water inflow correlation. It can be observed that water inflow correlation has little influence on the mean value of the generation. However, the correlation increases the range of variation of the total generation output, resulting in a higher probability of small or large total generation. The correlation of natural inflow correlation should therefore be considered in the assessment of total generation.

Fig. 8
figure 8

Comparison showing the impact of natural inflow correlation

5 Conclusion

This paper proposes a computationally efficient stochastic optimization algorithm to solve the MTHS problem for cascaded hydropower systems. Latin hypercube sampling and the Cholesky decomposition method are employed to formulate the random natural inflows accounting for correlation. Scenario bundling and sensitivity analysis are used to obtain the objective function solutions of all scenarios, and to accelerate the calculation. Two-station and ten-station cascaded hydropower systems are used to investigate the proposed method. Conclusions are summarized as follows:

  1. 1)

    The proposed Latin hypercube sampling with Cholesky decomposition combined with SB and LC-SB-SA to solve the stochastic MTHS problem has the merits both in calculation speed and precision compared with LC alone and LC-SB. Thus LC-SB-SA is able to yield the probability distribution of the objective function solution rapidly and reliably, and operators can make decisions in a better way to reduce risks.

  2. 2)

    The appropriate bundling distance parameter can be selected according to the requirements of actual operating conditions. Generally, if the parameter is smaller, it is possible to achieve higher solution accuracy with the expense of longer computing time.

  3. 3)

    Natural inflow correlation has a significant impact on the estimated generation. Scheduling without consideration of natural inflow correlation significantly misjudges the fluctuation of total energy production. Consequently, the solutions of MTHS problem become more realistic by adding such correlation to simulations.

  4. 4)

    While coping with many random and correlated variables as in a ten-station system, this approach can still provide an accurate probability curve of total power generation for assessment of mid-term hydro scheduling.

Further work may include pumped hydro storage facilities to adapt to grids with high penetration of variable renewable generation.