1 Introduction

Real-time transient stability assessment (TSA) is one of the most important means for the power systems to prevent cascading failures, and avoid power system instability and large-area blackout. At the same time, it is also a necessary tool for power system security and stability analysis. However, until now TSA has not been widely used in practice, because each traditional analysis method has its own limitations and defects. The main research progress in TSA includes the following aspects. One aspect is the study of improving simulation speed with high-performance computing method such as parallel computing. The second aspect is the study of direct method which can determine the stability of a system directly according to the state of the system without numerical integration. The other aspects are the improving direct method research to adapt to large-scale grid, the combining direct method research with simulation method to form mixed method and method research of directly determining the system stability using response of post-fault system. It is a whole new research method on the TSA with data mining and big data technologies based on huge amount of data and intelligent algorithms. For the first four methods, a lot of research works are carried out, while there are still some bottle necks that cannot break. A time parallel algorithm is proposed [1], and the bottle neck is the contradiction between parallelism and convergence. There is literature that identifies the instability by the concavity and convexity of the plane trajectory of the post-fault equivalent system, while the recognition speed needs to be further improved [2].

In these years, the value of big data technology has been widely recognized all around the world. It is both opportunities and challenges to analyse big data and explore useful information or knowledge for various application [3]. In the era of big data, big data technology is used to mine useful information in massive data and to make correct decisions [4]. TSA based on data mining usually consists of two main phases, one is offline training and another is online application. Firstly the data mining method is used to train a model based on large amount of simulation data and historical data. Then the mapping from various physical variables to power systems’ steady states is find. At last the power systems’ stability rules are extracted. In online application phase, the pre-prepared stability assessment rules and the current state of the system are used to obtain a stability assessment result, while the stability assessment rules do not dependent on the system models, but rely on offline training. Early in 1989, [5] proposes an artificial neural network (ANN) based approach to predict the critical clearing time (CTT). It is one of the earliest attempts in this field, and since then relevant studies are underway through the world. The studies are categorized into two types, which are pre-fault type and post-fault type, and this paper belongs to the latter one. Generally speaking, the most commonly used methods are ANN [6,7,8], decision tree (DT) [9, 10] and support vector machine (SVM) [11, 12]. There are also researches about comparing these new methods, and SVM shows excellent performance [12, 13]. Apart from the traditional SVM, multi-SVM has been applied in the field of TSA [14]. SVM is also used to handle emergency control problem of power system [15] and forecast the wind power generation [16]. Since SVM has been widely used and has many advantages, this paper chooses to apply SVM algorithm in TSA study.

This paper addresses on problems of the data mining method with SVM. There are some deficiencies in data mining method. Firstly, the data mining method based on correlation does not focus on the specific physical model, while the traditional power system stability analysis based on the casual relationship has a mature physical model and theoretical method. Traditional methods have questioned the reliability of data mining method in TSA. Secondly, the classification rules established by data mining method cannot avoid false alarms and false dismissals. For the first problem, this paper integrates stability assessment rules with traditional theory of stability analysis of power systems, and establishes the stability assessment rules under the stability domain concept. To deal with the problems of false alarms and false dismissals, this paper proposes a new strategy with grey region (GR), and two novel SVMs, aggressive support vector machine (ASVM) and conservative support vector machine (CSVM). So that the data mining based on TSA can adapt to the special requirements of power system. In this strategy, most of the cases can be classified into stability or instability. For those small amounts of cases, on the one hand, they can be determined as instability to ensure the accuracy of classification; on the other hand, the probability of instability can be obtained.

2 Theoretical basis

In this section, theoretical basis is prepared for the following studies, which include basic concept of stability domain, a brief introduction of SVM algorithm, and the relationship between them.

2.1 Stability domain theory

The power system belongs to the nonlinear autonomous system, as shown in (1).

$${\dot{\varvec{x}}} = {\varvec{f}}\left( {\varvec{x}} \right) \qquad{\varvec{x}} \in {\mathbf{R}}^{n} , \, {\varvec{f}} \in {\mathbf{R}}^{n}$$
(1)

where \({\varvec{x}} = \left[ {x_{1} ,x_{2} , \ldots ,x_{n} } \right]^{\text{T}}\) is n-dimensional state variable; f(x) is a smooth function. If \({\varvec{x}}_{{\varvec{e}}} \in {\mathbf{R}}^{n}\) satisfies the condition \(\varvec{f}\left( {{\varvec{x}}_{{\varvec{e}}} } \right) = {\mathbf{0}}\), \({\varvec{x}}_{{\varvec{e}}}\) is the equilibrium point of the system. The corresponding linear system at the equilibrium point of the system is presented in (2).

$$\left\{ \begin{aligned} &\Delta {\dot{\varvec{x}}} = {\varvec{A}}\Delta {\varvec{x}} \hfill \\ &{\varvec{A}} = \left. {\frac{{\partial {\varvec{f}}\left( {\varvec{x}} \right)}}{{\partial {\varvec{x}}}}} \right|_{{{\varvec{x}}_{{\varvec{e}}} }} \hfill \\ \end{aligned} \right. \,$$
(2)

If \({\varvec{A}}\) has no zero-real eigenvalue, \({\varvec{x}}_{{\varvec{e}}}\) is a hyperbolic equilibrium point. The system structure near hyperbolic equilibrium point is stable. In the power system transient stability analysis, generally assume that the equilibrium point is hyperbolic. According to Hartman-Grobman theory [17], the system shown in (1) and the linear system shown in (2) are equivalent in the neighborhood of the hyperbolic equilibrium point \({\varvec{x}}_{{\varvec{e}}}\). Therefore, the stability analysis of (1) can be transformed into the stability of (2) near \({\varvec{x}}_{{\varvec{e}}}\). The stable manifolds \(W_{s} \left( {{\varvec{x}}_{{\varvec{e}}} } \right)\) and unstable manifolds \(W_{u} \left( {{\varvec{x}}_{{\varvec{e}}} } \right)\) of the hyperbolic equilibrium point \({\varvec{x}}_{{\varvec{e}}}\) are defined as shown in (3).

$$\left\{ \begin{aligned} W_{s} \left( {{\varvec{x}}_{{\varvec{e}}} } \right) = \left\{ {{\varvec{x}} \in {\mathbf{R}}^{n} :\mathop {\lim }\limits_{t \to + \infty }\varvec{\phi}\left( {{\varvec{x}},t} \right) = {\varvec{x}}_{{\varvec{e}}} } \right\} \hfill \\ W_{u} \left( {{\varvec{x}}_{{\varvec{e}}} } \right) = \left\{ {{\varvec{x}} \in {\mathbf{R}}^{n} :\mathop {\lim }\limits_{t \to - \infty }\varvec{\phi}\left( {{\varvec{x}},t} \right) = {\varvec{x}}_{{\varvec{e}}} } \right\} \hfill \\ \end{aligned} \right.$$
(3)

where \(\varvec{\phi}\left( {{\varvec{x}},t} \right)\) represents the solution of (1) through \({\varvec{x}}\) when \(t = 0\). The trajectory of the point in the steady manifold tends to the equilibrium point \({\varvec{x}}_{{\varvec{e}}}\) when \(t\) tends to be infinite. While the trajectory of the point in the unsteady manifold tends to the equilibrium point \({\varvec{x}}_{{\varvec{e}}}\) when \(t\) tends to be negative. There is a region around the stable equilibrium point \({\varvec{x}}_{s}\), and the trajectories passing through the points in the region are finally attracted to \({\varvec{x}}_{s}\). This region is defined as the stability domain of the equilibrium point, as shown in (4).

$$A\left( {{\varvec{x}}_{s} } \right) = \left\{ {{\varvec{x}} \in {\mathbf{R}}^{n} :\mathop {\lim }\limits_{t \to + \infty }\varvec{\phi}\left( {{\varvec{x}},t} \right) = {\varvec{x}}_{s} } \right\}$$
(4)

The stability domain of the stable equilibrium point is its stable manifold, and is an open constant connected set. The boundary of \(A\left( {{\varvec{x}}_{s} } \right)\) is called the stable boundary of \({\varvec{x}}_{s}\), denoted by \(\partial {\varvec{x}}_{s}\). However, the greatest difficulty in assessing the stability domain is to solve the stability of the boundary, and for high-dimensional systems, the boundary of the actual system is difficult to obtain.

2.2 SVM algorithm

TSA is a classification problem, and the stable boundary is actually an interface between the stable and unstable regions. SVM is a kind of data mining methods which can solve the classification problem by fitting the interface of two regions. The principle of SVM is to map the training samples into high-dimensional space by using the mapping function, and find a linear hyperplane in high-dimensional space, so that the distance from the hyperplane to the samples of two sides is the farthest [18], as shown in (5).

$$\left\{ \begin{aligned} &\mathop {\hbox{min} }\limits_{{{\varvec{w}},b,\zeta }} \left( {\frac{1}{2}{\varvec{w}}^{\text{T}} {\varvec{w}} + C\sum\limits_{i = 1}^{{n_{t} }} {\zeta_{i} } } \right) \hfill \\ &\text{s.t. } y_{i} \left( {{\varvec{w}}^{\text{T}} \varvec{\varphi }\left( {{\varvec{X}}_{i} } \right) + b} \right) \ge 1 - \zeta_{i} \hfill \\& \;\; \quad \zeta_{i} \ge 0 \qquad i = 1,2, \ldots ,n_{t} \hfill \\ \end{aligned} \right.$$
(5)

where nt is the number of training sample; \(\left( {{\varvec{X}}_{i} ,y_{i} } \right)\) is the ith training sample, \({\varvec{X}}_{i}\) stands for support vectors that are the decisive samples around the hyperplane, \(y_{i}\) is the output related to support vectors; \(\varvec{\varphi }\left( \cdot \right)\) is the mapping function from low-dimensional space to high-dimensional space; \(\zeta_{i}\) is the slack variable; C is the penalty factor for the slack variable; b is threshold value; w is the weight vector of the hyperplane. The optimization variables in this optimization problem are w, b and \(\zeta\). b is the coefficient trained in SVM. In theory, the value of b is uncertain. When SVM is optimal, the b value can be obtained by taking samples from any standard SVM.

When solving the problem of SVM, usually transform (5) into its dual problem and transform the calculation of the mapping function into the calculation of the kernel function. To use the trained SVM classifier, calculate \(f_{SVM}\) by (6) and see whether it is positive or not.

$$f_{SVM} \left( {\varvec{X}} \right) = \sum\limits_{i = 1}^{{n_{t} }} {\alpha_{i} y_{i} K\left( {{\varvec{X}}_{i} ,{\varvec{X}}} \right) + b}$$
(6)

where \(\alpha_{i}\) is the coefficient corresponding to support vectors, which is also the non-zero Lagrangian multipliers of the optimization problem above. Positive and negative \(f_{SVM}\) respectively indicate two classes. \(K\left( {{\varvec{X}}_{i} ,{\varvec{X}}} \right)\) is kernel function, which is the inner product of mapping function, as presented in (7).

$$K\left( {{\varvec{X}}_{i} ,{\varvec{X}}} \right) = {\varvec{\varphi }}\left( {{\varvec{X}}_{i} } \right)^{\text{T}} {\varvec{\varphi }}\left( {\varvec{X}} \right)$$
(7)

Usually the radial basis kernel function, which is shown in (8), is chosen as the kernel function because of its good performance, and it measures the similarity between two vectors, and the parameter \(\gamma\) adjusts the distance between them.

$$K\left( {{\varvec{X}}_{i} ,{\varvec{X}}} \right) = \text{e}^{{ - \gamma \left\| {{\varvec{X}}_{i} - {\varvec{X}}} \right\|^{2} }}$$
(8)

2.3 Theoretical basis of fitting stability boundary with SVM

According to the relevant theory of the stability of the power system, if the stability domain exits, the stable and unstable regions are nonlinearly separable, which means the stable boundary is linearly separable in an infinite dimensional space. This property provides a theoretical basis for the SVM to fit the stable boundary. The SVM realizes the transformation of low-dimensional space to high-dimensional space through mapping function, so that the two regions of nonlinear separable in low-dimensional space are linearly separable in high-dimensional space. Therefore, find the appropriate mapping function and get enough samples on both sides of stable boundary and then a stability assessment rule equivalent to the stable boundary can be established by training SVM. On the one hand, the stability domain theory provides theoretical support for the application of data mining methods and clarifies the physical meaning of the stability assessment rules; on the other hand, the data mining methods provide a new method for obtaining the stable boundary of the stable domain.

It is the most ideal method for real-time TSA to directly apply the stable boundary obtained by fitting as a TSA rule. However, the actual system faces the following problems:

  1. 1)

    In the real-time TSA, the stable equilibrium point refers to the one of the post-fault system. The different faults in the same operating mode will result in different post-fault systems, and these systems have different stable equilibrium points and also the stable boundaries are different. It is unrealistic to use a stable boundary as a stability assessment rule to fit all possible stable boundaries in the offline training phase.

  2. 2)

    Real-time transient assessment requires measurement of input characteristics immediately after a fault, and then the results are obtained by substituting the measured values into the stabilization rules. Nevertheless, the states of an actual system are not easy to get through measurement.

To solve the above two problems, this study does two approximations:

  1. 1)

    When the scale of system is large enough, it is considered that the stable domain corresponding to the post-fault equilibrium point obtained by different disturbances before the same fault equilibrium is approximately the same stable domain.

  2. 2)

    In this study, only the state quantities which are easy to measure and the non-state quantities such as active and reactive power of the line and the phase angle of the bus voltage are taken as alternatives of the input features.

The most important approximation is built on the large scale of system, which means the bigger system is, the better result is. But it does not mean the algorithm can’t be used to a system not so big. From the case study in this paper, application in the provincial grid of China is well satisfied.

3 Real-time TSA strategy using ASVM and CSVM

Due to the factors, such as model error, stability domain approximation, and input space dimensionality constraints, the TSA rule cannot be exactly equivalent to the stability boundary of the system stability domain. Therefore, the stable region and unstable region cannot be completely separated in the regular input space, and there exits the GR. In this section, the GR is defined to ensure that the assessment results outside the GR are accurate and credible, rather than find the boundaries of the two regions.

3.1 GR

GR is the region where the output state is not unique in the input space of the stability assessment rule. Suppose there is a point X in the input space, and there exits two different post-fault systems which can be both represented by X in the input space. The stability of the two systems are different, and then X is in GR. The following main causes of GR are analysed:

  1. 1)

    The mapping function corresponding to the kernel function cannot map the input space exactly to the high dimensional space of which the stability boundary is linearly separable, so the mapping relationship error is one of the causes.

  2. 2)

    Because of the two approximations in Section 2.3, it is impossible to find an interface to accurately separate the two types of samples. So, the accuracy rate of stability assessment rules cannot reach 100%.

When kernel function, pre-fault mode of operation and input features of stability rules are determined, the shape and size of GR will be determined. In this case, misclassification cannot be avoided if the accuracy is still focused on. Therefore, a new idea for stability assessment is needed.

3.2 ASVM and CSVM

From the perspective of sample, GR is a mixed area of stable and unstable samples, as shown in Fig. 1.

Fig. 1
figure 1

Stability classification diagram

In traditional SVM, the samples that fall above L1 are classified as unstable, while below are stable. It is obvious that misclassification exits. In this section, a new idea based on GR is proposed: region above L2 is determined as unstable while below L3 is determined stable and the stability assessment of the samples fall into these two regions is ensured correct. For the cases in GR, calculate the instability probability.

3.2.1 CSVM

CSVMs do not allow unstable samples to be misclassified, thus defining the boundary of GR near the stable region, as L3 in Fig. 1. Adjusting the slack variable in the constraint condition can limit the type of misclassified sample. CSVM can be obtained by removing the slack variables in the constraints corresponding to the unstable samples, as shown in (9).

$$\left\{ \begin{aligned} &\mathop {\hbox{min} }\limits_{{{\varvec{w}},b,\zeta }} \left( {\frac{1}{2}{\varvec{w}}^{\text{T}} {\varvec{w}} + C\sum\limits_{i = 1}^{{n_{t} }} {\zeta_{i} } } \right) \hfill \\ &\text{s.t. } \, y_{i}^{st} \left( {{\varvec{w}}^{\text{T}} {\varvec{\varphi }}\left( {{\varvec{X}}_{i}^{st} } \right) + b} \right) \ge 1 - \zeta_{i} \quad i = 1,2, \ldots ,m \hfill \\ \,&\quad \,\,\,\, y_{i}^{un} \left( {{\varvec{w}}^{\text{T}} {\varvec{\varphi }}\left( {{\varvec{X}}_{i}^{un} } \right) + b} \right) \ge 1 \qquad\;\; i = 1,2, \ldots ,k \hfill \\ \end{aligned} \right. \,$$
(9)

where superscript st denotes stable samples; superscript un denotes unstable samples; m is the number of stable samples; k is the number of unstable samples. The hyperplane trained by CSVM corresponds to L3 in Fig. 1, which makes unstable samples strictly limited to one side of the boundary and ensures that the other side of the sample is stable.

Referred to the solution of SVM, convert (9) into a dual problem and transform the maximization problem into the minimization problem, and finally the classification rule of CSVM is presented in (10).

$$f_{CSVM} \left( {\varvec{X}} \right) = \sum\limits_{i = 1}^{k} {\lambda_{i} y_{i}^{un} K\left( {{\varvec{X}}_{i}^{un} ,{\varvec{X}}} \right) + \sum\limits_{i = 1}^{m} {\beta_{i} y_{i}^{st} K\left( {{\varvec{X}}_{i}^{st} ,{\varvec{X}}} \right) + } b}$$
(10)

where \(\lambda_{i}\) and \(\beta_{i}\) are weights of Lagrangian multipliers \({\varvec{\uplambda}}\) and \({\varvec{\upbeta}}\). The \(\lambda_{i}\) corresponding to the unstable samples has no upper limit, while \(\beta_{i}\) corresponding to the stable ones has upper limit C, the penalty factor. CSVMs are more likely to classify samples as unstable.

3.2.2 ASVM

In contrast with CSVM, ASVM can be obtained by removing the slack variable in the constraint condition corresponding to the stable samples in traditional SVM model, as shown in (11).

$$\left\{ \begin{aligned} &\mathop {\hbox{min} }\limits_{{{\varvec{w}},b,\zeta }} \left( {\frac{1}{2}{\varvec{w}}^{\text{T}} {\varvec{w}} + C\sum\limits_{i = 1}^{{n_{t} }} {\zeta_{i} } } \right) \hfill \\ &\text{s.t.} \, y_{i}^{st} \left( {{\varvec{w}}^{\text{T}} {\varvec{\varphi }}\left( {{\varvec{X}}_{i}^{st} } \right) + b} \right) \ge 1 \qquad \quad \; i = 1,2, \ldots ,m \hfill \\&\quad\,\,\, y_{i}^{un} \left( {{\varvec{w}}^{\text{T}} {\varvec{\varphi }}\left( {{\varvec{X}}_{i}^{un} } \right) + b} \right) \ge 1 - \zeta_{i} \quad i = 1,2, \ldots ,k \hfill \\ \end{aligned} \right.$$
(11)

The hyperplane trained by ASVM corresponds to L2 in Fig. 1, which makes stable samples strictly limited to one side of the boundary and ensures that the other side of the sample is unstable. Similarly, the classification rule of ASVM is presented in (12).

$$f_{ASVM} \left( { \varvec{X}} \right) = \sum\limits_{i = 1}^{k} {\lambda_{i} y_{i}^{un} K\left( {{\varvec{X}}_{i}^{un} ,{\varvec{X}}} \right) + \sum\limits_{i = 1}^{m} {\beta_{i} y_{i}^{st} K\left( {{\varvec{X}}_{i}^{st} ,{\varvec{X}}} \right) + } b}$$
(12)

It is worth noting that (12) is exactly the same as (10), but the relative size of \({\varvec{\uplambda}}\) and \({\varvec{\upbeta}}\) are different: for ASVM, the stable samples corresponding to a larger Lagrangian multiplier, whereas CSVM is the opposite.

3.3 Stability assessment rules based on novel SVM

3.3.1 Basic process

The assessment rules are constructed by using CSVM and ASVM to train two boundaries, and the rules naturally divide the input space into three non-overlapping regions: stable region, unstable region and GR.

For a post-fault system to be judged, the judgement rules are summarized in Table 1.

Table 1 Stability assessment rules

As shown in Table 1, if the system is judged stable, the output \(y_{pg} = 1\); if the system is judged unstable, the output \(y_{pg} = - 1\); and if the results of CSVM and ASVM are different, the output \(y_{pg} = 0\), which means the samples fall into GR. Combine (10) and (12) and then the mathematical description of stability assessment rule are following (according to the value of the system \({\varvec{X}}_{pg}\) in the input space):

1) When \(f_{CSVM} \left( {{\varvec{X}}_{pg} } \right) > 0\) and \(f_{ASVM} \left( {{\varvec{X}}_{pg} } \right) > 0\), the post-fault system is stable.

2) When \(f_{CSVM} \left( {{\varvec{X}}_{pg} } \right) < 0\) and \(f_{ASVM} \left( {{\varvec{X}}_{pg} } \right) < 0\), the post-fault system is unstable.

3) For other situations, the post-fault system falls in GR, and other methods are needed to make further judgement.

The three regions are shown in Fig. 2. The red circle is the CSVM margin, and the green circle is the ASVM margin. The region outside the red and green circles are unstable region, while the intersection of red and green circles is the stable region, which means when the post-system state is in this region, it can be sure that the system is stable. The region in red circle and not in green circle is GR, the same as region in green circle while not in red circle.

Fig. 2
figure 2

CSVM, ASVM and physical meaning of stability assessment rule

3.3.2 Evaluation index

The expressions of percentage of false dismissal (PFD), the percentage of false alarm (PFA) and the percentage of grey (PG) are presented in (13), (14) and (15).

$$I_{PFD} = \frac{{N_{fd} }}{{N_{testing} }} \times 100\%$$
(13)
$$I_{PFA} = \frac{{N_{fa} }}{{N_{testing} }} \times 100\%$$
(14)
$$I_{PG} = \frac{{N_{grey} }}{{N_{all} }} \times 100\%$$
(15)

where \(N_{testing}\) is the number of testing samples; \(N_{fd}\) is the number of false dismissals; \(N_{fa}\) is the number of false alarms; \(N_{all}\) is the number of training or testing samples; \(N_{grey}\) is the number of samples fallen into GR.

3.3.3 Instability probability assessment

In Section 3.3.1, the assessment process only uses the sign of \(f_{CSVM} \left( {\varvec{X}} \right)\) and \(f_{ASVM} \left( {\varvec{X}} \right)\), while the magnitude of the two values is not being used. In order to use this information and describe the distance of samples in GR and CSVM (ASVM) boundary, the indicator distance difference (DD) is defined as shown in (16).

$$\begin{aligned} I_{DD} \left( {\varvec{X}} \right) &= d_{ASVM} \left( {\varvec{X}} \right) - d_{CSVM} \left( {\varvec{X}} \right) \hfill \\ &= \frac{{\left| {f_{ASVM} \left( {\varvec{X}} \right)} \right|}}{{\left\| {{\varvec{w}}_{ASVM} } \right\|}} - \frac{{\left| {f_{CSVM} \left( {\varvec{X}} \right)} \right|}}{{\left\| {{\varvec{w}}_{CSVM} } \right\|}} \hfill \\ &= \frac{{\left| {f_{ASVM} \left( {\varvec{X}} \right)} \right|}}{{\left\| {{\varvec{w}}_{ASVM} } \right\|}} + \frac{{f_{CSVM} \left( {\varvec{X}} \right)}}{{\left\| {{\varvec{w}}_{CSVM} } \right\|}} \hfill \\ \end{aligned}$$
(16)

where \({\varvec{w}}_{CSVM}\) can be written in (17) and \({\varvec{w}}_{ASVM}\) is similar.

$$\begin{aligned} \left\| {{\varvec{w}}_{CSVM} } \right\| = \sum\limits_{{i = \text{1}}}^{k} {\sum\limits_{{j = \text{1}}}^{k} {\lambda_{CSVM,i} \lambda_{CSVM,j} y_{i}^{un} y_{j}^{un} K\left( {{\varvec{X}}_{i}^{un} ,{\varvec{X}}_{j}^{un} } \right)} } \hfill \\ + \sum\limits_{{i = \text{1}}}^{m} {\sum\limits_{{j = \text{1}}}^{k} {\beta_{CSVM,i} \lambda_{CSVM,j} y_{i}^{st} } } y_{j}^{un} K\left( {{\varvec{X}}_{i}^{st} ,{\varvec{X}}_{j}^{un} } \right) \hfill \\ + \sum\limits_{{i = \text{1}}}^{k} {\sum\limits_{{j = \text{1}}}^{m} {\lambda_{CSVM,i} \beta_{CSVM,j} y_{i}^{un} y_{j}^{st} K\left( {{\varvec{X}}_{i}^{un} ,{\varvec{X}}_{j}^{st} } \right)} } \hfill \\ + \sum\limits_{{i = \text{1}}}^{m} {\sum\limits_{{j = \text{1}}}^{m} {\beta_{CSVM,i} \beta_{CSVM,j} y_{i}^{st} y_{j}^{st} K\left( {{\varvec{X}}_{i}^{st} ,{\varvec{X}}_{j}^{st} } \right)} } \hfill \\ \end{aligned}$$
(17)

The larger the IDD value indicates that the distance from the sample to the unstable region is greater than the distance from the sample to the stable region, so that the IDD value can characterize the relative positional relationship between the sample in GR and other regions. The instability probability curve can be fitted according to the proportion of unstable samples under different IDD values. In the real-time application, if the result of the assessment is in GR, calculate the IDD value of the sample and the instability probability is evaluated by using the instability probability curve.

The process of generating the instability probability curve is: ① generate a large number of samples in GR; ② calculate the IDD; ③ calculate proportion of unstable samples under different IDD; ④ obtain instability probability curve by Platt correction.

The principle of Platt correction [19] is to fit the stability probability curve to the sigmoid curve. After fitting the curve with Platt correction, the stability probability p of the sample in GR is shown in (18), while the instability probability is 1 − p.

$$p = \frac{1}{{1 + \exp \left( {a_{1} I_{DD} + a_{2} } \right)}}$$
(18)

where \(a_{1}\) and \(a_{2}\) are the parameters of the linear function in (19), which is fitted to \(\text{ln}\left( {p/\left( {1 - p} \right)} \right)\).

$$- a_{1} I_{DD} - a_{2} = g\left( {I_{DD} } \right) = \ln \frac{p}{1 - p}$$
(19)

\(a_{1}\) and \(a_{2}\) can be obtained by solving the optimization problem in (20).

$$\left\{ \begin{aligned} &\mathop {\hbox{min} }\limits_{{a_{1} ,a_{2} }} \left\{ { - \sum\limits_{i = 1}^{N} {\left[ {t_{i} \ln \left( {p_{i} } \right) + \left( {1 - t_{i} } \right)\ln \left( {1 - p_{i} } \right)} \right]} } \right\} \hfill \\& p_{i} = \frac{1}{{1 + \exp \left( {a_{1} I_{DD,i} + a_{2} } \right)}} \hfill \\ \end{aligned} \right.$$
(20)
$$t_{i} = \left\{ \begin{aligned} &\frac{{N_{new + } + 1}}{{N_{new + } + 2}}\qquad y_{i} = 1 \hfill \\ &\frac{1}{{N_{new - } + 2}} \qquad y_{i} = - 1 \hfill \\ \end{aligned} \right.$$
(21)

where \(i = 1, \, 2, \ldots , \, N_{new} ,\) \(N_{new}\) is the number of samples in GR; \(N_{new + }\) is the number of stable samples; \(N_{new - }\) is the number of unstable samples.

This optimization problem can be solved by likelihood methods [20].

4 Case study

4.1 IEEE 39-bus system

This section validates the stability assessment and demonstrates the TSA process based on the IEEE 39-bus system. There are 10 generators, 19 loads and 46 lines in this system. Fix the pre-fault operation mode, and make transient stability simulation on the basic of that. In order to avoid islanding, 35 of the 46 lines are involved in the fault scan, and the location of the fault is at both ends of the line. Therefore, there are 70 kinds of faults, the number of which is from 1 to 70. The fault is set as follow: three-phase short circuit occurs at the end of the line, and after a period of time the fault line is removed. Record the active power, reactive power of the line, phase angle of bus voltage and generator information at the moment of fault clearing time as alternative input features. A total of 2500 samples are generated, and 2000 of which are randomly selected as the training samples, which are used to train the model, while the other 500 are the test samples, using for testing the model.

1) Feature and parameter selection

The alternative feature types are shown in Table 2.

Table 2 Alternative feature types and their number

Adopt the method of forward search with wrapped evaluation, and the index is IPG. In each turn, choose one feature from the candidate features and minimize the IPG index of the classifier. When the amplitude of the IPG obtained at the end of a certain turn of feature selection is lower than a threshold value compared to the previous turn, it is difficult to reduce the GR by adding features and stop the features selection. The result indicates that, with the increase of input features, IPG decreases rapidly before 5 features, and after 10 features, IPG decreases very slowly, as shown in Fig. 3.

Fig. 3
figure 3

Relationship between IPG and number of features after feature selected

Set the reduction threshold value of IPG to 0.2%, and 15 features are selected. The result is shown in Fig. 4.

Fig. 4
figure 4

Result of feature selection in IEEE 39-bus system

Based on these 15 features, use 5-fold cross validation method to test the 400 groups of C and γ, and balance the classification effect and the overfitting problem, and finally determine the parameters C = 1.2, γ = 0.5.

2) Result of the proposed strategy

Under the determined input characteristics and parameters, a stability assessment rule is obtained by training 2000 training samples and tested by 500 test samples. There are 81 of the train samples fall into GR, and IPG is 4.05%. The proportion of test samples in GR is close, 4.8%. There are no false alarms and false dismissals in test samples. Compared to the stability assessment rules based on traditional SVM, the rules based on CSVM and ASVM are more in line with the requirements of power system TSA. The result is shown in Table 3, where N1 represents false dismissal number, N2 represents false alarm number, N3 represents number of samples in GR.

Table 3 Training and testing results of TSA considering conservativeness

The instability probability curve of the samples in GR is calculated below. First, samples in GR are needed. In this case study, 4156 samples are generated, and 2134 of which are stable and others are unstable. Then calculate IDD of each sample: in the formula of IDD, the denominator \(\left\| {{\varvec{w}}_{CSVM} } \right\| = 18.02\), \(\left\| {{\varvec{w}}_{ASVM} } \right\| = 18.31\). These two values are close because the stable and unstable samples are balanced. After the IDD values are obtained, calculate the instability probability of different intervals, and finally use Platt correction to fit the instability probability curve. The proportion of unstable samples and the instability probability curve are shown in Fig. 5.

Fig. 5
figure 5

Proportion of unstable samples and instability probability curve in GR

The instability probability curve in Fig. 5 is Sigmoid curve. It indicates that in this case the IDD value varies from − 0.1 to 0.1. With the increase of IDD, the probability of unstable samples decreases, which conforms with its physical meaning: IDD value reflects the distance from the sample point to GR, and the larger it is, the sample point is closer to the boundary of CSVM and the stable region. When IDD value is less than − 0.1, the instability probability is close to 100%; when IDD value is bigger than 0.1, the instability probability is close to 0; and when IDD value equals 0, the instability probability is about 50%. When in real-time application, samples that fall into GR can be used to calculate the instability probability based on their IDD values, which can help operators make decision as a reference.

3) Result of traditional SVM

Use forward search combined with encapsulated method to train 2000 samples for feature selection. Taking SVM classifier’s classification as the index of encapsulated evaluation, and the input features selected in each round are the features that make the classifier the highest accuracy. When the accuracy of one round of feature selection compared with last round is less than a threshold value, the feature selection is stopped. The feature selection result is shown in Fig. 6.

Fig. 6
figure 6

Result of feature selection based on traditional SVM in IEEE 39-bus system

Based on these features, a combination of 400 groups of C and γ is examined using a 5-fold cross validation method. Calculate the values of overfit degree and verification set misclassification under different combinations of parameters. The overfit degree value is calculated by dividing the verification set’s misclassification rate by the training set’s misclassification rate. Set the overfit threshold to 1.5, and the final parameters are confirmed: C = 1.4, γ = 0.2. Under the certain input features and parameters, a stability assessment rule is obtained by 2000 training samples and tested with 500 test samples. The result is shown in Table 4, where Ac represents classification accuracy.

Table 4 Training and test results of stability assessment rules based on traditional SVM in IEEE 39-bus system

The training and test accuracy can reach 97%, which means for the most cases, the method of TSA under the concept of stability region can correctly judge the post-fault system stability. Further analysis of the misclassified samples: during the training stage, the stability assessment rules judge 7 unstable samples as stable, 37 stable samples as unstable, that is, 7 false dismissals and 37 false alarms. In the test stage, the stability rule judges16 stable samples as unstable, which means 16 false alarms. Due to the influence of sample quality, dimension of input features, simulation model error and other factors, the two sorts of samples are not completely divisible. Therefore, the accuracy of evaluation cannot reach 100%, and false dismissals and false alarms can not be avoided. Although this stability assessment rule shows high accuracy in training and test, it cannot be applied in real time application.

4.2 Case study based on provincial power system

In this section, the case is based on realistic provincial power grid in China, referred to as HBPS. By the end of 2016, the total installed capacity of HBPS is about 67000 MW. The whole grid has about 200 generators, 1700 buses and 1700 transmission lines. From the prospect of power balance, hydropower resources are abundant in western part and there is load center in eastern part. Therefore, HBPS shows the characters of power transmission from west to east and north to south, which is shown in Fig. 7.

Fig. 7
figure 7

HBPS 500 kV grid structure diagram

This case is based on Hubei power grid of summer operation mode. The training and test samples are generated by scanning the 426 faults. The faults are set as three-phase short-circuit faults on 500 kV or 220 kV lines, and the fault line is being cut off after a period of time. For the fault of each location, select a certain range of fault clearing time to simulate. Record power angle of generator, rotor speed, active power and reactive power of line as alternative features, and 3100 in total. Finally 12780 samples are generated, 10176 of which are selected as training samples randomly, while the left 2604 as test samples.

In the case study, forward search is being selected and IPG index is used as the encapsulated evaluation index to select features. Each turn as the number of features increases, the IPG index gradually decreases, which means the GR shrinks gradually. When the number of features is less than 5, the IPG decreases rapidly, and the rate of decrease of IPG after 5 features is slowed down, and almost no longer decreases after more than 15 features. The relationship between the IPG index and the number of features is shown in Fig. 8.

Fig. 8
figure 8

Hubei power grid case: relationship between IPG and number of input features

Set the stop criterion of feature selection as \(- \Delta I_{PG} < 0.1\%\), and 20 features are selected, which are also the active power and reactive power of line. It is worth noticing that, compared with IEEE 39-bus system, the amount of input features needed by Hubei power grid is not directly proportional to the increase in scale. This is due to the fact that the crucial factors that affect the stability of the grid no not increase proportionally with the increase of scale.

Confirm the parameter C = 1 and γ = 0.3. Under certain features and parameters, train the rules of stability assessment and compare the training and test results of two assessment rules. The results are showed in Tables 5 and 6.

Table 5 Training and test results of stability assessment rules based on CSVM and ASVM
Table 6 Training and test result of stability assessment rules based on traditional SVM in HBPS

Similar to small system, there are no false dismissal and false alarm in the training and test results of stability assessment, considering the conservativeness, while the GR is about 9%. Although the traditional SVM-based stability assessment rules can reach 98% of the classification accuracy, false dismissal exits. Therefore, the comparison between Table 4 and Table 5 shows the proposed method is more in line with the actual operational requirements of the power system. Compared with the small system case, the factors affecting the stability of the large system are more scattered. The number of input features of the stability assessment rule needs to be increased to provide enough information. As the input features increase, the GR will gradually decrease. The requirements for the size of GR should be determined according to the requirements of assessment conservativeness in actual operation.

5 Conclusion

This paper proposes a strategy using CSVM and ASVM for real-time TSA. In this strategy, GR is defined to solve the problem that traditional analysis methods cannot avoid false alarms and missed alarms, and it ensures the assessment accuracy outside GR. Also, in order to adapt to the TSA rule, new indicators are proposed, especially the quantitative indicators of GR. In addition, the indicator IDD is proposed to describe the stability probability of samples in GR, and instability probability curve is fitted to be taken as reference for the operators. Although it is very difficult to realize online real-time TSA, this paper has made useful explorations. There is still a long distance from real engineering practice, but the works of this paper show great application prospect.