1 Introduction

Power flow calculation, which refers to calculating the distribution of active power, reactive power and voltage in a power grid with given topology and component parameters, is the foundation of steady or transient state analysis, power system planning, reliability analysis and optimization. In recent years, for higher energy efficiency, lower carbon footprint, and meeting growing power demand, abundant distributed generators, storages, etc. are integrated into power grid [1], which adds more uncertainty to power flow calculation and requires it to be more scalable. Moreover, the construction of cross-regional interconnection and long-distance transmission systems make the scale of power systems expand and their structure tends to be complicated. There is an urgent need to propose new power flow calculation approaches [2, 3] to cope with these changes and consequent issues they bring to power flow calculation, such as the addition of a temporal dimension, insufficient memory, and slow computational speed and convergence [4,5,6,7,8].

In view of its good performance in improving computational efficiency, parallel computing is considered to be a primary solution for power flow calculation in large-scale power system. Related research mainly focuses on parallel algorithms, parallel programming models and experimental platforms. The purpose of this research is to design efficient parallel algorithms with higher parallelism and lower data correlation, using methods such as decomposition and coordination, sparse vectors, Krylov subspace iterations, and so on [9,10,11,12]. Although these methods have made some progress, it is extremely difficult to overcome the internal temporal dependencies in the process of forward-backward sweeping. They also suffer from poor versatility and portability.

The parallel programming model is closely related to the specific hardware platform. Graphical processing units (GPUs), vector machines, symmetric memory-shared multiprocessor, Transputer systems and Beowulf clusters are commonly used. Reference [10] implements power flow computing by constructing a peer-to-peer (P2P) platform, but its scalability is poor and there is a serious communication delay, so it is difficult to realize online computing of large-scale power flow on it. Reference [9] used pipeline technology with non-blocking communications based on message passing interface (MPI) to reduce waiting times in data synchronization among nodes. However, according to Amdahl’s law, it has a bottleneck of gradually declining parallel speedup ratio as the number of computing nodes increases. Reference [12] proposed the parallel generalized minimal residual (GMRES) algorithm and implemented an analysis on the Beowulf cluster platform. From the experimental results, there was limited improvement in parallel efficiency, and the amount of inter-process communication was increased when adding processors. A distributed heuristic approach is proposed for the optimal management of a smart power distribution grid in [13], and the reported results show its efficiency compared to a heuristic centralized approach, however, the experiment cases are just 9-bus and 25-bus systems, which are not a practical scale. References [14] and [15] implemented parallel power flow calculations on GPUs. Reference [14] used particle swarm optimization (PSO) to calculate optimal power flow (OPF) while [15] proposed a Monte-Carlo (MC) based method for probabilistic power flow (PPF) analysis. They could get speedups of approximate 16-20 compared to sequential execution on a single CPU with 30-300 buses. These traditional parallel programming models usually have a high cost while they are difficult to implement, and their hardware platform are hard to extend. They are usually more suitable for applications with a relatively small volume of data.

Cloud computing fuses parallel computing and distributed computing and has become a standard technology for lots of big data analysis tasks including power system simulation and control. A cloud-based power system operational model called cloud grid (CG) is presented in [16], where the service model of cloud computing is integrated into the whole life cycle of the power system in China’s smart grid environment. Cloud computing mainly adopts MapReduce as its parallel programming model, allowing numerous cheap commercial computers to be used for computing and giving high reliability, low cost, the capability for processing a huge amount of data, flexibility and scalability. Reference [17] designed and implemented a cloud computing platform for a wide area measurement system and it was used to process large amounts of power grid data. Its basic software is Hadoop which uses the MapReduce model for parallel data extraction, transformation and access across multiple files, demonstrating efficient massive data processing. In addition, [18] proposed a technology on processing mass intermittent energy data on MapReduce model. There are lots of advantages, achieved by allocating a large number of tasks to different nodes of a loosely-coupled computer cluster system and demanding high intensity mass data throughput; however, MapReduce-based cloud computing is most suitable for data-intensive applications, and usually has poor performance when executing computation-intensive tasks such as large-scale power flow calculations.

There has been much research devoted to improving the performance of cloud computing frameworks. Spark is considered to be a more efficient model than MapReduce. Reference [19] indicated that the bottleneck of Hadoop is reading data from disk. In [20], it was noted that Hadoop stores input and output data on disk while Spark stores them in memory, and the result showed that the latter obtained a 40 times improvement in speed over the former. Reference [21] described that, similarly to Spark’s storage strategy, caching input data could halve the average job completion time. Some problems about the granularity of caching and how Hippo uses a “homework dependency graph” to determine memory retention and prefetching were discussed in [22]. As the core technique of Spark, RDDs provide a distributed memory-shared model which is characterized by MapReduce’s fault-tolerance performance and simple programming. Currently, RDDs have been successfully applied in other fields requiring big data processing and analysis [23, 24] but there is little research about RDDs-based power system simulation.

In view of the above analysis, a parallel power flow calculation method derived from the Newton-Raphson method and MapReduce is proposed. The contributions of this method are mainly embodied in two aspects:

1) Deploying large-scale power flow calculation on a cloud computing platform.

2) Solving the problem of low performance of MapReduce model when applied to this task.

Considering the necessity to manage a large volume of data, the proposed method takes adequate advantage of cloud computing and makes up for the insufficiency of MapReduce by using RDDs and DAG. To reduce the time spend reading, writing and transferring data, it forms datasets in memory. The computing performance is further improved by executing the sequential programs with a maximum degree of parallelism.

2 Power flow calculation model for large-scale power systems

2.1 Power flow calculation model of power system

Power flow analysis is a fundamental calculation which is often used to evaluate the feasibility, reliability and efficiency of a power supply scheme, and in doing research on power system operation and regulating schemes [25,26,27,28]. It is also necessary for theoretical line loss analysis. An online power flow calculation with improved performance would be a significant advance.

1) Bus analysis in power flow calculation

Taking various conditions constraint into account, buses in power system are divided into different categories: ① PQ buses are the most prevalent in the power grid, with given active power and reactive power, thereby power injection is known; ② PV buses have known active power and voltage amplitudes, and usually they are few in number, if any; ③ a balance bus (PI bus) has given voltage amplitude and phase, and generally there is only one of them. A power plants buses which is responsible for adjustment of system frequency is often selected as the balance bus.

A new type of bus is appearing in the power system in increasing numbers due to the integration of distributed energy. Whether they are synchronous generators or doubly-fed generators, wind turbines can be equivalent to PQ buses in power flow calculations. If connected into grid through a current-source inverter, solar power can be equivalent to a PI bus and, if connected into grid through voltage source inverter, it can be equivalent to a PV bus. Micro gas turbines and fuel cells can be treated as PV buses in power flow calculation.

2) The mathematical model of power flow calculation

Power flow calculation aims to determine power distribution and loss accurately according to power grid parameters and structure within a reasonable time. It is necessary to set up and solve (1) Newton-Raphson method which are derived in [29].

$$\left[ {\begin{array}{*{20}c} {\Delta P_{1} } \\ {\Delta Q_{1} } \\ {\Delta P_{2} } \\ {\Delta Q_{2} } \\ \vdots \\ {\Delta P_{p} } \\ {\Delta U_{p}^{2} } \\ {\Delta P_{n} } \\ {\Delta U_{n}^{2} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {H_{{11}} } & {N_{{11}} } & {H_{{12}} } & {N_{{11}} } & {H_{{1p}} } & \cdots & {H_{{1n}} } \\ {J_{{11}} } & {L_{{11}} } & {J_{{12}} } & {L_{{12}} } & {J_{{11}} } & \cdots & {J_{{1n}} } \\ {H_{{21}} } & {N_{{21}} } & {H_{{22}} } & {N_{{22}} } & {H_{{2p}} } & \cdots & {H_{{2n}} } \\ {J_{{21}} } & {L_{{21}} } & {J_{{22}} } & {L_{{22}} } & {J_{{2p}} } & \cdots & {J_{{2n}} } \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ {H_{{p1}} } & {N_{{p1}} } & {H_{{p2}} } & {N_{{p2}} } & {H_{{pp}} } & \cdots & {H_{{pn}} } \\ {R_{{p1}} } & {S_{{p1}} } & {R_{{p2}} } & {S_{{p2}} } & {R_{{pp}} } & \cdots & {R_{{pn}} } \\ {H_{{n1}} } & {N_{{n1}} } & {H_{{n2}} } & {N_{{n2}} } & {H_{{np}} } & \cdots & {H_{{nn}} } \\ {R_{{n1}} } & {S_{{n1}} } & {R_{{n2}} } & {S_{{n2}} } & {R_{{np}} } & \cdots & {R_{{nn}} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\Delta f_{1} } \\ {\Delta e_{1} } \\ {\Delta f_{2} } \\ {\Delta e_{2} } \\ \vdots \\ {\Delta f_{p} } \\ {\Delta e_{p} } \\ {\Delta f_{n} } \\ {\Delta e_{n} } \\ \end{array} } \right]$$
(1)

where “\(\Delta\)” represents the unbalance value; fj and ej are the real and imaginary parts of the correction voltages; Hij, Nij, Jij, Lij, Rij, Sij are the parts of the Jacobian matrix; Pi and Qi are the active and reactive power; Ui is the voltage magnitude. There are \((n - 1)\) PQ buses and \((n - m)\) PV buses. The first \(2(m-1)\) rows in (1) correspond to the PQ buses and the following \((n - m)\) rows correspond to the PV buses. A rectangular coordinate system is used.

2.2 Power flow calculation using Newton-Raphson method

The Gauss-Sederal method, the Newton-Raphson method and the P-Q method are the main methods for power flow analysis. Meanwhile, intelligent algorithms such as the genetic algorithm and particle swarm optimization have been used; however, studies about improving power flow calculation methods are still aimed at the traditional algorithms [30, 31].

The Newton-Raphson method (or Newton method) is a typical method for solving nonlinear equations in mathematics. It is better than other methods in terms of memory requirement and computational speed with good convergence. This method is still used widely. Its basic idea is expanding the power flow equation \(F(x) = 0\) as a Taylor series and omitting the second and higher order terms. This linearizes the nonlinear equations locally, and the core of the solution process is to form and solve the correction equation repeatedly.

This paper selects the Newton-Raphson method for studying power flow calculation because of the following advantages: ① Fast convergence. With an appropriate initial value, the algorithm has characteristics of square-law convergence. Generally, accurate solutions can be obtained after 4-5 iterations, and the time of iteration is substantially independent of the grid size; ② Good convergence reliability. Compared with the Gauss-Seidel method which has poor convergence and slow calculation, the Newton-Raphson method has good convergence reliability; ③ Extensive application. The Newton method is widely used and applicable to the general power flow calculation because it has no specific requirements on the parameters in power grids; ④ Easy to parallelize. Traditional direct methods for power flow calculation require forward-backward sweeping in the solving process and it is difficult to realize this as a vectorized or parallel calculation. The Newton-Raphson method is easier to parallelize.

3 Parallel power flow calculation method based on RDDs and DAG

3.1 Parallel programming model: MapReduce

MapReduce, a parallel programming model for processing large data sets, greatly simplifies the complexity of distributed parallel programming. The Map (mapping) function is used to map a set of key-value pairs to a new set of key-value pairs and The Reduce (reduction) function performs a summary operation for each group of mapped key-value pairs which share the same key. Taking the calculation of the admittance matrix’s diagonal elements as an example, Fig. 1 explains the MapReduce operation.

Fig. 1
figure 1

Calculation process of admittance matrix’s diagonal elements based on MapReduce

Figure 1 shows that it is necessary to read power grid parameters from different buses before the Map operation (each rectangle represents a node), and buffer the intermediate results to disks at various nodes. During the Reduce operation, it is necessary to read intermediate results from all child nodes. As a result, there are relatively many data reading, writing and transmission operations. A growing amount of power data and calculation makes transmission delays more and more obvious. Reference [32] applies MapReduce programming to parallel power flow computing using the component averaging (CAV) iterative method and Jacobi iterative method. However, there are still problems like data transmission delays in the computing process, long processing time, and poor real-time performance, which make the results not so satisfactory.

3.2 Distributed memory sharing model

RDDs and DAG optimization are the core technologies of Spark. As a distributed memory sharing model, RDDs can store data in memory and make them form datasets. It can reduce the time spent reading and writing data by directly visiting these datasets when analyzing the data with Spark. Before calculating, the Spark kernel will draw a DAG describing the calculation path. Computation will be divided according to the DAG into sets of tasks, which will be submitted to the compute nodes for real calculations. DAG optimization enables the sequential tasks to be executed in parallel to the greatest possible extent. In addition, transmission time and execution time are reduced by implementing the task sets with narrow dependence in one stage. Figure 2 shows the calculation process of admittance matrix’s diagonal elements based on RDDs and DAG optimization.

Fig. 2
figure 2

Calculation process of admittance matrix’s diagonal elements based on RDDs and DAG optimization

It can be seen from Fig. 2 that the branch admittance value slices are first read from each node (each rectangle represents a node), and then the intermediate key values are stored in the RDDs collection after the Map operation. The intermediate key values are read from all nodes when performing a Reduce operation. Reading and writing operations in the whole process are based on RDDs datasets and their speed is much faster. DAG optimization takes the key values of admittance data slices into a joint partition and stores intermediate results, which are within the same key, in the same node for processing. This will avoid the data being mixed among nets, which can reduce data transmission and improve efficiency of the computation.

The difficulty of parallel processing depends on the size of the power grid and the complexity of its structure, and on how well the algorithm can be adapted to parallel computation. Several factors such as the method of subdivision, optimal allocation across nodes, and interdependency should be considered to implement the parallel power flow calculation. Figure 2 shows the principle for calculating diagonal elements of the admittance matrix based on MapReduce, including some steps which are indicated diagrammatically such as the optimal allocation of tasks and node calling.

3.3 Parallel power flow calculation method

The analysis above explains why a parallel power flow calculation using the Newton-Raphson method with RDDs and DAG optimization should be efficient. In large-scale power system, the amount of calculation for power flow calculation mainly comes from solving correction equations and updating the Jacobian matrix. The complete process includes preparation for these steps and is outlined in Fig. 3.

Fig. 3
figure 3

Complete process for parallel power flow calculation

In the calculation process, the n buses of the power grid are not divided sequentially into PV buses, PQ buses, and balance buses. Rectangular coordinate is selected for representation of voltage. The source and intermediate data are stored in the RDDs array. After DAG optimization, those operations with narrow dependence are implemented in one stage. The whole procedure is divided into seven stages and each stage can be completed at one node independently. Different stages, however, demand striding several nodes. Stages 1, 2, and 3 can be executed in parallel as can Stages 4 and 5.

The specific procedures of each stage are as follows, where the notation \(* * [\, ][\, ]\) on \(* * [\,]\) expresses a variable array or a variable which is formed in the process of the power flow calculation (“\(*\)” indicates one or more letters).

Step 1: Save the impedance data and their row numbers in the source file in a RDDs array \(RR[l_{num} ][4]\) and index variable \(l_{num}\) respectively. The other elements of the array are the identities of “from” and “to” buses of the branch impedances, stored in \(RR[\,][0]\) and \(RR[\,][1]\) respectively, and the resistance and reactance of the branch impedances, stored in \(RR[\,][2]\) and \(RR[\,][3]\) respectively, with ohms as their units. The conductance (real part) \(YG[n + 1][n + 1]\) and susceptance (imaginary part) \(YB[n + 1][n + 1]\) of the admittance matrix can be calculated according to (2) and (3) after getting array \(RR[\,][\,]\). \(YG[\,][\,]\) and \(YB[\,][\,]\) are both RDDs arrays.

$$\left\{ \begin{aligned} YG[i][j] = - RR[\,][2]/(RR[\,][2]^{2} + RR[\,][3]^{2} ) \hfill \\ YB[i][j] = - RR[\,][3]/(RR[\,][2]^{2} + RR[\,][3]^{2} ) \hfill \\ \end{aligned} \right.$$
(2)
$$\left\{ \begin{array}{l} YG\left[ i \right]\left[ j \right]= - \mathop{\mathop{\sum}\limits_{k = 1}}\limits_{k! = i}^{k = n} {YG_{(i,k)} } \hfill \\ YB\left[ i \right]\left[ j \right] = - \mathop{\mathop{\sum}\limits_{k = 1}}\limits_{k! = i}^{k = n} {YB_{(i,k)} } \hfill \\ \end{array} \right.$$
(3)

This procedure is shown in Fig. 4 below in terms of diagonal elements and off-diagonal elements separately. Off-diagonal elements can be computed directly from branch resistance and impedance using (2). Then the diagonal elements can be calculated according to (3).

Fig. 4
figure 4

Calculation process for admittance matrix

Step 2: Read the information about bus voltage and bus power, and save the buses’ type, voltage, active power and reactive power in the RDDs arrays \(T_{bus} [n + 1]\), \(U[n + 1]\), \(P[n + 1]\) and \(Q[n + 1]\), respectively. Set the values of unknown data to zero; for instance, the reactive power of a PV bus is unknown, so these are set zero.

Step 3: Set the initial value of voltage on each bus according to (4). Each voltage value consists of real part \(e\) and imaginary part \(f\), while their correction values are \(ee\) and \(ff\). In the first iteration, initial values and correction values of voltage can be estimated. Otherwise, they can be set according to the correction equation.

$$\left\{ \begin{aligned} e_{i}^{(k)} = e_{i}^{(k - 1)} + ee \hfill \\ f_{i}^{(k)} = f_{i}^{(k - 1)} + ff \hfill \\ \end{aligned} \right.$$
(4)

Step 4: Calculate the unbalance value of the PQ buses’ active power (PP) and reactive power (QQ), and the PV buses’ active power (PP) and voltage (UU), according to (5)-(7) as follows:

$$PP[i] = P[i] - \sum\limits_{j = 1}^{j = n} {[e[i](YG[i][j]e[j] - YB[i][j]f[j]) + f[i](YG[i][j]f[j] - YB[i][j]e[j])]}$$
(5)
$$QQ[i] = Q[i] - \sum\limits_{j = 1}^{j = n} {[f[i](YG[i][j]e[j] - YB[i][j]f[j]) + e[i](YG[i][j]f[j] - YB[i][j]e[j])]}$$
(6)
$$UU[i]_{{}}^{2} = U[i]_{{}}^{2} - (e[i]_{{}}^{2} + f[i]_{{}}^{2} )$$
(7)

Figure 5 consists of inner and outer loops. The outer loop traverses the first subscript (indicating the bus) of the unbalance value and the inner loops compute the value of each element by accumulation. The first inner loop calculates the unbalance value of active power, then judges the type of bus. If it is a PQ bus, the second inner loop calculates the unbalance value of reactive power, otherwise the unbalance value of voltage.

Fig. 5
figure 5

Calculation process of unbalance value

Step 5: Calculate the elements of the Jacobian matrix using (8), (9) and save them in the related two-dimensional RDDs arrays \(H[n + 1][n + 1]\), \(N[n + 1][n + 1]\), \(J[n + 1][n + 1]\), \(L[n + 1][n + 1]\), \(R[n + 1][n + 1]\) and \(S[n + 1][n + 1]\). PQ buses are represented by elements \(H\), \(N\), \(J\), \(L\) in Jacobian matrix and PV buses by \(H\), \(N\), \(R\), \(S\).

$$\left\{ \begin{aligned} &H[i][j] = YG[i][j]f[i] - YB[i][j]e[i] \hfill \\ &N[i][j] = YG[i][j]e[i] + YB[i][j]f[i] \hfill \\ &J[i][j] = - N[i][j] \hfill \\ &L[i][j] = H[i][j] \hfill \\ &R[i][j] = 0 \hfill \&\ S[i][j] = 0 \hfill \\ \end{aligned} \right.$$
(8)

where i ≠ j, when i = j, there exists:

$$\begin{array}{l} \left\{ \begin{array}{l} \mathop I\limits^{.} = Y[i][i]\mathop U\limits^{.}_{i} + \mathop{\mathop{\sum}\limits_{j = 1}}\limits_{j \ne i}^{j = n} {Y[i][i]\mathop U\limits^{.} [j]} = a[i][i] + jb[i][i] \hfill \\ H[i][i] = YG[i][i]f[i] - YB[i][i]e[i] + b[i][i] \hfill \\ N[i][i] = YG[i][i]e[i] + YB[i][i]f[i] + a[i][i] \hfill \\ J[i][i] = - YG[i][i]e[i] - YB[i][i]f[i] + a[i][i] \hfill \\ L[i][i] = YG[i][i]f[i] - YB[i][i]e[i] - b[i][i] \hfill \\ R[i][i] = 2f[i] \hfill \\ S[i][i] = 2e[i] \hfill \\ \end{array} \right.\end{array}$$
(9)

Step 6: Sort the Jacobian matrix elements from Step 5 according to the order of PQ and PV buses and the patterns \(\left[ \begin{aligned} H,N \hfill \\ J,L \hfill \\ \end{aligned} \right]\) and \(\left[ \begin{aligned} H,N \hfill \\ R,S \hfill \\ \end{aligned} \right]\) respectively to construct the Jacobian matrix RDDs array \(Y[2n][2n]\). Similarly, construct the unbalance value vector RDDs array \(PQU[2n][1]\) according to the order of PQ and PV buses and the patterns \(\left[ \begin{aligned} PP \hfill \\ QQ \hfill \\ \end{aligned} \right]\) and \(\left[ \begin{aligned} PP \hfill \\ UU \hfill \\ \end{aligned} \right]\) respectively. Following (2) multiply the inverse of the Jacobian matrix by \(PQU\) to obtain the RDDs array \(ef[2n][2n]\) which are the correction values, arranged in order of PQ and PV buses and the pattern \(\left[ \begin{aligned} ee \hfill \\ ff \hfill \\ \end{aligned} \right]\).

The process of calculation is shown in Fig. 6. There are two parts: generating the Jacobian matrix and getting its inverse. The generation of the Jacobian matrix consists of two loops: firstly traversing each line of the matrix and then calculating the elements. The process of inversion is to get each element’s algebraic cofactor and form a quotient with the determinant of the Jacobian matrix. Finally, the inverse matrix is multiplied by the unbalanced value vector, thus solving the correction equation, and the results are saved in the RDDs array \(ef[\,]\). The values of array \(ef[\,]\) are placed into \(ee[\,]\) and \(ff[\,]\) according to the order of \(Y_{bus}\) values. For example, if \(Y_{bus} [1] = 2\), then \(ee[2] = ef[2]\), \(ff[2] = ef[3]\).

Fig. 6
figure 6

Process of generating and inverting Jacobian matrix

Referring to Fig. 3, the termination condition for the iteration compares the maximum correction with a specified threshold, and the iteration will finish and jump to Step 7 if this condition is met, otherwise, the iteration will continue from Step 3.

Step 7: To complete the power flow calculation, calculate the distribution of power and line loss in the power grid according to (10), (11) and (12).

$$\widetilde{S}_{s} { = }\mathop U\limits^{.}_{s} \sum\limits_{i = 1}^{n} {\mathop Y\limits^{*}_{si} \mathop U\limits^{*}_{i} } = P_{s} + {\text{j}}Q_{s}$$
(10)
$$\widetilde{S}_{ij} { = }\mathop U\limits^{.}_{i} \mathop I\limits^{*}_{ij} { = }\mathop U\limits^{.}_{i} \left[ {\mathop U\limits^{*}_{i} \mathop y\limits^{*}_{i0} + (\mathop U\limits^{*}_{i} - \mathop U\limits^{*}_{j} )\mathop y\limits^{*}_{ij} } \right] = P_{ij} + {\text{j}}Q_{ij}$$
(11)
$$\widetilde{S}_{ji} { = }\mathop U\limits^{.}_{j} \mathop I\limits^{*}_{ji} { = }\mathop U\limits^{.}_{j} \left[ {\mathop U\limits^{*}_{j} \mathop y\limits^{*}_{j0} + (\mathop U\limits^{*}_{j} - \mathop U\limits^{*}_{i} )\mathop y\limits^{*}_{ji} } \right] = P_{ji} + {\text{j}}Q_{ji}$$
(12)

4 Experimental results and analysis

4.1 Cluster-based computing environment

The experimental platform is made up of a 5-node cluster in a cloud computing platform, which includes a master and four slaves. Figure 7 shows the topological graph.

Fig. 7
figure 7

Cluster nodes topology

Running on the cluster are four processes NameNodes and one ResourceManager, which is also called the Master of the cluster. There are also four processes DataNode, which act as Workers in the cluster, and one NodeManager. DataNode and NameNode are responsible for managing storage files. ResourceManager and NodeManager manage the cluster resources. The Master and Worker processes are started by Spark and then run the tasks submitted by Spark. The hardware configuration of each node includes: 20 GB disk, 1 processor and 1 GB memory. The cluster uses Hadoop and Spark technology, with hadoop distributed file system (HDFS) as its storage system for large-scale power data, Yarn as resource manager, Spark as memory-based distributed parallel computing framework and Scala as a light and convenient programming language. The details of software configuration are in Table 1.

Table 1 Software information of nodes

4.2 Source data and test cases

The test data are stored in two files: a branch impedance file and a bus voltage and power file. The columns stored in each file are listed in Tables 2 and 3. According to (1) and (2), the branch admittance value \(YR\) (real part) and \(YB\) (imaginary part) will be acquired and then the admittance matrix can be obtained. Data are accessed according to the type of bus and unused or non-existant data are set to 0 when reading the file.

Table 2 Arrangement of data in branch impedance file
Table 3 Arrangement of data in buses’ voltage and power file

In order to simulate large-scale power flow calculations, test cases are constructed from multiple instances of the standard test systems IEEE 30, IEEE 118 and IEEE 300. The test systems are interconnected by tie-lines to make N×N network structures, so that larger-scale case systems come into being. The number of buses and branches in each constructed test case are shown in Table 4.

Table 4 Information of cases

4.3 Experimental processing and results analysis

To analyze the advantages of a distributed memory cluster, three aspects are taken into account: the scale of simulated power system, the speedup of the power flow calculation and the scale of Spark cluster. Power flow calculations using the method outlined above were completed on the Spark cluster and some run-time features of the experiment are shown in Tables 5 and 6.

Table 5 Information about available nodes after starting Spark cluster successfully
Table 6 Information about finished tasks

Table 5 shows information about available resources of each node in the Spark cluster after starting. The cluster has four slaves and they have been successfully started. Each node with “active” status can perform tasks assigned by the master at any moment. Table 6 shows details of five completed tasks. The Spark cluster allocates an ID for each task in its execution process. Each slave can use 1 GB memory. Each task is performed by all slaves at the same time using 4 kernels.

Testing the performance of parallel computing requires benchmark testing tools, such as SparkBench. However, SparkBench supports only four types of application: machine learning, graph calculation, SQL querying, and streaming data calculation. It can’t be used to test power system flow calculations directly. SparkBench performance indices include the task execution time and resource consumption, yet not the volumes of shuffle data, input and output data, etc. From the above description, it is difficult to give an accurate and detail description of the I/O time and the volume of intermediate data. Referencing [7, 9], this article adopts the commonly used running time and speedup ratio to evaluate the performance of parallel programs. Running time refers to the time that the whole parallel hardware platform takes to execute a parallel program and solve a problem. Specifically, it includes: the arithmetic calculation time, which is measured by the number of steps of basic operations during the execution of the algorithm; the communication time, which is measured by the count of communication traffic; and the I/O time, which is measured by the count of data read from or written to files and their transmission delay.

4.4 Experiment 1: comparison of running time for different models and cases

The foundation of this work and the problems which need to be solved are based on a cloud computing platform, so the proposed method and others are compared in the same cloud computing architecture and experimental environment, so that the validity and advantages of proposed method can be verified.

The first experiment compares the running time of the power flow calculation in stand-alone model, using the MapReduce model, and using the proposed method when the number of buses is 118, 1062, 2383, 2737, 3000, 7680, 161542, 331462, and 498550. For good convergence, the number of iterations of the correction equation is set to be 3. In Fig. 8, the abscissa is the number of buses and the ordinate is running time. It shows the change of running time with the number of buses under the three computational models.

Fig. 8
figure 8

Running time of power flow calculation for different numbers of buses using three computational models

It can be seen from Fig. 8 that, as the number of buses increases, the advantages of the proposed method emerge gradually and its performance is better than MapReduce and the stand-alone model when the number of buses is equal to or more than 2383. Meanwhile, it should be noted that when the amount of data is small the advantage of the parallel computing model with cloud computing are not obvious. Under the cluster model, after submitting the task, every node needs to establish connection with the cluster during the initialization process, at the same time, the HDFS must establish communication with Spark, and MapReduce and the cluster’s internal nodes need to establish communication with each other, which altogether takes about 10 s. In the task execution process, the master will divide the task into subtasks and assign to slaves for execution. So, because of these overheads, it can be seen that when the number of buses is 118, the running time in stand-alone model is even less than for the other two models.

In all cases, the running time of the power flow calculation using the MapReduce model is higher than stand-alone model. When there is a small amount of data, the distribution achieved with the Map task is seriously uneven, so there is a great difference in the utilization ratio of nodes and the computing capability of cluster cannot be fully exploited, while a lot of management overhead still exists, resulting in low performance. For large-scale power flow calculation, MapReduce calculation tasks are executed in multiple nodes where data are stored, which will produce temporary data and frequent read/write operations for the HDFS. A large number of disk I/O operations are required and, as a result, its performance is still poor and has no advantage compared with the sequential stand-alone model.

Comparing with traditional parallel computation models such as MPI, the advantages of proposed method are also dependent on the data scale. A comparison of running time between methods from [7, 9, 12, 33] and the proposed method of this paper is shown in Table 7. It should be noted that the single value used [7, 9, 12, 33] in Table 7 is the lowest among them. Table 7 illustrates that when the number of buses is small to moderate (less than 106200), the performance of this method is slower than the best achieved in [7, 9, 12, 33]. When the maximum number of buses is 106200, [7, 9, 12, 33] achieve a lowest running time of 49.72 s, nevertheless, the proposed method can reach a lower running time of 38.376 s in the case of 161542 buses. As the number of buses continues to increase, the performance advantages of the proposed method can be better seen, and this is determined by the structure and inherent characteristics of the cloud computing environment.

Table 7 Running time of methods proposed in this paper and in [7, 9, 12, 33] when using four processors

4.5 Experiment 2: change of time speedup based on the proposed method

Speedup refers to the ratio between two times. This experiment measures the running time ratio of the proposed method and the stand-alone power flow calculation as the number of buses changes. The result is given in Fig. 9, where the abscissa is the number of buses and the ordinate is the time speedup of the power flow calculation.

Fig. 9
figure 9

Speedup ratio for proposed method compared to a stand-alone power flow calculation

The experimental results show that the speedup ratio increases with the number of buses, which indicates that the parallel performance of the proposed method is enhanced with an increasing amount of data and calculation. Presumably, if the number of buses continues to increase of buses, there is still room for the speedup to improve.

4.6 Experiment 3: efficiency of power flow calculation with different cluster scales

Due to limited experimental conditions, The cluster built with the Spark framework has only 4 slaves and 1 master, but in a practical environment the cluster would be far larger than this. As the scale of cluster expands, the advantages will become more obvious. This experiment compares the time efficiency of power flow calculation using the proposed method in the cases of 5 nodes (4 slave nodes and one master node) and 9 nodes (8 slave nodes and one master node). The test data come from the synthesized systems with 3000, 161542, and 331462 buses. Figure 10 shows the time needed when using the 5-node cluster and the 9-node cluster for these systems. As can be seen, the time needed for the power flow calculation reduces when the number of slave nodes increases from 4 to 8, but the reduction is not proportional to the number of added nodes. This is because additional time is needed for communicating with the new nodes.

Fig. 10
figure 10

Time needed for power flow calculation when using different cluster scales

4.7 Experiment summary

Through the above three experiments, we can draw the following conclusions:

1) The time performance of the power flow calculation using the proposed method is superior to the stand-alone model and the MapReduce model, and the speedup increases with the number of buses.

2) The time performance of power flow calculation using the proposed method is improved with the expansion of the Spark cluster. However, as the cluster enlarges, the cost for managing the cluster will also increase. The running time of the proposed method won’t be reduced in proportion to the scale of cluster.

Additionally, some shortcomings about experimental process and results need to be pointed out and explained:

1) The power flow calculation has achieved better performance in stand-alone model when the number of buses is relatively small. This phenomenon confirms that cloud-based parallel computation, including the proposed method, is more appropriate for a large volume of data.

2) In this paper, we reference the experimental setups of power flow calculation in [7, 9, 12, 33], and compare with the best running time among these papers. Although the numbers of power flow calculation buses and branches are not exactly the same as in this paper, the running time of power flow calculations may be compared among similar configurations.

3) There is a big gap between the proposed method and mature commercial tools. This paper uses five cheap PCs to build an experimental cluster and deploys a virtual machine (CentOS operating system) on each node (Windows operating system) for the power flow calculation experiment. In the process of the power flow calculation, the virtual machine occupies only a small amount of CPU, memory (1 GB) and other resources available on the PCs. The overall performance of this cluster including its stability and reliability are only applicable for experimental testing, and its performance is not as good as some commercial models. Through employing a reliable and stable basic computing platform, and conducting practical improvement and optimization on it, the performance will be greatly improved. For example, a smart grid dispatching control system named D5000 developed by the NARI Group, and the solution for intelligent dispatching system in Central China Power Grid designed by Sugon corporation, have been employing traditional MPI and OpenMP respectively. Nevertheless, the performance of each system is greatly better than that reported in [7, 9, 12, 33].

In summary, in the same cloud computing experiment platform, the testing results show that this method has greatly improved the computational performance compared with MapReduce, and its running time is better than [7, 33] in the case of a large number of buses, which verifies the advantages of proposed method.

5 Conclusion

This paper proposed a parallel power flow calculation method based on RDDs, DAG optimization and the Newton-Raphson method to deal with the problem of large amounts of data transmission and low efficiency in the MapReduce model for parallel computation. Running time comparisons were performed in an experimental environment. There were some disparities between the simulation data and actual power grids, as well as between the experimental cluster configurations and the commercial tools for cluster computing. However, in similar experimental environment, the results show that the proposed method is more suitable for large-scale power systems compared with stand-alone, MapReduce and some methods proposed in the literature.

In the future, some studies about the allocation of virtual machines, task scheduling and load balancing strategy are hoped to be carried out to meet the needs of higher computing performance and analysis efficiency when applied to power flow calculation.