1 Introduction

In recent years, researches into the broad field of smart grids have been extremely active. The exciting and powerful opportunities arising from new monitoring and controlling infrastructure has given rise to many new ideas and applications. However, the smart grid has also provided many new challenges, for example, the limitation of existing networks to accommodate new distributed generators (DG) [1, 2], and the added complexity from a wide range of heterogeneous components [3]. The research focus has included optimization of networks with high DG penetration [4, 5], optimization through direct control of storage and loads [611], the application of the extensive smart grid monitoring and control devices for fault and breach management [12, 13], communication, data management and smart meters [1416], the use of grid connected vehicles as both postponable loads and potential storage devices [1722], and optimal control of smart buildings [2326].

New methods are required to solve the range of optimization problems arising from this new and evolving environment. Many traditional methods employ centralized solutions which are limited in their ability to solve some of the larger and more complex problems presented by the smart grid. In particular their scalability, especially as more DG and smart meters are rolled out, increasing the demand on data transmission infrastructure and centralized computing resources [3, 27]. In such cases the volume of data and delays in the asynchronous data transmission may adversely affect the timeliness of centralized optimization results. As such distributed approaches are often beneficial.

Distributed approaches can utilize local data by partitioning the network according to such factors as the electrical properties of the network and the forecast power flow [2830]. For example, in [31] epsilon decomposition is used to determine the range of influence of the network’s DG, which is then utilized to control voltages should they exceed operating limits. Distributed approaches also benefit from the local optimization that is independent of a central bottleneck. For example in [32] a distributed approach is taken to voltage regulation utilizing the smart grid’s set of intelligent and cooperative smart entities. A distributed game theoretic approach is taken to produce optimal day-ahead schedules for DG, storage and loads in [3336]. In [37] the optimal generation schedule for DG is evaluated through particle swarm optimization.

In spite of their benefits, many distributed approaches must make approximations in order to operate with either complete or partial independence from a central controller, leading to varying levels of sub-optimality. Purely distributed solutions then aim to solve these sub-problems without the benefit of global state information or observation of changes in global state. Design of distributed algorithms must therefore be careful that these approximations do not lead to instability due to competing controls between neighbouring regions, and must take into consideration communication delays between sensors and local controllers [38], and inaccuracies in local estimates of state information [39]. Distributed solutions can therefore benefit greatly from some form of coordination in order to improve estimations.

The following list summarizes the key benefits of distributed, coordinated control.

  1. 1)

    Local data: Utilization of local data reduces data size and improves privacy by reducing requirements for data sharing.

  2. 2)

    Local optimization: Local controllers apply local data to their optimization routines which can improve the timeliness of optimization results.

  3. 3)

    Reduced Central Burden: Computational requirements for the central controller are greatly reduced, even as the network dimension increases, since much of the burden is taken by the numerous local controllers.

In addition to these considerations, in the presence of controllable storage and postponable loads, optimization is no longer possible if only the current state is considered since actions taken in the present will affect future states and costs, resulting in the change of the original optimization problem. In this case, the optimization problem must consider the cost of operation into the future, and therefore a timely model is desirable which also considers the uncertainty introduced by the DGs, and to predict the future states based on the present state and future controls. Dynamic programming (DP) offers benefits over other methods in solving this type of problem due to its ability to reduce the problem’s computational complexity by the combination of instantaneous decision making along the state trajectory and the optimal cost-to-go function associated with each state. In the case of a stochastic optimization problem, in particular problems where the expectation of future costs is difficult or impossible to calculate, approximate dynamic programming (ADP) can be applied to estimate the future costs. In addition to its ability to handle difficult stochastic problems, ADP has the added benefit of reducing a problem’s dimensionality by summarizing the future states by a feature set.

ADP has been applied to many fields including control of the smart grid [40]. In [41] an optimal ADP algorithm is presented for the energy dispatch problem with grid-level storage, including a rigorous proof of the algorithm’s convergence. The increased observability and controllability of the smart grid is utilized to apply a dual heuristic dynamic program to solving the dynamic stochastic optimal power flow (OPF) problem in [42]. Q-learning is applied to the optimal routing of shipboard power, storing discrete values for state-action value pairs in [43]. In [44] the problem of optimising DG output and storage is tackled by balancing supply and demand at the customer level through DP. In [45] operation of a micro-grid featuring both DG, heat supply and storage is optimised through DP. In [46] DP is applied to the multi-objective problem of optimally allocating DG to an existing network.

Application of ADP by power system operators has largely focused on the economic dispatch of power [40]. A review of the economic dispatch literature since 1990 is presented in [47]. These focus on resource allocation from the generation point of view and not the distribution system point of view. When applying ADP to the distribution system it is important to consider the added complexities of the network structure. Applying ADP to a distributed smart grid problem while considering the implications of reduced and delayed global state information exchange is the focus of this paper.

In this paper we present a coordinated, distributed algorithm based on distributed, local controllers and a central coordinator for exchanging summarized global state information, with the aim of optimizing resource allocation of DG and storage, and managing deterministic loads in the smart grid, while maintaining network operating constraints and allowing for delays in data exchange. The coordinated, distributed algorithm’s objectives are to:

  1. 1)

    Reduce the problem dimensionality compared to centralized methods.

  2. 2)

    Improve local state estimation over purely distributed approaches.

  3. 3)

    Be resistant to instability from competing local controllers.

  4. 4)

    Be robust in the presence of delayed information exchange.

This paper is organized as follows. Section 2 provides the network analysis and dynamic programming framework on which the study’s algorithms are built. Then in Sect. 3 our smart grid optimization problem is formulated as a distributed Optimal Power Flow problem. In Sect. 4 our proposed solution is presented through a centrally coordinated distributed approximate dynamic program with asynchronous information exchange between local and central nodes. Finally, a case study is presented illustrating the feasibility of this approach in Sect. 5, followed by the study’s conclusion in Sect. 6.

2 Preliminaries

In this section, the relevant background is provided to power flow analysis and dynamic programming, and the approximations to their solutions on which the distributed problem and solution of the paper are built.

2.1 Approximate power flow

In preparation of the distributed problem formulation of Sect. 3.2, the approximations of power flow equations are given. Power flow analysis of a network aims to find its steady-state operation, where network state is defined as bus power and voltage and line current. Newton–Raphson power flow analysis can calculate the network state given the bus admittance matrix and bus power for all busses. However this may not be possible if only a subset of the network’s bus powers is known—such as in the case of a distributed optimization problem. In this case an approximation can be made.

From the Jacobian matrix of the Newton–Raphson power flow analysis the sensitivity matrix can be calculated:

$$\begin{aligned} {\varvec{\varLambda}} = \left[ \begin{array}{ll} \frac{\partial \delta }{\partial P} & \frac{\partial \delta }{\partial Q}\\ \frac{\partial |v|}{\partial P}&\frac{\partial |v|}{\partial Q} \end{array} \right] \end{aligned}$$

The sensitivity matrix provides a linear approximation of the relationship between changes in nodal power and voltage as follows:

$$\begin{aligned} \left[\begin{array}{l} \delta _t\\ |v_t| \end{array} \right] = \left[\begin{array}{l} \delta _{0}\\ |v_{0}| \end{array} \right] + {\varvec{\varLambda}} \left[ \begin{array}{l} \Delta P_t\\ \Delta Q_t \end{array} \right] \end{aligned}$$
(1)

where \(|v_t|\angle \delta _t\) is the complex voltage at all busses at time t, and \(\Delta P_t\) and \(\Delta Q_t\) are vectors of the change in active and reactive power at all busses since time \(t=0\). Once these approximations are made, \({\varvec{\varLambda}}\) no longer needs to be recalculated for each change in network state considered by the optimization process. This greatly reduces the burden on the distributed controller.

The period for which this approximation is appropriate will depend on the magnitude of any variations in network state. If significant changes occur in the network then the sensitivity matrix may require recalculation.

2.2 Dynamic smart grid problem: DP

A distributed approach to solving an optimization problem in the smart grid should aim to find the solution (or approximate solution) to the global problem. Here we present the global problem and dynamic programming approach that will be broken into a distributed problem in Sect. 3.1.

For an initial state \(x_0\), the optimization problem is defined as follows.

$$\begin{aligned}&\min _{u} J_u(x_0)\nonumber \\ {\text{s.t.}}&\ u=\{ u_0,u_1,\dots , u_{T-1}\}\nonumber \\&x_{t+1} = f_x(x_t, u_t, w_t)\nonumber \\&G(x_t,u_t) \le 0,\ \forall \ t \in [0,T) \end{aligned}$$
(2)

where \(J_u(\cdot )\) is the cost-to-go function to minimize; the given state sequence \(x=\{x_0,x_1,\dots ,x_T\}\) results from control sequence u; the receding prediction horizon T can be chosen such that the variance of the expected state of the system at \(t=T\) is large, for example when forecast loads and available intermittent energy supplies are uncertain.

Given the dynamic nature of this problem, we apply the principals of dynamic programming (DP). DP selects the best decisions recursively from the last step backwards based on the cost of the present decision and the expected future cost. The cost-to-go recursively is defined for a given control sequence \(u=\{u_0,u_1,\dots \}\).

$$\begin{aligned} J_u(x_t) = g(x_t, u_t) + \underset{w}{E}\left[ J_u(x_{t+1}) | x_t, u_t \right] \end{aligned}$$
(3)

where \(g(x_t, u_t)\) is the cost of applying control \(u_t\) when in state \(x_t\); the expectation term \(E\left[ \cdot \right]\) is the expected future cost.

Dynamic programming aims to minimise \(J_u\), that is, find the control sequence that solves

$$\begin{aligned} J(x_t) = \underset{u \in U_t(x_t)}{\min } \left\{ g(x_t, u) + \underset{w}{E}\left[ J(x_{t+1}) | x_t, u \right] \right\} \end{aligned}$$
(4)

where \(U_t(x_t)\) is the set of admissible controls when in state \(x_t\) and is governed by the inequality constraints \(G(x_t,u_t)\).

2.3 Dynamic smart grid solution: ADP

In preparation of the coordinated, distributed optimization approach of Sect. 4, we seek an approximation of the expectation term in (4). Since state transitions are dependant on the previous state, action and random variables, the smart grid optimization problem may present a large number of reachable states for which the expectation of the future cost-to-go must be calculated. Specifically, computational requirements will grow exponentially with respect to the time horizon T. This is known as the “curse of dimensionality”. In the complex environment of the smart grid it is therefore appropriate to make some approximations.

As such, we replace the expectation from (4) with an approximation defined as \({{\tilde{V}}_t}(x_t)\).

$$\begin{aligned} {\tilde{J}}(x_t) = \underset{u \in U_t(x_t)}{\min } \left\{ g(x_t, u) + {{\tilde{V}}_t}(x_t^{u}) \right\} \end{aligned}$$
(5)

where \(x_t^{u}\) is the post decision state at time t (i.e. the state after applying controls u but before applying the stochastic variations \(w_t\) [48]); \({{\tilde{V}}_t}(\cdot )\) is the expectation approximation. We no longer need to calculate the cost an exponentially increasing number of times, however we do need to find an appropriate approximation model for \({{\tilde{V}}_t}(x_t^u)\) and find a process of training it. In Sect. 4.2 we present a distributed ADP algorithm that trains \({{\tilde{V}}_t}(\cdot )\) and approximates \({\tilde{J}}(x_t)\).

Below we discuss some considerations when choosing training sample paths and some convergence issues.

Policy Iteration When dealing with high-dimensional problem spaces it can be difficult or impossible to evaluate all control policies that visit each state. As such a common solution to training the approximation (and the one used in this study) is to analyse a series of sample paths through Monte-Carlo simulation. Each sample path defines a control sequence \(u^{(k)} = [u_0^{(k)},\dots ,u_{T-1}^{(k)}]\) that is refined over a series of iterations (k). The sample paths can be chosen randomly forming an exploration policy. However, this approach can form a good approximation only if an appropriate representative sample set is taken from the state space. In other cases it may be possible to exploit the structure of the problem and follow an exploitation policy. If the sample paths are chosen according to a pure exploitation policy, then

$$\begin{aligned} u_t^{(k+1)} = \underset{u \in U_t^{(k)}(x_t)}{\arg \min } \left\{ g(x_t^{(k)}, u) + {{\tilde{V}}_t}(x_t^{u(k)}) \right\} \end{aligned}$$
(6)

where the choice of control at iteration \(k+1\) is chosen according to the approximation of the optimum at iteration k. While some applications such as those studied in [40] can obtain optimal results from a pure exploitation policy, it is often required that a combination of exploration and exploitation be used to search for a broad approximation and then refine it.

ADP Convergence Issues Approximate dynamic programming has been successfully applied to many applications. It is developed with an heuristic belief that if both the value function can be approximated with sufficient accuracy and optimal policies with respect to the approximated value function can be learnt, then the true optimal policy can be approximated with sufficient accuracy. Even though ADP is developed in this intuitive way, numerous proofs of both convergence and optimality have been developed for specific applications. Generally the nature of the approximation will determine the convergence and optimality of the ADP algorithm. According to [48], experimental results have shown the importance of the approximation’s form being capable of capturing the true value function and new samples being able to improve the estimate of not only the sample state but also a large number of other states. In [49] a number of convergence results are reviewed for various continuous function approximations and in [40, 41] the concavity of resource allocations is exploited to form a convergent algorithm.

3 Distributed smart grid problem

Here we formulate the distributed smart grid optimization problem as a distributed dynamic OPF problem. To this end the distributed dynamic smart grid problem and distributed approximate power flow is presented. This section then concludes with the calculation of voltage estimation errors as a measure of the limitation of a distributed approach.

3.1 Distributed dynamic OPF

We consider a distribution network with sensitivities \(\Lambda\), and featuring controllable DG and storage. The goal of dynamic OPF is to minimize costs \(\sum _t g(x_t,u_t)\) over a time window [0, T], by changing the control sequence \(\{u_t\}\), subject to state transition \(x_{t+1} = f_x(x_t, u_t, w_t)\). We therefore define the cost-to-go according to (3) for control sequence \(u=\{u_t\}\) recursively as

$$\begin{aligned} J_u(x_t) = g(x_t, u_t) + E_{w_t}\left[ J_u(x_{t+1}) | x_t, u_t \right] \end{aligned}$$
(7)

where \(J_u(x_t)\) represents the cost of network operation and power generation and, in the case of a deregulated competitive market, includes power import from third parties, and may also include bias towards renewable and distributed generation. Sequence \(\{x_t\}\) and \(\{u_t\}\) define the real and reactive power output and consumption of DG, loads, storage and smart devices, and storage capacities.

The vector of bus powers corresponding to generator busses is defined as \({\varvec{S}}_{\text{DG}}\) and is constrained by minimum and maximum complex The vector of bus powers corresponding to generator busses is defined as \({\varvec{S}}_{\text{DG}}\) and is constrained by minimum and maximum complex magnitudes

$$\begin{aligned} {\varvec{S}}_{\text{DG}}^- \le |{\varvec{S}}_{\text{DG}}| \le {\varvec{S}}_{\text{DG}}^+. \end{aligned}$$
(8)

Similarly, bus powers corresponding to storage are defined as \({\varvec{S}}_{\text{S}}\) and are constrained by

$$\begin{aligned} {\varvec{S}}_{\text{S}}^- \le |{\varvec{S}}_{\text{S}}| \le {\varvec{S}}_{\text{S}}^+. \end{aligned}$$
(9)

The control vector is then defined as \({\varvec{u}}_t=[{\varvec{S}}_{{\text{DG}},t}\ {\varvec{S}}_{{\text{S}},t}]\). The vector of bus powers corresponding to load busses is defined as \({\varvec{S}}_{\text{L}}\), and the vector of storage capacities is defined as \({\varvec{q}}\) and is subject to constraints

$$\begin{aligned} 0 \le {\varvec{q}} \le {\varvec{q}}^+. \end{aligned}$$
(10)

The control vector is then defined as \({\varvec{x}}_t=[{\varvec{S}}_{{\text{L}},t}\ {\varvec{q}}_t]\). Finally we define the noise vector as \({\varvec{w}}_t=[\Delta {\varvec{S}}_{{\text{L}},t}\ \Delta {\varvec{S}}_{{\text{DG}},t}]\), where \(\Delta {\varvec{S}}_{{\text{L}},t}\) is a random variation in load power, and \(\Delta {\varvec{S}}_{{\mathrm{DG}},t}\) is a random variation in DG output.

The network must be operated within the regulatory voltage limits specified by

$$\begin{aligned} \delta ^-&\le {\tilde{\delta}}_t \le \delta ^+,\nonumber \\ |v_t^-|&\le | {\tilde{v}}_t|_t \le |v_t^+|, \end{aligned}$$
(11)

where the voltage approximations \([{\tilde{\delta}}_t\ |{\tilde{v}}_t|]^{\mathrm{T}}\) in (13) have be used. Constraints (8), (9), (10) and (11) together form the inequality constraints \(G(x_t,u_t)\).

To present to the distributed dynamic OPF problem we assume that costs, controls and state are separable and can therefore be calculated by local controllers. Then given the subset of network busses B with a strong coupling to the local controller the problem is formally presented as follows:

$$\begin{aligned}&\underset{u_B}{\min }\,J_{u_B}(x_{B,0})\nonumber \\ {\text{s.t.}} \quad &x_{B,t+1} = f_x(x_{B,t}, u_{B,t}, w_{B,t}),\nonumber \\&G_B(x_{B,t},u_{B,t}) \le 0,\ \forall \ t \in [0,T) \end{aligned}$$
(12)

where \(u_B=\{ u_{B,0},u_{B,1},\dots ,u_{B,T} \}\), \(u_{B,t}\in u_t\), \(x_{b,t}\in x_t\) and \(w_{B,t}\in w_t\). An illustration of a network’s subset structure is given in Fig. 1. The solution to this distributed optimal power flow problem is defined in Sect. 4 where approximate dynamic programming is applied.

3.2 Distributed power flow

We denote the subset of busses known to the local controller as B, and all other busses in the network as \(\sim B\). We can then say that changes in state in the busses of \(\sim B\) will impact the state in the busses of B leading to estimate inaccuracies. We quantify this by the through the following linear approximation of (1).

$$\begin{aligned} {\left[\begin{array}{l} {\tilde{\delta}}_t\\ | {\tilde{v}}_t| \end{array}\right]}_B&= {\left[ \begin{array}{l} \delta _{0}\\ |v_{0}| \end{array}\right]}_B + {\varvec{\varLambda}}_{B,B} {\left[ \begin{array}{l} \Delta P_t\\ \Delta Q_t \end{array}\right]}_B + \Delta {\varvec{v}}_{B,\sim B}\nonumber \\ \Delta {\varvec{v}}_{B,\sim B}&= {\varvec{\varLambda}}_{B,\sim B} {\left[ \begin{array}{l} \Delta P_t\\ \Delta Q_t \end{array}\right]}_{\sim B} \end{aligned}$$
(13)

where \([{\tilde{\delta}}_t\ | {\tilde{v}}_t|]^{\mathrm{T}}_B\) are the approximate voltages in B, \({\varvec{\varLambda}}_{B,B}\) are the self-sensitivities within B, \({\varvec{\varLambda}}_{B,\sim B}\) are the sensitivities of busses in B with respect to external changes, \([\Delta P_t \, \Delta Q_t]^{\mathrm{T}}_B\) are the changes in power since time \(t=0\) at busses in B, and \([\Delta P_t \Delta Q_t]^{\mathrm{T}}_{\sim B}\) are the changes in active and reactive power since time \(t=0\) at all busses in \(\sim B\).

The advantage of a distributed approach can be seen in (13). The subset of global state that is weakly coupled to the local controller is reduced to a single value, \(\Delta v_{B,\sim B}\), which can be approximated as constant for the duration of the local controller’s optimization. In Sect. 4.1 we present an algorithm based on \(\epsilon\) decomposition to define the strongly coupled subset B based on the value of \(\Delta {\varvec{v}}_{B,\sim B}\).

3.3 Distributed voltage approximation Error

The Sensitivity matrix (\({\varvec{\varLambda}}_{t}\)) is time variant and we are approximating its value as constant as at time \(t=0\). We denote the error introduced at time t as \(e_t^{{\varvec{\varLambda}}}\), which is dependant on the size of \([\Delta P_t\ \Delta Q_t]^{\mathrm{T}}\).

An error is also introduced due to the approximation of local voltages given in (13) due to a lack of real-time global state information. This error is quantified as the difference between the true change in voltage and the approximation given in (13):

$$\begin{aligned} e_t^{v}&= {\left[ \begin{array}{l} \delta _t\\ |v_t| \end{array} \right]}_B - {\left[ \begin{array}{l} {\tilde{\delta}}_t\\ | {\tilde{v}}_t| \end{array}\right]}_B\nonumber \\&= {\varvec{\varLambda}}_{B,\sim B} {\left[ \begin{array}{l} \Delta P_t\\ \Delta Q_t \end{array}\right]}_{\sim B} - \Delta {\tilde{\varvec{v}}}_{B,\sim B} \end{aligned}$$
(14)

where \(\Delta {\tilde{v}}_{B,\sim B}\) is the last known value of the change in voltages in B due to the network state external to B. Under normal operating conditions the sensitivities change slowly, justifying the linear approximation of (13). As such typically \(e_t^{\varvec{\varLambda}} \ll e_t^v\), and so we concentrate on reducing \(e_t^v\). This error represents a limitation to the distributed approach. As such the error is reduced through the iterative process between the global and local controllers described in Sect. 4.3.

While \(e_t^{\varvec{\varLambda}}\) may be acceptably small while the changes in injected power vary minimally, however if the network state changes significantly then the central coordinator can recalculate and redistribute relevant portions of the sensitivity matrix. This process is further discussed in Sect. 4.3.

Fig. 1
figure 1

Distributed network structure

4 Proposed coordinated distributed solution

A solution to the distributed problem presented in Sect. 3.1 is now offered as a coordinated, distributed, iteratively refined approximate dynamic program. To achieve this, the global problem must first be reduced to a set of distributed problems. This is achieved through a power flow based \(\epsilon\) decomposition. The distributed problem is then solved through an ADP algorithm whose approximation of state and optimal control is refined through the introduction of a central coordinator.

4.1 Power flow based \(\epsilon\) Decomposition

The following algorithm’s objective is to define a set of strongly coupled busses, B, while minimizing voltage estimation errors at the controlled bus, \(b\in B\), of a local controller. For the purpose of this study we assume that each controllable device in the smart grid has a local controller at its bus, which we designate as its controlled bus. The size of B is constrained such that both the communication and computation burdens at the local controller are reduced. This is achieved through observation of both the sensitivity matrix and forecast shifts in power and is based on \(\epsilon\) decomposition (see [31] for an example of \(\epsilon\) decomposition).

To minimize the impact of external state changes in the distributed power flow calculation of (13) and therefore reduce the error of (14) and improve state estimation, we must aim to minimize \(\Delta v_{B,\sim B}\). As such we apply \(\epsilon\) decomposition to the change in voltage at the controlled bus:

$$\begin{aligned} \Delta v_b = {\varvec{\varLambda}}_b \left[ \begin{array}{l} \Delta P\\ \Delta Q \end{array} \right] = {\varvec{\varLambda}}_{b,B} {\left[ \begin{array}{l} \Delta P\\ \Delta Q \end{array}\right]}_B + \epsilon R \end{aligned}$$
(15)

where R is a residual vector with all values less than 1, and \(\epsilon\) is a scalar that quantifies the level of decoupling of subset B. We refer to the value of B that minimizes \(\epsilon R\) as the \(\epsilon\)-tolerant subset.

Clearly \(\epsilon R = \Delta v_{b,\sim B}\) from (13) and must be minimized across the time window of the optimization in order to find the best subset B. To this end we define the largest likely shift in active and reactive power from forecast data up to time T to be

$$\begin{aligned} \left[ \begin{array}{l} \Delta P_\mathrm{max}\\ \Delta Q_\mathrm{max} \end{array}\right] = \arg \max \left\{ \left\| \begin{array}{l} \Delta P_t\\ \Delta Q_t \end{array} \right\| , t\in [0,T) \right\} \end{aligned}$$
(16)

where \(\Vert \cdot \Vert\) denotes the vector’s norm. Then the \(\epsilon\) decomposition can be performed as a constrained minimization of \(\Delta v_{B,\sim B}\):

$$\begin{aligned}&B = \underset{B \subset [1,n]}{\arg \min } \left\| {\varvec{\varLambda}}_{b,\sim B} {\left[ \begin{array}{l} \Delta P_\mathrm{max}\\ \Delta Q_\mathrm{max} \end{array}\right]}_{\sim B} \right\| \nonumber \\&C_B \le C_\mathrm{(max)} \end{aligned}$$
(17)

for an n bus network, where \(C_B\) is the number of controllable units in B, and \(C_{(max)}\) is the maximum number of controllable units allowed for any local subset.

On a practical note, this minimization can be achieved with relative ease if we define the product of the sensitivity matrix and changes in power as an ordered sum. That is

$$\begin{aligned} {\left[\begin{array}{l} \Delta \delta \\ \Delta |v| \end{array}\right]}_b = {\varvec{\varLambda}}_b \left[ \begin{array}{l} \Delta P_\mathrm{max}\\ \Delta Q_\mathrm{max} \end{array} \right] = \left[ \begin{array}{l} \sum \limits _{i=1}^{n} \left( \frac{\partial \delta _b}{\partial P_i}\Delta P_i + \frac{\partial \delta _b}{\partial Q_i}\Delta Q_i \right) \\ \sum \limits _{i=1}^{n} \left( \frac{\partial |v_b|}{\partial P_i}\Delta P_i + \frac{\partial |v_b|}{\partial Q_i}\Delta Q_i \right) \end{array}\right] \end{aligned}$$
(18)

where \(P_i\) and \(Q_i\) are the elements of \([P_{\max }\ Q_{\max }]^{\mathrm{T}}\). We can then take the \(C_{max}\) most significant elements of the sum as our B and thereby the remaining summands make up the minimal \(\Vert {\varvec{\varLambda}}_{b,\sim B}[\Delta P_{\max }\ \Delta Q_{\max }]^{\mathrm{T}}_{\sim B}\Vert\).

Through this process the size of \(\Delta v_{B,\sim B}\) is reduced and therefore the likely local impact of changes external to the local controller are also reduced. The value of \(\Delta v_{B,\sim B}\) is approximated as constant and further refined through information updates as described in Sect. 4.3.

Remark

The optimality of (12) is dependent on the error in state, which is defined by \(e^v\), from (14). \(\epsilon\) from (15) impacts the size of \(\Delta v_{B,\sim B}\) and therefore the size \(e^v\), and \(e^v\) determines the error in state since voltage \(v \subset x\). Consequently \(\epsilon\) will indicate the deviation from optimality in (12). Moreover, if \(e^v\) can be reduced, the approximation of optimality may also be improved.

4.2 Distributed optimization through ADP with partial state information

To solve the problem of (12) we must first be able to calculate the cost-to-go from (3), which involves a difficult to calculate expectation term. As such the expectation is replaced with an approximation defined as \({{\tilde{V}}_t}(x_t)\) and \(J_u(\cdot )\) is approximated as follows:

$$\begin{aligned} {\tilde{J}}_u(x_t) = g(x_t, u) + \gamma _t {{\tilde{V}}_t}(x_t^{u}), \end{aligned}$$
(19)

where \(x_t^{u}\) is the post decision state, and \({{\tilde{V}}_t}(\cdot )\) is the approximation of expected future costs. Assuming the estimator \({{\tilde{V}}_t}(\cdot )\) is available then the difficulty in applying (19) to solving (12) is only due to the dimensionality of u which has been reduced through the process described in Sect. 4.1.

Training of \({{\tilde{V}}_t}(\cdot )\) is performed according to the iterations of algorithm 1 by the local controller with controlled bus b, and \(\epsilon\) tolerant subset B. The local controller of Fig. 2 provides a simplified view of the process. In Fig. 2, \({\varvec{\varLambda}}_{B,B}\) is the local sensitivities; \(Y_B\) is the local admittances; \(\Delta {\tilde{v}}_{B,\sim B}\) is the external influences on voltage; \(v_{B,0}\) is the local voltages at time \(t=0\); k is the local iteration counter from 1 to K; \(U_B\) is local admissible controls; \(u_B^{(k)}\) is the sample local control path; \(u_b^{(K)}\) is the approximation of optimal control at b. Analysis of the presented algorithm reveals that the complexity of the ADP training is independent of total network size. To see this, consider the three most significant steps: The minimization of (20), the next state calculation of 2.5, and the sample calculations of 3.1. Assuming a quadratic cost function gives complexity of \(g_B(\cdot )\) as \(O(|B|^2)\), where \(|B|=\max \{|x|,|u|\}\) is the dimensionality of the local network subset. Given k samples at iteration k we assume that the complexity of the estimator is O(k|B|). Then the complexity of each minimization step is \(O(|B|^2)+O(k|B|)\), and the number of steps required is assumed to depend only on |B|. The next state function \(f_x(\cdot )\) is assumed linear and therefore has complexity O(|B|). Finally the sample calculations depend only on \(g_B(\cdot )\) and therefore have complexity \(O(|B|^2)\). The complexity of the algorithm therefore depends on the horizon T, iteration limit K, and network subset size |B| which depends on the choice of \(C_{\max }\) in (17), and does not depend on the total network size.

Remark

The independence of algorithm 1 from the total network size allows the algorithm to be scaled to large networks while computational requirements can be tuned through parameters T, K, and \(C_{\max }\).

Remark

At Step 2.2, in algorithm 1, \(\rho ^{(k)}\) is close to 0 for small values of k and close to 1 for large values of k. The choice of \(\rho\) will determine the rate of convergence, that is, how much the policy will explore the state-space before exploiting knowledge from the previous sample paths.

figure a
Fig. 2
figure 2

Central iterations

4.3 Refining distributed estimates through central coordination

The solution from the algorithm of Sect. 4.2 relies on the approximation of voltage, based on \(\Delta {\tilde{v}}_{B,\sim B}\). To improve this estimate, updated values of \(\Delta {\tilde{v}}_{B,\sim B}\) are sent to the local controllers over a series of iterations by a central coordinator. The central coordinator is responsible for improving the local controllers’ state estimation in order to reduce the likelihood of constraint breaches and to bring their solutions closer to optimal.

Such information exchange must be done with great care since it introduces a feedback loop in the global system. Specifically, oscillations may result from adjacent local controllers adjusting their controls in response to each other. To mitigate against this problem the controls are aggregated to form \(u_t=\{ u_{b,t} | \forall b \}\), for all controlled busses b, and are dampened by the introduction of a control step size \(\alpha \in (0,1]\) in the following iterative stochastic approximation:

$$\begin{aligned} \hat{u}_t^{(j)} = \alpha u_t + (1 - \alpha )\hat{u}_t^{(j-1)} \end{aligned}$$
(21)

where \(\alpha\) is referred to as the step size since it dictates how far we update our j th approximation of the optimal control, \(\hat{u}^{(j)}\), in the direction of the new control policy, \(u_t^{(j)}\).

Algorithm 2 describes the process for exchanging updated external voltage approximations with the local controllers, utilizing the dampened control values specified by (21) (the central coordinator of Fig. 2 provides a simplified view of the process). The algorithm can handle delayed information exchange by simply assigning \([P_t^{(j)}\ Q_t^{(j)}]_B:=[P_t^{(j-1)}\ Q_t^{(j-1)}]_B\), at Step 4, when new information is not available at central iteration (j) from subset B. This will have the effect of slowing down convergence, but so long as new information is received regularly the convergence argument of Sect. 4.4 holds.

At each iteration of Algorithm 2 the network state is assessed (refer to Step 2) and if required the local approximations \({\varvec{\varLambda}}_{B,B}\) are updated (see Sect. 3.3 for a discussion of the error due to the constant \({\varvec{\varLambda}}\) approximation). This update to the sensitivity matrix is performed according to the aggregated network controls defined by (22) for the present time. The relevant portions of the sensitivity matrix, \({\varvec{\varLambda}}_{B,B}\), are then distributed to the local controllers who use the updated matrix for subsequent calculations. In this way the linear approximation of power flow through time invariant \({\varvec{\varLambda}}\) can be adapted to the state and model drifting.

Remark

The control variables of algorithm 1 are continuous with respect to time, as such the algorithm approximates optimal control as constant for any given time step. However, each central iteration of Algorithm 2 will trigger optimal values to be updated by the local controller and so the control update rate is dependent only on the central coordinator’s update rate. This brief period allows for regular corrections to the optimal control in response to system state changes which are assumed minimal within the update period.

Remark

At Step 5.3, in Algorithm 2, \(\Delta v_{B,\sim B}\) summarizes the state of the loosely coupled network busses with respect to local controller B and is therefore able to reduce the required information exchange between the central coordinator and local controllers.

figure b

4.4 Convergence of dampened information exchange

Here we provide an heuristic explanation of the convergence resulting from the appropriate selection of the step size \(\alpha\). Consider a network under steady state operation that experiences a change in controls by local controller B. Let us define the change in controls at B at iteration j as

$$\begin{aligned} \Delta u^{(j)}_B = u_B^{(j)} - u_B^{(j-1)} \end{aligned}$$
(24)

and the impact of this change on the voltage of the remaining busses in the network as

$$\begin{aligned} \Delta v_{\sim B,B}^{(j)} = {\varvec{\varLambda}}_{\sim B,B} \left[ \begin{array}{l} \Delta P^{(j)}\\ \Delta Q^{(j)} \end{array}\right]_B \end{aligned}$$
(25)

where \([\Delta P^{(j)}\ \Delta Q^{(j)}]_B^{\mathrm{T}} \in u^{(j)}_B\). From (25) we can see that the changes in voltage at external busses has a linear relationship with the change in power resulting from the change in control at B. As such we assume

$$\begin{aligned} \left| \Delta v_{\sim B,B}^{(j-1)}\right| < \theta \left| \Delta u_{B}^{(j-1)}\right| \quad \forall \ j > 1 \end{aligned}$$
(26)

for constant \(\theta \in (0,\infty )\). We further assume, based on (13), that the change in controls \(\Delta u_{\sim B,B}^{(j)}\) in response to the change in voltage, \(\Delta v^{(j-1)}_{\sim B,B}\), is also linearly constrained. As such we assume

$$\begin{aligned} \left| \Delta u_{\sim B,B}^{(j)}\right| < \phi \left| \Delta v_{\sim B,B}^{(j-1)}\right| \quad \forall \ j > 1 \end{aligned}$$
(27)

for constant \(\phi \in (0,\infty )\). Finally we assume that corresponding constraints exist for changes in control external to B influencing the voltage and control at B such that

$$\begin{aligned} \left| \Delta v_{B,\sim B}^{(j)}\right| < \theta \left| \Delta u_{\sim B,B}^{(j)}\right| \quad \forall \ j > 0 \end{aligned}$$
(28)

and

$$\begin{aligned} \left| \Delta u_{B,\sim B}^{(j+1)}\right| < \phi \left| \Delta v_{B,\sim B}^{(j)}\right| \quad \forall \ j > 0 \end{aligned}$$
(29)

The constant \(\theta\) represents the limit of the network’s response to a local change in control, and \(\phi\) the limit of the local control adjustment to a local change in voltage. As such, for assumptions (26) to (29) to hold we must assume that the variation of \({\varvec{\varLambda}}\) is bounded for all \(j>0\) since constants \(\theta\) and \(\phi\) are dependant on \({\varvec{\varLambda}}\) which is in fact a function of voltage according to the partial derivatives of the power flow equations. This assumption is reasonable while the network operates within voltage constraints according to (12).

Given that the true changes in control are \(\alpha \Delta u\), and given assumptions (26) to (29), then

$$\begin{aligned} \left| \Delta u_{B,\sim B}^{(j+1)}\right|&< \phi \left| \Delta v_{B,\sim B}^{(j)}\right| \nonumber \\&< \alpha \phi \theta \left| \Delta u_{\sim B,B}^{(j)}\right| \\&< \alpha \phi ^2\theta \left| \Delta v_{\sim B,B}^{(j-1)}\right| \nonumber \\&< \alpha ^2\phi ^2\theta ^2\left| \Delta u_{B}^{(j-1)}\right| \nonumber \end{aligned}$$
(30)

Given that \(\Delta u\) is in fact a random variable further assumptions must be placed on \(\alpha\). We assume that \(\alpha\) is non-negative, \(\sum _{t=0}^\infty \alpha =\infty\) and \(\sum _{t=0}^\infty \alpha ^2<\infty\). Then, given a choice of \(\alpha\) that satisfies \(\alpha ^2\phi ^2\theta ^2 < 1\), it follows that as \(j \rightarrow \infty\), \(u^{(j+1)} - u^{(j)} \rightarrow 0\) and the network will again return to steady state operation.

5 Case study

We consider the case of an operator controlling DG and storage in a distribution network with the aim of minimizing power import into the network from third party suppliers. Tests were conducted on networks with a range of sizes, and optimization was achieved through control of both DG and storage, and for the sake of a simpler presentation only the constraints of (8)–(10) and the voltage magnitude constraints of (11) were applied. Formally, we aimed to approximately solve (2) with \(J(\cdot )\) approximated by (19), and

$$\begin{aligned} g(x_t,u_t)=|S_{0,t}|{{{\mathrm{sgn}}}}({{\mathrm{Re}}}\{S_{0,t}\}), \end{aligned}$$
(31)

where \(S_{0,t}\) is the complex power at time t and at bus 0, with the slack bus assumed to be at index 0 with respect to voltage and power vectors \(v_t\) and \(S_t\), and admittance matrix Y. The future cost-to-go approximation defined as \({\tilde{V}}_t(x_t)\) in (19) was implemented through Kernel Regression applied with a Gaussian Kernel [50].

5.1 Scenarios

Coordinated, distributed optimization was applied to both a small scale network and a series of randomly generated networks of varying sizes. Experiments on the small scale network were aimed at verifying the coordinated, distributed algorithm’s ability to perform comparably with centralized approaches in terms of optimality and state estimation, and to assess the algorithm’s convergence with information exchange delays. The larger network experiments aimed to assess the coordinated, distributed algorithm’s scalability in terms of convergence and processing time.

The small scale tests were performed on a network based on the IEEE 13 node test feeder network [51] featuring both DG and storage. Distributed generators were connected to busses 611, 645, 646, 675, 680 and 684, with total maximum output potential greater than the network’s total peak load. Storage was connected to Busses 632, 645, 671 and 684 each with a 1 MWh capacity.

The large scale tests used randomly generated networks, featuring similar operating conditions to the IEEE 13 node test feeder network, featuring DG, storage and stochastic loads at similar densities and capacities.

Forecast DG output and demand curves used by all scenarios are presented in Fig. 3.

Fig. 3
figure 3

12 Hour forecast

5.2 Localization

Applying the \(\epsilon\) decomposition of Sect. 4.1 resulted in the local controller \(\epsilon\)-tolerant subsets as described in Table 1.

Table 1 Network \(\epsilon\) decomposition

For the problem specified by (31) to be solved by each subset, the local version of the cost function must first be defined according to the distributed OPF problem of (12) and the distributed ADP problem of (20). The local cost contribution is derived by calculating the changes in power imported into the distribution network due to changes in the busses of subset B. The total power import can be given as

$$\begin{aligned} S^*_{0,t}&=v^*_{0,t}Y_0v_t,\nonumber \\&=v^*_{0,t}Y_0(v_{0}+\Delta v_t) \end{aligned}$$
(32)

where \(Y_0\) is the admittance matrix row corresponding to the slack bus, and \(\Delta v_t\) is derived from \({\varvec{\varLambda}} [\Delta P_t\ \Delta Q_t]^{\mathrm{T}}\) with \([P_t\ Q_t]\subset (u_t \cup w_t)\). Given that many terms in (32) are constant and assuming the slack voltage is 1 p.u., the minimization can be given as

$$\begin{aligned} \underset{u_t}{\min } |S_{0,t}|=\underset{u_t}{\min } |Y_0\Delta v_t| \end{aligned}$$
(33)

This can then be applied to the local changes in subset B to give the local cost function:

$$\begin{aligned} g_B(x_t,u_t)=|Y_0\Delta v_{\cdot ,B,t}|{{\mathrm{sgn}}}({{\mathrm{Re}}}\{Y_0\Delta v_{\cdot ,B,t}\}) \end{aligned}$$
(34)

where \(\Delta v_{\cdot ,B,t}\) are the changes in complex voltage due only to changes in control in subset B.

5.3 Results

Here we present the results of the simulations. The following demonstrations illustrate the coordinated, distributed optimization algorithm’s ability to maintain costs compared to a centralized approach, to maintain voltages without full network state information, to be stable under delayed information exchange, and to maintain performance with increasing numbers of local controllers. Numerous executions of the simulation were performed to ensure that the results presented here are a representative set for the average case.

Centralized, distributed and coordinated cost comparison

the scenario was deliberately selected such that a centralized comparison could be made. Here we present the minimized costs according to four optimization approaches:

  1. 1)

    A deterministic dynamic program using expected values for random variables,

  2. 2)

    The ADP algorithm of Sect. 4.2 applied in a centralized manner to the entire network,

  3. 3)

    The coordinated, distributed algorithm,

  4. 4)

    The average from a series of random control sequences used for relative comparison.

The coordinated, distributed results have been taken after a number of iterations once the algorithm has stabilized. The results depicted in Fig. 4 show that although there has been a drastic reduction in state information (refer to Table 1), the coordinated, distributed algorithm is able to provide a good approximation of the optimal solution.

Fig. 4
figure 4

Cost comparison of optimization methods

Voltages

as discussed in Sect. 3.3, due to local controllers possessing only a subset of the full network’s state, voltage calculations are approximate only. This raises the possibility of underestimating voltages and subsequently approximating optimal controls that lead to voltage breaches according to \(G(x_t,u_t)\) in Sect. 3.3. Here we demonstrate the ability of the central coordination to reduce the chance of such breaches. Fig. 5 shows the maximum network voltages at each iteration for times \(t=0,1,2,3,4\) (other times did not exhibit voltages breaches for any iteration). At iteration 1, when local controllers have no global state information, locally optimal controls results in voltage breaches. At iteration 2, after dampened information has been shared, each local controller overreacts, drastically reducing the voltage. Subsequent iterations result in a stabilization of the voltages within the voltage magnitude constraints of (11).

Fig. 5
figure 5

Maximum voltage at each iteration

Information exchange delays

we simulate the case of delayed data transfer between local controllers and the central coordinator by associating a probability, \(p_u\), with each local controller which determines if the updated local state information is made available. For example a probability of \(p_u=0.5\) represents the case where, on average, each local controller has updated data available only every second central iteration. The coordinated, distributed optimization was performed for varying values for the data transfer probability and is presented in Fig. 6. The information delay behaves as a type of dampening. For minor delays, such as where there is only a \(20\%\) chance of information delay (\(p_u=0.8\)), the controls and therefore cost converges quickly. As the value of \(p_u\) decreases the system takes longer to converge. However, it is clear that even when updates are received from local controllers only 20 % of the time, the algorithm is still able to converge. Another side effect of the dampening effect of the delay is that optimization with delayed information may be less prone to overshoot. For example, if the case of no delay is compared with the case of \(p_u=0.8\), then it can be seen that the delayed case has less overshoot and in fact converges more quickly.

Fig. 6
figure 6

Cost convergence with information exchange delays

Dampened updates to illustrate

the importance of dampening the selected controls between central iterations in (21), the centrally calculated costs are compared over iterations with no dampening and dampened with a step size of \(\alpha =0.8\). Results can be seen in Fig. 7. The issues discussed in Sect. 4.4 can clearly be seen, with the undampened case exhibiting oscillations. In addition to the oscillation in control and cost in the undampened case, the voltage estimates are unable to stabilize and so they switch between over- and under-estimating. This results in breaches on every second central iteration. On the other hand, the dampened control case stabilizes quickly.

Fig. 7
figure 7

Effects of dampening controls of cost

Scalability

in order to test the scalability of the coordinated, distributed algorithm, it was applied to a range of randomly generated networks of varying sizes. The simulations were performed on a quad core Pentium i5 with 16 GB of RAM running Windows 7. Table 2 lists the algorithm’s processes and average execution times. The \(\epsilon\) decomposition has the longest processing time, however it is not performed frequently.

Table 2 Process Times

Figure 8 presents the processing time required for ADP training and optimization by the local controller, and for voltage change updates made by the central coordinator. The timing samples were taken across 20 central iterations and give the average time taken for each task per iteration. Voltage change updates show that there is a quadratic increase in processing time as the number of local controllers increases. This is due to the calculation of (23). However, the time taken for these updates is significantly shorter than the time taken for ADP training and optimization.

ADP training and optimization results in Fig. 8 show that the local optimizations exhibit constant time processing regardless of the number of local controllers. These results are consistent with the complexity analysis performed in Sect. 4.2, which showed that, since there is a bound on the number of controllable devices within each local controller’s network subset, network size does not impact the processing requirements of the ADP training and optimization algorithm as it would in the centralized case.

Fig. 8
figure 8

Process duration for a single central iteration

The series of test cases presented in Fig. 8 were also assessed for convergence. A subset of results were selected for time \(t=0\) and are presented in Fig. 9. The cost-to-go curves show good convergence after fewer than 20 central iterations for the range of number of local controllers tested, and the presented cases are representative of all experimental results.

Fig. 9
figure 9

Cost-to-go convergence for networks with n local controllers

These results suggest that the coordinated, distributed algorithm can easily handle many local controllers. This is an important feature of the algorithm when considering the case of a more powerful coordinating server and numerous low powered local controllers.

6 Conclusions

We have presented a coordinated, distributed, constrained optimization algorithm for regulating smart grid technologies that utilizes the well established methods of approximate dynamic programming and optimal power flow. Our algorithm carefully summarizes global state information through \(\epsilon\) decomposition such that local controllers can improve their approximation of optimal control without being overburdened by the high-dimensional state of the entire distribution network. Additionally, the reduced state information is updated over a series of iterations controlled by a central coordinator, providing local controllers with continually improved estimations and allowing for asynchronous global information exchange.

The proposed coordinated, distributed algorithm features reduced dimensionality reducing calculation complexity and as such can be applied to on-line optimization, even in the case of low powered distributed controllers. Complexity analysis of the local optimization algorithm has shown that it is independent of total network size, and as such the proposed distributed optimization approach is scalable to large networks. The centralized nature of the algorithm’s coordination allows it to operate in an asynchronous manner making it robust to communication delays, and the flexibility of the algorithm allows it to be adapted to the costs and constraints specific to the needs of the smart grid operator.

Through our case study simulation we have demonstrated how the use of a subset of network information can lead to an approximately optimal solution. We have also demonstrated the coordinated, distributed algorithm’s ability to improve local state estimation of voltages and to perform well with respect to cost minimisation when compared to centralized solutions. Also, we have shown that even in the presence of asynchronous information exchange, the coordinated, distributed approach can converge to a near optimal solution. Finally, we have demonstrated the algorithm’s ability to scale well with respect to the number of local controllers.