Abstract
The high penetration and uncertainty of distributed energies force the upgrade of volt-var control (VVC) to smooth the voltage and var fluctuations faster. Traditional mathematical or heuristic algorithms are increasingly incompetent for this task because of the slow online calculation speed. Deep reinforcement learning (DRL) has recently been recognized as an effective alternative as it transfers the computational pressure to the off-line training and the online calculation timescale reaches milliseconds. However, its slow offline training speed still limits its application to VVC. To overcome this issue, this paper proposes a simplified DRL method that simplifies and improves the training operations in DRL, avoiding invalid explorations and slow reward calculation speed. Given the problem that the DRL network parameters of original topology are not applicable to the other new topologies, side-tuning transfer learning (TL) is introduced to reduce the number of parameters needed to be updated in the TL process. Test results based on IEEE 30-bus and 118-bus systems prove the correctness and rapidity of the proposed method, as well as their strong applicability for large-scale control variables.
THE proportions of wind, photovoltaic, and other distributed energies in the power system have dramatically increased in recent years. Due to the characteristics of random output as well as high-density injection, it often makes the voltage and var of local grid vary widely in a short time, e.g., the average voltage fluctuation of the 220 kV bus for a wind farm within 10 s could reach 6 kV and the maximum value within 2 s could be beyond 5 kV. These rapid fluctuation problems undoubtedly spawn the need to upgrade the volt-var control (VVC) [
VVC is essentially a mixed-integer nonlinear optimization problem that coordinates discrete reactive power regulation equipment (such as capacitors, transformer taps) and continuous equipment (such as static var compensator (SVC), static var generator (SVG), reactive power of generators) to achieve global optimal operation of power system [
To realize real-time response to voltage and var fluctuations, many scholars have introduced DRL, which has been applied in robot control, autopilot, and other complex control fields, into VVC recently [
DRL algorithms mainly include two categories, i.e., value-based and policy-based ones. The “actor-critic” type essentially belongs to the policy-based DRL algorithms, which can realize the direct mapping of state to action by establishing actor network. Meanwhile, it also absorbs the advantages of value-based DRL algorithms which evaluates action value by establishing critic network, bringing the single-step update to replace the iteration update used in the early policy-based algorithms, which greatly improves the training efficiency. Therefore, the existing research literature on the application of DRL to VVC mainly adopts the “actor-critic” type DRL algorithms, like deep deterministic policy gradient (DDPG) [
However, the “actor-critic” type also brings certain defects while combining the advantages of the above two types of DRL algorithms [
In addition, in the existing research on applying DRL to VVC, the research objects are only for the power system with fixed topology. When the topology changes, the actor and critic networks trained for the original topology are no longer applicable. However, the system topology changes frequently due to equipment failures, load transfer, and routine maintenance in actual operation. If the network parameters suitable for new topology are only obtained from repeating all the training operations of DRL, the timeliness of actual DRL application to VVC will be significantly reduced.
To overcome the above shortcomings of DRL applied to VVC, this paper proposes a simplified DRL-based VVC of topologically variable power system. Compared with existing literature, the main contributions of this paper are as follows.
1) Simplification of critic network training. The Agent and Environment (power system) are set to interact only once in each iteration to set the reward function as the action value of the agent directly. Then, the critic network training is simplified to fit the nonlinear relationship between the power system state and node voltage in a supervised training way. Traditional PFC can be replaced by the simple forward calculation of critic network.
2) Simplification of actor network training. As training perfect actor network depends heavily on the judgment quality of critic network, the actor network training is set to start after the completement of critic network. Then, large amounts of invalid explorations in the early training stage for forming better critic network can be reduced, and the training efficiency of actor network can be significantly improved with the guidance of well-trained critic network from the start of training.
3) Fast training of DRL-based VVC for topologically variable power system. The side-tuning transfer learning (TL) is adopted to quickly obtain the network parameters suitable for the new topology with tiny training of the newly established small network. Compared with the conventional fine-tuning TL, the TL rate can be greatly improved.
The remainder of this paper is organized as follows. Section II is mainly the formulation of DRL-based VVC. Section III proposes the simplified DRL applied to VVC. Side-tuning TL-based VVC of topologically variable power system is elaborated in Section IV. Section V shows the general flowchart of the proposed method. The results of numerical tests are demonstrated in Section VI. Section VII states the conclusions.
To ensure the safety of system operation, traditional VVC usually selects the voltage deviation as the optimization index. Taking the node voltage exceeding the limit as penalty function, the VVC mathematical model is commonly constructed as:
(1) |
where is the objective function; n is the number of system nodes; and are the voltage and its target value, respectively; is the off-limit penalty function of voltage; is the corresponding coefficient; and are the upper and lower voltage limits, respectively; and are the active and reactive power outputs of generators, respectively; QCi is the reactive power compensation; and are the active and reactive loads, respectively; and j are the conductance and susceptance of the line, respectively; is the phase angle difference between the head and tail nodes; and and are the upper and lower regulation limits of reactive power equipment , respectively.
The main concepts involved in DRL include Agent, Environment, Action, State, and Reward. In this paper, Action, State, and Reward are abbreviated as A, S, and R, respectively. The goal of DRL is to train a policy that directly establishes the mapping between S and A, which maximizes the total expected discounted rewards of Agent, i.e., , and , where is the mathematical expectation; T is the number of interactions between Agent and Environment; and is the discount factor of R. The value of is updated based on the data samples stored during the iterations. Finally, the well-trained Agent can achieve excellent control strategies via with the minimum interaction steps in the face of any entirely new S.
As shown in

Fig. 1 Concise schematic diagram of DRL.
1) Agent and Environment: the system operator or control program is set as Agent, and the power system interacting with Agent is set as Environment.
2) State: the set of power system real-time operating state parameters is set as S, which usually contains the active and reactive loads, the active power output of generators, and the operating state of all reactive power equipment.
3) Action: the control strategy of reactive power equipment generated for S is set as A.
4) Reward: the function that characterizes the quality of A is set as R. In fact, the commonly-used R in DRL is the same as the objective function F of traditional VVC mathematical model.
The implementation object of the simplification strategy proposed in this paper is the actor-critic-type DRL algorithm, whose essence is to build critic network to evaluate different A generated by actor network, and to continuously upgrade the parameters of the two networks based on the data samples obtained from the continuous interactions between Agent and Environment. Finally, critic network can generate the most accurate value of A (abbreviated as Q), and actor network can generate the best A with the highest Q for different scenes.
As shown in

Fig. 2 Interactive update of DDPG.
1) Generation of training samples. As shown in (2), in each iteration, S randomly generated by Environment (power system) is input into actor network to generate A (reactive power equipment control strategy). After putting the noise-added A back into the Environment, a new , , and termination flag Done are obtained. Then, the data sample formed in this interaction is stored in the replay buffer D.
(2) |
where TS is the generated training sample; N is the Gaussian distribution; is the expectation; is the variance; and represents that R is commonly calculated by the PFC of power system.
2) Training of critic network. The training goal of the critic network is to satisfy the Bellman equation of Q shown in (3). That is, the value of current action judged by critic network equals the sum of R and the value of the following action . Therefore, the training method takes the deviation between the actual and the estimated , called TD-error, as the loss function to train the critic network parameters based on the training samples randomly chosen from D.
(3) |
(4) |
where M is the number of chosen training samples; is the discount factor used to estimate ; and and are the loss function and learning rate of critic network, respectively.
3) Training of actor network. As the training goal of actor network is to generate A with the highest Q judged by critic network for all scenarios, Q is directly used as the loss function to train actor network.
(5) |
where and are the loss function and learning rate of actor network, respectively.
For the “actor-critic” type DRL algorithms, the premise that actor network can get good control effects is that critic network can make the most accurate judgments on the value Q of different A. Since the impact of the current A on the future power system should be considered in the traditional DRL, the value Q of current A judged by critic network should satisfy the Bellman equation of Q, which is an equality constraint related to the current and subsequent A. Compared with the end-to-end type of supervised training, this kind of training will increase the training difficulty and consume more training samples without doubts. In addition, when DRL is applied to VVC, the value R contained in each sample can only be obtained by PFC without exception in the existing literature. But accurate PFC methods such as the Newton-Raphson method always take a long time to calculate once, so the overall training time of DRL will be significantly increased. To sum up, the core of improving the training speed of DRL-based VVC lies in adopting a simpler and faster way to train a critic network that can accurately judge the quality of different A.
The original intention of applying DRL to VVC is to replace the iterative calculation of traditional methods with the single swift calculation of DRL, to rapidly respond to voltage and var fluctuations caused by the random output of distributed energies. However, unlike the application scenarios of DRL in other fixed Environment such as Go and computer games, when DRL is applied to VVC, in the interactions between Agent and Environment, the variables in S other than reactive power equipment are also uncertain, such as changes of the active power output of distributed energies, load size, and even system topology. Therefore, this paper holds that when DRL is applied to VVC, it is not necessary to consider the impact of current A generated by actor network on the future power system, but only to use critic network to evaluate the control effect of A on the current scenario.
Based on the above analysis, this paper proposes a simplified DRL method based on actor-critic architecture for VVC, which includes the following three core ideas.
1) To enforce critic network paying all attention to the control effect of A corresponding to the current scenario, multiple interactions between the Agent and Environment in each iteration are simplified into one single interaction. The action value generated by critic network is reduced to R obtained by the single interaction. The Bellman equation of Q is directly reduced to .
2) The method for calculating R is changed from the traditional PFC to the forward calculation of critic network. The critic network training is directly simplified to the most commonly supervised training, whose input variables are the state parameters of power system while the output variables are the voltage of each node.
3) As the great simplification of critic network training and the premise of actor network training quality is that critic network can make accurate judgments on the value Q of different A, this paper changes the parallel training of the two networks. Critic network training is completed based on the supervised training first. Then, actor network is trained much more quickly with the help of well-trained critic network.
Based on the above core ideas, as shown in

Fig. 3 Update of simplified DRL-based VVC.
1) Training of critic network. By randomly adjusting the node loads in 0-1.2 times the normal level, the active power output of generators in 0-1 times the rated power value, and the instructions of reactive equipment in the interval of upper and lower limits to form different operating scenarios, and performing PFC to get the node voltage, massive training samples can be obtained to conduct the supervised training of critic network.
(6) |
where and are the node voltage predicted by critic network and the label voltage obtained by PFC, respectively.
2) Unlike the traditional DRL which uses the value Q evaluated by critic network as the loss function for training actor network, this paper trains actor network parameters strictly following the chain derivative rule based on the well-trained critic network.
(7) |
TL is introduced to solve the problem of updating the parameters of DRL networks quickly when the system topology changes in this paper. The definition of TL [
The most common TL method is fine-tuning. As shown in (8), its core idea is to take the network parameters trained by the original task directly as the initial values of the new task, and then the network parameters suitable for the new task can be quickly trained through fewer training samples and iterations.
(8) |
where TN and BN are the networks of the new and original tasks, respectively; Mt is the number of training samples; and It is the current training iteration.
However, when fine-tuning is applied to DRL-based VVC of large-scale power systems, as the volume of DRL network parameters increases rapidly with the growth of the number of system nodes and fine-tuning involves all the parameters to update in each parameter upgrade, there will still be a problem of slow training speed for the new topology.
To this end, this paper introduces another TL method called side-tuning [
(9) |

Fig. 4 Comparison of fine-tuning and side-tuning. (a) Fine-tuning. (b) Side-tuning.
where is the weighting factor of side-tuning.
As shown in
(10) |
(11) |

Fig. 5 Update of side-tuning TL of simplified DRL-based VVC.
where the subscripts BN and SN correspond to the network of original task and newly-established network, respectively. In
The general flowchart of the proposed method is shown in

Fig. 6 General flowchart of proposed method.
The rapidity and correctness of the proposed method in this paper are verified using IEEE 30-bus and 118-bus systems, whose detailed parameters are obtained from MATPOWER of MATLAB. The control objects of VVC mainly include the generators and transformer taps. Therefore, the number of control variables for IEEE 30-bus system is 9 while those for 118-bus system reaches 64, which has been a relatively large number in the existing literature of DRL applied to VVC. The target of VVC is to make the voltage of all nodes close to the target value 1 p.u.. All the programs are written using Python 3.7.5 and the PFC involved is performed based on the PYPOWER toolkit from Python.
All the tests are performed on a Windows PC equipped with Intel Core i5-12500H CPU @ 2.5 GHz and 16 GB RAM.
The methods used for comparison are listed in
Method | Description |
---|---|
1 | Simplified DRL (proposed method in this paper) |
2 | SAC (current best DRL method) |
3 | IPM (mathematical algorithm) |
4 | PSO (heuristic algorithm) |
5 | Side-tuning TL |
6 | Fine-tuning TL |
Method | Parameters | Value | ||
---|---|---|---|---|
30-bus system | 118-bus system | |||
Simplified DRL/SAC | Actor network | Number of layers | 4 | 5 |
Node number of hidden layers | 128 | 512 | ||
Learning rate | 0.004 | 0.004 | ||
Critic network | Number of layers | 4 | 5 | |
Node number of hidden layers | 128 | 512 | ||
Learning rate | 0.004 | 0.004 | ||
Iteration number | 200 | 200 | ||
Number of training samples | 5000 | 10000 | ||
Number of test samples | 500 | 500 | ||
IPM | Central coefficient | 0.1 | 0.1 | |
Convergence precision |
1 |
1 | ||
PSO | Number of particles | 30 | 50 | |
Maximum speed coefficient | 0.05 | 0.05 | ||
Convergence precision |
1 |
1 | ||
Side-tuning TL | SN of actor | Number of layers | 3 | 4 |
Node number of hidden layer | 32 | 128 | ||
Learning rate | 0.002 | 0.002 | ||
SN of critic | Number of layers | 3 | 4 | |
Node number of hidden layers | 32 | 128 | ||
Learning rate | 0.002 | 0.002 | ||
Number of training samples | 1000 | 2000 |
Computed by the well-trained actor of methods 1, 2 and methods 3, 4, respectively,
(12) |

Fig. 7 General control effect comparison of methods 1-4.
Method | Computing time (s) | |||
---|---|---|---|---|
Average | Maximum | Minimum | ||
1 |
3.43×1 |
4.78×1 |
2.07×1 |
3.8×1 |
2 |
4.40×1 |
6.19×1 |
2.16×1 |
3.8×1 |
3 |
3.46×1 |
4.81×1 |
2.04×1 |
9.0×1 |
4 |
3.47×1 |
4.86×1 |
2.10×1 |
4.1×1 |

Fig. 8 Detailed control effect comparison of methods 1-4 in scenarios 1-100.

Fig. 9 Training effect comparison of methods 1 and 2.

Fig. 10 Training speed comparison of methods 1 and 2.
Firstly, the online control effect and calculation speed of the four methods are compared. As can be observed in
It demonstrates that the simplification strategy for actor-critic-type DRL algorithm proposed in this paper works well, and the gradient descent optimization based on the backward derivation of R for training actor network can guarantee method 1 to achieve the same optimization accuracy as mathematical or heuristics algorithms. The above conclusions can be further illustrated in
Secondly, the offline training effect of methods 1 and 2 are compared. As can be observed in
Thirdly, the training speed of methods 1 and 2 are compared. As can be observed in
Based on methods 1, 5, and 6, the test for the side-tuning TL-based VVC is carried out. The operating scenario before the network topology change is that all lines in the system are all running. The operating scenario after the network topology change is the line between nodes 10 and 21, and the line between nodes 8 and 28 in IEEE 30-bus system is disconnected.
Figures

Fig. 11 Training effect comparison of methods 1, 5, and 6.

Fig. 12 Training speed comparison of methods 1, 5, and 6.
As can be observed in
As shown in
1) By adopting TL, the number of samples required for critic training of the new topology is much reduced, so the time for generating samples by using the traditional PFC is greatly reduced.
2) By adopting side-tuning TL, the actor and critic networks enjoy not only the good guidance of original topological network parameters, but also the parameters to be updated only involving the small volume SN, so that the calculation amount in each iteration is significantly reduced and the training speed of method 5 can be improved by about two times compared with method 6.
Based on the above simulation results and analysis, as long as method 1 is used to complete the training of actor and critic networks of a certain topology based on method 1, in the face of other topologies, the network parameters of actor and critic suitable for the new topology can be quickly obtained by using method 5 based on side-tuning TL, with the training speed improved by about ten times compared with traditional DRL which trains from the start.
To verify the generality of the proposed methods and their strong applicability for large-scale control variables, simulations are carried out on the IEEE 118-bus system which contains 64 control variables. The performances of various methods are shown in
Validation | Method | Average | Computing time (s) | Training time (s) |
---|---|---|---|---|
Validation of simplified DRL | 1 |
3.07×1 |
6.40×1 | 597.1 |
2 |
3.98×1 |
6.40×1 | 2320.6 | |
3 |
3.08×1 |
4.20×1 | ||
4 |
3.12×1 |
4.05×1 | ||
Validation of side-tuning TL | 1 |
3.57×1 |
6.40×1 | 589.0 |
5 |
3.57×1 |
6.70×1 | 183.4 | |
6 |
3.57×1 |
6.40×1 | 306.4 |
This paper presents a simplified DRL-based VVC, which greatly simplifies the interaction and training process in DRL, and forces Agent to obtain the control strategy that minimizes the voltage deviation through only one single interaction when facing an entirely new operating scene. The test results prove that the proposed method can not only achieve the same calculation accuracy as traditional mathematical methods, but also significantly improve the training speed compared with the traditional DRL.
This paper introduces side-tuning TL to DRL-based VVC of topologically variable power system to reduce the number of parameters needed to update when the system topology changes, and makes mathematical derivation for the application of side-tuning TL to simplified DRL. Based on the test results, the proposed method can obtain faster training speed than traditional TL to improve the timeliness of simplified DRL applied to VVC greatly.
References
D. K. Molzahn, F. Dörfler, H. Sandberg et al., “A survey of distributed optimization and control algorithms for electric power systems,” IEEE Transactions on Smart Grid, vol. 8, no. 6, pp. 2941-2962, Nov. 2017. [Baidu Scholar]
M. H. J. Bollen, R. Das, S. Djokic et al., “Power quality concerns in implementing smart distribution-grid applications,” IEEE Transactions on Smart Grid, vol. 8, no. 1, pp. 391-399, Jan. 2017. [Baidu Scholar]
A. Bedawy, N. Yorino, K. Mahmoud et al., “Optimal voltage control strategy for voltage regulators in active unbalanced distribution systems using multi-agents,” IEEE Transactions on Power Systems, vol. 35, no. 2, pp. 1023-1035, Mar. 2020. [Baidu Scholar]
H. Ahmadi, J. R. Martí, and H. W. Dommel, “A framework for volt-var optimization in distribution systems,” IEEE Transactions on Smart Grid, vol. 6, no. 3, pp. 1473-1483, May 2015. [Baidu Scholar]
R. A. Jabr and I. Džafić, “Penalty-based volt/var optimization in complex coordinates,” IEEE Transactions on Power Systems, vol. 37, no. 3, pp. 2432-2440, May. 2022. [Baidu Scholar]
M. B. Liu, C. A. Canizares, and W. Huang, “Voltage and var control in distribution systems with limited switching operations,” IEEE Transactions on Power Systems, vol. 24, no. 2, pp. 889-899, May 2009. [Baidu Scholar]
H.-Y. Su and T.-Y. Liu, “Enhanced worst-case design for robust secondary voltage control using maximum likelihood approach,” IEEE Transactions on Power Systems, vol. 33, no. 6, pp. 7324-7326, Nov. 2018. [Baidu Scholar]
Y. -Y. Hong, F. -J. Lin, Y. -C. Lin et al., “Chaotic PSO-based var control considering renewables using fast probabilistic power flow,” IEEE Transactions on Power Delivery, vol. 29, no. 4, pp. 1666-1674, Aug. 2014. [Baidu Scholar]
Y. Malachi and S. Singer, “A genetic algorithm for the corrective control of voltage and reactive power,” IEEE Transactions on Power Systems, vol. 21, no. 1, pp. 295-300, Feb. 2006. [Baidu Scholar]
K. Mahmoud, M. M. Hussein, M. Abdel-Nasser et al., “Optimal voltage control in distribution systems with intermittent PV using multi-objective grey-wolf-Lévy optimizer,” IEEE Systems Journal, vol. 14, no. 1, pp. 760-770, Mar. 2020. [Baidu Scholar]
J. Duan, D. Shi, R. Diao et al., “Deep-reinforcement-learning-based autonomous voltage control for power grid operations,” IEEE Transactions on Power Systems, vol. 35, no. 1, pp. 814-817, Jan. 2020. [Baidu Scholar]
X. Sun and J. Qiu, “A customized voltage control strategy for electric vehicles in distribution networks with reinforcement learning method,” IEEE Transactions on Industrial Informatics, vol. 17, no. 10, pp. 6852-6863, Oct. 2021. [Baidu Scholar]
P. Li, M. Wei, H. Ji et al., “Deep reinforcement learning-based adaptive voltage control of active distribution networks with multi-terminal soft open point,” International Journal of Electrical Power & Energy Systems, vol. 141, pp. 1-10, 2022. [Baidu Scholar]
H. Liu, C. Zhang, Q. Chai et al., “Robust regional coordination of inverter-based volt/var control via multi-agent deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 12, no. 6, pp. 5420-5433, Nov. 2021. [Baidu Scholar]
Y. Zhou, B. Zhang, C. Xu et al., “A data-driven method for fast AC optimal power flow solutions via deep reinforcement learning,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1128-1139, Nov. 2020. [Baidu Scholar]
Y. Zhou, W. Lee, R. Diao et al., “Deep reinforcement learning based real-time AC optimal power flow considering uncertainties,” Journal of Modern Power Systems and Clean Energy, vol. 10, no. 5, pp. 1098-1109, Sept. 2022. [Baidu Scholar]
W. Wang, N. Yu, Y. Gao et al., “Safe off-policy deep reinforcement learning algorithm for volt-var control in power distribution systems,” IEEE Transactions on Smart Grid, vol. 11, no. 4, pp. 3008-3018, Jul. 2020. [Baidu Scholar]
H. Liu and W. Wu, “Two-stage deep reinforcement learning for inverter-based volt-var control in active distribution networks,” IEEE Transactions on Smart Grid, vol. 12, no. 3, pp. 2037-2047, May 2021. [Baidu Scholar]
D. Cao, W. Hu, X. Xu et al., “Deep reinforcement learning based approach for optimal power flow of distribution networks embedded with renewable energy and storage devices,” Journal of Modern Power Systems and Clean Energy, vol. 9, no. 5, pp. 1101-1110, Sept. 2021. [Baidu Scholar]
D. Cao, W. Hu, J. Zhao et al., “Reinforcement learning and its applications in modern power and energy systems: a review,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1029-1042, Nov. 2020. [Baidu Scholar]
E. Liang, R. Liaw, R. Nishihara et al., “RLlib: abstractions for distributed reinforcement learning,” Proceedings of Machine Learning Research, vol. 80, pp. 3053-3062, Dec. 2017. [Baidu Scholar]
A. Irpan. (2018, Feb.). Deep reinforcement learning doesn’t work yet. [Online]. Available: https://www. alexirpan.com/2018/02/14/rl-hard.html [Baidu Scholar]
P. Henderson, R. Islam, P. Bachman et al., “Deep reinforcement learning that matters,” Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 1-8, Mar. 2018. [Baidu Scholar]
R. Huang, Y, Chen, T. Yin et al., “Accelerated derivative-free deep reinforcement learning for large-scale grid emergency voltage control,” IEEE Transactions on Power Systems, vol. 37, no. 1, pp. 14-25, Jan. 2022. [Baidu Scholar]
A. Stooke and P. Abbeel. (2019, Jan.). Accelerated methods for deep reinforcement learning. [Online]. Available: https://arxiv.org/abs/1803.02811 [Baidu Scholar]
Q. Huang, R. Huang, W. Hao et al., “Adaptive power system emergency control using deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1171-1182, Mar. 2020. [Baidu Scholar]
T. P. Lillicrap, J. J. Hunt, A. Pritzel et al. (2015, Sept.). Continuous control with deep reinforcement learning. [Online]. Available: https://arxiv.org/abs/1509.02971 [Baidu Scholar]
S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, Oct. 2010. [Baidu Scholar]
F. Zhuang, Z. Qi, K. Duan et al., “A comprehensive survey on transfer learning,” Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021. [Baidu Scholar]
J. O. Zhang, A. Sax, A. Zamir et al. (2022, Jan.). Side-tuning: a baseline for network adaptation via additive side networks. [Online]. Available: https://arxiv.org/abs/1912.13503 [Baidu Scholar]