Australian and New Zealand Control Conference (ANZCC), Auckland, New Zealand, 27 - 29 November 2019, pp.255-257
Coordinating two or more dynamic systems such as autonomous vehicles or satellites in a distributed manner poses an important research challenge. Multiple approaches to this problem have been proposed including Nonlinear Model Predictive Control (NMPC) and its model-free counterparts in reinforcement learning (RL) literature such as Deep Q Network (DQN). This initial study aims to compare and contrast the optimal control technique, NMPC, where the model is known, with the popular model-free RL method, DQN. Simple distributed variants of these for the specific problem of balancing and synchronising two highly unstable cart-pole systems are investigated numerically. We found that both NMPC and trained DQN work optimally under ideal model and small communication delays. While NMPC performs sub-optimally under a model-mismatch scenario, DQN performance naturally does not suffer from this. Distributed DQN needs a lot of real-world experience to be trained but once it is trained, it does not have to spend its time finding the optimal action at every time step like NMPC. This illustrative comparison lays a foundation for hybrid approaches, which can be applied to complex multi-agent scenarios.