Provable reinforcement learning for multi-agent and robust control systems

Thursday, February 18, 2021 11:00to12:00

Dynamic Games and Applications Seminar

Speaker: Kaiqing Zhang – University of Illinois at Urbana-Champaign, United States

Webinar link
Webinar ID: 962 7774 9870
Passcode: 285404

Abstract: Recent years have witnessed both tremendous empirical successes and fast-growing theoretical development of reinforcement learning (RL), in solving many sequential decision-making and control tasks. However, many RL algorithms are still several miles away from being applied to practical autonomous systems, which usually involve more complicated scenarios with multiple decision-makers and safety-critical concerns. In this talk, I will introduce our work on the development of RL algorithms with provable guarantees, with focuses on the multi-agent and safety-critical settings. I will first show that policy optimization, one of the main drivers of the empirical successes of RL, enjoys global convergence and sample complexity guarantees for a class of robust control problems. More importantly, we show that certain policy optimization approaches automatically preserve some "robustness" during the iterations, some property we termed as "implicit regularization". Interestingly, such a setting naturally unifies other important benchmark settings in control and game theory: risk-sensitive control design, and linear quadratic zero-sum dynamic games, while the latter is the benchmark multi-agent RL (MARL) setting that mirrors the role played by linear quadratic regulators (LQR) for single-agent RL. Despite the nonconvexity and the fundamental challenges in the optimization landscape, our theory shows that policy optimization enjoys global convergence guarantees in these problems as well. The results have then provided some theoretical justifications for several basic robust RL and MARL settings that are popular in the empirical RL world. In addition, I will introduce several other works along this line of provable MARL and robust RL, including decentralized MARL with networked agents, sample complexity of model-based MARL, etc. Time permitting, I will also share several future directions based on the previous results, towards large-scale and reliable autonomy.

Back to top