ANDREA PESARE

Dottore di ricerca

ciclo: XXXIV


supervisore: E. Carlini
relatore: M. Falcone

Titolo della tesi: An optimal control approach to Reinforcement Learning

Optimal control and Reinforcement Learning deal both with sequential decision-making problems, although they use different tools. In this thesis, we have investigated the connection between these two research areas. In particular, our contributions are twofold. In the first part of the thesis, we present and study an optimal control problem with uncertain dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution \pi on the space of possible dynamics functions. The goal is to minimize an average cost functional, where the average is computed with respect to the probability distribution \pi. This framework describes well the behavior of a class of model-based RL algorithms, which build a probabilistic model (here represented by \pi) of the dynamics, and then compute the control by minimizing the expectation of the cost functional with respect to \pi. In this context, we establish some convergence results for the value function and the optimal control. These results constitute an important step in the convergence analysis of this class of RL algorithms. In the second part, we propose a new online algorithm for dealing with LQR problems where the state matrix A is unknown. Our algorithm provides an approximation of the dynamics and finds a suitable control at the same time, during a single simulation. It is based on an integration between RL and optimal control techniques. A probabilistic model is updated at each iteration using Bayesian linear regression formulas, and the control is obtained in feedback form by solving a Riccati differential equation. Numerical tests show how the algorithm can efficiently bring the system to the origin, despite not having full knowledge of the system at the beginning of the simulation.

Produzione scientifica

11573/1623273 - 2022 - A New Algorithm for the LQR Problem with Partially Unknown Dynamics
Pacifico, Agnese; Pesare, Andrea; Falcone, Maurizio - 04b Atto di convegno in volume
congresso: 13th International Conference on Large-Scale Scientific Computations, LSSC 2021 (Sozopol, Bulgaria)
libro: Large-Scale Scientific Computing - (978-3-030-97548-7; 978-3-030-97549-4)

11573/1604568 - 2021 - Convergence results for an averaged LQR problem with applications to reinforcement learning
Pesare, A.; Palladino, M.; Falcone, M. - 01a Articolo in rivista
rivista: MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS (Springer-Verlag London Limited:Sweetapple House, Catteshall Road, Godalming Surrey GU7 3DJ United Kingdom:011 44 1483 418822, EMAIL: postmaster@svl.co.uk, Fax: 011 44 1483 415151) pp. 379-411 - issn: 0932-4194 - wos: WOS:000670862800001 (3) - scopus: 2-s2.0-85109934033 (2)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma