## ANDREA PESARE

PhD Graduate**PhD program:**: XXXIV

**supervisor**: E. Carlini

**advisor**: M. Falcone

**Thesis title:**An optimal control approach to Reinforcement Learning

Optimal control and Reinforcement Learning deal both with sequential decision-making problems, although they use different tools. In this thesis, we have investigated the connection between these two research areas. In particular, our contributions are twofold.
In the first part of the thesis, we present and study an optimal control problem with uncertain dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution \pi on the space of possible dynamics functions. The goal is to minimize an average cost functional, where the average is computed with respect to the probability distribution \pi. This framework describes well the behavior of a class of model-based RL algorithms, which build a probabilistic model (here represented by \pi) of the dynamics, and then compute the control by minimizing the expectation of the cost functional with respect to \pi. In this context, we establish some convergence results for the value function and the optimal control. These results constitute an important step in the convergence analysis of this class of RL algorithms.
In the second part, we propose a new online algorithm for dealing with LQR problems where the state matrix A is unknown. Our algorithm provides an approximation of the dynamics and finds a suitable control at the same time, during a single simulation. It is based on an integration between RL and optimal control techniques. A probabilistic model is updated at each iteration using Bayesian linear regression formulas, and the control is obtained in feedback form by solving a Riccati differential equation. Numerical tests show how the algorithm can efficiently bring the system to the origin, despite not having full knowledge of the system at the beginning of the simulation.

**Research products**

11573/1623273 - 2022 -

**A New Algorithm for the LQR Problem with Partially Unknown Dynamics**Pacifico, Agnese; Pesare, Andrea; Falcone, Maurizio - 04b Atto di convegno in volume

**conference:**13th International Conference on Large-Scale Scientific Computations, LSSC 2021 (Sozopol, Bulgaria)

**book:**Large-Scale Scientific Computing - (978-3-030-97548-7; 978-3-030-97549-4)

11573/1604568 - 2021 -

**Convergence results for an averaged LQR problem with applications to reinforcement learning**Pesare, A.; Palladino, M.; Falcone, M. - 01a Articolo in rivista

**paper:**MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS (Springer-Verlag London Limited:Sweetapple House, Catteshall Road, Godalming Surrey GU7 3DJ United Kingdom:011 44 1483 418822, EMAIL: postmaster@svl.co.uk, Fax: 011 44 1483 415151) pp. 379-411 - issn: 0932-4194 - wos: WOS:000670862800001 (3) - scopus: 2-s2.0-85109934033 (3)