Thesis title: An optimal control approach to Reinforcement Learning
Optimal control and Reinforcement Learning deal both with sequential decision-making problems, although they use different tools. In this thesis, we have investigated the connection between these two research areas. In particular, our contributions are twofold.
In the first part of the thesis, we present and study an optimal control problem with uncertain dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution \pi on the space of possible dynamics functions. The goal is to minimize an average cost functional, where the average is computed with respect to the probability distribution \pi. This framework describes well the behavior of a class of model-based RL algorithms, which build a probabilistic model (here represented by \pi) of the dynamics, and then compute the control by minimizing the expectation of the cost functional with respect to \pi. In this context, we establish some convergence results for the value function and the optimal control. These results constitute an important step in the convergence analysis of this class of RL algorithms.
In the second part, we propose a new online algorithm for dealing with LQR problems where the state matrix A is unknown. Our algorithm provides an approximation of the dynamics and finds a suitable control at the same time, during a single simulation. It is based on an integration between RL and optimal control techniques. A probabilistic model is updated at each iteration using Bayesian linear regression formulas, and the control is obtained in feedback form by solving a Riccati differential equation. Numerical tests show how the algorithm can efficiently bring the system to the origin, despite not having full knowledge of the system at the beginning of the simulation.