Thesis title: Deep Reinforcement Learning for Robust Spacecraft Guidance and Control
This Ph.D. thesis aims to investigate new guidance and control algorithms based on deep neural networks and reinforcement learning, with application in next-generation space missions, which are expected to require greater levels of autonomy and robustness. Unlike traditional optimal control methods, deep reinforcement learning represents a systematic framework to deal with space trajectory optimization problems under any kind of uncertainty, including unknown or unmodeled dynamics, inaccurate initial conditions, control execution errors, and measurement noise. In deep learning approaches to spacecraft guidance, a deep neural network is used to map observations, that is, any combination of measures of the spacecraft state, including raw images taken by on-board optical cameras, to corresponding control actions, which in space applications generally define the modulus and direction of the thrust. In deep reinforcement learning, the optimal control problem is reformulated as a discrete-time Markov decision process, and the network, or agent, is trained by trial-and-error through repeated simulations of the mission scenario. The agent starts with a random control policy and progressively refines it during training, seeking to maximize the rewards received from the environment as a measure of its current performance. The exploratory behavior typical of reinforcement learning algorithms, which learn through a huge number of simulations, is at the base of their inherent robustness against variations in the environment definition. At the end of the training process, besides a reference robust trajectory, the network outputs an optimal observation-feedback control law. For this reason, the trained network can be used on-board the spacecraft to provide it with real-time and autonomous control capabilities during the actual operations. In this thesis, deep neural networks with different architectures are trained by a state-of-the-art reinforcement learning method and applied to a few selected real-world study cases, including interplanetary, multi-body, and proximity-operation space missions. The objective is to assess how the networks perform in terms of optimality, constraint handling, and robustness in different operational scenarios, which feature scattered initial conditions, multiple terminal and path constraints, unmodeled dynamics, control and navigation errors, and partial observability of the spacecraft state.