GABRIEL PALUDO LICKS

Dottore di ricerca

ciclo: XXXVI



Titolo della tesi: From Histories to States: Practical Automata-Based Methods for Learning and Solving Regular Decision Processes

Reinforcement Learning (RL) commonly relies on the Markov property, which asserts that the current state contains all relevant information for future decision-making. However, many real-world scenarios violate this property: agents often face observations that do not fully reveal the environment's true state or that exhibit dependencies extending across extended histories. Partially Observable Markov Decision Processes (POMDPs) provide a comprehensive theoretical framework for such problems, yet exact solutions are computationally intractable, and existing learning-based methods largely depend on heuristic or approximate techniques. In this thesis, we focus on a more specialised class of partially observable systems known as Regular Decision Processes (RDPs). RDPs capture a wide range of non-Markovian dependencies while remaining more tractable than general POMDPs. Specifically, they assume that any temporal dependencies in the environment can be encoded by a finite-state automaton, thereby transforming the problem into a Markovian one once the automaton's internal states are included in the agent's state representation. This structural assumption, though less general than a full POMDP, is satisfied in many practical cases where the relevant history can be summarised by a finite memory mechanism. A principal contribution of this work is demonstrating how an RL agent can learn such an automaton-based representation autonomously, without explicit domain knowledge. We achieve this by leveraging automata learning, a field dedicated to inferring state machines from observed sequences, in combination with established RL methods. By examining how different histories lead to indistinguishable outcomes, the agent merges states that exhibit statistically equivalent future reward and observation distributions. The result is a compact, state-based model of the environment that effectively restores the Markov property. Building on this learned representation, we integrate it with classical RL algorithms. Empirical evaluations on benchmark tasks show that our approaches outperform baseline methods, particularly in domains featuring partial observability and extended temporal dependencies. Beyond improvements in sample efficiency, the explicit automaton structure also provides a degree of interpretability, potentially allowing for transfer to related tasks or examination by domain experts. Finally, we discuss directions for further development, including extensions to larger or noisier observation spaces, integration with function approximation, and potential applications in fields where partial observability and regular temporal dependencies are dominant.

Produzione scientifica

11573/1727985 - 2024 - Enhancing Deep Sequence Generation with Logical Temporal Knowledge
Umili, E.; Paludo Licks, G.; Patrizi, F. - 04b Atto di convegno in volume
congresso: the 3rd International Workshop on Process Management in the AI Era (PMAI 2024) co-located with 27th European Conference on Artificial Intelligence (ECAI 2024), Santiago de Compostela, Spain, October 19, 2024 (Santiago de Compostela; Spain)
libro: PMAI 2024. Proceedings of the 3rd International Workshop on Process Management in the AI Era (PMAI 2024) co-located with 27th European Conference on Artificial Intelligence (ECAI 2024) - ()

11573/1670725 - 2022 - Markov abstractions for PAC reinforcement learning in non-Markov decision processes
Ronca, A.; Paludo Licks, G.; De Giacomo, G. - 04b Atto di convegno in volume
congresso: 31st International Joint Conference on Artificial Intelligence, IJCAI 2022 (Wien; Austria)
libro: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) - (9781956792003)

11573/1728605 - 2022 - Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision Processes
Ronca, A.; Paludo Licks, G.; De Giacomo, G. - 04c Atto di convegno in rivista
rivista: IJCAI (Harcourt Incorporated:6277 Sea Harbor Drive:Orlando, FL 32887:(800)745-7323, (415)392-2665, Fax: (415)982-2665) pp. 3408-3415 - issn: 1045-0823 - wos: WOS:001202342303075 (2) - scopus: 2-s2.0-85137943705 (8)
congresso: International Joint Conference on Artificial Intelligence (Wien; Austria)

11573/1670727 - 2020 - Using self-attention LSTMs to enhance observations in goal recognition
Amado, L.; Paludo Licks, G.; Marcon, M.; Fraga Pereira, R.; Meneguzzi, F. - 04b Atto di convegno in volume
congresso: 2020 International Joint Conference on Neural Networks, IJCNN 2020 (Glasgow; Scotland)
libro: Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN 2020) - (9781728169262; 9781728169279)

11573/1670710 - 2020 - SmartIX: A database indexing agent based on reinforcement learning
Paludo Licks, G.; Colleoni Couto, J.; De Fátima Miehe, P.; De Paris, R.; Dubugras Ruiz, D.; Meneguzzi, F. - 01a Articolo in rivista
rivista: APPLIED INTELLIGENCE (Dordrecht: Kluwer) pp. 2575-2588 - issn: 1573-7497 - wos: WOS:000546382600018 (19) - scopus: 2-s2.0-85082686438 (21)

11573/1670729 - 2020 - Automated database indexing using model-free reinforcement learning
Paludo Licks, G.; Meneguzzi, F. - 04b Atto di convegno in volume
congresso: arXiv preprint (Online)
libro: arXiv preprint - ()

11573/1670730 - 2018 - Smart makerspace: a web platform implementation
Paludo Licks, G.; Teixeira, A. C.; Luyten, K. - 04c Atto di convegno in rivista
rivista: INTERNATIONAL JOURNAL: EMERGING TECHNOLOGIES IN LEARNING () pp. 140-156 - issn: 1868-8799 - wos: WOS:000429617700010 (6) - scopus: 2-s2.0-85042591151 (8)
congresso: International Journal of Emerging Technologies in Learning volume 13 number 2 (Online)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma