ALESSANDRO TRAPASSO

PhD Graduate

PhD program:: XXXVII



Thesis title: Integrating Multi-Agent Planning and Reinforcement Learning through Reward and Exploration Machines

Integrating automated planning with reinforcement learning (RL) is a longstanding goal in artificial intelligence, yet existing solutions struggle when rewards are non-Markovian, when agents must act concurrently, or when the state–action space explodes in multi-agent settings. This dissertation tackles these challenges by unifying symbolic planning techniques with model-based RL and automata-based reward representations. The key idea is to let formal planners supply the high-level temporal and concurrency structure of the task, while data-driven learners refine execution policies online. In doing so, the work bridges the complementary strengths of planning—foresight, structure and explainability—and of RL—adaptation to unknown or stochastic dynamics. Concretely, the thesis contributes: (i) A multi-agent planning formalism with explicit agent representation, implemented in the Unified Planning library to provide clear semantics and seamless compilation to existing multi-agent planning solvers. (ii) QR-Max, a PAC-MDP model-based RL algorithm for discrete-action Non-Markovian Reward Decision Processes that exploits reward-machine factorization. (iii) An extension of QR-Max to cooperative multi-agent domains that shares learned dynamics while decoupling individual reward models. (iv) MARL-RM, a framework that automatically converts partial-order multi-agent plans into reward machines, thereby injecting concurrency and synchronization constraints directly into decentralized training. (v) A hierarchy of state abstractions, heuristic shaping and a Global Exploration Machine that densify sparse rewards and orchestrate safe, coordinated exploration.

Research products

11573/1738691 - 2025 - Unified Planning: Modeling, manipulating and solving AI planning problems in Python
Micheli, A.; Bit-Monnot, A.; Roger, G.; Scala, E.; Valentini, A.; Framba, L.; Rovetta, A.; Trapasso, A.; Bonassi, L.; Gerevini, A. E.; Iocchi, L.; Ingrand, F.; Kockemann, U.; Patrizi, F.; Saetti, A.; Serina, I.; Stock, S. - 01a Articolo in rivista
paper: SOFTWAREX ([Amsterdam] : Elsevier B.V.) pp. - - issn: 2352-7110 - wos: WOS:001391993900001 (5) - scopus: 2-s2.0-85212576537 (16)

11573/1756085 - 2025 - Concurrent Multiagent Reinforcement Learning with Reward Machines
Trapasso, Alessandro; Jonsson, Anders - 04b Atto di convegno in volume
conference: 28th European Conference on Artificial Intelligence (ECAI 2025) (Bologna; Italy)
book: Proceedings of 28th European Conference on Artificial Intelligence (ECAI 2025) - (9781643686318)

11573/1685988 - 2023 - A formalization of multi-agent planning with explicit agent representation
Trapasso, Alessandro; Santilli, Sofia; Iocchi, Luca; Patrizi, Fabio - 04b Atto di convegno in volume
conference: 38th ACM/SIGAPP Symposium on Applied Computing (Tallinn, Estonia)
book: SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing - (9781450395175)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma