ALESSANDRO TRAPASSO

PhD Graduate

PhD program:: XXXVII



Thesis title: Integrating Multi-Agent Planning and Reinforcement Learning through Reward and Exploration Machines

Integrating automated planning with reinforcement learning (RL) is a longstanding goal in artificial intelligence, yet existing solutions struggle when rewards are non-Markovian, when agents must act concurrently, or when the state–action space explodes in multi-agent settings. This dissertation tackles these challenges by unifying symbolic planning techniques with model-based RL and automata-based reward representations. The key idea is to let formal planners supply the high-level temporal and concurrency structure of the task, while data-driven learners refine execution policies online. In doing so, the work bridges the complementary strengths of planning—foresight, structure and explainability—and of RL—adaptation to unknown or stochastic dynamics. Concretely, the thesis contributes: (i) A multi-agent planning formalism with explicit agent representation, implemented in the Unified Planning library to provide clear semantics and seamless compilation to existing multi-agent planning solvers. (ii) QR-Max, a PAC-MDP model-based RL algorithm for discrete-action Non-Markovian Reward Decision Processes that exploits reward-machine factorization. (iii) An extension of QR-Max to cooperative multi-agent domains that shares learned dynamics while decoupling individual reward models. (iv) MARL-RM, a framework that automatically converts partial-order multi-agent plans into reward machines, thereby injecting concurrency and synchronization constraints directly into decentralized training. (v) A hierarchy of state abstractions, heuristic shaping and a Global Exploration Machine that densify sparse rewards and orchestrate safe, coordinated exploration.

Research products

Connessione ad iris non disponibile

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma