LUIGI SIGILLO

PhD Graduate

PhD program:: XXXVII


advisor: prof. Danilo Comminiello

Thesis title: High-Resolution Synthesis Across Domains: A Wavelet-Driven Approach to Generative Modeling

Deep generative modeling is revolutionizing visual data synthesis, from creative industry applications to tools for scientific discovery in fields like medical imaging and remote sensing. In image synthesis, deep learning models like Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Denoising Diffusion Probabilistic Models (DDPMs) help create novel, realistic, and controllable visual content. In specialized applications, these models can augment scarce data for medical analysis or enhance imagery for remote sensing tasks. A crucial challenge in applying deep learning to image synthesis is overcoming the persistent trade-offs between resolution, fidelity, and computational efficiency. While generative models have advanced significantly, they often struggle with generalization across diverse domains and data modalities. Conventional methods tend to scale poorly to ultra-high resolutions (UHR), leading to artifacts like repeated structures or blurred textures. Furthermore, these models often apply uniform refinement processes across all spatial regions, disregarding local frequency variations and failing to optimally allocate supervision to areas of different visual complexity. Wavelet transforms, particularly Discrete Wavelet Transforms (DWT) and their hypercomplex extension, Quaternion Wavelet Transforms (QWT), have shown promising results in multi-scale signal analysis. These methods operate by decomposing images into a hierarchy of frequency sub-bands, capturing both global structure and fine-grained details. This formulation endows models with the ability to leverage sparse representations and dimensionality reduction. However, their potential to reshape feature representation, inform conditioning, and adapt training objectives in a model-agnostic fashion had not been comprehensively exploited. In this thesis, we explore this concept and exploit the wavelet-driven learning paradigm to overcome the aforementioned shortcomings of traditional generative models. We leverage multi-scale analysis to make models inherently aware of frequency and spatial information. We first design a structure-aware GAN, StawGAN, tailored for cross-domain infrared-to-RGB image translation. Building on this foundation, we develop specialized diffusion models for domain-specific tasks, including high-fidelity maritime image super-resolution and efficient EEG-to-image synthesis. Moving beyond these conventional generative approaches, we introduce a series of wavelet-driven architectures that explicitly incorporate multi-scale signal representations. Among them is QUAVE, a novel framework leveraging quaternion wavelet transforms (QWT) to enhance feature extraction and improve generalization in medical imaging. Expanding on these insights, we also pioneer several wavelet-based super-resolution models: a QWT-conditioned diffusion model, a metadata- and wavelet-aware architecture for satellite imagery, and a highly efficient hybrid framework, Wavelet Diffusion GAN, that combines the strengths of GANs and diffusion processes. Finally, we address a direct extension of these works, focusing on high-fidelity synthesis. We culminate our investigation with a Latent Wavelet Diffusion (LWD) framework, a general and lightweight solution that enables existing latent diffusion and flow matching models to achieve UHR (up to 4K) synthesis without architectural modifications or additional inference costs. Through extensive experiments across various generative tasks involving different domains and data modalities, we have thoroughly explored the wavelet-driven paradigm while addressing scenario-specific challenges in fidelity, efficiency, and generalization, advancing research in this field.

Research products

11573/1741098 - 2025 - Gramian multimodal representation learning and alignment
Cicchetti, Giordano; Grassucci, Eleonora; Sigillo, Luigi; Comminiello, Danilo - 04b Atto di convegno in volume
conference: International Conference on Learning Representations (ICLR 2025) (Singapore; Republic of Singapore)
book: Proceedings of International Conference on Learning Representations (ICLR 2025) - ()

11573/1742870 - 2025 - Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models
Lopez, Eleonora; Sigillo, Luigi; Colonnese, Federica; Panella, Massimo; Comminiello, Danilo - 04b Atto di convegno in volume
conference: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025) (Hyderabad; India)
book: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - (979-8-3503-6874-1; 979-8-3503-6875-8)

11573/1739492 - 2025 - Generalizing medical image representations via quaternion wavelet networks
Sigillo, Luigi; Grassucci, Eleonora; Uncini, Aurelio; Comminiello, Danilo - 01a Articolo in rivista
paper: NEUROCOMPUTING (Elsevier BV:PO Box 211, 1000 AE Amsterdam Netherlands:011 31 20 4853757, 011 31 20 4853642, 011 31 20 4853641, EMAIL: nlinfo-f@elsevier.nl, INTERNET: http://www.elsevier.nl, Fax: 011 31 20 4853598) pp. - - issn: 0925-2312 - wos: WOS:001469482100001 (2) - scopus: 2-s2.0-105002227923 (4)

11573/1723593 - 2024 - Ship in sight: diffusion models for ship-image super resolution
Sigillo, L.; Gramaccioni, R. F.; Nicolosi, A.; Comminiello, D. - 04b Atto di convegno in volume
conference: 2024 International Joint Conference on Neural Networks, IJCNN 2024 (Yokohama; Japan)
book: Proceedings of the International Joint Conference on Neural Networks - (9798350359312)

11573/1693469 - 2023 - GROUSE. A task and model agnostic wavelet-driven framework for medical imaging
Grassucci, Eleonora; Sigillo, Luigi; Uncini, Aurelio; Comminiello, Danilo - 01a Articolo in rivista
paper: IEEE SIGNAL PROCESSING LETTERS (IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: subscription-service@ieee.org, INTERNET: http://www.ieee.org, Fax: (732)981-9667) pp. 1397-1401 - issn: 1070-9908 - wos: WOS:001086210700001 (8) - scopus: 2-s2.0-85174843341 (9)

11573/1693480 - 2023 - StawGAN: Structural-Aware Generative Adversarial Networks for Infrared Image Translation
Sigillo, L.; Grassucci, E.; Comminiello, D. - 04b Atto di convegno in volume
conference: 56th IEEE International Symposium on Circuits and Systems, ISCAS 2023 (Monterey, USA)
book: Proceedings - IEEE International Symposium on Circuits and Systems - (978-1-6654-5109-3)

11573/1693467 - 2023 - Sailing the SeaFormer. A transformer-based model for vessel route forecasting
Sigillo, L.; Marzilli, A.; Moretti, D.; Grassucci, E.; Greco, C.; Comminiello, D. - 04b Atto di convegno in volume
conference: 33rd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2023 (Rome; Italy)
book: IEEE International Workshop on Machine Learning for Signal Processing, MLSP - (979-8-3503-2411-2)

11573/1669173 - 2022 - Hypercomplex image- to- image translation
Grassucci, Eleonora; Sigillo, Luigi; Uncini, Aurelio; Comminiello, Danilo - 04b Atto di convegno in volume
conference: 2022 International Joint Conference on Neural Networks, IJCNN 2022 (Padua; Italy)
book: Proceedings of the International Joint Conference on Neural Networks - (978-1-7281-8671-9)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma