LORENZO DI ROCCO

Dottore di ricerca

ciclo: XXXVII

supervisore: Umberto Ferraro Petrillo

Titolo della tesi: Scalable Solutions for Large-scale Bioinformatics Analysis: A Critical Study of Apache Spark Application in High-Performance Computational Genomics

Over time, the evolution of sequencing platforms has revolutionized the ability to unravel DNA complexity, enabling an increasing understanding of the genetic structure of the organisms. However, these technological advancements have resulted in the generation of vast amounts of data that can be processed, stored, and interpreted. The growing volume of sequencing output has motivated a successful integration of computational genomics with supercomputing and artificial intelligence techniques to efficiently face computational challenges and extract meaningful insights from raw data, ultimately improving the speed and accuracy of genomic analysis. However, the potential of distributed computing in genomics has yet to be fully unlocked. While theoretically advantageous, the distribution of complex bioinformatics tasks is challenging, as it requires a deep understanding of distributed systems and advanced programming skills. This thesis leverages Apache Spark to propose distributed pipelines designed to address critical challenges in computational genomics that involve processing large datasets. Apache Spark is a highlevel framework that simplifies and accelerates the development of distributed solutions by managing technical issues internally. This feature makes Apache Spark particularly appealing for developing user-friendly, cloud-compatible libraries that promote the adoption of distributed computing in computational genomics. However, while this abstraction has proven successful in various realworld applications, it may face limitations in addressing the highly complex, sequentially structured problems often encountered in computational genomics, where processing across non-shared memory systems poses unique challenges. Through extensive experimental evaluations, this thesis aims to assess the strengths and limitations of applying Apache Spark to large-scale problems in computational genomics. Each chapter focuses on a specific genomics-related application that is known for its data-intensive nature and where distributed computing could serve as a strategic resource. For each case, a pipeline is proposed and thoroughly analyzed through experiments aimed at evaluating scalability and identifying potential bottlenecks.

Produzione scientifica

11573/1748966 - 2026 - SapientIAGraph: An Open Knowledge Graph of University Degree Programs at Sapienza

Ceccaroni, Riccardo; Di Rocco, Lorenzo; Ferraro Petrillo, Umberto - 04b Atto di convegno in volume

congresso: The 29th European Conference on Advances in Databases and Information Systems (Tampere)

libro: New Trends in Database and Information Systems - (978-3-032-05727-3)

11573/1738095 - 2025 - Two-Phase Distributed Algorithm for Solving the Bi-Objective Minimum Spanning Tree Problem: A Preliminary Study

Amorosi, Lavinia; Cairo, Mariagrazia; Dell’Olmo, Paolo; Di Rocco, Lorenzo; Ferraro Petrillo, Umberto - 04b Atto di convegno in volume

congresso: Parallel Processing and Applied Mathematics (PPAM 2024) (Ostrava)

libro: Parallel Processing and Applied Mathematics - ()

11573/1748806 - 2025 - A Distributed Workflow for Long Reads Self-correction

Ceccaroni, Riccardo; Di Rocco, Lorenzo; Ferraro Petrillo, Umberto; Brutti, Pierpaolo - 04b Atto di convegno in volume

congresso: Euro-Par 2024 International Workshops (Madrid; Spain)

libro: Euro-Par 2024: Parallel Processing Workshops - (9783031902024; 9783031902031)

11573/1747320 - 2025 - A flexible parametric approach to synthetic patients generation using health data

Cipriani, Marta; Di Rocco, Lorenzo; Puopolo, Maria; Alfò, Marco - 01a Articolo in rivista

rivista: STATISTICAL METHODS & APPLICATIONS (Physica-Verlag, berlin) pp. 639-662 - issn: 1618-2510 - wos: WOS:001561231000001 (0) - scopus: 2-s2.0-105014897022 (0)

11573/1733746 - 2025 - A Flexible Parametric Approach to Synthetic Patients Generation in Clinical Trials

Cipriani, Marta; Rocco, Lorenzo Di; Alfò, Marco - 04b Atto di convegno in volume

congresso: SIS 2024 (Bari)

libro: Methodological and Applied Statistics and Demography III - (9783031644306; 9783031644313)

11573/1747366 - 2025 - PatientProfiler: A network-based approach to personalized medicine

Lombardi, Veronica; Di Rocco, Lorenzo; Meo, Eleonora; Venafra, Veronica; Di Nisio, Elena; Perticaroli, Valerio; Lorentz Nicolaeasa, Mihail; Cencioni, Chiara; Spallotta, Francesco; Negri, Rodolfo; Sacco, Francesca; Perfetto, Livia - 01a Articolo in rivista

rivista: MOLECULAR SYSTEMS BIOLOGY (London : Nature Pub. Group) pp. - - issn: 1744-4292 - wos: (0) - scopus: (0)

11573/1752444 - 2025 - PatientProfiler: building patient-specific signaling models from proteogenomic data

rivista: MOLECULAR SYSTEMS BIOLOGY (London : Nature Pub. Group) pp. - - issn: 1744-4292 - wos: WOS:001590695400001 (2) - scopus: 2-s2.0-105018590634 (2)

11573/1717606 - 2024 - A distributed approach for persistent homology computation on a large scale

Ceccaroni, Riccardo; Di Rocco, Lorenzo; Ferraro Petrillo, Umberto; Brutti, Pierpaolo - 01a Articolo in rivista

rivista: THE JOURNAL OF SUPERCOMPUTING (Kluwer Academic Publishers / Massachusetts:PO Box 358, Accord Station:Hingham, MA 02018:(617)871-6600) pp. - - issn: 0920-8542 - wos: WOS:001289394700004 (1) - scopus: 2-s2.0-85200951050 (3)

11573/1729124 - 2024 - Exploiting mechanistic models toward personalised strategies in Breast Cancer

Lombardi, Veronica; Venafra, Veronica; Di Rocco, Lorenzo; Nicolaeasa, Lorentz; Ferraro Petrillo, Umberto; Alfo', Marco; Sacco, Francesca; Perfetto, Livia - 04f Poster

congresso: 20th BITS Annual Meeting (Trento, Italy)

libro: 20th BITS 2024 - ()

11573/1692306 - 2023 - A Distributed Alignment-free Pipeline for Human SNPs Genotyping

Di Rocco, L.; Ferraro Petrillo, U. - 04b Atto di convegno in volume

congresso: 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (Houston)

libro: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics - (9798400701269)

11573/1671078 - 2023 - Using software visualization to support the teaching of distributed programming

Di Rocco, L.; Ferraro Petrillo, U.; Palini, F. - 01a Articolo in rivista

rivista: THE JOURNAL OF SUPERCOMPUTING (Kluwer Academic Publishers / Massachusetts:PO Box 358, Accord Station:Hingham, MA 02018:(617)871-6600) pp. 3974-3998 - issn: 0920-8542 - wos: WOS:000854426900002 (2) - scopus: 2-s2.0-85138168998 (2)

11573/1644217 - 2022 - Scheduling K-mers Counting in a Distributed Environment

Amorosi, L.; Di Rocco, L.; Ferraro Petrillo, U. - 04b Atto di convegno in volume

congresso: International Conference on Optimization and Decision Sciences, ODS 2021 (Rome, Italy)

libro: Optimization in Artificial Intelligence and Data Sciences - ()

11573/1657394 - 2022 - Community detection in networks: a heuristic version of Girvan Newman algorithm

Bombelli, Ilaria; Di Rocco, Lorenzo - 04b Atto di convegno in volume

congresso: SIS2022 - 51ST SCIENTIFIC MEETING OF THE ITALIAN STATISTICAL SOCIETY (Caserta; Italy)

libro: Book of the short papers - (9788891932310)

11573/1671074 - 2022 - DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks

Di Rocco, Lorenzo; Ferraro Petrillo, Umberto; Rombo, Simona E - 01a Articolo in rivista

rivista: BMC BIOINFORMATICS ([London]: BioMed Central, [2000]-) pp. 1-18 - issn: 1471-2105 - wos: WOS:000881990300001 (8) - scopus: 2-s2.0-85141688064 (7)

11573/1556563 - 2021 - Large Scale Graph Based Network Forensics Analysis

Di Rocco, L.; Ferraro Petrillo, U.; Palini, F. - 04b Atto di convegno in volume

congresso: 25th International Conference on Pattern Recognition Workshops, ICPR 2020 (milan; Italy)

libro: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) - ()