SERENA BAINI

PhD Graduate

PhD program:: XXXVII



Thesis title: Georeferenced database for estimating the Italian genetic variability of red-listed insect species

Biodiversity loss is an urgent global issue, with insect populations experiencing unprecedented declines due to habitat destruction, climate change, pollution, and anthropogenic pressures. Insects play crucial roles in ecosystems as pollinators, decomposers, and bioindicators, yet their conservation remains overlooked, and genetic diversity is rarely considered in conservation planning. This thesis addresses critical knowledge gaps in insect conservation genetics, focusing on species listed in the Italian IUCN Red Lists. Through a multi-faceted approach, we explore the availability, spatial distribution, and integration of genetic data to inform conservation priorities In the first chapter, I developed a semi-automated pipeline to retrieve genetic sequences from GenBank and BOLD, integrating manual data extraction from relevant literature. The pipeline included automated filtering steps to clean and standardize metadata, including taxonomic verification, sequence assignment validation, and georeferencing improvements. Taxonomic inconsistencies were resolved by aligning all species names with the taxonomy adopted by the IUCN Red List, while erroneous or missing geographic coordinates were corrected using the R/CoordinateCleaner package. This process identified and filtered common georeferencing errors, including records mislocated in capital cities, biodiversity institutions, outliers, and points erroneously placed in marine areas. To ensure accurate taxonomic assignment, we employed a widely used procedure utilizing the QIIME2 platform. We generated a taxonomic classifier based on a recently curated reference database and applied this trained classifier to validate the taxonomic identity of all sequences included in our curated dataset. Following these procedures, we assembled a curated and georeferenced database of mitochondrial DNA (COI) sequences for four insect orders of conservation concern—Lepidoptera, Hymenoptera, Coleoptera, and Odonata—comprising approximately 37,000 sequences from 1,466 species. The database is available on Zenodo in SQL format. Our analysis revealed substantial taxonomic and spatial biases in genetic data availability. Only 41% of Hymenoptera and 53% of Coleoptera species listed in the IUCN Red List were represented in genetic repositories, with many species lacking molecular data. While genetic data coverage was higher for Lepidoptera (92%) and Odonata (99%), intraspecific representation remained uneven, with a mean of 24.3 sequences per species. Lepidoptera had the highest representation (89.5 sequences per species), while Hymenoptera and Coleoptera had significantly lower coverage (7.6 and 9.9 sequences per species, respectively). Data quality issues were prevalent, with 8% of sequences containing georeferencing errors, mainly due to incorrect coordinate placements (e.g., records in the sea or administrative centroids). Synonymy and taxonomic inconsistencies further complicated data retrieval, as 7% of Coleoptera, 6% of Hymenoptera, and 36% of Lepidoptera species were initially missing due to misclassification under outdated taxonomies. Additionally, after a sequence assignment check, we found that 17% of Odonata, 12% of Lepidoptera and Coleoptera, and 6% of Hymenoptera occurrences lacked species-level identification, limiting their applicability in conservation or other fields. Despite these challenges, our study highlights the potential of curated genetic datasets for insect conservation. One of the most significant results of our study is the successful assignment of geographic coordinates to nearly 70% of the collected data, recovering a substantial amount of otherwise unusable genetic records. This meticulous georeferencing process significantly enhances the usefulness of genetic data for spatial conservation analyses. By refining metadata, improving georeferencing accuracy, and systematically compiling genetic data, we provide a valuable resource for researchers and conservation initiatives. Furthermore, the automation of as many processes as possible enable the continuous updating of the dataset through automated workflows, while also allowing for manual retrieval of additional data from existing scientific publications. Future efforts should prioritize expanding taxonomic coverage, standardizing metadata, and addressing geographic gaps to enhance the potential of genetic data. The second chapter expands upon this dataset by assessing how effectively the Natura 2000 network and Italian national protected areas (PAs) preserve intraspecific genetic diversity in selected insect species from the Italian IUCN Red List. Using species with sufficient genetic data, we generated continuous maps of allelic richness (AR), nucleotide diversity (π), and haplotype diversity (h) through a moving window approach. Kriging interpolation techniques were applied to generate smoothed genetic diversity surfaces, which were then masked using Area of Habitat (AOH) maps to refine spatial accuracy by excluding unsuitable habitats. To quantify the representation of genetic diversity within protected areas, genetic diversity surfaces were classified into three categories—Low, Medium, and High diversity—and analyzed within and outside Natura 2000 sites and national parks. Our findings reveal that protected areas capture lower diversity classes of AR and π compared to unprotected regions, while haplotype diversity (h) tends to be comparatively higher within them. Natura 2000 sites provide significantly broader coverage of genetic diversity than national protected areas, as confirmed by permutation-based analyses, which indicate that this difference is not solely explained by their larger spatial extent. Specifically, high genetic diversity classes were more prevalent in Natura 2000 than in national parks, with 19.6% vs 11.4% for AR, 18.9% vs 11.3% for π, and 21.5% vs 12.7% for h. The IPA (In Protected Areas) and INA (In Natura 2000 Areas) analyses were conducted to evaluate the distribution of genetic diversity within protected areas, focusing on three distinct diversity classes: Low, Medium, and High. IPA analysis assessed genetic diversity within the boundaries of Italian national protected areas, while INA analysis focused specifically on the Natura 2000 network. Results indicated that across genetic metrics, the Medium Class was generally the most prevalent within protected areas. However, nucleotide diversity (π) showed a predominance of Low Class values, suggesting that many protected areas capture relatively low levels of this metric. In contrast, haplotype diversity (h) exhibited a more balanced distribution, with a notably higher proportion in the High Class compared to the other metrics, indicating that protected areas may still retain a considerable breadth of genetic lineages. To further explore phylogenetic diversity, we reconstructed intraspecific phylogenies for selected species using Bayesian inference and applied a modified kriging method to interpolate lineage distributions across landscapes using the R/Phylin package. Spatial interpolation of genetic lineages revealed strong structuring in butterfly species such as Pyronia cecilia (R² = 0.93) and Lasiommata megera (R² = 0.84), emphasizing the role of protected areas in maintaining localized genetic variation. However, the representation of different lineages varied significantly, suggesting that conservation strategies should account for intraspecific genetic structures rather than treating species as homogeneous units. For example, in P. cecilia, the dominant genetic lineage had greater protected coverage (30.3%) than the secondary lineage (22.0%), highlighting the need for lineage-specific conservation measures. Similarly, the beetle Rosalia alpina exhibited notable lineage differentiation, with one lineage having 44.0% coverage in protected areas compared to 32.5% for the other, underscoring the importance of considering spatial lineage distributions in conservation planning. These findings emphasize the importance of integrating genetic diversity into conservation strategies and leveraging genetic data to refine spatial conservation priorities. The genetic maps generated in this study provide a valuable baseline for future monitoring, allowing researchers and conservationists to track trends in genetic diversity over time. By identifying areas of high genetic diversity, these maps can help guide conservation actions that incorporate this critical aspect of biodiversity. Given the limited genetic data for Red List insect species, leveraging public genetic repositories and spatial interpolation techniques can reveal genetic and phylogenetic patterns that inform conservation planning. As genetic repositories continue to expand, incorporating these data into future analyses will improve our ability to detect meaningful genetic diversity patterns and refine conservation priorities. Nevertheless, it is necessary to recognize that the current paucity of genetic data represents a major limitation. This study highlighted the need to supplement mitochondrial DNA (mtDNA) with nuclear genetic data, which could reveal distinct regions of high genetic diversity and enhance the patterns identified through mtDNA. By improving data sharing and promoting the broader use of genetic data in conservation strategies, this study provides a foundation for more effective and evidence-based conservation actions, particularly for overlooked insect taxa. The third chapter integrates taxonomic and phylogenetic diversity metrics into systematic conservation planning for Italian Lepidoptera listed in the IUCN Red List. Using the Zonation conservation prioritization tool, we identified high-priority areas based on multiple biodiversity features, including species richness (SR), weighted endemism (WE), phylogenetic diversity (PD), evolutionary distinctiveness (ED), and phylogenetic endemism (PE). Additionally, EDGE scores were incorporated to assess species’ evolutionary uniqueness and extinction risk, ensuring a more comprehensive conservation prioritization approach. The study focused on 257 Italian butterfly species from the IUCN Red List, including 9 endemics. Distribution data were obtained from the IUCN Red List Spatial Data repository and supplemented with species atlases and literature. A time-calibrated phylogeny covering all extant European butterfly species, reconstructed using Bayesian inference from mitochondrial (COI) and nuclear gene fragments, was used to derive phylogenetic diversity metrics. We computed taxonomic diversity metrics (SR, WE) and phylogenetic diversity metrics (PD, ED, PE) maps, along with EDGE scores, which integrate ED with extinction risk (IUCN categories mapped to global endangerment values). Before conducting spatial conservation prioritization, we analyzed correlations between taxonomic (SR, WE) and phylogenetic (PD, ED, PE) diversity metrics. These analyses identified potential counterbalances between conservation approaches, thus informing the prioritization process. Spatial conservation prioritization was performed using Zonation 5 to investigate how different dimensions of biodiversity influence the identification of priority areas. We first conducted pairwise comparisons between species richness and each phylogenetic metric—phylogenetic diversity (PD), evolutionary distinctiveness (ED), and phylogenetic endemism (PE)—to assess spatial congruence and potential mismatches. These comparisons were then repeated using species richness combined with weighted endemism (SR + WE) to evaluate the effect of incorporating taxonomic rarity. Spatial overlap among outputs was quantified by comparing the top 20% priority areas, aligned with the current extent of the Italian PA and Natura 2000 networks. Finally, a comprehensive prioritization was carried out integrating PD, ED, PE, and species-specific EDGE scores, with existing protected areas (PAs) included as a hierarchical constraint. This allowed us to assess the effectiveness of the current PA network in capturing evolutionary diversity and to simulate a 10% expansion to identify complementary areas for protection. The results revealed a strong negative correlation between species richness (SR) and evolutionary distinctiveness (ED) (r = -0.895), suggesting that species-rich areas do not necessarily harbor evolutionarily unique species, highlighting a key limitation of using species richness alone in conservation planning. High species richness was concentrated in northern Italy, particularly in the Alps, while high ED values were found in Sardinia, southern Italy, and insular regions, indicating the presence of unique evolutionary lineages. Similarly, phylogenetic endemism (PE) showed peaks in Lampedusa and the Pontine Islands, reflecting localized evolutionary histories. When evaluating the effectiveness of Italy’s protected areas (PAs), we found that while PAs provide better coverage for high PD and PE regions, they fail to adequately protect high-ED regions. Only 2.9% of high-ED areas were covered by PAs, compared to 7.9% for PD and 6.3% for PE, indicating gaps in the conservation of evolutionarily distinct species. This result highlights that current protected areas do not adequately cover evolutionarily distinct species, necessitating strategic expansions that prioritize high-ED. Spatial prioritization with Zonation revealed limited spatial congruence between taxonomic and phylogenetic diversity metrics. Overlap between species richness (SR) and phylogenetic diversity (PD) was only 11.00%, with 9.00% of the area uniquely prioritized by either metric, highlighting the distinct spatial signals captured by each. A higher overlap (15.93%) was observed between SR and evolutionary distinctiveness (ED), despite a strong negative correlation, reflecting Zonation’s emphasis on complementary representation. Similar patterns were found in comparisons with phylogenetic endemism (PE), where shared areas ranged from 12.70% to 13.24%, and exclusive areas from 6.76% to 7.30%. Including weighted endemism (WE) slightly modified priority rankings but did not substantially increase congruence with phylogenetic metrics, suggesting limited added value when species are already treated as discrete features. Finally, under a protected area (PA)-constrained prioritization incorporating EDGE scores and phylogenetic metrics, current PAs showed low representation across most features, with 75.1% concentrated in the lowest coverage class. A simulated 10% expansion improved representation considerably, reducing underrepresentation and increasing the proportion of features reaching higher coverage thresholds, with only 5% of metrics remaining in the lowest coverage class. These findings emphasize the need to integrate phylogenetic diversity metrics into conservation planning, as traditional approaches focusing solely on species richness may overlook evolutionary uniqueness. Additionally, these findings contribute to global initiatives such as the IUCN SSC Phylogenetic Diversity Task Force that are increasingly promoting the integration of phylogenetic diversity into conservation strategies, supporting the development of metrics like the IPBES PD indicator and the prioritization of EDGE species. This thesis underscores the importance of incorporating genetic and phylogenetic diversity into conservation planning for insects. By integrating genetic data with spatial analyses, we demonstrate how conservation strategies can move beyond traditional species-based approaches to better capture evolutionary history and intraspecific variability. The construction of a curated genetic database provided an important resource for assessing taxonomic and geographic biases, emphasizing the need for standardized and accessible genetic data. Aligning with global biodiversity targets, this work advocates for the broader adoption of genetic and phylogenetic metrics to enhance conservation planning and long-term species resilience.

Research products

11573/1701429 - 2024 - Filling knowledge gaps in insect conservation by leveraging genetic data from public archives
Baini, Serena; De Biase, Alessio - 01a Articolo in rivista
paper: DATABASE (Oxford: Oxford Journals, 2009- Oxford,UK: Oxford University Press) pp. 1-13 - issn: 1758-0463 - wos: WOS:001151598200001 (0) - scopus: 2-s2.0-85183715141 (0)

11573/1643725 - 2022 - Genetic assessment reveals inbreeding, possible hybridization, and low levels of genetic structure in a declining goose population
Honka, J.; Baini, S.; Searle, J. B.; Kvist, L.; Aspi, J. - 01a Articolo in rivista
paper: ECOLOGY AND EVOLUTION (Oxford: Wiley-Blackwell) pp. 1-18 - issn: 2045-7758 - wos: WOS:000747845400078 (10) - scopus: 2-s2.0-85123804229 (10)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma