Algorithms, Theoretical Computer Science and Epigenomics: Mining Mathematical Laws for Predicting Chromatin Organization and Nucleosome Occupancy in Eukaryotic Genomes

Epigenomics is the study of modifications on the genetic material of a cell that do not depend on changes in the DNA sequence, since those latter involve specific proteins around which DNA wraps. The end result is that epigenomic changes have a fundamental role in the proper working of each cell in Eukaryotic organisms. A particularly important part of Epigenomics concentrates on the study of chromatin, that is, a fiber composed of a DNA-protein complex and very characterizing of Eukaryotes. Understanding how chromatin is assembled and how it changes is fundamental for Biology. Starting from an open problem regarding nucleosomes and the so called 30 nm fiber, posed by R. Kornberg nearly 30 years ago, we show how the use of simple ideas from Theoretical Computer Science can surprisingly help in making progress in obtaining a mathematical characterization of nucleosome potioning in Eukaryotes, based only on knowledge of the DNA sequence. Such a characterization was conjectured not to exist. As a corollary, we obtain an algorithm for nucleosome occupancy prediction, optimal in time and guaranteeing a two orders of magnitudo improvement in time over Machine Learning counterparts. In addition, statistically sound data structures, i.e., Epigenomic Dictionaries, are also introduced in order to effectively mine epigenomic datasets for the discovery of further regularities. Again, those data structures are much more sensitive than the Machine Learning countderparts and they can profitably accomodate information from different sources. Joint Work with: Simona Ester Rombo, Dipartimento di Matematica ed Informatica, University of Palermo Italy; Filippo Utro, Computational Biology Center, IBM T.J. Watson Research, Yorktown Heights, NY, USA.

30/10/2019

3:00 pm
Place: viale Regina Elena 295/B, building F, last floor
Raffaele Giancarlo (Dipartimento di Matematica ed Informatica, University of Palermo)