Titolo della tesi: Digital Palaeography as a matter of sign representation: from the reconstruction of the objective shape of the inscriptions to their analysis through Deep Neural Networks
Digital palaeography can be synthetically defined as the combination of traditional palaeographic methods (aimed at studying historic writing systems and at analyzing, deciphering, and dating handwritten texts, based on very specific stylistic features) and techniques for digitizing inscriptions, tools and algorithms for data processing and (shared) analysis, and databases for data management. Included in the broader field of Digital Humanities, this discipline has benefited, in recent years, on the one hand from increasingly advanced digitization techniques (such as Reflectance Transformation Imaging, Multispectral Imaging, X-ray Computed Tomography, and 3D modelling), and, on the other, from the development of sophisticated algorithms for data analysis of the (already) digitized inscriptions, mainly including data-driven methodologies, and, more precisely, machine and deep learning solutions.
This thesis is part of this relatively recent research field, with a focus on the fundamental problem of the high-quality and correct representation of signs, inscriptions, and handwritten texts in general. In particular, the concept of representation (that is, of re-elaboration through a simplifying model) of the signs is explored from two different perspectives.
On the one hand, if we consider the problem of inscription digitization, a high-level representation of an inscription corresponds to a replica of the signs as accurate as possible from a metric and morphological point of view, so that a close inspection is possible even remotely, and that also the smallest details – including those not immediately visible to the naked eye – are highlighted. On the other hand, the same concept has been studied from the point of view of the automatic analysis of the inscriptions (given that they are already available in a digital format): in this case, by optimal representation we mean a transformation of the original data (for example, from an image of a sign to a feature vector) which maximizes the performance of an algorithm aimed at solving a task which involves that data.
The first interpretation of this basic concept was deepened by considering the problem of the digitization of four undeciphered writing systems, three of which were in use in the second millennium BCE Aegean (Cretan Hieroglyphic, Linear A, and Cypro-Minoan), plus Rongorongo (of uncertain dating), used on Easter Island (this research direction was investigated within the ERC project INSCRIBE, with Prof. Silvia Ferrara as Principal Investigator). Since these writing systems are generally “three-dimensional”, typically carved on objects of very small dimensions (of the order of 1-5 cm), and characterized by signs a few tenths of a millimeter deep, their digitization was mainly carried out through accurate and high resolution 3D modelling techniques, such as macro-photogrammetry (combined with focus stacking) and structured light scanning, able to guarantee an accuracy in the order of 1-2 hundredths of a millimeter. During the Ph.D., a total of 166 3D models were acquired, on the basis of which accuracy, precision, texture quality, and legibility assessments were made. The results produced clearly demonstrate the high potential of the methodologies adopted (and refined with specific measures) compared to the need to produce representations as faithful as possible to the original inscriptions.
If we move to the concept of representation from the point of view of the automatic analysis of the inscriptions, this second line of investigation was deepened by considering two different case studies. The first one involves the signs of the Cretan Hieroglyphic undeciphered writing system: in this case, the goal is to train an encoder (based on a deep residual network) of the signs to produce representations, or feature vectors, suitable for the task of sign classification (according to the Cretan Hieroglyphic sign repertoire). For the second case study, instead, a group of (digitized) medieval and modern manuscripts from the Vatican Apostolic Library was selected, with the aim of training an encoder, also here based on a deep residual network, to extract from the manuscript pages useful representations for the task of handwriting identification, or the subdivision of the manuscripts into parts belonging to distinct scribes, on the basis of the respective handwriting style.
However, for both case studies a sufficient amount of annotated or labeled data was missing (a typical problem in the palaeographic domain and known as “data scarcity”). This problem made the direct use of supervised learning techniques impossible, although until recently they have been the most efficient strategy for extracting high-level representations from data. To cope with this, in both cases a self-supervised pretraining solution was tested (using a method available in the literature and based on the reconstruction of Bags of Visual Words), capable of leveraging large amounts of unlabeled data to learn high-level representations. For both case studies, it has been shown how the inclusion of a self-supervised pretraining phase has a beneficial effect on the performance in the task itself, and therefore on the extraction of useful representations from the data.
In summary, therefore, this thesis has focused on the concept of optimal representation of inscriptions and handwritten texts in the context of digital palaeography. Among the main contributions, we can list the definition of 3D modelling techniques and practices for the accurate and high resolution reconstruction of inscriptions (plus some innovations in the processing and post-processing stages of the 3D models), and the identification of self-supervised learning as a powerful prompt to solve problems of interest in palaeography through data-driven approaches, although this research field is intrinsically affected (at least at present) by a lack of annotated data.