Titolo della tesi: Automata-Theoretic Techniques for Declarative Process Mining
This thesis investigates the use of finite-state automata for various process-related tasks.
Automata are a possible choice for a process modeling language that has gained increasing attention in recent years. The main reason for this is their relation to temporal logics, which makes automata easy to define and understand, while preserving all the advantages of a procedural representation. Indeed, if we think of a process as a set of process traces, that is, as event sequences constituting a formal language, finite-state automata are a natural choice for modeling processes.
As a first contribution, we propose a new method for Temporal Reasoning in Answer Set Programming (ASP) that takes advantage of the automata representation of temporal specifications expressed in Linear-Time Temporal Logic on finite traces (LTLf). This method is then employed to solve various problems of interest for the Declarative Process Mining community, in particular, Log Generation, Conformance Checking, and Query Checking. All of those problems are addressed from both a control-flow perspective and a data perspective. The experimental evaluation conducted, including a comparison with state-of-the-art tools, shows the feasibility of the approach. Notably, our method drastically outperformed the best tools for Log Generation from declarative specifications.
The thesis then moves on to investigate how to leverage Automata Learning algorithms for the automated discovery of process models from event logs. After an analysis of the performances of the Minimum Description Length (MDL) algorithm, which only takes as input a sample of positive words (i.e., the event log) we advocate for the importance and feasibility of including also negative examples. As a result, this makes other learning algorithms available. In particular, Regular Positive and Negative Inference (RPNI) and Evidence Driven State Merging (EDSM) algorithms are considered. We conducted an extensive evaluation on both real-life and synthetic logs, considering as quality metrics precision, fitness, generalization and simplicity (some of them requiring to be adapted to the new learning setting). The results pointed out that
MDL generates much simpler, and therefore more understandable, automata than the other algorithms, keeping similar values of precision and generalization. However, since RPNI and EDSM learn the DFAs from explicit negative behaviors, they
produce automata that are able to better discriminate between positive and negative behaviors. Regarding time performances, we observe that they decreases exponentially for logs including a large activity alphabet. Nonetheless, the algorithms seem to scale very well for logs including a large number of distinct traces and/or traces including many events.