Thesis title: Exploiting Process Mining and Interaction-Driven Techniques to Improve the Life-cycle of Big Data Pipelines
We start by looking at a new reference model to represent event logs coming from Big Data Pipelines in XES. Then, we look at a semi-automated technique to generate simulated event logs that is not a burden for organizations. We continue looking at DIS-PIPE, a software built to implement big data pipeline discovery (which is a specialized version of process discovery) and uncover dark data through SQL queries. Finally we explore an interaction-driven framework to exploit user logs to pinpoint latency issues in visualization systems and understand what is causing them (e.g., rendering, the DBMS failing, or a poorly designed interaction). This framework can be used to facilitate the optimization of visualization systems, thus improving the interpratation phase of a big data pipeline.