Thesis title: Advanced Data Structures for efficient and precise Per-Flow Measurements
A vast scientific literature has been produced in the last decades that investigates
optimal ways to monitor network traffic. Due to the rise in interest, scale, and
complexity of telecommunication networks, this task has become more challenging
and exciting at the same time. The requirements to perform such a task often
collide with the limited amount of resources available on modern network devices,
like switches and routers, confronted with the amount of network flows to be
monitored. Part of the academic and industrial research then focused on the usage of
approximated data structures: instead of keeping per-flow reserved statistics, these
structures make their computational and memory resources shared among many
flows or they keep track of just a limited subset of the overall traffic, exchanging a
reduced memory footprint for the accuracy of their estimate. The recent blooming
of Machine Learning research has raised attention on the possibility of combining
machine learning models with approximate data structures, with new promising
solutions already proposed to the academy. In this thesis we try to analyze the
existing state of the art on these "learned" data structures for network monitoring;
we then propose a novel system that tries to employ a single ML model that, beyond
enhancing the performances of a flow monitoring system, attempts to optimize
other networking tasks. By distributing the burden of the ML model overhead over
multiple applications, we make users more willing to pay the price for the increased
cost and complexity due to ML introduction. Besides this research direction, the
analysis of the approximated data structure literature led to the conclusion that
they are, in practice, heavily underutilized, i.e., sparse, thus wasting a significant
amount of memory. We introduce a solution to such waste in the form of a data
structure representation that leverages sparsity to reduce the memory footprint of
per-flow monitoring systems while preserving their original accuracy. Ultimately,
part of our research effort dealt with classical data structures: as part of this work,
we show a novel low-error flow size estimate system based on a classical approximate
data structure.