Titolo della tesi: Computational intelligence and granular information processing strategies for automatic communications security analysis
Communications network evolution established as the main pillar making new kind of services possible, through an even more connected world, wide spreading cutting edge technologies like mobile ad-hoc networks, smart Internet of Things devices and sensors networks. Supporting new services required building a complex layered mix of technologies with heterogeneous requirements making integration challenging. Such complexity contributed to increase attack surface on networks that are spread across multiple kinds of environments, requiring new kind of tools able to adapt to network conditions and provide insights of meaningful sets of packets.
In this scenario, research has focused on multiple approaches to analyze network traffic, for instance modeling behavior of information sources. Conventional mathematical modeling could prove overly complex to employ due to high volume traffic, sent and received at high speed, and concerning unpredictable user behavior. For these reasons, Artificial Intelligence and data-driven techniques have emerged as viable candidates to build adaptable and scalable models able to recognize key patterns in network traffic.
Our work focused on algorithms to perform automatic network traffic analysis and knowledge discovery, introducing new approaches built on consolidated Granular Computing-based techniques. This is an introductory study to investigate viable solutions able to analyze network traffic extracting recurrent patterns by processing both single packets alone and aggregated sets of packets represented in suitable data structures. Thus, we first provide ways to pre-process single instances captured, representing them in structured domains able to highlight temporal and topological relationships between packets. Moreover we provide computational intelligence-based processing techniques to automatically find recurrent substructures both in sequences and graphs domains. Thus, we formally define non-exclusive supervised problems and ways to handle multi-labeled sequences and graphs data structures during the whole processing pipeline. We show that the proposed solutions provide interesting results to perform data mining and automatic knowledge discovery over network captures, by learning human interpretable white box optimized models. Lastly, we provide a detailed overview of results obtained in terms of performance and interpretability, both from real-time and post-processing network captures.