COMPUTER SCIENCE

Seminars

2025

Prof. Marc Dacier - Gaming (in)security, an example of network security research at KAUST
30/06/2025
This seminar will start with a brief introduction to KAUST, the King Abdullah University of Sciences and Technology, a unique international place for STEM research in Saudi Arabia located on the coast of the red sea, 80 kms north of Jeddah. Then we will present some of the ongoing research within the SERBER group (Security Research Bearing Experimental Results) led by Prof. Dacier. In particular, we will present the challenging and booming area of online gaming which we see as the new wild west for cybersecurity researchers. We will discuss new cheating techniques that can be carried out without violating the integrity of the client or server software, simply by manipulating packets on the network. We will show real world examples of these attacks carried out in practice and discuss possible mitigation strategies.

Prof. Rob van Glabbeek - Mutual Exclusion: verification and impossibilities
28/05/2025
The mutual exclusion problem is one of the cornerstones of distributed computing. It deals with parallel processes that need to perform tasks that may not interfere with each other, such as updating records in a shared database. The sensitive parts of each process' code are collected in so-called critical sections, and the task of a mutual exclusion protocol is to make sure that two or more of the processes can never be in their critical section at the same time. Many mutual exclusion protocols have been proposed since this problem was posed in 1965, and it is generally believed that thereby the mutual exclusion problem has been solved. Yet, whether a mutual exclusion protocol operates correctly depends on the hardware on which it will be running. Today I will formulate 6 alternative assumptions that one could make on the way the hardware implements shared registers, and check to what extent classical solutions to the mutual exclusion problem actually yield correct mutual exclusion protocols under each of these assumptions. I report both on automated verification and on theoretical work. Regarding the latter, I will show that under those assumptions that were made in the original paper posing the mutual exclusion problem (namely, speed independence and atomicity) no correct mutual exclusion protocol is possible. This observation contradicts 50 years of research on mutual exclusion. I also show how mutual exclusion can be achieved when dropping any of these assumptions. Regarding the automated verification, we employed a model checker to exhaustively analyse the state space of more than a dozen of the most popular mutual exclusion protocols, and automatically verify their correctness under each of the six hardware assumptions discussed. It turns out that several well-known algorithms do not live up to their promises, and have subtle flaws, that never showed up during their behavioural analysis, yet were found by our model checker.

From Passwords to Behavioral Analysis: the evolution of user authentication
14/05/2025
While the tech industry often references a 'passwordless future,' the reality is that passwords are still the backbone of online security and aren't going away anytime soon. This seminar discusses how best practices for password management have evolved, with a crucial focus on the human side of cybersecurity. We'll explore how increasingly complex password rules impact user behavior, drawing on the latest research, and suggest practical tips for protecting sensitive data. Nonetheless, it is clear that, today, passwords alone are no longer sufficient to safeguard our information and identities. Among alternative and complementary approaches, the rise of behavioral analysis is establishing itself as a powerful tool for enhancing security. This technology offers a transparent way to bolster user authentication within a Multifactor Authentication (MFA) framework. Specifically, a potential approach is presented using the unique patterns of how individuals type free-text on a keyboard. This analysis provides a deeper layer of security without adding significant friction for the user, addressing the growing need for robust, yet less intrusive, protection in today's digital landscape.

Rafael Frongillo - Incentive problems in data science competitions, and how to fix them
27/01/2025
Machine learning and data science competitions, wherein contestants submit predictions about held-out data points, are an increasingly common way to gather information and identify experts. One of the most prominent platforms is Kaggle, which has run competitions with prizes up to 3 million USD. The traditional mechanism for selecting the winner is simple: score each prediction on each held-out data point, and the contestant with the highest total score wins. Perhaps surprisingly, this reasonable and popular mechanism can incentivize contestants to submit wildly inaccurate predictions. The talk will begin with intuition for the incentive issues and what sort of strategic behavior one would expect---and when. One takeaway is that, despite conventional wisdom, large held-out data sets do not always alleviate these incentive issues, and small ones do not necessarily suffer from them, as we confirm with formal results. We will then discuss a new mechanism which is approximately truthful, in the sense that rational contestants will submit predictions which are close to their best guess. If time permits, we will see how the same mechanism solves an open question for online learning from strategic experts.

2024

Alexandra Silva - The Frontiers of Active Learning in Network Verification
25/11/2024
Automata learning has become a popular technique to facilitate analysis software and hardware behavior. It is particularly suited for behaviors that can be described as finite state machines, as for example network protocols, but can be used in a surprisingly wide range of applications. In this talk, I want to explore the limits of using automata learning in the context of network verification. I will give an overview of recent work on the use of active automata learning to analyse QUIC, a network protocol that is bound to eventually replace TCP. I will then discuss current work on active learning for weighted and probabilistic automata, needed in order to analyse more expressive network behaviors, e.g. congestion and fault tolerance.

Shaodi You - Physics-Based Vision in the AI Era
22/07/2024
Light traveling in the 3D world interacts with the scene through intricate processes before being captured by a camera. These processes result in the dazzling effects like colour and shading, complex surface and material appearance, different weathering, just to name a few. Physics based vision aims to invert the processes to recover the scene properties, such as shape, reflectance, light distribution, medium properties, etc., from the images by modelling and analyzing the imaging process to extract desired features or information. When physics based vision meets deep learning, there will be mutual benefits. On one hand, classic physics based vision tasks can be implemented in a data-fashion way to handle complex scenes. On the other hand, high-level vision task can also be benefitted by awareness of the physics principles.

Marco Canini - Programmable Networks for Distributed Deep Learning: Advances and Perspectives
24/06/2024
Training large deep learning models is challenging due to high communication overheads that distributed training entails. Embracing the recent technological development of programmable network devices, this talk describes our efforts to rein in distributed deep learning's communication bottlenecks and offers an agenda for future work in this area. We demonstrate that an in-network aggregation primitive can accelerate distributed DL workloads, and can be implemented using modern programmable network devices. We discuss various designs for streaming aggregation and in-network data processing that lower memory requirements and exploit sparsity to maximize effective bandwidth use. We also touch on gradient compression methods, which contribute to lower communication volume and adapt to dynamic network conditions. Lastly, we consider how to continue our research in light of the enormous costs of training large models at scale, which make it quite hard for researchers to tackle this problem area. We will describe our ongoing work to create a new approach to emulate DL workloads at a fraction of the necessary resources.

Gualtiero Volpe - Automated analysis of human movement qualities
21/05/2024
Full-body movement is a major nonverbal channel human beings exploit to communicate and to interact with each other. Movement is involved, for example, in the expression of emotions, and plays a crucial role in group phenomena. This talk will address technologies for real-time automated analysis of movement qualities, i.e., it will present computational approaches to distinguish how a movement is performed, e.g., suddenly, fluently, hesitantly, and so on. The focus will especially be on systems for capturing and representing the motoric behavior of individuals, for extracting features of movement quality, and for segmenting a movement stream into meaningful units. Application areas include technologies for rehabilitation, education, and well-being.

Michael Swift - "Virtual Memory in the 21st century" [Mandatory]
08/05/2024
Page-based virtual memory, which carves memory into fixed-size pages, forms the basis for all modern operating systems by providing the illusion of a huge memory space for running programs. Yet, since its development in the early 1960s, computing has diversified and memory capacities have grown by orders of magnitudes. I will talk about why virtual memory endures today, the challenges facing virtual memory in modern systems and new directions in retaining its enduring benefits.

Riccardo Spolaor - "Audio Eavesdropping through Side-Channel Analyses"
21/03/2024
In recent years, mobile devices have become essential tools for daily communication activities and social network interactions for billions of people all around the world. As such devices contain a plethora of private and sensitive information, they have become popular targets of attacks to harm users' privacy, among which a passive attack called side-channel analysis. The side-channels are a physical phenomenon that can be measured from both inside or outside a device. They are mostly due to the user interaction with a mobile device, but also to the context in which the device is used, hence they can reveal sensitive user information among which audio content from private conversations. This talk focuses on eavesdropping attacks by measuring the sound-induced vibrations via two distinct approaches. The first attack focuses on the vibration produced by the built-in mobile device's loudspeaker via the accelerometer sensor. The second attack utilizes a mmWave radar to reconstruct the audio within a room from the minute vibrations on surrounding objects. Differently from other state-of-the-art methods, both these attacks can achieve eavesdropping with unconstrained vocabulary.

Monika Henzinger - "How Can Algorithms Help in Protecting our Privacy"
12/03/2024
Decisions are increasingly automated using rules that were learnt from personal data. Thus, it is important to guarantee that the privacy of the data is protected during the learning process. To formalize the notion of an algorithm that protects the privacy of its data, differential privacy was introduced by Dwork, McSherry, Nissim, and Dwork in 2006. It is a rigorous mathematical definition to analyze the privacy properties of an algorithm – or the lack thereof. In this talk I will give an introduction to differential privacy with an emphasis on differential private algorithms that can handle dynamically changing input data.

Muhammad Najib - "From Polyhedron to Verification: Geometrical Reasoning in Cooperative Multi-Player Mean-Payoff Games"
29/02/2024
Concurrent multi-player mean-payoff games serve as important models for multi-agent systems with quantitative individual goals. These games have been extensively studied in non-cooperative settings, for example, using Nash equilibrium. This talk, however, explores an alternative setting: what happens if cooperation is possible? This setting is particularly relevant for cooperative AI systems, as it allows for the modelling of cooperation among agents, even when their goals do not fully align. First, we present a characterisation of cooperative game solution concept called the core using discrete geometry techniques and establish a necessary and sufficient condition for its non-emptiness. Second, we utilise this characterisation to demonstrate the existence of polynomial witnesses in the core. Finally, we leverage this existence to address key decision problems in rational verification.

2023

Kamen Kanev - "Hand and Finger Motion Tracking Interfaces for Interactive Communication"
16/11/2023
This presentation will focus on application domains where high fidelity tracking of the subtle hand and finger motions is required. With respect to this, the design and development of the specialized Data Glove conceived at RIE and employed by Yamaha for musical instrument performance tracking will be discussed. Details will be provided about the specialized highly stretchable carbon nanotube (CNT) sensors developed at RIE and embedded in the Data Glove. A live demonstrations of the Data Glove and its capabilities for hand and finger motion based interactions and communication could also be carried out if the time permits. More specialized interfaces where accommodated hand and finger motion tracking is required for addressing the needs of people with disabilities will also be discussed. With respect to this some interaction models based on the Malossi alphabet commonly used for communications with deafblind persons will be considered. An extended version of the Malossi alphabet redesigned for single-hand operations and thus suitable for mobile communications will be presented and the related interface requirements will be clarified. Following this the design and implementation of a specialized data glove for Mobile Malossi communications will be discussed and some of the developed prototypes may be demonstrated if the time permits.

Lorenzo Cazzaro - "Data-Independent Verification of Robustness and Fairness of Tree-Based Classifiers"
9/11/2023
The increasing success of Machine Learning (ML) led to its massive deployment in various applications. Unfortunately, traditional methods for assessing the performance of ML models do not always provide a reliable picture of their effectiveness in practical scenarios, for instance when the robustness or the fairness of the algorithm matters. Previous research efforts have introduced definitions of properties, such as robustness and various fairness criteria, to characterize the trustworthy behavior of ML models. Additionally, verification algorithms based on formal methods have been proposed to provide formal guarantees regarding compliance with the desired properties. However, several popular robustness and fairness properties are local or data-dependent, in the sense that they are predicated solely on specific test instances rather than on arbitrary inputs. In this talk, I will discuss two contributions that attempt to go beyond the verification of local robustness and fairness properties. In particular, we design verification algorithms of more expressive properties for a specific class of ML classifiers, i.e., tree-based classifiers. The first contribution proposes a data-independent stability analysis to identify a subset of the feature space where the ML model does not change its predictions despite adversarial manipulations. We use this analysis to verify resilience, a new security property that generalizes the idea of robustness. The same analysis is then leveraged in our second contribution to verify a global fairness property, i.e., a property that predicates over all the possible inputs of the classifier. In particular, our analysis synthesizes sufficient conditions for fairness predicating over the entire feature space, thus providing global fairness guarantees.

Ronitt Rubinfeld - "Making use of locality in computation"
23/10/2023
Modern networks can be such that complete inputs and outputs for computational problems on them are *so large*, that there is not time to read or write them in their entirety. However, if one is only interested in small parts of the output at any given time, is it really necessary to solve the entire computational problem? Can one avoid the need to view the whole input? Or, is there a way to take advantage of certain "locality properties" of the problem? We describe recent work in the model of "local computation algorithms" which for a given input, supports queries by a user to values of specified bits of a legal output. The goal is to design local computation algorithms in such a way that very little of the input needs to be seen in order to determine the value of any single bit of the output. Though this model describes sequential computations, techniques from local distributed algorithms have been extremely important in designing efficient local computation algorithms. We describe results on a variety of problems for which sublinear time and space local computation algorithms have been developed.

Jan Arne Trelle - "Recognition of Linear and Star Variants of Leaf Powers is in P"
19/10/2023
A k-leaf power of a tree T is a graph G whose vertices are the leaves of T and whose edges connect pairs of leaves whose distance in T is at most k. A graph is a leaf power if it is a k-leaf power for some k. Over 20 years ago, Nishimura et al. [J. Algorithms, 2002] asked if recognition of leaf powers was in P. Recently, Lafond [SODA 2022] showed an XP algorithm when parameterized by k, while leaving the main question open. In this talk, I explore this question from the perspective of two alternative models of leaf powers, showing that both a linear and a star variant of leaf powers can be recognized in polynomial time.

Sicun Gao - "Reasoning Numerically"
4/10/2023
Highly-nonlinear continuous functions have become a pervasive model of computation. Despite newsworthy progress, the practical success of “intelligent” computing is still restricted by our ability to answer questions regarding their reliability and quality: How do we rigorously know that a system will do exactly what we want it to do and nothing else? For traditional software and hardware systems that primarily use rule-based designs, automated reasoning has provided the fundamental principles and widely-used tools for ensuring their quality. However, the rigid symbolic formulations of typical automated reasoning methods often make them unsuitable for dealing with computation units that are driven by numerical and data-driven approaches. I will overview some of our attempts in bridging this gap. I will highlight how the core challenge of NP-hardness is shared across discrete and continuous domains, and how it motivates us to seek the unification of symbolic, numerical, and statistical methods towards better understanding and handling of the curse of dimensionality.

Lorenzo Alvisi - "Beyond blockchains: a status report on the Quest for the Ideal Log"
13/07/2023
The shared log paradigm is at the heart of modern cloud-based distributed applications. Their appeal lies in the simplicity of the abstraction they offer: by adding new items to the log’s totally ordered sequence of records, clients contribute to building a shared ground truth for their system, which they can then leverage both immediately (e.g., to achieve Paxos-like fault tolerance or atomic transactions) and in the background (e.g., to support debugging, deterministic replay, analytics, intrusion detection, and failure recovery). A shared log implementation would ideally offer (i) Total order ; (ii) High throughput; (ii) High availability during reconfigurations and (iv) Low and predictable latency. No shared log today achieves all these properties — but not for lack of trying. For example, Scalog, a cool shared log implementation we developed a couple of years ago, offered what was then an unprecedented combination of features for continuous smooth delivery of service. It allowed applications to customize data placement, support reconfiguration with no loss in availability, and recover quickly from failures. At the same time, Scalog achieved high throughput and total order. Achieving low latency without giving up on any of these properties, however, has proved elusive — and, as we will see, the issue goes deeper than simple engineering. I will give you a sneak peak to the new ideas we are exploring to finally crack the problem.

Pierre Gaillard - "Strategy Repair in Reachability Games"
12/07/2023
We introduce Strategy Repair, the problem of finding a minimal amount of modifications to turn a strategy for a reachability game from losing into winning. The problem is relevant for a number of settings in Planning and Synthesis, where solutions essentially correspond to winning strategies in a suitably defined reachability game. We show, via reduction from Vertex Cover, that Strategy Repair is NP-complete and devise two algorithms, one optimal and exponential and one polynomial but sub-optimal, which we compare experimentally. The reported experimentation includes some heuristics for strategy modification, which proved crucial in dramatically improving performance.

Silvio Micali - "From Consensus to Agreement: The Evolution of Blockchain"
27/06/2023
Blockchain, its evolution, and novel applications.

Ankit Gangwal - "Modern Problems in Modern Mobile OSes"
31/05/2023
This talk focuses on two of the most frequently used technologies on modern mobile devices, i.e., Bluetooth and password managers. The first part of the talk explains how an attacker can exploit BLE advertisements to exfiltrate information from BLE-enabled devices. In particular, our BLEWhisperer (ESORICS '22) attack establishes a communication medium between two devices without requiring any prior authentication or pairing. The talk will elucidate a proof-of-concept attack framework for the Android ecosystem. The second part of the talk presents a novel attack, called AutoSpill (CODASPY '23), on Android password managers to leak users' saved credentials during an autofill operation. AutoSpill conveniently dodges Android's secure autofill process and allows the attacker to get user credentials for free, i.e., the attacker does not even need to write the code to steal/phish credentials. The majority of popular Android password managers we considered in our experiments were found vulnerable to AutoSpill. Finally, the talk concludes with various practical countermeasures for both of our attacks.

Mauro Sozio - "Fully-Dynamic Decision Trees"
31/05/2023
Historically, machine learning (ML) algorithms have been developed assuming the input datasets to be static. However, as new data is collected or removed because being noisy or obsolete, one soon faces the need to efficiently update an ML model without sacrificing its accuracy. Decision trees are a cornerstone of ML and an essential tool in any ML library. Moreover, decision-tree based algorithms such as XGBoost and random forests have been consistently embraced in real-world applications and data challenges. In our talk, we present the first fully-dynamic algorithm that maintains an ``accurate'' decision tree with ``low'' amortized cost, over an arbitrary sequence of insertions and deletions of labeled examples. We also argue that our algorithm is optimal both in terms of memory usage and running time, up to polylogarithmic factors. We conclude our talk with an extensive experimental evaluation on real-world data showing the effectiveness of our approach. Our work (presented at AAAI23) represents one of the first studies on fully-dynamic supervised ML, which has been mostly unexplored so far, to the best of our knowledge.

Roberto Dessì - "Cross-Domain Image Captioning with Discriminative Finetuning"
26/05/2023
Neural captioners are typically trained to mimic human image descriptions, without any awareness of the function that such descriptions might have, leading to biased and vague captions. In this talk, I’ll empirically show that adding a self-supervised objective to a neural captioner helps to recover a plain, visually descriptive language that is more informative about image contents. In particular, I’ll describe experiments where we take an out-of-the-box neural captioner, and we finetune it with a discriminative objective. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify the target image among a set of candidates. In terms of similarity to ground-truth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, the discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner trained only with supervised learning and without finetuning. I’ll further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla captions or ground-truth captions for human subjects tasked with an image discrimination task. If time allows, I’ll conclude the talk by drawing a connection between our work and reinforcement learning from human feedback (RLHF), a recently introduced method powering models like ChatGPT and InstructGPT.

Alessandro Brighente - "Security Issues of 5G Mobility and O-RAN"
23/05/2023
Cellular networks have gone a long road in the development of solutions able to address previous-generation security and privacy issues. Compared to the previous generations, 5G is the first to carefully consider a large variety of threat vectors and attack surfaces, proposing a solution able to handle different types of attacks. However, as technology evolves, so do attackers’ strategies and capabilities, exposing hence novel security threats against last-generation networks. With the development of 6G, novel technologies and methodologies commit exclusively to achieving security and privacy. In this talk, I will present the 5G security and privacy issues related to mobility and its new architectures. In particular, I will first explore the implications of fake base station attacks in the handover management in 5G and propose a solution. I will then show a vulnerability in the Open Radio Access Network (O-RAN) leading to a link fabrication attack.

Emanuele Marconato - "Representation Learning for Concept-based Models"
19/05/2023
Black-box models, like deep neural networks, are unreliable in safety-critical applications, where predictions must be based on the correct input-level attribution and have to adhere to the intended human explanations. Recent advances in Explainable AI have led to studying Concept-based models (CBMs), architectures that integrate high-level concepts to guarantee “ante-hoc” explainability. However, existing CBMs suffer from several limitations that compromise their interpretability.In this talk, I will discuss the current problems of SotA CBMs and present our solution, GlanceNets, new CBMs that align the concepts to the intended semantics by leveraging causal representation learning and out-of-distribution recognition. I will also shape possible future directions for improving Concept-based models.

Stefano Calzavara - "Formal security verification of tree-based classifiers"
09/05/2023
Decision trees and tree ensembles are popular classification models for tabular data. Similar to other machine learning models, however, they are susceptible to evasion attacks at test time, where adversarial perturbations might transform inputs so that they get misclassified. In this talk, I present some recent results on the security verification of tree-based classifiers. First, I introduce a new security measure called resilience, which mitigates some of the issues of the traditional robustness measure, and I discuss how resilience can be soundly estimated for tree-based classifiers. Then, I introduce a new paradigm called "verifiable learning", which advocates the adoption of new training algorithms designed to learn models which are easy to verify. In particular, I present a new class of tree ensembles admitting security verification in polynomial time, thus escaping from classic NP-hardness results, and I discuss how such models can be trained and efficiently verified.

Luigi Loreti - "Photonic processors for arithmetic computation and matrix manipulation"
04/05/2023
In this seminar, a novel optical computer design will be introduced, utilizing light phases as information carriers. This concept employs LCD spatial light modulators (SLMs) for phase shifting and integrated detectors for result analysis. It supports massive parallel computing and matrix-to-vector operations while maintaining low power consumption.

Wei Li - "Unlocking Generative AI with Ubiquitous Hardware and Open Software"
20/04/2023
The presentation will cover the major opportunities of and challenges posed by recent Generative AI technologies such as ChatGPT and Stable Diffusion along with the hardware and software considerations of the increasingly resource-intensive Large Language Models (LLMs) with AI transformer at their core. This talk will cover AI systems from the algorithm down to gates – spanning AI model development, the underlying software frameworks that fuel fast model innovation, and finally the hardware covering CPUs, GPUs and accelerators that provide the huge volume of required compute power.

Nicola Dragoni - "Demystifying and Demythologizing Blockchain Technology"
18/04/2023
This seminar will give a gentle, conceptual, and honest introduction to blockchain technology. Gentle means that the seminar is targeted at non-experts. Conceptual means the focus will be on key concepts instead of technicalities or one specific blockchain platform. Last but not least, honesty means that the seminar will not be the (usual) enthusiastic business talk about blockchain, but it will highlight misconceptions and dispel some common myths about blockchain.

Zorah Laehner - "Learning Functions on Manifolds and Correspondences via Quantum Annealing"
17/04/2023
In this talk I will present my recent research, especially representing functions on manifolds through neural networks (the equivalent of NeRF but on manifolds) and using quantum annealing to solve problems in computer vision. First, we will look into how the function approximation power of neural networks can be used to define a continuous and differentiable texture representation on 3D shapes, how to optimize this and the theory behind why it works so well. The next part will be a short introduction into quantum annealing and its unique properties, how it can be used to solve correspondence problems, and why combining it with learning approaches makes sense.

Avigdor Gal - "Loch Data and Other Monsters: on Creating Data Ecosystems, the Intelligent Way"
04/04/2023
A data ecosystem offers an alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. Despite years of research in data governance and management, trustability is still affected by the absence of transparent and traceable data-driven pipelines. Data integration is the main facilitator of such data-driven pipelines and matching is a task at the heart of any data integration process, aimed at identifying correspondences among data elements. Matching problems were traditionally performed in a semi-automatic manner, with correspondences being generated by matching algorithms and outcomes subsequently validated by human experts. Human-in-the-loop data integration has been recently challenged by the introduction of big data and recent studies have analyzed obstacles to effective human matching and validation. In this talk, we focus on the tension between human and machine matching. We propose a novel data ecosystem architecture that relies on both human knowledge and machine learning and offer a concrete algorithmic solution for effective data integration within this architecture. In particular, we shall present the limitations of human matching and offer a method for learning to characterize reliable and valuable matching experts.

Keshav Pingali - "Fifty Years of Parallel Programming: Ieri, Oggi, Domani"
20/03/2023
Parallel programming started around 1970 so as a discipline, it is now more than 50 years old. What have we learned in the past 50 years about parallel programming? What problems have we solved and what problems remain to be solved? What can young researchers learn from the successes and failures of our discipline? This talk presents a personal point of view about these and other questions regarding the state of parallel programming. I will also discuss my experience in doing a start-up called Katana Graph, which is based on my work in parallel computing.

Mauro Conti - "Covert & Side Stories: Threats Evolution in Traditional and Modern Technologies"
15/03/2023
Alongside traditional Information and Communication Technologies, more recent ones like Smartphones and IoT devices also became pervasive. Furthermore, all technologies manage an increasing amount of confidential data. The concern of protecting these data is not only related to an adversary gaining physical or remote control of a victim device through traditional attacks but also to what extent an adversary without the above capabilities can infer or steal information through the side and covert channels! In this talk, we survey a corpus of representative research results published in the domain of side and covert channels, ranging from TIFS 2016 to more recent Usenix Security 2022, including several demonstrations at Black Hat Hacking Conferences. We discuss threats coming from contextual information and to which extent it is feasible to infer precise information. In particular, we discuss attacks like inferring actions that a user is doing on mobile apps by eavesdropping on their encrypted network traffic, identifying the presence of a specific user within a network through analysis of energy consumption, or inferring information (also key one like passwords and PINs) through timing, acoustic, or video information.

Luigi Gresele - "Learning Identifiable Representations: A Gentle Introduction"
06/03/2023
Representation learning is motivated by the fact that learning systems, both biological and artificial, receive unstructured information from the external world: artificial networks trained for object recognition take collections of pixels as inputs; visual information processing in biological systems starts in photoreceptors, where incoming light is converted into biological signals. To make certain aspects of this information explicit and easily accessible (e.g., the position, dimension and colour of objects in an image), complex processing is required. Central questions are what information should be made explicit in a representation, and how to do so. In this talk, we will approach these problems based on two metaphors. The first one is the cocktail-party problem, where a number of conversations happen in parallel in a room, and the task is to recover (or separate) the voices of the individual speakers from recorded mixtures—also termed blind source separation. The second one is what we call the independent-listeners problem: given two listeners in front of some loudspeakers, the question is whether, when processing what they hear, they will make the same information explicit, identifying similar constitutive elements. Rather than the reconstruction of a ground truth, in the second problem we are interested in comparing the representations extracted by the two listeners. These questions can be studied with the approach of independent component analysis (ICA). This entails establishing whether, under some technical assumptions, representations can be uniquely specified—up to some ambiguities deemed tolerable, and except for a small number of corner cases. In technical terms, this corresponds to characterizing identifiability of the model. We will explore the motivations and objectives of research on identifiability in representation learning, highlighting the challenges and the proposed solutions. Although the talk will be mostly based on literature on (nonlinear) ICA, we will also discuss the implications for unsupervised learning and probabilistic modelling in general.

2022

Vitaly Shmatikov - "Can We Trust Machine Learning Models?"
14/12/2022
Modern machine learning models achieve super-human accuracy on tasks such as image classification and natural-language generation, but accuracy does not tell the entire story of what these models are learning. In this talk, I will look at today's machine learning from a security and privacy perspective, and ask several fundamental questions. Could models trained on sensitive private data memorize and leak this data? When training involves crowd-sourced data, untrusted users, or third-party code, could models learn malicious functionality, causing them to produce incorrect or biased outputs? What damage could result from such compromised models? I will illustrate these vulnerabilities with concrete examples and discuss the benefits and tradeoffs of technologies (such as federated learning) that promise to protect the integrity and privacy of machine learning models and their training data. I will then outline practical approaches towards making trusted machine learning a reality.

Radia Perlman - "Network Protocols: Myths, Missteps, and Mysteries"
13/12/2022
So much of what “everyone knows” about network protocols is actually false. For instance, why do we need both Ethernet and IP? The “obvious” answer is that IP is “layer 3” and Ethernet is “layer 2”, but in fact, once Ethernet stopped being a way for a few hundred nodes to share a single wire, and instead became a network of point-to-point links with switches forwarding between the links, “Ethernet” today should really be called a layer 3 protocol. So, why can’t we get rid of switches and connect all links with IP? Or why can’t we replace IP routers with Ethernet switches? Another topic is whether IP was the best layer 3 protocol, and whether IPv6 is simply a new version of IP, whereas, if the world had converted to CLNP (a 20-byte layer 3 protocol) in 1992, it would have been a major disruption to the Internet.

Artem Polyvyanyy - "Understanding Quality of Process Models Discovered from Event Data"
29/11/2022
The quality of a discovered process model is commonly assessed in process mining against four quality criteria: precision, recall, generalization, and simplicity. In this talk, I give an overview of these four quality criteria and elaborate on the topics related to challenges associated with measuring the quality of discovered process models, desired properties for measures of the model quality, and results on evaluating the existing quality measures against the corresponding desired properties.

Shufang Zhu - "On the Power of LTLf in Assured Autonomy"
10/11/2022
Assured Autonomy is a novel area that merges Artificial Intelligence (AI) and Formal Methods (FM), concerning building AI agents that autonomously deliberate how to act in a changing, incompletely known, unpredictable environment under formal guarantees. A popular specification language for describing formal guarantees is Linear Temporal Logic (LTL) from FM. However, LTL is interpreted over infinite traces (relating to non-terminating systems). Since AI agents are not dedicated to a single task in their complete life cycle, but are supposed to accomplish one task after another, AI applications often employ a finite-trace variant of LTL, denoted as LTLf. In particular, the study of LTLf synthesis brings the intellectual merit of achieving Assured Autonomy by allowing agents to automatically construct programs with guarantees to meet their tasks specified in LTLf. In this talk, I will review an evolving journey toward Assured Autonomy through LTLf synthesis. Starting from an attempt to devise a symbolic backward LTLf synthesis framework, which has demonstrated its significant efficiency in various applicable scenarios, the journey evolves into a forward LTLf synthesis technique that highlights several interesting avenues of future work in the context of Markovian Decision Process.

Avi Widgerson - "Imitation Games"
28/10/2022
One of Alan Turing's most influential papers is his 1950 Computing machinery and intelligence, in which he introduces the famous "Turing test" for probing the nature of intelligence by evaluating the abilities of machines to behave as humans. In this test, which he calls the "Imitation Game," a (human) referee has to distinguish between two (remote and separate) entities, a human and a computer, only by observing answers to a sequence of arbitrary questions to each entity. This lecture will exposit, through examples from a surprisingly diverse array of settings, the remarkable power of this basic idea to understand many other concepts. I will discuss variations of the Imitation Game in which we change the nature of the referee, and of the objects to be distinguished, to yield different analogs of the Turing test. These new Imitation Games lead to novel, precise, and operative definitions of classical notions, including secret, knowledge, privacy, randomness, proof, fairness, and others. These definitions have in turn led to numerous results, applications, and understanding. Some, among many consequences of this fundamental paradigm, are the foundations of cryptography, the surprising discoveries on the power and limits of randomness, the recent influential notion of differential privacy, and breakthrough results on patterns in the prime numbers and navigation in networks. Central to each of these settings are computational and information theoretic limitations placed on the referee in the relevant Imitation Game. This lecture will survey some of these developments and speculate on future uses of this paradigm in science and society, in a way which is hopefully accessible without any specific background knowledge.

Danilo Francati - "Eluding Secure Aggregation in Federated Learning via Model Inconsistency"
27/10/2022
Secure aggregation is a cryptographic protocol that securely computes the aggregation of its inputs. It is pivotal in keeping model updates private in federated learning. Indeed, the use of secure aggregation prevents the server from learning the value and the source of the individual model updates provided by the users, hampering inference and data attribution attacks. In this talk, I will show that a malicious server can easily elude secure aggregation as if the latter were not in place. In particular, I will present an attack strategy that allows the server to infer information on individual private training datasets, independently of the number of users participating in the secure aggregation. This makes it a concrete threat in large-scale, real-world federated learning applications. The attack strategy is generic and equally effective regardless of the secure aggregation protocol used. It exploits a vulnerability of the federated learning protocol caused by incorrect usage of secure aggregation and lack of parameter validation. This demonstrates that current implementations of federated learning with secure aggregation offer only a “false sense of security”.

Alan Dix - "Designing User Interactions with AI: Servant, Master or Symbiosis"
14/10/2022
All AI ultimately affects people, in some cases deeply buried, in others interacting directly with users, whether physically, such as autonomous vehicles, or virtually, such as recommender systems. In these interactions, AI may be a servant, such as Alexa operating on command; or AI may be the master, such as gig-work platforms telling workers what to do. However, potentially the most productive interactions are a symbiosis, human and AI complementing one another. Designing human-in-the-loop systems changes the requirements of both AI algorithms and user interfaces. This talk will explore some design principles and examples in this exciting area.

Sher Muhammad Doudpota - "Natural Language Processing with Deep Learning"
27/09/2022
Natural Language Processing (NLP) is a sub field of computer science that deals with extracting meaningful patterns from large amounts of text. Traditionally, rule based approaches have dominated in the field, however recently Deep Neural Networks have shown promising results when applied on different NLP tasks. The biggest advantage of applying deep learning on text is eliminating the need of feature engineering which needs domain expertise. With the advent of deep learning, we no more need manual feature engineering, rather algorithms based on neural networks automate the process. This seminar will give an overview to the participants about different applications of deep learning on varying tasks of nlp including sentiment analysis, topic modeling and creative side of AI that is text generation.

Maxime Parmentier - "Statistical model checking: genetic algorithm, spatio-temporal logic and sequential analysis"
12/09/2022
Both computer programs and embedded systems become more and more complex with the progress of technological innovation. Cyber-physical systems, such as self-driving vehicles, communications satellites or medical robots, are not only expensive and difficult to produce, but they also have high requirements with respect to consistency, safety and efficiency. Making sure cyber-physical systems meet those requirements is therefore of the utmost importance and must be done as early as during the design stage of the production. While analyzing the design of a cyber-physical system or testing a prototype are valid (albeit potentially costly) strategies for small systems, they are completely unfeasible for most real-case examples. Thankfully, there exists another strategy: (statistical) model checking. With a model of the system, it is possible to simulate executions of the system and check if no execution violates a given property. The goal of this presentation is to present multiple new ideas and frameworks to solve important problems of statistical model checking: the local extremum problem, the spatiality problem, and the optimal stopping problem.

Mario Di Francesco - "Distributed Intelligence for Mobile Computing and the Internet of Things"
14/07/2022
Distributed intelligence is a collaborative process to carry out different types of learning and inference tasks. Such a process enables heterogeneous and resource-constrained devices to realize diverse applications and services. This talk specifically focuses on two representative use cases of distributed intelligence in the context of mobile computing and the Internet of Things (IoT). The first one is improving user experience in mobile augmented reality through edge computing. The second use case is partitioning a pre-trained deep neural network for distributed inference in a fog network.

Sasha Rubin - "Formula synthesis in propositional dynamic logic with shuffle"
13/06/2022
I will introduce a new type of problem in logic called the "formula synthesis problem": for a logic L, given a set of formulas represented by a context-free grammar G, and a structure M, find a formula F (or say none exists) generated by G that is true in M. This problem generalises the model-checking problem (in case the grammar generates just a single formula). I will instantiate this problem taking L to be propositional dynamic logic (PDL) extended with program shuffle. In this setting, the formula synthesis problem is undecidable, but I will consider some natural restrictions that are decidable using classic tools of tree automata theory. For instance, the problem is EXPTIME-complete if the logic is restricted to PDL. The choice of this logic was also motivated by rephrasing a hierarchical synthesis problem in AI known as Hierarchical Task Network Planning. This talk is based on joint work with Sophie Pinchinat and François Schwarzentruber, published in AAAI22.

Sasha Rubin - "Best-Effort Synthesis"
09/06/2022
In the classic formulation (Pnueli and Rosner 1989), the synthesis problem concerns finding an agent strategy that guarantees that a specification formula expressed in linear-time temporal logic (LTL) is satisfied irrespective of the actions of the environment (this can be seen as an analogue of Planning with temporally extended goals in AI). What should an agent do when it can't achieve its task in all circumstances? In the classic formulation the agent 'gives up' (i.e., the synthesis procedure returns 'unsynthesisable'). However, what if giving up is not acceptable? One idea might be to do as well as possible, i.e., do a 'best effort'. In this talk I will offer a formalisation of what it means that an agent strategy does a a best effort, and discuss how to algorithmically solve such synthesis problems using graph algorithms. This talk is based on joint works with Benjamin Aminof, Giuseppe De Giacomo, Alessio Lomuscio, and Aniello Murano, presented at IJCAI21 and KR21.

Dominik L. Michels - "Enabling Accurate and Efficient Simulation in Visual Computing"
01/06/2022
The overarching goal of Dominik Michels' Computational Sciences Group within KAUST's Visual Computing Center is enabling accurate and efficient simulations for applications in Scientific and Visual Computing. Towards this goal, the group develops new principled computational methods based on solid theoretical foundations. This talk covers a selection of previous and current work presenting a broad spectrum of research highlights ranging from simulating stiff phenomena such as the dynamics of fibers and textiles, over liquids containing magnetic particles, to the development of complex ecosystems and weather phenomena. Moreover, connection points to the growing field of machine learning are addressed and an outlook is provided with respect to selected technology transfer activities.

Viktor K. Prasanna - "FPGA Accelerators in Data Centers"
23/05/2022
With recent dramatic advances in Field Programmable Gate Arrays (FPGAs), these devices are being used along with multi-core and novel memory technologies to realize advanced platforms to accelerate complex applications in the Cloud. We will review advances in reconfigurable computing leading up to accelerators for data science. We will illustrate FPGA-based parallel architectures and algorithms for a variety of data analytics kernels in streaming graph processing and graph machine learning. While demonstrating algorithm-architecture co-design methodology to realize high performance accelerators for graphs and machine learning, we demonstrate the role of modeling and algorithmic optimizations to develop highly efficient Intellectual Property (IP) cores for FPGAs. We show improved performance for two broad classes of graph analytics: iterative graph algorithms with variable workload (e. g., graph traversal, shortest paths, etc.) and machine learning on graphs (e. g., graph embedding). For variable workload iterative graph algorithms, we illustrate dynamic algorithm adaptation to exploit heterogeneity in the architecture. For graph embedding, we develop a novel computationally efficient technique using graph sampling and demonstrate scalable performance. We conclude by identifying opportunities and challenges in exploiting emerging heterogeneous architectures composed of multi-core processors, FPGAs, GPUs and coherent memory.

Edlira Dushku - "Has my smart IoT device been hacked? Detecting compromised IoT devices with remote attestation protocols"
18/05/2022
The Internet of Things (IoT) devices are becoming pervasive in everyday life. Besides enormous opportunities, IoT devices pose serious security and privacy concerns mainly due to their poor security design. Moreover, traditional security solutions are unsuitable for IoT devices because of their low-cost design and deployment in remote environments with limited physical accessibility. Aimed at securing IoT devices, Remote attestation (RA) has recently gained wide attention in IoT systems as a viable security mechanism that remotely detects the malware presence on IoT devices. Various RA schemes have been proposed to optimize the effectiveness and efficiency of RA protocols of a single IoT device or a group of IoT devices. This talk will briefly introduce the state-of-the-art RA approaches and present the research works in this area.

Hajo A. Reijers - "Process Science: Past, Present, Future"
17/05/2022 and 18/05/2022
In 2021, a proposal was launched to establish "process science", a multi-disciplinary field for the study of processes [1]. Processes are coherent series of changes, both man-made and naturally occurring, which unfold over time and occur at various levels. Process science is concerned with discovering and understanding processes as well as designing interventions to shape them into desired directions. Computer science, engineering, and data science greatly contribute to this field, but there are also strong relations with economics, management science, organization science, and the social sciences. In this PhD seminar, I will give a personal reflection on the past, the present, and the future of the discipline. In the part of the seminar that is scheduled for May 17th, I will first sketch the historic development of process science and next provide a summary of its major accomplishments. In the follow-up talk on May 18th, I will outline some of the major challenges that we face, which hopefully can serve as in inspiration for researchers at all levels and from a variety of backgrounds to contribute to this emerging and exciting field. [1] vom Brocke, J., van der Aalst, W.M.P, Grisold, T., Kremser, W., Mendling, J., Pentland, B., Recker, J., Roeglinger, M., Rosemann, M. Weber, B. (2021). Process Science: The Interdisciplinary Study of Continuous Change. Working Paper, available at SSRN Electronic Library, 2021.

Hajo A. Reijers - "Action - Response - Effect: Mining Work Processes"
04/05/2022
Process mining techniques can be used to analyze work processes within organizations. Many of the existing techniques focus on the sequential order in which activities are performed. While useful and popular in practice, few of these techniques consider the statistical relations within processes. In particular, existing techniques do not allow insights into how responses to an event (action) result in desired or undesired outcomes (effects). In this talk, I will present the ARE miner, which can be used to analyze and understand these action-response-effect patterns. I will illustrate the use of the ARE miner with a real-world data set from a Dutch healthcare institution.

Alessandro Epasto - "Privacy and fairness in clustering"
24/03/2022
Clustering is a fundamental unsupervised machine learning problem that lies at the core of several applications. With the increasing use of clustering in real applications it is becoming apparent the need to include privacy and fairness considerations in the design of clustering algorithms. In this talk, I will review some work on the recent area of designing clustering algorithms that include constraints on protecting the privacy of the data and on ensuring a fair representation of the users. First, I will review recent work (SODA22, KDD21) on clustering enforcing size-constraints which has applications in real-world anonymization applications (such as the Chrome Privacy Sandbox project at Google). I will present an O(1)-approximate algorithm for the r-gather problem in O(log n) MPC rounds. This result is based on a connection between the r-gather problem and maximal independent set / ruling set of the power graph in the MPC model which I will discuss. I will also show evidence on the effectiveness of the algorithm on real advertisement applications. Then, I will briefly review work (WSDM22, Neurips20, AISTATS20) on clustering where fairness constraints require rethinking classical approaches to protect the interest of the users. References: M. Almanza, A. Epasto, A. Panconesi, G. Re. "k-Clustering with Fair Outliers" WSDM, 2022 A. Epasto, M. Mahdian, V. Mirrokni, P. Zhong "Massively Parallel and Dynamic Algorithms for Minimum Size Clustering". SODA, 2022 A. Epasto, A. Muñoz Medina, et al. "Clustering for Private Interest-based Advertising". KDD Ads Track, 2021 S. Ahmadian, A. Epasto, M. Knittel, R. Kumar, M. Mahdian, B. Moseley, P. P., S. Vassilvitskii, Y. Wang, "Fair Hierarchical Clustering". NeurIPS, 2020 S. Ahmadian, A. Epasto, M. Mahdian, R. Kumar, "Fair Correlation Clustering". AISTATS 2020

Lorenzo Orecchia - "Fast Diffusion-based Algorithms for Vertex-based and Hypergraph Partitioning"
17/03/2022
Classical spectral graph theory is centered around the study of the quadratic forms of the graph Laplacian operator, through its structural relation to isoperimetric properties of the graph and his algorithmic connection with the heat diffusion process over the graph. In this talk, I will describe a generalization of this theory that focuses instead on bilinear forms of incidence operators and allows for more general structural results and novel algorithmic primitives. In particular, I will showcase a simple, efficiently computable nonlinear analogue of the heat diffusion process that allows us to detect sparse vertex-based separators and hypergraph separators. I will also describe its connection to the fastest-mixing markov chain over an unweighted undirected graph. This is joint work with Antares Chen and Erasmo Tani.

Roberto Cipolla - "Computer Vision: Geometry, Uncertainty and Deep Learning"
09/03/2022
The last decade has seen a revolution in the theory and application of computer vision. I will begin with a brief review of some of the fundamentals with a few examples of the reconstruction, registration and recognition of three-dimensional objects from uncalibrated images and their translation into novel commercial applications. I will then introduce some recent results from real-time deep learning systems that fully exploit geometry, compute model uncertainty and require very small amounts of labelled data. In particular, we will explore the use of geometry to help design networks that can be trained with unlabelled data for human body pose and robot localisation.

2021

Multi-agent systems: from robotic to financial networks
08/02/2021
A team of drones looking for survivors amidst the rubbles of an earthquake, a squad of mobile robots inspecting a field of interest, and algorithmic traders in a financial network apparently have little in common. In all these scenarios, however, the interplay among the multiple actors involved can be abstracted as a complex network of multiple interacting agents, with cooperative, opportunistic, or even antagonistic behaviors. In drone networks, multiple agents, mostly unmanned mobile devices, cooperate with each other to achieve a common goal or share a common view of ongoing phenomena in a field of interest. They may act opportunistically, trading-off their application task with their communication needs. Open research challenges include trajectory management, task assignment, and routing issues. In financial systems, traders, banks, and the stock exchange can also be abstracted in a model of interacting agents, often exhibiting interdependencies that can be studied with the tools of network science and multi-agent systems. This short talk sets out to describe some of the research goals I have enthusiastically pursued in the recent years, with a focus on the funded projects I have been working on.

Chatbots for software modelling
14/12/2021
Chatbots are software services accessed via conversation in natural language. They are used to help in all kinds of procedures like booking flights, querying visa information or assigning tasks to developers. They can be embedded in webs and social networks, and be used from mobile devices without installing dedicated apps. In this seminar, we will see how to take advantage of chatbots and social networks to enable the collaborative creation of software models by groups of users. The process is assisted by modelling bots that orchestrate the collaboration and interpret the users' inputs (in natural language) to incrementally build a domain model. The advantages of this modelling approach include ubiquity of use, automation, assistance, natural user interaction, traceability of design decisions, possibility to incorporate coordination protocols, and seamless integration with the user's normal daily usage of social networks. We will showcase the tool SOCIO which supports this novel modelling paradigm.

Toward Data-Driven Self-Adaptive Spectrum-Aware Wireless Systems
06/12/2021
The massive scale and strict performance requirements of next-generation wireless networks will require embedded devices to perform real-time fine-grained optimization of their spectrum usage. Yet, today's networking protocols and architectures are deeply rooted in inflexible designs, and utilize optimization models and strategies that are either too complex or too oversimplified to be fully effective in today's crowded spectrum environment. In this talk, we are going to introduce and discuss our recent research toward the design of data-driven self-adaptive spectrum-aware wireless systems, where transmitters and receivers use real-time deep learning to infer and optimize their networking parameters based on ongoing spectrum conditions. We will conclude the talk by discussing existing technical challenges and possible research directions.

Order! A tale of money, intrigue, and specifications
02/12/2021
Mistrust over traditional financial institutions is motivating the development of decentralized financial infrastructures based on blockchains. In particular, Consortium blockchains (such as the Linux Foundation Hyperledger and Facebook’s diem) are emerging as the approach preferred by businesses. These systems allow only a well-known set of mutually distrustful parties to add blocks to the blockchain; in this way, they aim to retain the benefits of decentralization without embracing the cyberpunk philosophy that informed Nakamoto’s disruptive vision. At the core of consortium blockchains is State Machine Replication, a classic technique borrowed from fault tolerant distributed computing; to ensure the robustness of their infrastructure, consortium blockchains actually borrow the Byzantine-tolerant version of this technique, which guarantees that the blockchain will operate correctly even if as many as about a third of the contributing parties are bent on cheating. But, sometimes, "a borrowing is a sorrowing". I will discuss why Byzantine-tolerant state machine replication is fundamentally incapable of recognizing, never mind preventing, an ever present scourge of financial exchanges: the fraudulent manipulation of the order in which transactions are processed - and how its specification needs to be expanded to give it a fighting chance. But is it possible to completely eliminate the ability of Byzantine parties to engage in order manipulation? What meaningful ordering guarantees can be enforced? And at what cost?

Learning and accruing knowledge over time using modular architectures
07/10/2021
One of the hallmarks of human intelligence is the ability to learn new tasks despite the paucity of direct supervision. Machine learning models have recently achieved impressive performance in this setting by using the following protocol: i) Collect a massive dataset, ii) Train a very large model and iii) Adapt to downstream tasks using very little, if any, task-specific labeled data. While this has been working remarkably well, it is still dissatisfying because the information present in each downstream task is never transformed into actual knowledge that can be leveraged to improve the prediction of subsequent downstream tasks. As a result, once in a while even larger models need to be retrained from scratch to account for the ever increasing amount of data. This begs two basic questions. First, what learning settings are useful to study knowledge accrual? And second, what methods are effective and efficient at learning from never-ending streams of data? In this talk, I will present a preliminary investigation in our quest to answer these questions. I will present experiments using anytime and continual learning with metrics accounting for both error rate and efficiency of learning through time. I will also discuss how modular architectures can strike good trade-offs in this setting. These networks, whose computation is expressed as the composition of basic modules, can naturally grow over time to account for new incoming data by simply adding new modules to the existing set of modules, and they can retain efficiency as the number of modules grow if only a small and constant number of modules is used at inference time. While these are admittedly baby steps towards our original goal, we hope to stimulate discussion and interest in our community about the fundamental question of how to represent and accrue knowledge over time.

The Pit and the Pendulum - Part II
01/10/2021
In the second seminar, I will discuss the tension between providing strong isolation guarantees (which greatly simplify the task of programming concurrent applications) and trying to maximize these applications' performance. Since the elegant foundations of transaction processing were established in the mid 70's with the notion of serializability and the codification of the ACID (Atomicity, Consistency, Isolation, Durability) paradigm, performance has not been considered one of ACID's strong suits, especially for distributed data stores. Indeed, the NoSQL/BASE movement that started a decade ago with Amazon's Dynamo was born out of frustration with the limited scalability of traditional ACID solutions, only to become itself a source of frustration once the challenges of programming applications in this new paradigm began to sink in. But how fundamental is this dichotomy between performance and ease of programming? In my talk, I'll share with you the intellectual journey my students and embarked on trying to overcome the traditional terms of this classic tradeoff.

The Pit and the Pendulum - Part I
30/09/2021
The cloud datastores that support today's service economy offer applications the ability to program using a transactional interface. Transactions are groupings of operations that take effect atomically: either all operations take effect or none do. They simplify program development as they allow developers to group related operations into one single atomic unit. For performance, modern datastores allow multiple transactions to execute concurrently. Isolation then defines a contract that regulates the interaction between these concurrent transactions. Indeed, isolation is important also in many machine learning algorithms that iteratively transform some global state, such as model parameters or variable assignment. When these updates are structured as transactions, they can be executed concurrently to achieve greater scalability, relying on isolation to maintain the semantics and theoretical properties of the original serial algorithm. But what guarantees should isolation offer? And how expensive is it to enforce them? In my first seminar, I will discuss the fascinating history of our community's attempts at formalizing isolation. You'll meet giants like Jim Gray and Barbara Liskov, Turing award winners who wrestled with this challenge, and you'll see what you think about our recent attempt to venture where such giants have trod.

Knowledge Discovery from Graphs and Networks
21/09/2021
In this talk, I will present some recent research directions that I have been exploring. One common denominator is the notion of graph or network. I'll start by describing my activities in the field of knowledge graphs (KG), graphs organized as a set of triples of the form (subject, predicate, object), where the predicate denotes some semantic relationship between the subject and the object (e.g., Stanley Kubrick, director, A Clockwork Orange). I'll discuss why existing approaches to learning low-level representations (or embeddings) for subject/object and predicates are sub-optimal when it comes to learning representations of triples as a whole; I'll show how to transform a KG into its triple-centric version by taking into account the semantics edges. Hence, I will describe two triple embedding learning architectures useful for downstream tasks such as triple verification; one based on biased random walks and the other based on graph neural networks. Next, I will discuss how to improve any existing semantic-oblivion embedding approach based on random walks by superimposing an abstract notion of neighborhood, based on an arbitrary node similarity measure. Finally, in the landscape of networks, I will describe ongoing research activities on a new topic called community deception, which is about how to hide a community (set of nodes) from social network analysis tools. I'll discuss some techniques based on carefully selected edge updates and their extension to attributed networks.

Abstractions and Their Compilers
16/09/2021
An abstraction in computer science is a data model plus a "programming language"; the language is often far simpler than a general-purpose programming language. We shall consider four different ways that abstractions have been used. Especially important are "declarative abstractions," where you say what you want done but not how to do it. These abstractions require clever compilation, including some powerful optimization techniques, if they are to be used in practice. We shall talk about three such declarative abstractions: regular expressions, and their compilation into finite automata, context-free grammars and their compilation into shift-reduce parsers, and the relational model of data, and its compilation into executable code.

From Computational Argumentation to Explanation
09/06/2021
Computational argumentation is a well-established field in (mostly symbolic) AI focusing on defining argumentation frameworks comprising sets of arguments and dialectical relations between them (e.g. of attack and, in addition or instead, of support), as well as so-called semantics (e.g. amounting to definitions of dialectically acceptable sets of arguments or of dialectical strength of arguments, satisfying desirable dialectical properties such as that supports against an argument should strengthen it). In this talk I will overview our recent efforts towards deploying computational argumentation to obtain and deliver to users explanations of different formats for a variety of systems, including data-driven classifiers. I will also argue that explainable AI (XAI) , which has witnessed unprecedented growth in AI in recent years, can be ideally supported by computational argumentation models whose dialectical nature matches well some basic desirable features of explanatory activities.

Geometric Deep Learning: from Euclid to drug design
27/04/2021
For nearly two millennia, the word "geometry" was synonymous with Euclidean geometry, as no other types of geometry existed. Euclid's monopoly came to an end in the 19th century, where multiple examples of non-Euclidean geometries were shown. However, these studies quickly diverged into disparate fields, with mathematicians debating the relations between different geometries and what defines one. A way out of this pickle was shown by Felix Klein in his Erlangen Programme, which proposed approaching geometry as the study of invariants or symmetries using the language of group theory. In the 20th century, these ideas have been fundamental in developing modern physics, culminating in the Standard Model. The current state of deep learning somewhat resembles the situation in the field of geometry in the 19h century: On the one hand, in the past decade, deep learning has brought a revolution in data science and made possible many tasks previously thought to be beyond reach -- including computer vision, playing Go, or protein folding. At the same time, we have a zoo of neural network architectures for various kinds of data, but few unifying principles. As in times past, it is difficult to understand the relations between different methods, inevitably resulting in the reinvention and re-branding of the same concepts. Geometric Deep Learning aims to bring geometric unification to deep learning in the spirit of the Erlangen Programme. Such an endeavour serves a dual purpose: it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers, and gives a constructive procedure to incorporate prior knowledge into neural networks and build future architectures in a principled way. In this talk, I will overview the mathematical principles underlying Geometric Deep Learning on grids, graphs, and manifolds, and show some of the exciting and groundbreaking applications of these methods in the domains of computer vision, social science, biology, and drug design.

From Zero Knowledge to Private Transactions
24/02/2021
The integrity of digital currencies such as Bitcoin rests on the fact that every payment is broadcast in plaintext. This fails to provide any meaningful privacy guarantees for users. In this talk I will explain how a beautiful cryptographic tool, zero knowledge proofs, can be used to design digital currencies that provide strong privacy guarantees for users. These ideas have been deployed in the wild, as part of the cryptocurrency Zcash and in several other settings.

2020

Landscape and Training Dynamics of DNNs: lessons from physics-inspired methods
13/02/2020
Despite their formidable success in recent years, a fundamental understanding of deep neural networks (DNNs) is still lacking. Open questions include the origin of the slowness of the training dynamics, and the relationship between the dimensionality of parameter space and number of training examples, since DNNs empirically generalize very well even when over-parametrized. A popular way to address these issues is to study the topology of the cost function (the loss landscape) and the properties of the algorithm used for training (usually stochastic gradient descent, SGD). Here, we use methods and results coming from the physics of disordered systems, in particular glasses and sphere packings. On one hand, we are able to understand to what extent DNNs resemble widely studied physical systems. On the other hand, we use this knowledge to identify properties of the learning dynamics and of the landscape. In particular, through the study of time correlation functions in weight space, we argue that the slow dynamics is not due to barrier crossing, but rather to an increasingly large number of null-gradient directions, and we show that, at the end of learning, the system is diffusing at the bottom of the landscape. We also find that DNNs exhibit a phase transition between over- and under-parametrized regimes, where perfect fitting can or cannot be achieved. We show that in this overparametrized phase there cannot be spurious local minima. In the vicinity of this transition, properties of the curvature of the loss function minima are critical. This kind of knowledge can be used both as a basis for a more grounded understanding of DNNs and for hands-on requirements such as hyperparameter optimization and model selection.

Fences and RMRs Required for Synchronization
24/02/2020
Compiler optimizations that execute memory accesses out of (program) order often lead to incorrect execution of concurrent programs. These re-orderings are prohibited by inserting costly fence (memory barrier) instructions. The inherent Fence Complexity is a good estimate of an algorithm's time complexity, as is its RMR complexity: the number of Remote Memory References the algorithm must issue. When write instructions are executed in order, as in the Total Store Order (TSO) model, it is possible to implement a lock (and other objects) using only one RAW fence and an optimal O(n log n) RMR complexity. However, when store instructions may be re-ordered, as in the Partial Store Order (PSO) model, we prove that there is an inherent tradeoff between fence and RMR complexities. The proof relies on an interesting encoding argument.

Network archeology: on revealing the past of random trees
10/02/2020
Networks are often naturally modeled by random processes in which nodes of the network are added one-by-one, according to some random rule. Uniform and preferential attachment trees are among the simplest examples of such dynamically growing networks. The statistical problems we address in this talk regard discovering the past of the network when a present-day snapshot is observed. Such problems are sometimes termed "network archeology". We present a few results that show that, even in gigantic networks, a lot of information is preserved from the very early days. Gabor Lugosi is an ICREA research professor at the Department of Economics, Pompeu Fabra University, Barcelona. He graduated in electrical engineering at the Technical University of Budapest in 1987, and received his Ph.D. from the Hungarian Academy of Sciences in 1991. His research main interests include the theory of machine learning, combinatorial statistics, inequalities in probability, random graphs and random structures, and information theory.

Model Based Design of Safety and Mission Critical Systems
19/02/2020
Many software based control systems are indeed safety or mission critical systems. Examples are: aerospace, transportation, medical devices, financial systems. In this talk I will outline my research activity as for model based synthesis and verification of software based control systems and show how the general purpose algorithms and tools developed have been used in specific application domains.

Safe and Efficient Exploration in Reinforcement Learning
05/02/2020
At the heart of Reinforcement Learning lies the challenge of trading exploration -- collecting data for identifying better models -- and exploitation -- using the estimate to make decisions. In simulated environments (e.g., games), exploration is primarily a computational concern. In real-world settings, exploration is costly, and a potentially dangerous proposition, as it requires experimenting with actions that have unknown consequences. In this talk, I will present our work towards rigorously reasoning about safety of exploration in reinforcement learning. I will discuss a model-free approach, where we seek to optimize an unknown reward function subject to unknown constraints. Both reward and constraints are revealed through noisy experiments, and safety requires that no infeasible action is chosen at any point. I will also discuss model-based approaches, where we learn about system dynamics through exploration, yet need to verify safety of the estimated policy. Our approaches use Bayesian inference over the objective, constraints and dynamics, and -- under some regularity conditions -- are guaranteed to be both safe and complete, i.e., converge to a natural notion of reachable optimum. I will also present recent results harnessing the model uncertainty for improving efficiency of exploration, and show experiments on safely and efficiently tuning cyber-physical systems in a data-driven manner. Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center. Before that he was an Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University (2008) and his Diplom in Computer Science and Mathematics from the Technical University of Munich, Germany (2004). He is a Microsoft Research Faculty Fellow and a Kavli Frontiers Fellow of the US National Academy of Sciences. He received ERC Starting Investigator and ERC Consolidator grants, the Deutscher Mustererkennungspreis, an NSF CAREER award, the Okawa Foundation Research Grant recognizing top young researchers in telecommunications as well as the ETH Golden Owl teaching award. His research on machine learning and adaptive systems has received awards at several premier conferences and journals, including the ACM SIGKDD Test of Time award 2019. Andreas Krause served as Program Co-Chair for ICML 2018, and is regularly serving as Area Chair or Senior Program Committee member for ICML, NeurIPS, AAAI and IJCAI, and as Action Editor for the Journal of Machine Learning Research.

Automatic Synthesis of Controllers for Cyber-Physical Systems
28/01/2020
A Cyber-Physical System (sometimes also called hybrid system) is composed of physical subsystems and software subsystems. Many Cyber-Physical Systems are indeed control systems: the software part is designed to control the physical part, so that some desired behavior is achieved. Applications of such Cyber-Physical Control Systems are ubiquitous: smart grids, electrical engineering, aerospace, automotive, biology, and so on. Recently, many methodologies have been presented on automatically synthesizing controllers for Cyber-Physical Systems. Such methodologies take as input a description of the physical part (plant) of a Cyber-Physical Control System, a set of requirements for the software part (controller), a set of desired behaviors for the closed-loop system (controller + plant), and output the actual software for the controller, which is guaranteed to meet all given specifications. In this talk, I will present a selection of such methodologies, mainly focusing on my own contributions.

Adaptive Communication for Battery-Free IoT devices
28/01/2020
With the ever-growing usage of batteries in the IoT era, the need for more eco-friendly technologies is clear. RF-powered computing enables the redesign of personal computing devices in a battery-less manner. While there has been substantial work on the underlying methods for RF-powered computing, practical applications of this technology has largely been limited to scenarios that involve simple tasks. This talk discusses how RFID technology, typically used to implement object identification and counting, can be exploited to realize a battery-free smart home. In particular, this talk considers the coexistence of several battery-free devices, with different transmission requirements - periodic, event-based, and real-time - and proposes a new approach to dynamically collect information from devices without requiring any a priori knowledge of the environment.

Learning to rank results optimally in search and recommendation engines
17/01/2020
Consider the scenario where an algorithm is given a context, and then it must select a slate of relevant results to display. As four special cases, the context may be a search query, a slot for an advertisement, a social media user, or an opportunity to show recommendations. We want to compare many alternative ranking functions that select results in different ways. However, A/B testing with traffic from real users is expensive. This research provides a method to use traffic that was exposed to a past ranking function to obtain an unbiased estimate of the utility of a hypothetical new ranking function. The method is a purely offline computation, and relies on assumptions that are quite reasonable. We show further how to design a ranking function that is the best possible, given the same assumptions. Learning optimal rankings for search results given queries is a special case. Experimental findings on data logged by a real-world e-commerce web site are positive.

Corso di 24 ore di "Scrittura tecnico-scientifica" (course in italian language only)
28/01/2020

2019

Harnessing the Power of (Knowledge) Graphs
09/01/2019
The Web, traditionally viewed as a vast repository of documents, is being transformed into a huge database by the massive presence of structured data. New research challenges arise in this information space due to the intrinsic decentralized data creation and management, the lack of superimposed schema, and the availability of huge volumes of data covering diverse domains. In this talk, I will give an overview of my research activities in the areas of graph querying and explanation languages, knowledge discovery from graphs, and social networks. I will also outline some ongoing research activities centered around the marriage between machine learning and knowledge representation and reasoning.

Limits and Structure of Learnability and Provability
01/02/2019
I will describe my two main areas of research: Learning in the Limit and Reverse Mathematics. Learning in the limit is one of the main computational models of learnability. A learner is modeled by a computational device which inductively generates hypotheses about an input language and stabilizes in the limit on a correct guess. Contrary to other models of learning, this model allows to decide questions of the following type: Is it the case that some learning constraint or learning strategy is necessary for learning some language class? I will discuss the case of so-called "U-shaped learning", a prominent and as-of-yet not well-understood feature of human learning in many contexts. The study of the effective (or computable) content and relative strength of theorems is one of the main areas of recent research in Computable Mathematics and Reverse Mathematics. I will outline a framework in which the following questions can be addressed: Given two theorems, is one stronger than the other or are they equivalent? Is it the case that one theorem is reducible to the other by a computable reduction? Given a problem, what is the complexity of its solutions to computable instances? I will discuss the case of Hindman's Finite Sums Theorem, which is the subject of a number of long-standing open problems in the area.

Power Efficient Machine Learning in Silicon
20/02/2019
Power efficiency is a key aspect in computer systems both in high-end servers and in portable devices. The focus of this talk is to discuss how the numerical formats of data impacts the energy efficiency of the computation, and how to trade-off accuracy with power savings. Machine Learning and Deep Learning are used as case studies to present the key ideas and the benefits derived by a flexible format to represent numerical data.

Round Compression for Parallel Matching Algorithms
20/02/2019
For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms?A prominent example here is the fundamental graph problem of finding maximum matching. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(log n) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has n^{1+Ω(1)} memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(log n) rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this talk, we finally show how to refute that perplexing possibility. That is, we break the above O(log n) round complexity bound even in the case of slightly sublinear memory per machine. This is a joint work with Jakub Łącki, Aleksander Mądry, Slobodan Mitrović, Krzysztof Onak, and Piotr Sankowski, which appeared at STOC'2018.

HOW BLOCKCHAINS WORK AND THEIR IMPLICATIONS ON CYBERSECURITY
04/03/2019
This talk will be divided in two parts. The first part will explain the main properties of blockchain applications. During the second part, we will show how blockchain properties can have an impact on security and show some examples of past vulnerabilities suffered by blockchain-based systems.

Dynamic graph algorithms and complexity (a survey)
08/03/2019
In this talk I will attempt to answer the following questions I have been asked quite often: What are the current challenges in dynamic graph algorithms? What are good starting points for people who want to try working in this field? The talk will focus on challenges for basic graph problems (e.g. connectivity, shortest paths, maximum matching), and will survey some existing upper and lower bound results and techniques.

THE PROTECTION OF SPACE MISSIONS: THREATS THAT ARE SPECIFIC TO CYBER SECURITY
18/03/2019
Space-based systems play an important role in our daily life and business. The trend is likely to rely on the use of space based systems in a growing number of services or applications that can be either safety-of-life critical or business and mission-critical. The security measures implemented in space-based systems may turn out to be insufficient to guarantee the information assurance properties, in particular confidentiality (if required by the data policy), availability and integrity of these services/applications. The various and possible cyber-attacks on space segments, ground stations and its control segments are meanwhile well known and experienced in many cases. We address the Cybersecurity specific aspects of space missions, the specific threats to space mission from cyberspace, and analyze the set of all the possible countermeasures.

Strong coresets for k-median and subspace clustering: Goodbye dimension
10/04/2019
I will start my talk with an introduction to the area of coresets for clustering problems, give basic definitions and explain how coresets can be used to obtain distributed and streaming algorithms. During my talk, I will mostly focus on a commonly used coreset definition that has been introduced by Har-Peled and Mazumdar: A coreset is a weighted set S of points such that for all sets C of k centers we have (1-eps) cost(P,C) <= cost(S,C) <= (1+eps) cost(P,C), where cost(P,C) is the sum of distances of the points in P to their closest center in C. Then I will present a new algorithm to compute coresets that have a similar for-all guarantee as in the above definition and that consist of a number of points that is independent of the size of the input point set P and the dimension of the input space d. Joint work with David Woodruff.

Stochastic Incremental Algorithms for Optimal Transport with SON Regularizer
18/04/2019
Optimal Transport (OT) is a classic area in probability and statistics for transferring mass from one probability distribution to another. Recently OT has been very successfully used for domain adaptation in many applications in computer vision, texture analysis, tomographic reconstruction and clustering. We introduce a new regularizer OT which is tailored to better preserve the class structure. We give the first theoretical guarantees for an OT scheme that respects class structure. We give an accelerated proximal--projection scheme for this formulation with the proximal operator in closed form to give a highly scalable algorithm for computing optimal transport plans. Our experiments show that the new regularizer preserves class structure better and is more robust compared to previous regularizers.

Distributed algorithms for hybrid networks
09/05/2019
I will introduce a new communication model, called hybrid network, in which the nodes have the choice between two communication modes: a local mode that allows them to exchange messages with nearby nodes, and a global mode where communication between any pair of nodes is possible, but the amount of communication is limited. This can be motivated, for instance, by wireless networks in which we combine direct device-to-device communication with communication via the cellular infrastructure. I will show how to quickly build up a low-diameter, low-degree network of global edges (i.e., connections established via the global communication mode) on top of any network of local edges (i.e., connections given by the local communication mode) so that various problems such as computing minimum spanning trees and shortest paths can be solved much more efficiently than by just using the local edges. This is joint work with various colleagues including John Augustine, Fabian Kuhn, and Mohsen Ghaffari.

Identifying Relatives (Almost) Privately
22/05/2019
The privacy of personal genomic data, and how much access to it is released to commercial and non commercial entities, is an important concern. In this work, we study a specific aspect of this problem: How to determine familial, relatives relations among a large group of individuals, while keeping genomic DNA data private (or almost private). There are several commercial companies that provide such service, albeit in completely non private manner: The individual customers supply their entire genomic DNA information to these companies. We describe the moderator model, where customers keep their DNA information locally, and a central moderator matches pairs of persons who are likely to be related. These persons then engage in a secure two party computation, which checks more accurately for familial relations, and does not leak any additional information. The entire protocol leaks only a bounded amount of information to the moderator or to any other party.

Turning Data into Knowledge
30/05/2019
In this talk, I will be presenting a selection of the most relevant research projects I have worked on during my past experience, both in academia and industry at Yahoo Research. Despite their differences, all those projects have the same common goal, which is to extract profitable knowledge from very large datasets using cutting-edge data mining and machine learning techniques. I will therefore show the validity of the solutions proposed by measuring their impact on key performance indicators of interest.

Multi-Covert Channel Exfiltration Techniques
14/06/2019
Millions of Dollars are lost yearly due to data exfiltration. However, in these days with the advanced defensive methods that are followed by network administrators classical exfiltration techniques are rendered useless. This talk presents two innovative exfiltration techniques that rely on Covert Channels and seems to be able to circumvent the major network administrators defences.

Better Guarantees for k-Means Clustering
25/06/2019
For quite some time the best approximation algorithm for k-means were a simple local search heuristic yielding an approximation guarantee of 9, a ratio that is known to be tight with respect to such methods. In this talk, we present an improved guarantee of 6.357 via a new primal-dual approach that allows us to exploit the geometric structure of k-means. Finally, we also consider the semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we should that few queries are sufficient to recover a clustering that is arbitrarily close to the optimal one. This is based on joint works with Sara Ahmadian, Buddhima Gamlath, Ashkan Norouzi-Fard, and Justin Ward.

Spectral Graph Theory
20/06/2019

Affordable Security or Big Guy vs Small Guy. Does the depth of your pockets impact your protocols?
24/06/2019
When we design a security protocol we assume that the humans (or organizations) playing Alice and Bob do not make a difference. In particular, their financial capacity seems to be irrelevant. In the latest trend to guarantee that secure multi-party computation protocols are fair and not vulnerable to malicious aborts, a slate of protocols has been proposed based on penalty mechanisms. We look at two well-known penalty mechanisms, and show that the so-called see-saw mechanism (Kumaresan et al., CCS 15), is only fit for people with deep pockets, well beyond the stake in the multi-party computation itself. Depending on the scheme, fairness is not affordable by everyone which has several policy implications on protocol design. To explicitly capture the above issues, we introduce a new property called financial fairness.

Adversarial Program Synthesis: Malicious JavaScripts and Cross-site Scripting
09/07/2019
Uses of machine learning in security typically focus on building classifiers to determine if certain objects (behaviors, programs, network flows, etc.) are malicious, i.e., representative of an attack. Despite early successes, the rise of adversarial machine learning has called into question the robustness of many such approaches. In adversarial machine learning, an attacker learns how to generate a malicious object in such a way that it gets classified as benign by a target classifier. My talk will discuss adversarial machine learning in the domain of JavaScript exploits. I will review existing classifiers for the detection of JavaScript exploits, and outline techniques to obfuscate exploits automatically so that they no longer get identified. Our work shows that, in order to achieve JavaScript exploit detectors that are robust to adversarial samples, much still needs to be done.

A Topological Perspective on Distributed Network Computing
05/07/2019
This talk will survey the tight connections between distributed computing and combinatorial topology. Indeed, combinatorial topology has been identified as a universal framework in which many different models for distributed computing can be expressed, and using combinatorial topology has been shown to be quite useful for analyzing the complexity of problems in these models. The talk will also present some recent results about the application of combinatorial topology in the framework of distributed network computing.

Model-Based Autonomic Security Management for Distributed Systems
10/07/2019
The continuous increase in the quantity and sophistication of cyber-attacks is making it more difficult and error-prone for the system administrators to handle the alerts generated by Intrusion Detection Systems (IDSs). To deal with this problem, several Intrusion Response Systems (IRSs) have been proposed lately. IRSs extend the IDSs by providing an automatic response to the detected attack. Such a response is usually selected either with a static attack-response mapping or by quantitatively evaluating all the available responses, given a set of pre-defined criteria. In this presentation, we introduce a probabilistic model-based IRS built on the Markov Decision Process (MDP) framework. In contrast with most existing approaches to intrusion response, the proposed IRS effectively captures the dynamics of both the defended system and the attacker and is able to compose atomic response actions to plan optimal multi-objective long-term response policies to protect the system. We evaluate the effectiveness of the proposed IRS by showing that long-term response planning always outperforms short-term planning, and we conduct a thorough performance assessment to show that the proposed IRS can be adopted to protect large distributed systems at run-time.

The process and the blockchain: A story of miners
19/07/2019
A process describes the temporal evolution of a system. Capturing the rules that govern its control flow helps to understand the boundaries of its behaviour. That is the essence of the research field called process mining. A blockchain can be defined at large as an immutable distributed ledger on which transactions between peers are recorded. Transactions are cryptographically signed and are meant to transfer digital commodities between parties. The validation and publication of transactions is the job of the nodes called miners. Lately, the blockchains have undergone a paradigm shift from mere electronic cash systems to a universal platform endowed with internal programming languages, on top of which decentralised applications can be built. That has been the turning point enabling the execution of processes on blockchains. This talk revolves around recent advancements in research concerning the execution and mining of processes on the blockchain. The discourse will include a focus on the automated discovery of behavioural specifications from process data, and, in turn, on the extraction of process data from blockchains.

Computer Vision for Perception, Representation and Learning
11/03/2019
Computer vision and machine learning have largely contributed to understanding data and scenes and have achieved impressive results in detection and recognition of scenes and objects, mostly mature for industrial transfer. The research field is however still limited when it comes to modelling and learning from videos, to reasoning on objects, parts and scenes and to developing universal representations and memory which fit multiple tasks. I would present my research and interests in the field.

Online Scheduling via Learned Weights
18/12/2019
Online algorithms are a hallmark of worst case optimization under uncertainty. On the other hand, in practice, the input is often far from worst case, and has some predictable characteristics. A recent line of work has shown how to use machine learned predictions to circumvent strong lower bounds on competitive ratios in classic online problems such as ski rental and caching. We study how predictive techniques can be used to break through worst case barriers in online scheduling. The makespan minimization problem with restricted assignments is a classic problem in online scheduling theory. Worst case analysis of this problem gives Ω(log m) lower bounds on the competitive ratio in the online setting. We identify a robust quantity that can be predicted and then used to guide online algorithms to achieve better performance. Our predictions are compact in size, having dimension linear in the number of machines, and can be learned using standard off the shelf methods. The performance guarantees of our algorithms depend on the accuracy of the predictions, given predictions with error η, we show how to construct O(log η) competitive fractional assignments. We then give an online algorithm that rounds any fractional assignment into an integral schedule. Our algorithm is O((log log m)^3)-competitive and we give a nearly matching Ω(log log m) lower bound for online rounding algorithms. Altogether, we give algorithms that, equipped with predictions with error η, achieve O(log η (log log m)^3) competitive ratios, breaking the Ω(log m) lower bound even for moderately accurate predictions. Joint work with Thomas Lavastida, Benjamin Moseley and Sergei Vassilvitskii

Max Cut is Approximated by Subexponential-Size Linear Programs
02/12/2019
We show that, for every epsilon>0, there is a linear programming relaxation of the Max Cut problem with at most exp(n^epsilon) variables and constraints that achieves approximation ratio at least 1/2 + delta, for some delta(epsilon)>0. Specifically, this is achieved by linear programming relaxations in the Sherali-Adams hierarchy. Previously, the only sub-exponential time algorithms known to approximate Max Cut with an approximation ratio better than 1/2 were based on semidefinite programming or spectral methods. We also provide subexponential time approximation algorithms based on linear programming for Khot's Unique Games problem, which have a qualitatively similar performance to previously known subexponential time algorithms based on spectral methods and on semidefinite programming. Joint work with Sam Hopkins and Tselil Schramm.

Improving Social Bot Detection with an AML Approach
21/11/2019
Social bots are automated accounts often involved in unethical or illegal activities. Academia has shown how these accounts evolve over time, becoming increasingly smart at hiding their true nature by disguising themselves as genuine accounts. If they evade, bots hunters adapt their solutions to find them: the cat and mouse game. Inspired by adversarial machine learning and computer security, in this talk we'll see an adversarial and proactive approach to social bot detection. A practical example of the application of this approach will be presented introducing the Digital DNA framework, proposed to study groups' behaviors in social networks and for bot detection.

Algorithms, Theoretical Computer Science and Epigenomics: Mining Mathematical Laws for Predicting Chromatin Organization and Nucleosome Occupancy in Eukaryotic Genomes
30/10/2019
Epigenomics is the study of modifications on the genetic material of a cell that do not depend on changes in the DNA sequence, since those latter involve specific proteins around which DNA wraps. The end result is that epigenomic changes have a fundamental role in the proper working of each cell in Eukaryotic organisms. A particularly important part of Epigenomics concentrates on the study of chromatin, that is, a fiber composed of a DNA-protein complex and very characterizing of Eukaryotes. Understanding how chromatin is assembled and how it changes is fundamental for Biology. Starting from an open problem regarding nucleosomes and the so called 30 nm fiber, posed by R. Kornberg nearly 30 years ago, we show how the use of simple ideas from Theoretical Computer Science can surprisingly help in making progress in obtaining a mathematical characterization of nucleosome potioning in Eukaryotes, based only on knowledge of the DNA sequence. Such a characterization was conjectured not to exist. As a corollary, we obtain an algorithm for nucleosome occupancy prediction, optimal in time and guaranteeing a two orders of magnitudo improvement in time over Machine Learning counterparts. In addition, statistically sound data structures, i.e., Epigenomic Dictionaries, are also introduced in order to effectively mine epigenomic datasets for the discovery of further regularities. Again, those data structures are much more sensitive than the Machine Learning countderparts and they can profitably accomodate information from different sources. Joint Work with: Simona Ester Rombo, Dipartimento di Matematica ed Informatica, University of Palermo Italy; Filippo Utro, Computational Biology Center, IBM T.J. Watson Research, Yorktown Heights, NY, USA.

Artificial intelligence and model checking for simulation-based analysis of cyber-physical systems
23/09/2019
Cyber-Physical Systems, i.e., systems consisting of interconnected physical and software parts, are ubiquitous in application domains as diverse as aerospace, automotive, energy, biology, human physiology, and pharmacology, just to name a few. In this talk, I will review recent approaches to simulation-based design and formal verification of cyber-physical systems, with a particular emphasis on my own contributions to the field.

Is it the end of Word Sense Disambiguation as we know it?
12/09/2019
Word Sense Disambiguation (WSD) is a fundamental problem in Natural Language Processing and in particular in Natural Language Understanding. In this seminar, I will be discuss how recent advances in word and sense embedding have contributed to redefining the WSD task. In particular I will present a new framework (Tripodi and Navigli, 2019) for WSD, based on game-theoretic principles (Tripodi and Pelillo, 2017), that can be used with arbitrary word and sense representations and a new disambiguation technique (Loureiro and Jorge 2019; Hu, Li, and Liang 2019) that inspired the title of this talk.

Dino Distefano - Facebook Infer Static Analyser
15/05/2019
Infer is an open-source static analyser developed at Facebook. Originally based on Separation Logic, Infer has lately evolved from a specific tool for heap-manipulating programs to a general framework which facilitates the implementation of new static analyses. In this talk, I will report on the Infer team’s experience of applying our tool to Facebook mobile code, each day helping thousands of engineers to build more reliable and secure software. Moreover, I will discuss the team’s current effort to turn Infer into a static analysis platform for research and development useful both to academic researchers and industrial practitioners.

Challenges in the Blockchain Era
11/09/2019
Distributed Ledgers Technologies (DLTs) allow to create and maintain an immutable shared database (often called the blockchain) in a fully distributed environment. DLTs have recently emerged as a new tool enabling exciting applications in the setting of digital currencies and cryptographic protocols. In this talk, I will review some of these applications, highlighting the challenges in the adoption of DLTs within the financial sector, and with a particular emphasis on my own contributions to the field.

2018

Manuel Mazo Espinosa Jr. - Symbolic abstractions of control systems' timing behaviour
09/10/2018
With the advent of the Internet-of-things and Cyber-Physical Systems everywhere, there has arisen a renewed interest on the study of networked control systems (NCS). NCS are systems in which measurements from sensors are sent through a communications network to a control unit to compute corrective actions for the system being monitored, actions which are again relayed through a network to actuators. In the past decade a shift of perspective in the design and analysis of NCS has emerged in order to make the use of communications bandwidth as efficient as possible. The idea is to move away from the traditional approach employing pre-determined (usually periodic) update times, to designs in which the time instants at which the control loop is closed is determined by the sensors themselves. This has resulted in what has been named “event-based control” (EBC). Despite of the many promising results of EBC designs I will argue that the unpredictability of the communications traffic they generate is a critical bottleneck to exploit EBC’s potential benefits. To solve this critical problem, I will describe our recent results on the construction of abstractions (in the form of timed automata) capturing the communications traffic of EBC. I will then describe how such abstractions can help in the design of more efficient EBC systems and schedulers for them.