Harnessing the Power of (Knowledge) Graphs
09/01/2019
The Web, traditionally viewed as a vast repository of documents, is being transformed into a huge database by the massive presence of structured data. New research challenges arise in this information space due to the intrinsic decentralized data creation and management, the lack of superimposed schema, and the availability of huge volumes of data covering diverse domains. In this talk, I will give an overview of my research activities in the areas of graph querying and explanation languages, knowledge discovery from graphs, and social networks. I will also outline some ongoing research activities centered around the marriage between machine learning and knowledge representation and reasoning.
|
Limits and Structure of Learnability and Provability
01/02/2019
I will describe my two main areas of research: Learning in the Limit and Reverse Mathematics.
Learning in the limit is one of the main computational models of learnability. A learner is modeled by a computational device which inductively generates hypotheses about an input language and stabilizes in the limit on a correct guess. Contrary to other models of learning, this model allows to decide questions of the following type: Is it the case that some learning constraint or learning strategy is necessary for learning some language class? I will discuss the case of so-called "U-shaped learning", a prominent and as-of-yet not well-understood feature of human learning in many contexts.
The study of the effective (or computable) content and relative strength of theorems is one of the main areas of recent research in Computable Mathematics and Reverse Mathematics. I will outline a framework in which the following questions can be addressed: Given two theorems, is one stronger than the other or are they equivalent? Is it the case that one theorem is reducible to the other by a computable reduction? Given a problem, what is the complexity of its solutions to computable instances? I will discuss the case of Hindman's Finite Sums Theorem, which is the subject of a number of long-standing open problems in the area.
|
Power Efficient Machine Learning in Silicon
20/02/2019
Power efficiency is a key aspect in computer systems both in high-end servers and in portable devices.
The focus of this talk is to discuss how the numerical formats of data impacts the energy efficiency of the computation, and how to trade-off accuracy with power savings.
Machine Learning and Deep Learning are used as case studies to present the key ideas and the benefits derived by a flexible format to represent numerical data.
|
Round Compression for Parallel Matching Algorithms
20/02/2019
For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms?A prominent example here is the fundamental graph problem of finding maximum matching. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(log n) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has n^{1+Ω(1)} memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(log n) rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power.
In this talk, we finally show how to refute that perplexing possibility. That is, we break the above O(log n) round complexity bound even in the case of slightly sublinear memory per machine.
This is a joint work with Jakub Łącki, Aleksander Mądry, Slobodan Mitrović, Krzysztof Onak, and Piotr Sankowski, which appeared at STOC'2018.
|
HOW BLOCKCHAINS WORK AND THEIR IMPLICATIONS ON CYBERSECURITY
04/03/2019
This talk will be divided in two parts. The first part will explain the main properties of blockchain applications. During the second part, we will show how blockchain properties can have an impact on security and show some examples of past vulnerabilities suffered by blockchain-based systems.
|
Dynamic graph algorithms and complexity (a survey)
08/03/2019
In this talk I will attempt to answer the following questions I have been asked quite often: What are the current challenges in dynamic graph algorithms? What are good starting points for people who want to try working in this field? The talk will focus on challenges for basic graph problems (e.g. connectivity, shortest paths, maximum matching), and will survey some existing upper and lower bound results and techniques.
|
THE PROTECTION OF SPACE MISSIONS: THREATS THAT ARE SPECIFIC TO CYBER SECURITY
18/03/2019
Space-based systems play an important role in our daily life and business. The trend is likely to rely on the use of space based systems in a growing number of services or applications that can be either safety-of-life critical or business and mission-critical. The security measures implemented in space-based systems may turn out to be insufficient to guarantee the information assurance properties, in particular confidentiality (if required by the data policy), availability and integrity of these services/applications. The various and possible cyber-attacks on space segments, ground stations and its control segments are meanwhile well known and experienced in many cases. We address the Cybersecurity specific aspects of space missions, the specific threats to space mission from cyberspace, and analyze the set of all the possible countermeasures.
|
Strong coresets for k-median and subspace clustering: Goodbye dimension
10/04/2019
I will start my talk with an introduction to the area of coresets for clustering problems, give basic definitions and explain how coresets can be used to obtain distributed and streaming algorithms. During my talk, I will mostly focus on a commonly used coreset definition that has been introduced by Har-Peled and Mazumdar: A coreset is a weighted set S of points such that for all sets C of k centers we have (1-eps) cost(P,C) <= cost(S,C) <= (1+eps) cost(P,C), where cost(P,C) is the sum of distances of the points in P to their closest center in C. Then I will present a new algorithm to compute coresets that have a similar for-all guarantee as in the above definition and that consist of a number of points that is independent of the size of the input point set P and the dimension of the input space d. Joint work with David Woodruff.
|
Stochastic Incremental Algorithms for Optimal Transport with SON Regularizer
18/04/2019
Optimal Transport (OT) is a classic area in probability and statistics for transferring mass from one probability distribution to another. Recently OT has been very successfully used for domain adaptation in many applications in computer vision, texture analysis, tomographic reconstruction and clustering. We introduce a new regularizer OT which is tailored to better preserve the class structure. We give the first theoretical guarantees for an OT scheme that respects class structure. We give an accelerated proximal--projection scheme for this formulation with the proximal operator in closed form to give a highly scalable algorithm for computing optimal transport plans. Our experiments show that the new regularizer preserves class structure better and is more robust compared to previous regularizers.
|
Distributed algorithms for hybrid networks
09/05/2019
I will introduce a new communication model, called hybrid network, in which the nodes have the choice between two communication modes: a local mode that allows them to exchange messages with nearby nodes, and a global mode where communication between any pair of nodes is possible, but the amount of communication is limited. This can be motivated, for instance, by wireless networks in which we combine direct device-to-device communication with communication via the cellular infrastructure. I will show how to quickly build up a low-diameter, low-degree network of global edges (i.e., connections established via the global communication mode) on top of any network of local edges (i.e., connections given by the local communication mode) so that various problems such as computing minimum spanning trees and shortest paths can be solved much more efficiently than by just using the local edges.
This is joint work with various colleagues including John Augustine, Fabian Kuhn, and Mohsen Ghaffari.
|
Identifying Relatives (Almost) Privately
22/05/2019
The privacy of personal genomic data, and how much access to it is released to commercial and non commercial entities, is an important
concern. In this work, we study a specific aspect of this problem: How to determine familial, relatives relations among a large group of individuals, while keeping genomic DNA data private (or almost private). There are several commercial companies that provide such service, albeit in completely non private manner: The individual customers supply their entire genomic DNA information to these companies. We describe the moderator model, where customers keep their DNA information locally, and a central moderator matches pairs of persons who are likely to be related. These persons then engage in a secure two party computation, which checks more accurately for familial relations, and does not leak any additional information. The entire protocol leaks only a bounded amount of information to the moderator or to any other party.
|
Turning Data into Knowledge
30/05/2019
In this talk, I will be presenting a selection of the most relevant research projects I have worked on during my past experience, both in academia and industry at Yahoo Research. Despite their differences, all those projects have the same common goal, which is to extract profitable knowledge from very large datasets using cutting-edge data mining and machine learning techniques. I will therefore show the validity of the solutions proposed by measuring their impact on key performance indicators of interest.
|
Multi-Covert Channel Exfiltration Techniques
14/06/2019
Millions of Dollars are lost yearly due to data exfiltration. However, in these days with the advanced defensive methods that are followed by network administrators classical exfiltration techniques are rendered useless. This talk presents two innovative exfiltration techniques that rely on Covert Channels and seems to be able to circumvent the major network administrators defences.
|
Better Guarantees for k-Means Clustering
25/06/2019
For quite some time the best approximation algorithm for k-means were a simple local search heuristic yielding an approximation guarantee of 9, a ratio that is known to be tight with respect to such methods.
In this talk, we present an improved guarantee of 6.357 via a new primal-dual approach that allows us to exploit the geometric structure of k-means.
Finally, we also consider the semi-supervised setting. Given an oracle that returns whether two given points belong to the same cluster in a fixed optimal clustering, we should that few queries are sufficient to recover a clustering that is arbitrarily close to the optimal one.
This is based on joint works with Sara Ahmadian, Buddhima Gamlath, Ashkan Norouzi-Fard, and Justin Ward.
|
Spectral Graph Theory
20/06/2019
|
Affordable Security or Big Guy vs Small Guy. Does the depth of your pockets impact your protocols?
24/06/2019
When we design a security protocol we assume that the humans (or organizations) playing Alice and Bob do not make a difference. In particular, their financial capacity seems to be irrelevant. In the latest trend to guarantee that secure multi-party computation protocols are fair and not vulnerable to malicious aborts, a slate of protocols has been proposed based on penalty mechanisms. We look at two well-known penalty mechanisms, and show that the so-called see-saw mechanism (Kumaresan et al., CCS 15), is only fit for people with deep pockets, well beyond the stake in the multi-party computation itself. Depending on the scheme, fairness is not affordable by everyone which has several policy implications on protocol design. To explicitly capture the above issues, we introduce a new property called financial fairness.
|
Adversarial Program Synthesis: Malicious JavaScripts and Cross-site Scripting
09/07/2019
Uses of machine learning in security typically focus on building classifiers to determine if certain objects (behaviors, programs, network flows, etc.) are malicious, i.e., representative of an attack. Despite early successes, the rise of adversarial machine learning has called into question the robustness of many such approaches. In adversarial machine learning, an attacker learns how to generate a malicious object in such a way that it gets classified as benign by a target classifier.
My talk will discuss adversarial machine learning in the domain of JavaScript exploits. I will review existing classifiers for the detection of JavaScript exploits, and outline techniques to obfuscate exploits automatically so that they no longer get identified. Our work shows that, in order to achieve JavaScript exploit detectors that are robust to adversarial samples, much still needs to be done.
|
A Topological Perspective on Distributed Network Computing
05/07/2019
This talk will survey the tight connections between distributed computing and combinatorial topology. Indeed, combinatorial topology has been identified as a universal framework in which many different models for distributed computing can be expressed, and using combinatorial topology has been shown to be quite useful for analyzing the complexity of problems in these models. The talk will also present some recent results about the application of combinatorial topology in the framework of distributed network computing.
|
Model-Based Autonomic Security Management for Distributed Systems
10/07/2019
The continuous increase in the quantity and sophistication of cyber-attacks is making it more difficult and error-prone for the system administrators to handle the alerts generated by Intrusion Detection Systems (IDSs). To deal with this problem, several Intrusion Response Systems (IRSs) have been proposed lately. IRSs extend the IDSs by providing an automatic response to the detected attack. Such a response is usually selected either with a static attack-response mapping or by quantitatively evaluating all the available responses, given a set of pre-defined criteria. In this presentation, we introduce a probabilistic model-based IRS built on the Markov Decision Process
(MDP) framework. In contrast with most existing approaches to intrusion response, the proposed IRS effectively captures the dynamics of both the defended system and the attacker and is able to compose atomic response actions to plan optimal multi-objective long-term response policies to protect the system. We evaluate the effectiveness of the proposed IRS by showing that long-term response planning always outperforms short-term planning, and we conduct a thorough performance assessment to show that the proposed IRS can be adopted to protect large distributed systems at run-time.
|
The process and the blockchain: A story of miners
19/07/2019
A process describes the temporal evolution of a system. Capturing the rules that govern its control flow helps to understand the boundaries of its behaviour. That is the essence of the research field called process mining.
A blockchain can be defined at large as an immutable distributed ledger on which transactions between peers are recorded. Transactions are cryptographically signed and are meant to transfer digital commodities between parties. The validation and publication of transactions is the job of the nodes called miners.
Lately, the blockchains have undergone a paradigm shift from mere electronic cash systems to a universal platform endowed with internal programming languages, on top of which decentralised applications can be built. That has been the turning point enabling the execution of processes on blockchains.
This talk revolves around recent advancements in research concerning the execution and mining of processes on the blockchain. The discourse will include a focus on the automated discovery of behavioural specifications from process data, and, in turn, on the extraction of process data from blockchains.
|
Computer Vision for Perception, Representation and Learning
11/03/2019
Computer vision and machine learning have largely contributed to
understanding data and scenes and have achieved impressive results in
detection and recognition of scenes and objects, mostly mature for
industrial transfer. The research field is however still limited when
it comes to modelling and learning from videos, to reasoning on
objects, parts and scenes and to developing universal representations
and memory which fit multiple tasks. I would present my research and
interests in the field.
|
Online Scheduling via Learned Weights
18/12/2019
Online algorithms are a hallmark of worst case optimization under uncertainty. On the other hand, in practice, the input is often far from worst case, and has some predictable characteristics. A recent line of work has shown how to use machine learned predictions to circumvent strong lower bounds on competitive ratios in classic online problems such as ski rental and caching. We study how predictive techniques can be used to break through worst case barriers in online scheduling.
The makespan minimization problem with restricted assignments is a classic problem in online scheduling theory. Worst case analysis of this problem gives Ω(log m) lower bounds on the competitive ratio in the online setting. We identify a robust quantity that can be predicted and then used to guide online algorithms to achieve better performance. Our predictions are compact in size, having dimension linear in the number of machines, and can be learned using standard off the shelf methods. The performance guarantees of our algorithms depend on the accuracy of the predictions, given predictions with error η, we show how to construct O(log η) competitive fractional assignments.
We then give an online algorithm that rounds any fractional assignment into an integral schedule. Our algorithm is O((log log m)^3)-competitive and we give a nearly matching Ω(log log m) lower bound for online rounding algorithms. Altogether, we give algorithms that, equipped with predictions with error η, achieve O(log η (log log m)^3) competitive ratios, breaking the Ω(log m) lower bound even for moderately accurate predictions.
Joint work with Thomas Lavastida, Benjamin Moseley and Sergei Vassilvitskii
|
Max Cut is Approximated by Subexponential-Size Linear Programs
02/12/2019
We show that, for every epsilon>0, there is a linear programming relaxation of the Max Cut problem with at most exp(n^epsilon) variables and constraints that achieves approximation ratio at least 1/2 + delta, for some delta(epsilon)>0. Specifically, this is achieved by linear programming relaxations in the Sherali-Adams hierarchy. Previously, the only sub-exponential time algorithms known to approximate Max Cut with an approximation ratio better than 1/2 were based on semidefinite programming or spectral methods. We also provide subexponential time approximation algorithms based on linear programming for Khot's Unique Games problem, which have a qualitatively similar performance to previously known subexponential time algorithms based on spectral methods and on semidefinite programming. Joint work with Sam Hopkins and Tselil Schramm.
|
Improving Social Bot Detection with an AML Approach
21/11/2019
Social bots are automated accounts often involved in unethical or illegal activities. Academia has shown how these accounts evolve over time, becoming increasingly smart at hiding
their true nature by disguising themselves as genuine accounts. If they evade, bots hunters adapt their solutions to find them: the cat and mouse game. Inspired by adversarial machine learning and computer security, in this talk we'll see an adversarial and proactive approach to social bot detection. A practical example of the application of this approach will be presented introducing the Digital DNA framework, proposed to study groups' behaviors in social networks and for bot detection.
|
Algorithms, Theoretical Computer Science and Epigenomics: Mining Mathematical Laws for Predicting Chromatin Organization and Nucleosome Occupancy in Eukaryotic Genomes
30/10/2019
Epigenomics is the study of modifications on the genetic material of a cell that do not depend on changes in the DNA sequence, since those latter involve specific proteins around which DNA wraps. The end result is that epigenomic changes have a fundamental role in the proper working of each cell in Eukaryotic organisms. A particularly important part of Epigenomics concentrates on the study of chromatin, that is, a fiber composed of a DNA-protein complex and very characterizing of Eukaryotes. Understanding how chromatin is assembled and how it changes is fundamental for Biology. Starting from an open problem regarding nucleosomes and the so called 30 nm fiber, posed by R. Kornberg nearly 30 years ago, we show how the use of simple ideas from Theoretical Computer Science can surprisingly help in making progress in obtaining a mathematical characterization of nucleosome potioning in Eukaryotes, based only on knowledge of the DNA sequence. Such a characterization was conjectured not to exist. As a corollary, we obtain an algorithm for nucleosome occupancy prediction, optimal in time and guaranteeing a two orders of magnitudo improvement in time over Machine Learning counterparts. In addition, statistically sound data structures, i.e., Epigenomic Dictionaries, are also introduced in order to effectively mine epigenomic datasets for the discovery of further regularities. Again, those data structures are much more sensitive than the Machine Learning countderparts and they can profitably accomodate information from different sources.
Joint Work with: Simona Ester Rombo, Dipartimento di Matematica ed Informatica, University of Palermo Italy; Filippo Utro, Computational Biology Center, IBM T.J. Watson Research, Yorktown Heights, NY, USA.
|
Artificial intelligence and model checking for simulation-based analysis of cyber-physical systems
23/09/2019
Cyber-Physical Systems, i.e., systems consisting of interconnected physical and software parts, are ubiquitous in application domains as diverse as aerospace, automotive, energy, biology, human physiology, and pharmacology, just to name a few.
In this talk, I will review recent approaches to simulation-based design and formal verification of cyber-physical systems, with a particular emphasis on my own contributions to the field.
|
Is it the end of Word Sense Disambiguation as we know it?
12/09/2019
Word Sense Disambiguation (WSD) is a fundamental problem in Natural Language Processing and in particular in Natural Language Understanding. In this seminar, I will be discuss how recent advances in word and sense embedding have contributed to redefining the WSD task. In particular I will present a new framework (Tripodi and Navigli, 2019) for WSD, based on game-theoretic principles (Tripodi and Pelillo, 2017), that can be used with arbitrary word and sense representations and a new disambiguation technique (Loureiro and Jorge 2019; Hu, Li, and Liang 2019) that inspired the title of this talk.
|
Dino Distefano - Facebook Infer Static Analyser
15/05/2019
Infer is an open-source static analyser developed at Facebook. Originally based on Separation Logic, Infer has lately evolved from a specific tool for heap-manipulating programs to a general framework which facilitates the implementation of new static analyses. In this talk, I will report on the Infer team’s experience of applying our tool to Facebook mobile code, each day helping thousands of engineers to build more reliable and secure software. Moreover, I will discuss the team’s current effort to turn Infer into a static analysis platform for research and development useful both to academic researchers and industrial practitioners.
|
Challenges in the Blockchain Era
11/09/2019
Distributed Ledgers Technologies (DLTs) allow to create and maintain an immutable shared database (often called the blockchain) in a fully distributed environment.
DLTs have recently emerged as a new tool enabling exciting applications in the setting of digital currencies and cryptographic protocols.
In this talk, I will review some of these applications, highlighting the challenges in the adoption of DLTs within the financial sector, and with a particular emphasis on my own contributions to the field.
|