Titolo della tesi: The Right to Explanation: Counterfactual Language in Algorithmic Recourse
While various post-hoc methods have been proposed for auditing and explaining black-box models, the practical implementation of the right to explain in real-world scenarios has received less attention. This raises a crucial question: what does it mean to explain an AI output to common people?
In this direction, it is evident that individuals and society not only needed a right to explain, but also a right to understand a relevant and easily comprehensible explanation. Moreover, to safeguard the people’s freedom to challenge the decisions of automated systems, the right to recourse is essential.
Counterfactuals, which are increasingly used to produce example-based post-hoc explanations, facilitate comprehension of black-box models by creating a counterfactual instance - for instance, in a binary model, classified in the opposite class - in the closest possible scenario, thereby minimizing changes of the original instance. We propose that counterfactuals, borrowed from causality, can serve as the language to convey the explanation fostering both the understanding and the recourse to individuals.
In the first part of this work, we formalize systematically the problem of generating counterfactual instances that lead to meaningful counterfactual statements. Our exploration covers a range of counterfactual properties, such as counterfactual class and closeness, and further examine properties such as sparsity, minimality and proximity, along with feasibility, plausibility, and actionability. We stress the importance of crafting explanations that are relevant to for the user, for instance, to prevent overshadowing subtle aspects of the explanation. To this end, we face the challenge of identifying the optimal counterfactual instance by using a set of diverse logic at the basis of the generation method, each underscoring different nuances in the explanation.
We focus on graph-type data to showcase different classes of methods that consider various units of change. Initially, we propose counterfactual graphs, with the definition of the search problem and data-driven heuristics to generate optimal solutions, measuring the error from the optimum in a ground truth case. After considering the most fined grained unit of modification for graphs, edges, we move on to establish a more comprehensive framework adaptable to vary multiple class of structures or concept for graph data. Using dense structures as unit of change, we propose a density-based counterfactual search framework, which can be instantiated with different dense substructures, such as triangles, maximal cliques, or sub-regions of graphs based on taxonomies.
Considering the complexity of textual data and its heightened relevance in the era of Foundation Models, or large language models, we analyze the characteristics of the text counterfactual search problem. Our focus is on developing methods to detoxify texts that exhibit sexist, gender, or discriminatory biases, aiming to identify and offer alternative formulations with counterfactual texts. Generating "realistic" counterfactual texts, however, is not a straightforward task. Thus, our analysis concentrates on some fundamental properties, such as minimality and sparsity, while also imposing additional constraints to ensure that the generated instances are fluent, grammatically correct, meaningful, and retain the original text’s intent. We develop a method centered on the mask and infill paradigm, that is based on a novel relevance-based infilling strategy that is guided by a conditional masked language model’s confidence in predicting subsequent tokens.
Algorithmic recourse has become conditio sine qua non for employing machine learning algorithms in decision-making scenarios. In our discussion, we introduce an environment to simulate one of the most prevalent setting (i.e. credit score and student application) of algorithmic recourse: where multiple agents compete over time for a limited resource. This environment enables us to analyze the reliability of recourse, aiming to prevent false expectation due to model drift or competition among agents with diverging characteristics.
Moving from the global to the individual, traditional algorithmic recourse techniques often overlook individual user preferences when adjusting features. Therefore, we define a framework for generating personalized algorithmic recourse with human-in-the-loop. Our approach frame the recourse generation as a multi-objective optimization problem, combining conventional constraints with user preferences, thus we define a mathematical framework to represent and estimate user preferences over the complex space of counterfactual feature change. Furthermore, we define Personal Validity, as a measure of the effectiveness of recourse for individual users.
In conclusion, future advancements ought to enhance Responsible AI by incorporating ethical, cultural, and legal elements, fostering interdisciplinary collaborations, and focusing on specific projects like counterfactual recourse for social networks and explainability in large language models. It also includes using Information Theory to assess model interpretability and emphasizes the importance of public engagement in AI policy and education development.
Finally, we aspire for this work to highlight the importance of the right to explanation and the right to recourse through the use of counterfactual language.