Thesis title: Quantity and Quality in A.I. Inferred Knowledge: Symbolic and NeuroSymbolic Perspectives
The growing interest in Artificial Intelligence (AI) has lead to important advances
significantly improving the capabilities of machines in formal reasoning and automatic
learning from data; this progress is nourishing the expectation that AI
could eventually achieve human-level intelligence and be employed in high-stakes,
safety-critical domains, effectively making people’s lives easier.
This Thesis argues that before this vision can be realized, it is of paramount
importance to recognize and address a hidden and unexplored tension running across
two complementary, and yet sometimes conflicting, requirements that AI is expected
to achieve: providing abundant, non-trivial information and at the same time making
sure it is trustworthy. Historically, logic-based AI is generally successful in achieving
high quality knowledge, while the AI leveraging statistical learning shows in many
fields impressive results regarding the quantity of knowledge it can infer. Recently,
the integration of these two approaches, known as Neuro-Symbolic (NeSy) AI, has
emerged as a promising direction to address this tension. Yet, it still struggles to
fully reconcile the two requirements, due to the insufficient quality of the knowledge
it provides. We envision that reconciling this dichotomy is crucial for the future of
AI, especially when it comes to its employment in multi-disciplinary, safety-critical
domains, where both aspects are non-negotiable.
In this work, we consider both pure logic-based and NeSy AI for two of their
most representative and studied applications: question answering over knowledge
bases and computer vision. We show how to improve each of them where they
are currently falling short within the quantity-vs-quality spectrum. In order to
do so, here we point to two of the main challenges faced by each approach, being
respectively incomplete information and reasoning shortcuts. Regarding logical
query answering we provide formal tools to quantify the amount of knowledge that
AI systems convey and lose. Using our framework it is possible to shed light on
the loss of information of current logic-based AI systems: moreover, leveraging the
gained insights we show how to define novel techniques to enhance the AI ability
to return high-quantity knowledge, and study their computational characteristics.
Conversely, for NeSy computer vision models, we design a representation learning
method that enhances the semantic quality of the concepts a model learns from
the data. We show that our method is sound and guided by formal logic. Then
we use it in practice to achieve improved performance in a challenging, high-stakes
autonomous driving scenario with unbalanced data.