Titolo della tesi: Interpretable Machine Learning: Leveraging SVMs to construct Optimal Decision Trees
In recent years, with the widespread use of machine learning (ML) methods in real-world applications, there has been growing attention to interpretable ML models that can give explanatory insights into their decision-making process. Thanks to their interpretability, Decision Trees are one of the most widely used Supervised ML tools for classification tasks. Due to the remarkable advances over the last thirty years in mixed integer programming, various approaches have been proposed to formulate the problem of training an Optimal Classification Tree (OCT) as a mixed-integer model.
As a first contribution, we present a novel mixed integer quadratic programming (MIQP) formulation for the OCT problem, which exploits the generalization capabilities of Support Vector Machines (SVMs) for binary classification. The proposed model, denoted as Margin Optimal Classification Tree (MARGOT), defines the decision rules of the tree as maximum margin multivariate hyperplanes following the soft SVM approach along the binary tree structure. MARGOT has been evaluated on benchmark datasets from the UCI repository against state-of-the-art OCT approaches. The MARGOT formulation turns out to be easier to solve than other OCT methods, and the generated tree better generalizes on new observations. To enhance the interpretability of our method, we analyze two alternative versions of MARGOT, which include feature selection techniques inducing sparsity of the hyperplanes’ coefficients. The two interpretable versions effectively select the most relevant features, maintaining good prediction quality.
Along this research line, we propose a variant of MARGOT, referred to as the Hard Margin Optimal Classification Tree (MARGOThard), which employs margin-based hyperplane splits across the nodes of the tree relying on either the hard or the soft SVM paradigm. Following the approach adopted for MARGOT, we developed a feature selection embedded version that promotes the local sparsity of the decision rules. To assess the main differences between the two margin optimal tree formulations, we provide a comprehensive set of results comparing MARGOT and MARGOThard, which show that both methods represent valid approaches for learning robust classifiers. Further, for all the proposed optimization problems, we developed ad-hoc heuristic algorithms to efficiently produce a feasible solution that can be used as a warm start for the optimization procedure.
Lastly, driven by the fact that the margin optimal tree models exhibit a particular decomposable structure, we lay the foundations for a future research line, whose final aim is to develop a tailored decomposition algorithm that leverages the SVM nature of the proposed problems. Along this direction, we propose an initial version of a Benders-like method to handle the simpler case of margin optimal trees, based on ℓ1-SVMs, that can be formulated as mixed-integer linear programs. This early-stage work offers insights into the design of a general decomposition procedure for the problem and paves the way for devising an ad-hoc algorithm for the MIQP-based versions of margin optimal trees.