Titolo della tesi: Symbolic Debugging of Optimized Code: Measuring, Testing, Tuning and Enhancing Debug Information Quality
Compiler optimizations often disrupt the correspondence between source code and generated binaries, undermining the effectiveness of symbolic debugging, which relies on debug information to encode this mapping. Preserving source-level constructs under aggressive transformations remains a formidable challenge: optimizations that restructure control flow, inline functions, or instruction scheduling frequently lead to missing or misleading metadata, causing debuggers to display incomplete or inaccurate program states. While recent works have exposed numerous bugs in compiler toolchains and proposed validation methodologies to detect inconsistencies in debug symbols, the research community still lacks a quantitative and systematic understanding of how much source-level information is preserved after optimization and how much of it could be recovered or retained through more comprehensive compiler testing and analysis.
This thesis addresses this gap through a set of complementary contributions that make the debugging quality of optimized code both measurable and improvable. First, it introduces a hybrid methodology to quantify debug information quality, combining dynamic traces and static analysis to accurately measure the completeness of preserved lines and variables. Second, it proposes a conjecture-based approach to detect completeness bugs in compiler toolchains, cases where compilers silently omit variables from the debug state, and reports 38 previously unknown issues in gcc and clang, 24 of which were confirmed and fixed by developers. Third, it presents DebugTuner, a framework that tunes compiler optimizations to preserve debuggability by identifying individual passes that degrade debug information and synthesizing debug-friendly configurations with minimal performance loss. This work also led to the introduction of the -opt-disable flag in LLVM, now available to the community. Finally, the thesis introduces Reparo, a binary-level analysis tool that enhances existing debug information, reconstructing accurate variable lifetimes that reflect the computations in binary code. The same analysis methodology is further extended to API monitoring for malware analysis, improving completeness and correctness in such systems, proving the cross-domain capability of our research.
Together, these contributions transform symbolic debugging from a best-effort, tool-specific capability into a measurable, improvable property of modern compilation. They provide a foundation for developing compilers and analysis tools that optimize not only for performance but also for debuggability.