Titolo della tesi: Application of Language Models on Code Analysis
Code analysis is a key topic for improving software quality and efficiency. This analysis
becomes even more important for securing code against potential cyber-attacks.
However, manual analysis of code, especially for the binary one, is complicated and
error-prone. Therefore, the investigation of new automatic techniques for code analysis is
research topic of great interest. As suggested by the "naturalness hypothesis", the code
exhibits similar statistical properties to natural languages. As a consequence, techniques
used for natural language processing can be also applied to analyze source and binary
code. For this reason, recent research applies neural language models on code analysis,
achieving significant results.
In line with this research trend, the two contributions of the thesis are focused on
the application of deep learning to analysis of code written in high-level and low-level
programming languages.
The first contribution of the thesis introduces a benchmark designed to evaluate models
for binary code representation. The tool can be used to test and compare the performance
of these models on various binary function tasks.
The second contribution, on the other hand, focuses on the application of neural
networks for analyzing source code. The contribution investigates the application of neural
language models for detecting code smells, that represent poor design choices potentially
impacting the code quality.