FRANCESCA CONSOLE

Dottoressa di ricerca

ciclo: XXXVI



Titolo della tesi: Application of Language Models on Code Analysis

Code analysis is a key topic for improving software quality and efficiency. This analysis becomes even more important for securing code against potential cyber-attacks. However, manual analysis of code, especially for the binary one, is complicated and error-prone. Therefore, the investigation of new automatic techniques for code analysis is research topic of great interest. As suggested by the "naturalness hypothesis", the code exhibits similar statistical properties to natural languages. As a consequence, techniques used for natural language processing can be also applied to analyze source and binary code. For this reason, recent research applies neural language models on code analysis, achieving significant results. In line with this research trend, the two contributions of the thesis are focused on the application of deep learning to analysis of code written in high-level and low-level programming languages. The first contribution of the thesis introduces a benchmark designed to evaluate models for binary code representation. The tool can be used to test and compare the performance of these models on various binary function tasks. The second contribution, on the other hand, focuses on the application of neural networks for analyzing source code. The contribution investigates the application of neural language models for detecting code smells, that represent poor design choices potentially impacting the code quality.

Produzione scientifica

11573/1685762 - 2023 - BinBench: a benchmark for x64 portable operating system interface binary function representations
Console, F.; D'aquanno, G.; Di Luna, G. A.; Querzoni, L. - 01a Articolo in rivista
rivista: PEERJ. COMPUTER SCIENCE. (San Francisco CA: PeerJ Inc.) pp. - - issn: 2376-5992 - wos: WOS:001009615200002 (0) - scopus: 2-s2.0-85162166082 (0)

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma