Uma investigação sobre como LLaMA-2 13B revisa código-fonte com ênfase em smell

DSpace Home
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO
→
Mestrado em Ciência da Computação.
→
View Item

Uma investigação sobre como LLaMA-2 13B revisa código-fonte com ênfase em smell

ALMEIDA, J. V. S.; http://lattes.cnpq.br/0668664022330187; ALMEIDA, João Victor Soares de.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/42073

Date: 2025-04-11

Abstract:

Code review in open source projects is a common and essential practice in software development, aiming to ensure source code quality and detect implementation issues. However, although essential, this manual practice can become costly and error-prone, especially in larger and collaborative projects. In this context, we investigate how the Large Language Model Meta AI (LLaMA-2 13B) can specifically contribute to the review of code smells, seeking to understand its capabilities and limitations in the development cycle. Our investigation was based on data extracted from consolidated open source projects such as Neovim, Keycloak, and gRPC. Starting from 19,149 comments distributed across 6,365 Pull Requests, we applied a hybrid approach consisting of systematic keyword filtering followed by manual analysis of comments, resulting in a code smell-focused dataset of 3,023 comments. After developing a specific prompt to guide the model’s reviews, we selected a stratified sample of 637 comments (21.10% of the dataset) for detailed evaluation. The results revealed that 91.73% of the model’s reviews showed low similarity to human reviews. Our qualitative analysis identified that in 72% of interventions the model diverges from human reviewers’ focus, although it provides technically comprehensive analyses in 48.3% of cases. The results suggest that, while LLaMA- 2 13B is capable of performing relevant analyses, its context limitations result in reviews that frequently diverge from human reviewers’ focus. Finally, we conclude that the model can be more effective when used as a complementary tool to human review, not as a substitute. Keywords: Code review; code smells; LLaMA-2 13B; Pull Requests; systematic analysis; prompt.

Show full item record