BARBOSA, I. C.; http://lattes.cnpq.br/2396932829533767; BARBOSA, Iann Carvalho.
Resumo:
In the context of software development, bug reports (BRs) are fundamental for identifying
and describing flaws that impact the quality and stability of the final product. The growing
volume of BRs in large software projects makes manual identification of similar BRs
a time-consuming and error-prone task, leading to reduced efficiency in the development
process. Aiming to improve resource allocation, expedite the resolution of recurring problems,
and optimize software development, we examined the application of machine learning
techniques relevant to these issues. To this end, we utilized the T5 (Text-to-Text Transfer
Transformer) model, the TF-IDF (Term Frequency-Inverse Document Frequency) method,
and a hybrid approach, leveraging the effectiveness of T5 in Semantic Textual Similarity
(STS) tasks and the versatility of TF-IDF in lexical analyses, combining them to enhance
the identification of similar BRs. The pipeline is divided into data retrieval, preprocessing,
vectorization, normalization, neural network training, and evaluation of obtained results. We
evaluated the performance of 56 models, applying various modeling strategies. This detailed
analysis reveals that using complete vectors as features is more effective than using cosine
distance. The proposed hybrid approach demonstrates promising results, often outperforming
individual approaches. The study also performs fine-tuning on 14 promising models,
testing 168 hyperparameter combinations, with Adam and RMSprop optimizers showing the
best performance. The contributions of this work include an evaluation of T5 and TF-IDF
performance in the context of BRs, the conception and validation of a hybrid approach, and
the exploration of various modeling strategies. The research offers suggestions for future
implementations, potentially improving efficiency and effectiveness in development and facilitating
resource allocation. The findings on T5 performance and the effectiveness of the
hybrid approach drive future research and applications in recommendation systems for bug
management and software development, highlighting the importance of their continuous improvement.