RIBEIRO, C. H. G.; http://lattes.cnpq.br/5968277195460296; RIBEIRO, Carlos Henrique Gonçalves.
Resumen:
Background: Code smells refer to patterns in source code that deviate from established design principles. During code review, developers have the opportunity to identify and correct these smells, thereby enhancing the overall quality of the codebase. Further examination of the discussions within code reviews can reveal
valuable insights about how code smells are discussed. Aim: In order to enable future research to better understand developers behavior regarding code smells, we set out to build a dataset of code-smell related discussions. In practice, we want to classify comments in two categories: code smell comments and non code
smell comments. Method: To do so, we conducted an experiment that leveraged semantic search as a classiication technique. The training data was scraped from three popular open source GitHub repositories and consisted of over 100,000 entries. Results: As a result, we have automatically classiied 4,058 review comments as being code smell related. Although employing a novel technique and disposing of limited resources we could achieve a precision of 0.41 for the task of classiication.