http://lattes.cnpq.br/4071050262331837; GADELHA, Guilherme Monteiro.
Résumé:
Automatic traceability recovery between software artifacts potentially improves the process of developing software, helping detect issues early during its life-cycle. Approaches applying Information Retrieval (IR) or Machine Learning (ML) techniques in textual data have been proposed, but those techniques differ considerably in terms of input parameters and results. It is difficult to assess their benefits and drawbacks when those techniques are applied in isolation, usually in small and medium-sized software projects. Also, an overview would be more comprehensive if a promising Deep Learning (DL) based technique is applied, in comparison with traditional IR techniques. We propose an approach to recover traceability links between textual software artifacts, in special bug reports and test cases, which can be instantiated with a set of IR and DL techniques. For applying and evaluating our solution, we used historical data from the Mozilla Firefox quality assurance (QA) team, for which we assessed the following IR techniques: Latent
Semantic Index (LSI), Latent Dirichlet Allocation (LDA) and Best Match 25 (BM25). We also applied the approach with a DL technique called Word Vector. Since there are no
traces matrices that straightly link bug reports and test cases, we used system features as
intermediate artifacts. In the context of traceability from bug reports to test cases, we noticed poor performances from three out of the four studied techniques. Only the LSI technique presented satisfactory effectiveness, even standing out over the state-of-the-art BM25 technique. Whereas theWord Vector technique presented the lowest effectiveness in our study. The obtained results show that the application of the LSI technique – set up with an appropriate combination of thresholds to define if a candidate trace is positive or not – is feasible for real-world and large software projects using a semi-automatized traceability recovery process, where the human analysts are aided by an appropriated software tool.