SANTOS, W. M. A.; http://lattes.cnpq.br/3961233355703842; SANTOS, Wesley Matteus Araújo dos.
Résumé:
In the last few years, the use of machine learning has spiked in several
industries, showing its remarkable potential for solving both
old and emergent problems on a scale never seen before. However,
despite the eforts on producing new and improved models, as well
as more reliable training methodologies, little is known about how
these softwares are being tested. In this paper, we investigate the
adoption of Python libraries for or related to automated testing
on more than 290 machine learning repositories on Github. We
also compare repositories that do and do not use those tools, in
terms of quality metrics, and study their code coverage. As a result,
28 libraries used for testing support purposes were identiied and
65.19% of all projects adopted at least one of them. We also found
that reinforcement learning and data analysis/visualization projects
have the highest adoptions of automated testing, and that unittest,
pytest and doctest are the most used libraries in our corpus. Furthermore,
we found that half of the projects that use at least one
testing library, have less code smells (48.28% in median) and, on
average, they have less vulnerabilities (71.42%).