Understanding the testing culture of machine learning projects on Github.

Accueil de DSpace
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
CURSOS DE GRADUAÇÃO DO CEEI
→
Curso de Bacharelado em Ciência da Computação
→
Trabalho de Conclusão de Curso - Artigo - Ciência da Computação
→
Voir le document

Understanding the testing culture of machine learning projects on Github.

SANTOS, W. M. A.; http://lattes.cnpq.br/3961233355703842; SANTOS, Wesley Matteus Araújo dos.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/29359

Date: 2023-02-14

Résumé:

In the last few years, the use of machine learning has spiked in several industries, showing its remarkable potential for solving both old and emergent problems on a scale never seen before. However, despite the eforts on producing new and improved models, as well as more reliable training methodologies, little is known about how these softwares are being tested. In this paper, we investigate the adoption of Python libraries for or related to automated testing on more than 290 machine learning repositories on Github. We also compare repositories that do and do not use those tools, in terms of quality metrics, and study their code coverage. As a result, 28 libraries used for testing support purposes were identiied and 65.19% of all projects adopted at least one of them. We also found that reinforcement learning and data analysis/visualization projects have the highest adoptions of automated testing, and that unittest, pytest and doctest are the most used libraries in our corpus. Furthermore, we found that half of the projects that use at least one testing library, have less code smells (48.28% in median) and, on average, they have less vulnerabilities (71.42%).

Afficher la notice complète