Investigação sobre uso de vocabulário de código-fonte para identiﬁcação de especialistas.

DSpace Principal
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO
→
Doutorado em Ciência da Computação.
→
Ver ítem

Investigação sobre uso de vocabulário de código-fonte para identiﬁcação de especialistas.

SANTOS,; http://lattes.cnpq.br/1246085373474860; SANTOS, Katyusco de Farias.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/606

Fecha: 2015-02-28

Resumen:

Identiﬁers and comments from a source code are the software vocabulary. Research point vocabularies as a valuable source of information about the project. To understand we developed a tool that extract them from source code. Exploring the data statistically, we identify two vocabularies properties: vocabulary size, that is a power function of LOC (Lines-Of-Code) and the repetition of vocabulary terms that ﬁts alog-normal distribution. Vocabulary as well as their properties and operations were formalized based on the concept of multisets. Extraction tool and formalization made possible scientiﬁc cooperation on usage of vocabulary in maintenance activities. This accumulated knowledge has shown that vocabulary was not explored as an input to code knowledge. Then we developed a code experts identiﬁcation approach whose knowledge is deﬁned by existing similarity between entities and developers vocabularies. We compared precision and recall with two baseline approaches: based on commits and based on percentage of modiﬁed LOC.The results show that to indicate a single specialist, top-1, our approach has alower precision, between 29.9% and 10%,than baseline approaches. More than one specialist-developer, up to top-3, our approach has better accuracy of up to 18.7% over those of the baselines. We also identify that the knowledge deﬁned by similarity when combined with an authorship model enhances the ability to identify experts, R2 of the model, by more than 4 points. We conclude that vocabulary can be solely used to expertise, and thus identify experts. In addition, vocabulary can be an additional component for models based on authorship and ownership, since it captures different aspects from ones existing in these models.

Mostrar el registro completo del ítem