MARQUES JUNIOR, A. R.; http://lattes.cnpq.br/4426213995601363; MARQUES JUNIOR, Antonio Ricardo.
Résumé:
The Brazilian Federal Police (PF) operates, among its diverse duties, in the investigation of cases through federal agents in their respective departments. One of the most recurrent tasks carried out by investigators occurs in the process of open investigations, where the person in charge must verify if there is already a criminal investigation procedure for the fact in question. However, because it is a subjective activity and it depends who performs it, there is the possibility of setting up more than one investigation ascertaining the same fact, making the investigation process difficult. This study compares classic and and state-of-art models in information retrieval such as Cosine Distance, Jaccard Similarity, Doc2Vec, and WMD, in search of relevant inquiries from structured and unstructured data (textual documents), aiming to detect document inquiries duplicity, similar cases that assist decision-making in investigations or to train new delegates through similar crimes.
To build the IR models, we used non-confidential data from ePol, the web platform which manages investigations’ activities and interconnects the Federal Police Stations of Brazil.
Each model returns the 4 most similar inquires to a previous inquiry selected as input. 55
inquiries were used as queries for each model and their responses were submitted to an
evaluation. Given the problem deals with unsupervised data, the evaluation was fulfilled by contextual experts, represented by PF delegates and clerks, where they answered surveys daily regarding comparisons between inquires. The results show classical methods such as jaccard similarity and cosine distance chieve good results for similar inquiries’ detection, with NDCGs equal to 0.8812 and 0.8371 respectively. The WMD model still has an NDCG close to those already mentioned (0.8037) and doc2vec achieves the worst result (0.6743). The study suggests the performance of models based on neural networks are below the others because the training base is not considered large enough for a deep neural network model, which can make the learning task for this type of approach more difficult. For detection of duplicity and relationship between inquiries, the results were not satisfactory according to NDCG metric. However, it should be noted that, unlike the similarity between inquiries, duplicity and relationship between inquiries are not common events to occur in this context. The models suggested in this study can be used as a feature of the ePol platform, identifying duplicity between inquiries and thereby optimizing PF’s work by reducing the waste of corporate resources, suggesting similar inquiries to new delegates and helping them regarding what actions should be taken in a police investigation.