Identificação de correspondências entre produtos a partir de descrições textuais curtas

Accueil de DSpace
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO
→
Doutorado em Ciência da Computação.
→
Voir le document

Identificação de correspondências entre produtos a partir de descrições textuais curtas

ALVES, André Luiz Firmino; http://lattes.cnpq.br/5729800124276465; ALVES, André Luiz Firmino.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/41569

Date: 2025-04-10

Résumé:

Decision-making processes in organizations increasingly depend on data. Therefore, issues related to data quality, such as incomplete, inconsistent, and redundant information, represent significant challenges. Data integration emerges as a critical research area, focused on combining and unifying information from different sources and formats, even in heterogeneous and autonomous environments, aiming to provide a comprehensive and consistent data view. For commercial transactions, companies issue invoices to document sales and purchases. However, the product data within these invoices often lack standardization, potentially presenting short, varied, and inconsistent descriptions. This research addresses the technical challenges of data integration and Product Matching in scenarios with limited or incomplete data, such as those in invoices. Our proposed approach, STEPMatch, leverages Information Retrieval and Natural Language Processing techniques to match short texts, such as invoice product descriptions. The results demonstrated the effectiveness of STEPMatch, achieving an accuracy of 98.11% in a test scenario. Additionally, we present a novel approach by adopting cross-lingual learning techniques within the Product Matching field, enhancing the generalization of machine learning models in contexts with limited labeled data and yielding promising results in cross-lingual and cross-domain adaptation. Our primary contribution lies in adopting machine learning techniques for product-matching, training in scenarios targeting low-resource language data, and demonstrating the feasibility of improving product-matching quality in large volumes of data from distinct languages

Afficher la notice complète