Uma abordagem para detecção de discurso de ódio utilizando aprendizado de máquina baseado em cruzamento de idiomas.

DSpace Principal
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO
→
Doutorado em Ciência da Computação.
→
Ver ítem

Uma abordagem para detecção de discurso de ódio utilizando aprendizado de máquina baseado em cruzamento de idiomas.

FIRMINO, A. A.; http://lattes.cnpq.br/6042902332948785; FIRMINO, Anderson Almeida.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/27501

Fecha: 2022-05-18

Resumen:

The growth of social media around the world has brought both benefits and challenges to society. Among the challenges, we highlight the proliferation of hate speech in social networks. Detecting hate speech has become an arduous task in today’s world. About 22.5 million posts with hate speech were removed from social networks between April and June 2020. Thus, it is necessary to develop research that seek automated solutions to identify and remove hate speech in social networks. In this thesis, we propose a new methodology for detecting hate speech in Portuguese texts. This methodology uses Cross-Lingual Learning, which consists of using transfer learning in Pre-Trained Language Models with a language with large corpora available (source language) to solve problems in languages with less annotated data (target language). The proposed methodology comprises four stages: corpora acquisition, definition of PTLM, training strategies and evaluation. We carried out experiments using Pre-Trained Language Models in different languages: English, Italian and Portuguese (BERT and XLM-R) to verify which one best suited the proposed method. Corpora in English (WH) and Italian (Evalita 2018) were used as source language and two corpora in Portuguese (target language) were used: OffComBr-2 and Hate Speech Dataset (HSD). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1 Score = 92%); and for the HSD corpus, the second best result was obtained (F1 Score = 90%).

Mostrar el registro completo del ítem