Uso de amostras não-marcadas para a melhoria de desempenho na classificação de texto.

Página inicial
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
CURSOS DE GRADUAÇÃO DO CEEI
→
Curso de Bacharelado em Engenharia Elétrica
→
Curso de Bacharelado em Engenharia Elétrica - CEEI - Relatórios de Estágio
→
Ver item

Uso de amostras não-marcadas para a melhoria de desempenho na classificação de texto.

MARTINS, A. D.; http://lattes.cnpq.br/4028898456588241; MARTINS, André Dieb.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/19442

Data: 2012-08

Resumo:

Here we present Kamal Paul Nigam's semi-supervisioned training technique for improving the Naive Bayes classifier's performance through the use of unlabeled samples. This metodology is motivated by the cost reduction achived when building a classifier using fewer labeled samples (costlier) and more more unlabeled samples. It's shown that using a combination of Expectation-Maximization method and Naive Bayes learning surpasses the traditional one alone. Through the introduction of unlabeled samples in the learning, it's observed a reduction on the amount of labeled samples needed for achieving several performance levels. The experiments were performed on 20Newsgroups- 18828 corpus and show similar results to Nigam's, even when relaxing some of the conditions imposed. More specifically, we relaxed the chronologic condition (use older documents for training newer for testing), resulting in similar positive results while under 15 classes.

Mostrar registro completo