Análise acústica da fala para auxílio à detecção e à classificação de distúrbios da voz.

Página inicial
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO
→
Mestrado em Ciência da Computação.
→
Ver item

Análise acústica da fala para auxílio à detecção e à classificação de distúrbios da voz.

AZEVEDO, G. A.; http://lattes.cnpq.br/7397197962569354; AZEVEDO, Gabriel Almeida.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/41977

Data: 2025-02-03

Resumo:

The voice is one of the most important means of human communication. Through speech, a message can be easily transmitted. Like any part of the human body, the phonatory system can be affected by diseases, which are commonly called voice pathologies, among which are vocal tract disorders (also called voice disorders), which include dysphonia, laryngitis, polyps and paralysis, the focus of the research. In most cases, early diagnosis is essential to contain the worsening of the patient’s clinical condition. However, the task of detecting and classifying these disorders is sometimes timeconsuming and requires expertise from the doctor. In addition, some of the tests are invasive, causing discomfort to the patient. In view of the above and aiming to assist in medical diagnosis, accelerating it and collaborating in the necessary basis, the research described here investigates the use of deep neural networks for the automatic classification of voice signals, in the healthy and pathological (or disordered) categories, with the objective of distinguishing between dysphonia, laryngitis, polyps and paralysis, with the adoption of non-invasive techniques for acquiring information. Data such as mel spectrograms, zero crossing rate (ZCR), root mean square energy (RMSE) and MFCC coefficients were used as sources of information for pre-trained CNN networks and hybrid CNN-RNN LSTM networks. Techniques for data augmentation, such as time stretching, time shifting and white noise injection were applied to the extracted data of the database used (Saarbruecken Voice Database - SVD) to overcome the problem of insufficient data. Each of the proposed approaches was built in two versions, one for female voices and another for male voices, and their performance was evaluated using the metrics accuracy, loss, precision, sensitivity (recall) and F1-score. The performance was evaluated using the metrics accuracy, loss, precision, sensitivity recall and F1-score. The binary classification networks achieved accuracy rates of 99,33% (male voices) and 99,50% (female voices), and the multi-classification networks achieved accuracy rates of 96,40% (female voices) and 89,20% (male voices), representing an important advance and contribution in the area of automatic detection and classification of vocal tract disorders and potential for clinical use.

Mostrar registro completo