MARINUS, J. V. M. L.; http://lattes.cnpq.br/9632762751005388; MARINUS, João Vilian de Moraes Lima.
Resumen:
In recent years, several researches in the Digital Voice Processing area have been carried out with the objective of assessing the quality of the patient's voice and assisting a specialist in the diagnosis of pathologies in the vocal folds. The acoustic analysis of the voice can be an efficient tool for the diagnosis of pathologies and has the advantage of not being invasive. In this context, this thesis had as main objective the investigation and the creation of methods for the classification of voices affected by pathologies in the vocal folds. Therefore, the objective was to verify the use of non-linear analysis of the voice signal to characterize vocal fold pathologies, based on images obtained from Chaos Theory techniques. For this purpose, 5 classes of vocal fold pathologies were studied: Paralysis, Edema, Nodule, Polyp and Keratosis. Additionally, it was studied a class called Benign Injury in the vocal folds, consisting of the grouping of voice signals affected by nodule, polyp and cyst. In the research, two databases were used: Massachusetts Eye and Ear Infirmary (MEEI) and Saarbruecken Voice Database (SVD). The pre-processing step consisted of increasing the number of signals using the Time Stretching method, segmenting and winding the signals. In the feature extraction phase, images of each signal segment were obtained from the trajectories of the reconstructed phase space of the signal. The images were used to train two Convolutional Neural Network (CNN), one with and one without a bottleneck layer. From the bottleneck layer, feature vectors were obtained, which were used to train a Support Vector Machine (SVM). The SVM results were compared to the CNN results without the bottleneck layer. 14 classifications were performed: Normal versus Pathology; 10 paired classifications among the 5 classes of pathologies; and 3 classifications between the classes Paralysis, Edema and Keratosis versus Benign Injury. The Normal versus Pathology classification provided 100% correctness, for both CNN and SVM. The nodule versus polyp classification provided accuracy above 90%, and the other classifications provided results between 70 and 90%. It was observed that, in general, classifications using data increase in the training set had better results than classifications without using such an increase, except in classifications involving the polyp class. In most cases, the use of segment sizes of 10 pitch cycles, for the formation of images, provided better results than the classic size of 20 ms. In general, the classification using bottleneck and SVM provided results superior to those using only CNN. The proposed approach proved to be promising for the area of recognition of pathologies in the vocal folds by voice, since it provided good results when classifying different types of pathology, which is a arduous task due to the loud character of the voice signal affected by pathology.