Desenvolvimento de um codificador de voz pessoal de baixa taxa baseada em modelos de Markov escondidos.

DSpace Principal
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA
→
Mestrado em Engenharia Elétrica.
→
Ver ítem

Desenvolvimento de um codificador de voz pessoal de baixa taxa baseada em modelos de Markov escondidos.

ROCHA, R. B.; http://lattes.cnpq.br/0884771058471411; ROCHA, Raíssa Bezerra.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/8165

Fecha: 2012-07-27

Resumen:

This dissertation presents the development of a voice encoder which has the transmission of voice signals with low bitrates as its main feature. Developed mainly for utilization in mobile cellular systems, the proposed encoder uses the phonetic coding technique, which provides the lowest transmission rate. Its implementation is divided into the development of the emitter and the receiver. In the emitter, the speech signals are segmented by a phoneme recognizer which utilizes Hidden Markov Models (HMMs) to model the voice signal. A pre-established index is assigned to each phoneme and its duration and energy are estimated. The information transmitted to the receiver consists of the index, energy and duration of each phoneme. This way the encoder achieves a reduction in the voice signal transmission rate. The receiver is constituted in two steps. In the first, each encoder user has to build an acoustic unit bank by pronunciation of pre-established phrases. The second step is a synthesis by concatenation of segments as syllables, phonemes and vowel meetings. To evaluate the performance of the encoder, an informal subjective test based on the ACR (Absolute Category Rating) test was used. Two evaluations were done. The first used automatic segmentation in the emitter and receiver, and the encoder allowed transmission of the voice signal with a rate of up to 150 bits/s. The results of the voice signal quality indicate that the evaluators classified most of the samples as average to good. In the second evaluation the segmentation used to form the acoustic unit bank was done manually. Sixty-two listening evaluators were questioned about the intelligibility and quality of the speech signals. The voice signals were coded using 125 bits/s, and most of them presented good levels of intelligibility and reasonable quality.

Mostrar el registro completo del ítem