Contribuições no contexto da teoria da Informação para o processamento de sinal genômico.

Página inicial
→
Campus Campina Grande | Centro de Engenharia Elétrica e Informática - CEEI
→
PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA
→
Doutorado em Engenharia Elétrica.
→
Ver item

Contribuições no contexto da teoria da Informação para o processamento de sinal genômico.

ARRUDA, M. M.; ARRUDA, MILENA M.; http://lattes.cnpq.br/3299838657781132; ARRUDA, Milena Marinho.

URI: http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/28317

Data: 2022-10-07

Resumo:

The growth of biological databases and the need to understand how the many components present in a living cell are interacting and working together to perform cellular functions are reasons that justify the interdisciplinary application of mathematical, statistical and computational theories for the analysis and processing of genomic information. The genetic information of an organism is encoded in deoxyribonucleic acid molecules (DNA) by means of units called bases. The analysis and processing of DNA sequences to obtain biological knowledge constitute the domain of this document. The research developed aims to integrate the theory and methods of signal processing and information theory to extract genomic information. One of the main challenges is, therefore, to define a mapping rule to represent DNA sequences that are initially in a symbolic domain, taking them to a numerical domain. The first result considers a bijective unidimensional mapping for elements of a finite field with the aim of analyzing the hypothesis that DNA is acting as a linear code in the transmission of stored information. Hence, there will be an error-correcting code underlying the DNA sequences. In this context, a new algorithm is proposed to search for BCH codes whose codewords are at a Hamming distance at most unity from the numerical vector resulting from the mapping of a given DNA sequence. Furthermore, it is shown that the DNA sequences are approximately uniformly distributed, under the Hamming metric, in a vector space of dimension n. Therefore, the genrator polynomial of the codes that identify collections of taxonomically close sequences do not provide enough biological information to group and classify them. The second result based on the hypothesis that when considering a fixed mapping for all DNA sequences, it is not possible to guarantee that the intrinsic characteristics of each sequence will be properly extracted. Therefore, two new algorithms are proposed: SNR-SE and TBP-SE, both based on the spectral envelope theory to calculate these mappings. The applicability of these methods in the context of spectral analysis to discriminate coding and non-coding sequences of proteins is analyzed and compared with other mappings already consolidated in the literature. In this scenario, the proposed algorithm, TBP-SE, had the highest accuracy and sensitivity among all evaluated. This stands out, since, in this application, sensitivity is especially important, as the probability of having a coding sequence that will not be identified is low. In addition, TBP-SE demonstrated good assertiveness even to detect regions with shorter coding sequences.

Mostrar registro completo