http://lattes.cnpq.br/5714569690420143; ALVES, George Marcelo Rodrigues.
Abstract:
Due to the constant growth of the number of texts available on the Web, there is a need to catalog that information which appear at every moment. However, it is an arduous task in which humans are unable to perform this task manually, given the increased amount of data available at every second. Numerous studies have been conducted in order to automate the cataloging process. A research line with utility for various areas of human knowledge is the indexing of documents based on temporal contexts present in these documents. This is not a trivial task, as it involves the analysis of unstructured information present in natural language, available in several languages, among other difficulties. The main objective of this work is to create a model to allow indexing of documents, creating topic maps enriched with the concepts in text and their related temporal information. This approach led to the RISO-GCT (Temporal Contexts Generation), a part of RISO Project (Semantic Information Retrieval on Text Objects), which aims to create a semantic indexing environment and retrieval of documents, enabling a more accurate recovery. RISO-GCT uses the results of a preliminary module, the RISO-TT (Temporal Tagger) responsible the labeling temporal information contained in documents and carrying out the process of normalization of temporal expressions. Found. In this module the normalization of temporal expressions has been improved, in order allow a richer temporal context determination. Experiments were conducted to evaluate the effectiveness of the approach proposed a in this research. The first, in order to verify that the topic map previously created by RISO-IC has been correctly enriched with temporal information related to the concepts correctly, and the second, to analyze the effectiveness of the normalization of expressions extracted from documents. The experiments concluded that both the RISO-GCT, as the RISO-TT, which was evolved during this work, obtained better results than similar tools.