SANTOS, A. A.; http://lattes.cnpq.br/2454224571306182; SANTOS, Adriano Araújo.
Résumé:
The necessity of managing the large amount of digital existing documents nowadays,
associated to the human inability to analyze ali this information in a fast manner, led
to a growth of research in the area of system development for automation of the
information management process. Nevertheless, this is not a trivial task. Most of the
available documents do not have a standardized structure, hindering the
development of computational schemes that can automate the analysis of
information, thus requiring jobs of information conversion from natural language to
structured information. For such, syntactic, temporal and spatial pattern recognition
tasks are needed. Concerning the present study, the main objective is to create an
advanced temporal pattern recognition mechanism. We created, heurístically, a rules
dictionary of temporal patterns, developing a module in an extendable and flexible
architecture for retrieval and marking. This module, called RISO-TT, implements this
pattern recognition mechanism and is part of the RISO project (Retrieval of Semantic
Information from Textual Objects). Two experiments were carried out in order to
evaluate the efficiency of this approach. The first one was intended to verify the
extendability and flexibility of the RISO-TT architecture and the second one to
analyze the efficiency of the proposed approach, based on a comparison between
the developed module and two Consolidated tools in the academic community
(Heideltime and SuTime). RISO-TT outperformed the rivais in the temporal
expression marking process, which was proved through statistical tests.