SILVA, W. J.; http://lattes.cnpq.br/0360011804148231; SILVA, Welmisson Jammesson da.
Resumo:
Information Extraction (IE) is a collection of methods and techniques that have as objective to extract, from semi-structured or non-structured data sources, relevant information. An EI system is able to extract, from textual information sources, only information that is of interest to system users, the parts that are not interesting to users are not extracted. In this work, a new supervised IE method is proposed where the extracted information, text parts, is non-structured; this represents a progress in relation to 'traditional' IE, where the extracted information is structured according to a user-defined template. Being supervised, information extraction from new documents is induced from a previous collection of documents with their marked relevant - parts training set -; however, the method innovates in the sense that the training set can be very small in absolute terms, this way propitiating low cost of its preparation. Another innovation of the method is its extraction technique, that is an appropriate combination of existent techniques. Domain independence and independence of format of documents are other two important characteristics of the method. For the validation of the method, the system TIES-Textual Information Extraction System - was developed and tested with two disparate domains, one on electric power systems and the another on legislation for public administration: the results of the tests, for the two domains, were promising.