RAMALHO, R. E. C.; http://lattes.cnpq.br/7686727918516258; RAMALHO, Rich Elton Carvalho.
Résumé:
Information Extraction Systems assist humans in searching for specific information in documents.
However, most of these systems do not support documents in the Portable Document Format (PDF),
which is widely used. In a PDF document, the text content is mixed with metadata or semi-structured
data, which makes it difficult for Natural Language Processing (NLP) algorithms to extract the required
information. The Court of Auditors of the State of Acre (TCE-AC) is the supervisory and controlling
body of the use of public money and the budget and financial administration of the state of Acre,
responsible for analyzing and judging the public accounts of the jurisdictions. Jurisdictions must
publish information related to bids both in the TCE-AC bid management system and in the Official
Gazette of the State of Acre (DOE), which uses the PDF format. It is the responsibility of the TCE-AC to
verify that the bidding information is in both places, thus generating a lot of manual work. In this
work, we present a PLN solution with the objective of extracting the DOE acts, automatically
categorizing the acts as bidding or not, if so, advanced PLN techniques will be used to process and
extract the entities and information from the bidding so that it is possible assist the TCE-AC to verify
that the bid is also in the bid management system.