SOUSA, Ianna M. S. F.; http://lattes.cnpq.br/8545006395210271; SOUSA, Ianna Maria Sodré Ferreira de.
Resumen:
With the outsprccd of the digital libraries and the Internet more and
more electronic texts, written in several languages, become available for a wide and
geographically dispersed public. This turns it's necessary to develop tools that
facilitates indexing, representation and retrieval of multilingual documents.
This thesis presents a method for semiautomatic construction of a
multilingual thesaurus, based on the indexing of electronic documents, in order to
support a adequate information retrieval, independent of the language of the
documents.
The method consists in extracting the terms of a document and to use
an analysis of the co-occurrence of terms in order to determine its relevance. Using
special unilingual dictionaries, abstract, language-independent terms are determined.
Relevant concepts are represented as binary relations and, using the method of
rectangular decomposition of Gammoudi, rectangles of pairs concept/document are
determined and added to the existing thesaurus incrementally.
Special dictionaries and an interaction with the user determines the
correct contexts for ambiguous terms, further on eliminating flexions and
determining the abstract concepts.
A prototype has been developed which allows a continuous update of
the existing thesaurus, indexing new documents, in several languages. It also
supports multilingual queries and the addition of the new languages.