GOMES, J. I. F.; http://lattes.cnpq.br/8540079502222271; GOMES, José Igor de Farias.
Resumo:
Geographic regions representation has been the main target of several researches in the last years, as it is the key component for performing various tasks, such as searching for similar regions. However, such representation is not a trivial task, as it may involve numerous variables in the process. The current trend is for these representations to be made using high-dimensional vectors, known as embeddings. However, search operations for these tend to be resource-intensive for the machine in terms of processing time and disk usage. In this article we experimented with different kinds of manipulation on these vectors in order to reduce the consumption of computational resources during the search without significantly impacting the relevance of the results produced. Vector dimensionality reduction techniques and the quantization of its elements were performed, in addition to comparing the exact search for nearest neighbors and the approximate search for them. We observed that the approximate search for nearest neighbors reduces the search time by approximately 42,6%, while still maintaining a good approximation with the baseline results. The embeddings quantization technique showed the second-best intersection with the baseline results and significantly reduced disk usage by the indexes. Techniques such as dimensionality reduction did not result in significant changes in the search time and had very low intersection with the research baseline.