MACENA, A. A.; MACENA, Arthur de Amorim.
Abstract:
This course completion work addresses the development of a web tool for the data annotation
project at the Computer Laboratory Intelligent Applied(LACINA). This tool focuses on dialogue-based
audio annotations, with the aim of facilitating the creation of datasets obtained for training dialogue
systems based on machine learning. The tool was designed to provide an efficient and intuitive
platform, allowing the identification of speakers and the marking of intentions and entities in audio
dialogues. The project highlights the importance of dialog-based audio annotations to the
advancement of dialog systems, and discusses the advantages of having annotated datasets, as well
as the ethical and privacy considerations associated with this process. The development session
addresses the technical aspects of the tool, including the choice of technologies and frameworks
used, the challenges faced during the course and the solutions adopted to overcome them. Desirable
functionalities were highlighted, such as a friendly user interface, advanced playback features and
synchronized visualization of audio and annotated text. Then, practical applications of the tool in the
context of machine learning and natural language processing projects were discussed. The annotated
datasets created with the tool can be used to train speech recognition and language comprehension
models, contributing to the development of conversational dialogue systems. This work aims to
provide a comprehensive overview of this ever-evolving area, highlighting its impact and its potential
to drive research and development of audio-based machine learning applications. The dialog-based
audio annotations web tool represents a significant contribution to the field of natural language
processing and dialog systems, facilitating the creation of high quality annotated datasets and driving
the advancement and development of more advanced dialog systems. efficient, natural and accurate.