LEANDRO, C. R. V.; http://lattes.cnpq.br/8611845611254864; LEANDRO, Caroliny Regina Valença.
Resumo:
Observability plays an important role in software development and maintenance. We can say that a system is observable when any state it can enter can be understood and explained, whether it is routine or something completely new. Along with metrics and traces, logs represent one of the pillars of observability, playing a vital role in debugging system states. This highlights its importance as a source of data and the need for its treatment and storage. In this context, OpenTelemetry emerges as a framework and set of tools that aims to facilitate the collection and management of observability data in systems. Being independent of vendors and tools, and adopting an open-source model, OpenTelemetry proves to be highly versatile software, adaptable to the individual needs of its users, making it an ideal choice in implementing observability in systems. The focus of this work is on improving a module used in an OpenTelemetry collector, whose main function is to receive logs on na e-commerce platform. This module comprises two components: the WAL, responsible for detecting failures in sending logs to OpenSearch and storing unsent logs in an object storage service; and a log Replayer, which attempts to resend the stored logs to OpenSearch later. However, the log Replayer faces challenges related to the availability of hardware resources, instability in variable environments, and limitations in configuration, which negatively impact its effectiveness in sending logs to OpenSearch. In addition, the absence of data on the health and performance of the WAL can make maintenance and debugging of this component difficult due to the lack of relevant information. Given this scenario, this work aims to improve the log Replayer, aiming to improve the availability and utilization of hardware resources, increase its reliability in sending logs to OpenSearch, and make it more flexible in terms of configuration. Additionally, it is intended to add observability capabilities to the WAL mechanism to ensure greater visibility of the mechanism’s operation and facilitate debugging of the same.