SAMPAIO, L. R.; http://lattes.cnpq.br/2334194004546171; SAMPAIO, Lília Rodrigues.
Résumé:
The ability to efficiently process large amounts of data, such as that from IoT sensors, is a
desired goal for many systems, especially since the value of this data can quickly drop after the moment it is collected. Processing demands of this kind led to the development of the Data Stream Processing (DSP) paradigm, where data arrives continuously and needs to be processed in real time. Such applications are subject to varying operating conditions, and it is important to adapt well to different scenarios while maintaining Quality of Service (QoS) goals. Traditional approaches suggest solutions aimed at the automatic scaling of resources, which presents challenges such as defining good metrics of interest for QoS objectives, determining the interval for collecting this data and estimating the amount of resources that must be provisioned. Although new techniques for monitoring and adapting DSP systems are continuously evolving, many of the proposed solutions do not have the necessary theoretical basis to guarantee high levels of accuracy in their execution. On the other hand, given its analytical approach, Control Theory can be a good alternative for this purpose. However, applying control techniques in computer systems still presents itself as a challenge, mainly due to the difficulty in abstracting the complex behavior of software in a mathematical form suitable for the design of a controller, in order to reduce the system delay, generate appropriate corrective actions, and minimize steady-state error. Considering this, this work proposes to apply and evaluate control theory methodologies in micro-batch DSP systems. System identification methods are used to generate a model representation of Asperathos, a framework for automating different data processing applications while maintaining customizable QoS goals. Based on this, a Proportional-Integral controller that tracks performance metrics is proposed, as well as a demonstration of its tuning. A SIMO-type multi-objective controller is also proposed, based on performance and cost metrics. For the validation of the solution, energy data disaggregation tasks are performed in a Kubernetes cluster orchestrated by Asperathos.