SOUZA, J. J. L.; http://lattes.cnpq.br/7746442450855672; SOUZA, José Joedson Lima de.
Resumo:
When we are faced with the non-normality of data, so that we can perform some statistical
analyzes more reliably, such as regression analysis, it is necessary that we find some way to
transform these data, aiming to meet, in particular, the assumption of normality. To assist us
in transforming real data extracted from the portal (MINISTÉRIO DA SAÚDE, 2022) of the
Department of Informatics of the Unified Health System (DATASUS) and later adjust a linear
model performed in the R Core Team (2023), we chose to use the Box-Cox transformation
technique. In addition to visual graphic analyses, to assess compliance with this assumption for
the residuals of the adjusted model, it is extremely important to carry out tests. Therefore, in this
research we proceeded with one of the most used tests for analysis of normality, the Shapiro-Wilk
test. The objective of this study was to validate the assumptions for adjusting a linear model to
the variables age and time of treatment of patients diagnosed with malignant neoplasia. In this
scenario, after applying the Shapiro-Wilk test, we verified that the residuals of the adjusted model
(p − valor = 1.107e − 08) led to the rejection of the null hypothesis (H0), that is, the residuals
did not follow a normal distribution. In this way, the Box-Cox transformation was applied to
these residues, however, after the transformation was carried out, the test was applied again to
the new data, and it was found that the data remained rejecting (H0), since the p-value was equal
to 0.001, being less than the significance level suggested by Fisher, thus, it can be concluded that
this transformation is not ideal for these data in question. For future works in which variables
adjusted to linear regression models have residuals that do not follow a normal distribution, we
suggest the application of “Generalized Linear Models” (GLMs); whose basic idea is to open a
range of options for the response variable, allowing it to belong to the uniparametric exponential
family of distributions.