http://lattes.cnpq.br/9747902267969441; AMORIM, Brunna de Sousa Pereira.
Resumo:
In order to decrease the number of road accidents, solutions to identify influencing factors of road accidents and its risk areas are being researched throughout the world. However, road accident studies depend upon its location, hence this study uses supervised machine learning techniques and automated machine learning to classify accident risk sections of brazilian federal road s in severe or not-severe, using several features. The accident data was analized, pre-processed and its features were selected using different techniques, resulting in a set of information containing the week day and time the accident happened, the road type, the road route, the road orientation, the weather condition when the accident happened and the accident type. Machine learning models were trained and evaluated in four different scenarios: scenario A used a imbalanced database with the "accident frequency" feature, while scenario B used a imbalanced database without the "accident frequency" feature; scenario C used a balanced database with the "accident frequency" feature and scenario D used a balanced database without the "accident frequency" feature. To validate the model, the accuracy, precision, recall and F-measure metrics were used. Scenarios A and B results were disregarded since all models preticted only one class: not-severe. Scenario C best result was a MLP neural network model with 85% of accuracy, 87% of precision, 85% of recall and 84% of F-measure. The best results to scenario D were two combinations of classifiers: first, the combination of Random Forest and BernoulliNB; second, the combination of Logistic Regression and ExtraTreesClassifier, both resulting in 84,58% of accuracy, 88,14% of precision, 84,58% of recall and 84,06% of F-measure.