COSTA, D. A.; COSTA, Dante de Araújo.
Résumé:
Predictive models in machine learning and knowledge discovery in database processes, particularly in domains like basketball, are invaluable for gaining insights into player performance. This study compares supervised machine learning approaches (black-box and white-box models, including ensemble methods) to analyze statistical data from college basketball players (NCAA). We aim to identify NCAA players with high potential for NBA success, determine which player characteristics most influence selection decisions, and how these models have such conclusions to compare their performances and the associated explainability. This task is challenging due to factors beyond statistics, such as player context and team roster considerations during selection. The main objective is to provide decision-makers with crucial insights for player selection, aid in better player assessment, and develop young talents by emphasizing key game aspects. We compare interpretable prediction model results with satisfactory accuracy levels. Balancing interpretability and predictive accuracy, we employ white-box, black-box, and ensemble classification methods like Decision Trees, Logistic Regression, Support Vector Machine, Multi-Layer Perceptron, Random Forest, and XGBoost. Additionally, genetic algorithms were used to reduce each model's feature set, retaining only the most impactful features. Compared to standard procedures without feature selection, all models showed improved performance. We found minimal differences in predictive accuracy between the best white-box and black-box models. Genetic algorithms and logistic regression combination outperformed other models' predictive accuracy while significantly reducing features and enhancing result interpretability. The analysis also highlights the most influential features in the model and how models came to such conclusions.