RAPOSO, L. B.; http://lattes.cnpq.br/7921625824750417; RAPOSO, Lucas Brasileiro.
Abstract:
Large Language Models (LLMs) emerged as a paradigm shift in the use of Artificial Intelligence (AI) and are widely used in different areas. One of the most popularized models of this term is ChatGPT, developed by OpenAI. Since its rise, other companies, such as Meta and Google, have developed their own models as alternatives to GPT. These tools are presented as problem solving tools in a wide variety of contexts. However, little attention has been paid to measuring the correctness and efficiency of their responses. In addition, most studies in this area are limited to the English language context, without effectively testing the models in globalized scenarios. Therefore, this study proposes to subject Meta, OpenAI and Google systems to objective multiple choice assessments on high school level content, using the National High School Exam (ENEM) tests. After collecting the responses from the models, analyses were performed, comparing the performance of each model and the averages of Brazilian students, considering the number of correct answers per test. Surprisin gly, this work showed that all three models performed better in more “subjective” areas than in objective areas, going against common sense.