Can artificial intelligence-based large language models pass the National Dentistry Examination in Peru?

Authors

DOI:

https://doi.org/10.20453/reh.v35i4.6253

Keywords:

artificial intelligence, dental education, educational assessment, large language models

Abstract

Objective: To determine which artificial intelligence (AI) large language model demonstrates the highest accuracy in answering the 2023 National Dentistry Examination (ENAO, by its acronym in Spanish) in Peru, compared with the official answer key. Material and methods: The 100 multiple-choice questions from the 2023 ENAO were tested using ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot. Responses were categorized by subject area and scored as correct or incorrect. Data were analyzed using the chi-square test (α = 0.05). Results: ChatGPT-4 achieved the highest overall accuracy (90.00%), followed by Gemini (82.00%), Copilot (79.00%), and ChatGPT-3.5 (76.00%). Across most models, the highest accuracy was observed in Public Health, Research, Health Services Management, and Ethics, whereas lower performance was observed in Anatomy and in Oral Medicine and Pathology. Pairwise comparisons revealed that ChatGPT-4 performed significantly better than ChatGPT-3.5 (difference: 14%; p = 0.0084) and Copilot (difference: 11%; p = 0.0316); no significant differences were found among the remaining model comparisons (p > 0.05). Conclusion: All AI language models demonstrated effectiveness in answering the 2023 ENAO questions, with ChatGPT-4 achieving the highest accuracy.

Downloads

Download data is not yet available.

Author Biographies

Miguel Á. Saravia-Rojas, Universidad Peruana Cayetano Heredia, School of Stomatology. Lima, Peru.

Master in Stomatology with a Specialization in Orofacial Harmonization from the Faculdade de Tecnologia IPPEO, Brazil, and Doctor in Stomatology from the Universidad Peruana Cayetano Heredia (UPCH). Lecturer in the Professional Program of Stomatology and currently Head of the Academic Department of Stomatological Clinic (DACE).

Carlos Mendiola-Aquino, Universidad Peruana Cayetano Heredia, School of Stomatology. Lima, Peru.

Master in Stomatology with a Specialization in Endodontics from the Universidad Peruana Cayetano Heredia (UPCH). Currently serving as Vice Dean of the School of Stomatology. Lecturer in the Professional Stomatology Program and the Second Professional Specialization Program in Endodontics.

Francisco Orejuela-Ramirez, Universidad Peruana Cayetano Heredia, School of Stomatology. Lima, Peru.

Dental surgeon. Master in Public Health with a specialization in Epidemiology. Lecturer in the Academic Department of Social Dentistry at the School of Stomatology "Roberto Beltrán." Course coordinator for Social Dentistry and professor of postgraduate courses.

Wanderley Tunquipa-Chacón, Universidad Peruana Cayetano Heredia, School of Stomatology. Lima, Peru.

Graduate of the School of Stomatology at Universidad Peruana Cayetano Heredia (UPCH). Practice Supervisor in the Academic Department of Stomatological Clinic – UPCH.

Rocio Geng-Vivanco, Universidad Peruana Cayetano Heredia, School of Stomatology. Lima, Peru.

Dentist, graduated from the School of Stomatology at Universidad Peruana Cayetano Heredia (Lima, Peru) in 2014. Internship at the University of Maryland, School of Dentistry (USA, 2014). Master's degree in Sciences (Oral Rehabilitation) from the School of Dentistry of Ribeirão Preto, University of São Paulo (FORP-USP). Currently pursuing a PhD in the Department of Dental Materials and Prosthodontics at FORP-USP. Research Internship Abroad at the University of Illinois Chicago, College of Dentistry (USA, 2023-2024).

References

Xu L, Sanders L, Li K, Chow JC. Chatbot for health care and oncology applications using artificial intelligence and machine learning: systematic review. JMIR Cancer [Internet]. 2021; 7(4): e27850. Available from: https://doi.org/10.2196/27850

Chung J, Kim D, Choi J, Yune S, Song KD, Kim S, et al. Prediction of oxygen requirement in patients with COVID-19 using a pre-trained chest radiograph xAI model: efficient development of auditable risk prediction models via a fine-tuning approach. Sci Rep [Internet]. 2022; 12: 21164. Available from: https://doi.org/10.1038/s41598-022-24721-5

Kim D, Chung J, Choi J, Succi MD, Conklin J, Longo MG, et al. Accurate auto-labeling of chest X-ray images based on quantitative similarity to an explainable AI model. Nat Commun [Internet]. 2022; 13: 1867. Available from: https://doi.org/10.1038/s41467-022-29437-8

O’Shea A, Li MD, Mercaldo ND, Balthazar P, Som A, Yeung T, et al. Intubation and mortality prediction in hospitalized COVID-19 patients using a combination of convolutional neural network-based scoring of chest radiographs and clinical data. BJR Open [Internet]. 2022; 4(1): 20210062. Available from: https://doi.org/10.1259/bjro.20210062

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ [Internet]. 2023; 9: e45312. Available from: https://doi.org/10.2196/45312

Takagi S, Watari T, Erabi A, Sakaguchi K. Performance of GPT-3.5 and GPT-4 on the Japanese medical licensing examination: comparison study. JMIR Med Educ [Internet]. 2023; 9: e48002. Available from: https://doi.org/10.2196/48002

Taira K, Itaya T, Hanada A. Performance of the large language model ChatGPT on the national nurse examinations in Japan: evaluation study. JMIR Nurs [Internet]. 2023; 6: e47305. Available from: https://doi.org/10.2196/47305

Wang YM, Shen HW, Chen TJ. Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc [Internet]. 2023; 86(7): 653-658. Available from: https://doi.org/10.1097/JCMA.0000000000000942

Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, et al. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg [Internet]. 2024; 170(6): 1492-1503. Available from: https://doi.org/10.1002/ohn.489

Suárez A, Díaz-Flores V, Algar J, Gómez M, Llorente M, Freire Y. Unveiling the ChatGPT phenomenon: evaluating the consistency and accuracy of endodontic question answers. Int Endod J [Internet]. 2024; 57(1): 108-113. Available from: https://doi.org/10.1111/iej.13985

Patil NS, Huang RS, van der Pol CB, Larocque N. Comparative performance of ChatGPT and Bard in a text-based radiology knowledge assessment. Can Assoc Radiol J [Internet]. 2024; 75(2): 344-350. Available from: https://doi.org/10.1177/08465371231193716

Morishita M, Fukuda H, Muraoka K, Nakamura T, Hayashi M, Yoshioka I, et al. Evaluating GPT-4V’s performance in the Japanese national dental examination: a challenge explored. J Dent Sci [Internet]. 2024; 19(3): 1595-1600. Available from: https://doi.org/10.1016/j.jds.2023.12.007

Kaftan AN, Hussain MK, Naser FH. Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study. Sci Rep [Internet]. 2024; 14: 8233. Available from: https://doi.org/10.1038/s41598-024-58964-1

Haze T, Kawano R, Takase H, Suzuki S, Hirawa N, Tamura K. Influence on the accuracy in ChatGPT: differences in the amount of information per medical field. Int J Med Inform [Internet]. 2023; 180: 105283. Available from: https://doi.org/10.1016/j.ijmedinf.2023.105283

Farajollahi M, Modaberi A. Can Chatgpt pass the “Iranian endodontics specialist board” exam? Iran Endod J [Internet]. 2023; 18(3): 192. Available from: https://doi.org/10.22037/iej.v18i3.42154

Mihalache A, Grad J, Patil NS, Huang RS, Popovic MM, Mallipatna A, et al. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye [Internet]. 2024; 38(13): 2530-2535. Available from: https://doi.org/10.1038/s41433-024-03067-4

Ohta K, Ohta S. The performance of GPT-3.5, GPT-4, and Bard on the Japanese national dentist examination: a comparison study. Cureus [Internet]. 2023; 15(12): e50369. Available from: https://doi.org/10.7759/cureus.50369

Spataro J. Introducing Microsoft 365 Copilot - Your copilot for work [Internet]. Official Microsoft Blog; 2023, March 16. Available from: https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/

Hendrycks D, Burns C, Basart S, Zou A, Mazeika M, Song D, et al. Measuring massive multitask language understanding [preprint en Internet]. arXiv; 2021. Available from: https://doi.org/10.48550/arXiv.2009.03300

Memarian B, Doleck T. ChatGPT in education: methods, potentials, and limitations. Comput Hum Behav Artif Humans [Internet]. 2023; 1(2): 100022. Available from: https://doi.org/10.1016/j.chbah.2023.100022

Downloads

Published

2025-12-30

How to Cite

1.
Saravia-Rojas M Á., Mendiola-Aquino C, Orejuela-Ramirez F, Tunquipa-Chacón W, Geng-Vivanco R. Can artificial intelligence-based large language models pass the National Dentistry Examination in Peru? . Rev Estomatol Herediana [Internet]. 2025 Dec. 30 [cited 2025 Dec. 31];35(4):305-11. Available from: http://44.198.254.164/index.php/REH/article/view/6253

Issue

Section

ORIGINAL ARTICLES

Most read articles by the same author(s)

1 2 > >>