Impact of language and question types on ChatGPT-4o's performance in answering oral pathology questions from Taiwan National Dental Licensing Examinations

Chung-Pin Chiang, Hualien Tzu Chi Hospital Buddhist Tzu Chi Medical Foundation
Yu-Hsueh Wu, National Cheng Kung University
Kai-Yun Tso, National Cheng Kung University

DOI

10.1016/j.jds.2025.07.010

First Page

2176

Last Page

2180

Abstract

Abstract Background/purpose ChatGPT has been utilized in medical and dental education, but its performance is potentially influenced by factors like language, question types, and content complexity. This study aimed to assess how English translation and question types affect ChatGPT-4o's accuracy in answering English-translated oral pathology (OP) multiple choice questions (MCQs). Materials and methods A total of 280 OP MCQs were collected from Taiwan National Dental Licensing Examinations and English-translated as a testing set for ChatGPT-4o. The mean overall accuracy rates (ARs) for English-translated and non-translated MCQs were compared by the dependent t -test. The difference in ARs between English-translated and non-translated OP MCQs within each of three question types (image-based, case-based, and odd-one-out questions) was assessed by chi-square test. The binary logistic regression was used to determine which type of question was more likely to be answered incorrectly. Results ChatGPT-4o showed significantly higher mean overall AR (93.2 ± 5.7 %) for English-translated MCQs than for non-translated MCQs (88.6 ± 6.5 %, P < 0.001). There were no significant differences in the ARs between English-translated and non-translated MCQs within each question type. The binary logistic regression revealed that, within the English-translated condition, image-based questions were significantly more likely to be answered incorrectly (odds ratio = 9.085, P = 0.001). Conclusion Translation of exam questions into English significantly improved ChatGPT-4o's overall performance. Error pattern analysis confirmed that image-based questions were more likely to result in incorrect answers, reflecting the model's current limitations in visual reasoning. Nevertheless, ChatGPT-4o still demonstrated its strong potential as an educational support tool.

Recommended Citation

Chiang, Chung-Pin; Wu, Yu-Hsueh; and Tso, Kai-Yun (2025) "Impact of language and question types on ChatGPT-4o's performance in answering oral pathology questions from Taiwan National Dental Licensing Examinations," Journal of Dental Sciences: Vol. 20: Iss. 4, Article 15.
DOI: 10.1016/j.jds.2025.07.010
Available at: https://jds.ads.org.tw/journal/vol20/iss4/15

Download

COinS

Impact of language and question types on ChatGPT-4o's performance in answering oral pathology questions from Taiwan National Dental Licensing Examinations

Authors

DOI

First Page

Last Page

Abstract

Recommended Citation

Share

Search