•  
  •  
 

DOI

10.1016/j.jds.2025.07.010

First Page

2176

Last Page

2180

Abstract

Abstract Background/purpose ChatGPT has been utilized in medical and dental education, but its performance is potentially influenced by factors like language, question types, and content complexity. This study aimed to assess how English translation and question types affect ChatGPT-4o's accuracy in answering English-translated oral pathology (OP) multiple choice questions (MCQs). Materials and methods A total of 280 OP MCQs were collected from Taiwan National Dental Licensing Examinations and English-translated as a testing set for ChatGPT-4o. The mean overall accuracy rates (ARs) for English-translated and non-translated MCQs were compared by the dependent t -test. The difference in ARs between English-translated and non-translated OP MCQs within each of three question types (image-based, case-based, and odd-one-out questions) was assessed by chi-square test. The binary logistic regression was used to determine which type of question was more likely to be answered incorrectly. Results ChatGPT-4o showed significantly higher mean overall AR (93.2 ± 5.7 %) for English-translated MCQs than for non-translated MCQs (88.6 ± 6.5 %, P < 0.001). There were no significant differences in the ARs between English-translated and non-translated MCQs within each question type. The binary logistic regression revealed that, within the English-translated condition, image-based questions were significantly more likely to be answered incorrectly (odds ratio = 9.085, P = 0.001). Conclusion Translation of exam questions into English significantly improved ChatGPT-4o's overall performance. Error pattern analysis confirmed that image-based questions were more likely to result in incorrect answers, reflecting the model's current limitations in visual reasoning. Nevertheless, ChatGPT-4o still demonstrated its strong potential as an educational support tool.

Share

COinS