•  
  •  
 

DOI

10.1016/j.jds.2025.02.022

First Page

2460

Last Page

2466

Abstract

Abstract Background/purpose Large language models (LLMs) offer promising applications in dentistry, but their performance in specialized, image-rich contexts such as dental technology examinations remains uncertain. The purpose of this study was to evaluate the accuracy of three multimodal LLMs, ChatGPT-4o (4o), OpenAI o1 (o1), and Claude 3.5 Sonnet (Sonnet), when presented with questions from the Japanese National Examination for Dental Technicians. Materials and methods A total of 240 multiple-choice questions from 2022 to 2024 theory sections of the exam were used. Each question, including its accompanying figures or images where applicable, was presented to the three LLMs in a zero-shot manner without specialized prompting. Correct response rates were calculated overall, as well as by question type (text-only vs. visually-based) and subject area. Statistical comparisons were performed using Cochran's Q test, followed by McNemar's test with Bonferroni correction where indicated. Results Overall correct response rates were 58.3 % (4o), 67.5 % (o1), and 64.6 % (Claude 3.5 Sonnet). For text-only questions, o1 achieved the highest accuracy (79.1 %), significantly outperforming 4o (68.3 %; P = 0.017). In contrast, all models showed reduced accuracy on visually-based questions (44.6–55.4 %), with no significant difference among them. Conclusion These results suggest that multimodal LLMs can supplement theoretical dental technology education, although their limited performance on visual tasks indicates the need for traditional hands-on training. Enhanced image interpretation skills may help address workforce challenges in dental technology.

Share

COinS