DOI
10.1016/j.jds.2025.03.018
First Page
2427
Last Page
2435
Abstract
Abstract Background /purpose: Large language models (LLMs) have been studied in text-based healthcare tasks, but their performance in multimodal dental applications has not yet been fully explored. This study evaluated the performance of four multimodal LLMs on dental licensing examination questions with both text-only and visually-based components. Materials and methods Four multimodal LLMs, ChatGPT-4o (4o), OpenAI o1 (o1), Claude 3.5 Sonnet (Sonnet), and Gemini 2.0 Flash Thinking Experimental (Gemini), were tested on 353 questions from the 2024 Japanese National Dental Examination, including 204 text-only and 149 visually-based questions spanning 17 dental specialties. A zero-shot approach was used without prompt engineering. Performance was analyzed using Cochran's Q test and McNemar's test with Bonferroni correction. Results o1 achieved the highest overall correct response rate (81.9 %), followed by Sonnet (71.7 %), Gemini (66.6 %), and 4o (65.7 %). All models performed significantly better on text-only questions (79.9–92.2 %) than on visually-based questions (45.6–67.8 %). Performance varied by specialty, with highest scores in basic medical sciences (Dental pharmacology: 100 %; Oral physiology: 86.7–100 %) and lower scores in clinical specialties requiring visual interpretation (Orthodontics: 36.4–66.7 %). Conclusion Multimodal LLMs demonstrate promising performance on dental examination questions, particularly in text-based scenarios, but significant challenges remain in complex visual interpretation. The remarkable zero-shot performance of newer models such as o1 suggests potential applications in dental education and certain aspects of clinical decision support, although further advances are needed before reliable application in visually complex diagnostic workflows.
Recommended Citation
Mine, Yuichi; Okazaki, Shota; Taji, Tsuyoshi; Kawaguchi, Hiroyuki; Kakimoto, Naoya; and Murayama, Takeshi
(2025)
"Benchmarking multimodal large language models on the dental licensing examination: Challenges with clinical image interpretation,"
Journal of Dental Sciences: Vol. 20:
Iss.
4, Article 64.
DOI: 10.1016/j.jds.2025.03.018
Available at:
https://jds.ads.org.tw/journal/vol20/iss4/64