Dens invaginatus as a diagnostic challenge: evaluating large language models against expert endodontic reasoning


Creative Commons License

Erkal D., Felek T., Butean O., Er K.

BMC ORAL HEALTH, vol.25, pp.1-6, 2025 (SCI-Expanded)

  • Publication Type: Article / Article
  • Volume: 25
  • Publication Date: 2025
  • Doi Number: 10.1186/s12903-025-06987-z
  • Journal Name: BMC ORAL HEALTH
  • Journal Indexes: Scopus, Science Citation Index Expanded (SCI-EXPANDED), CINAHL, EMBASE, MEDLINE, Directory of Open Access Journals
  • Page Numbers: pp.1-6
  • Open Archive Collection: AVESIS Open Access Collection
  • Akdeniz University Affiliated: Yes

Abstract

Introduction

This study hypothesized that large language models (LLMs) would underperform compared to expert clinicians in diagnosing and managing complex endodontic anomalies, such as dens invaginatus, when provided with periapical radiographs. Although LLMs have shown promise in dental education and basic diagnostics, their effectiveness in nuanced clinical reasoning has remained unclear.

Methods

Nineteen anonymized periapical radiographs depicting challenging endodontic conditions were paired with clinical vignettes. Six advanced LLMs and one expert endodontist independently answered six structured clinical questions per case. Each response was scored against a reference key. Accuracy rates were compared using Kruskal-Wallis and Mann-Whitney U tests. Chi-square tests were used to evaluate model performance across question types.

Results

The expert achieved 100% accuracy, while all LLMs performed significantly lower (P < 0.05). Copilot demonstrated the lowest scores across all questions. The most substantial performance drop was observed in anomaly classification tasks, particularly in identifying and categorizing dens invaginatus. No significant performance differences were found among the top-performing LLMs.

Conclusions

While LLMs showed competence in basic diagnostic tasks, they failed to replicate expert-level decision-making in complex endodontic scenarios. Their current capabilities remain insufficient for unsupervised clinical use. This study is among the first to assess LLMs using real radiographic data in endodontics and highlights the need for further multimodal model development.