Evaluating vision transformers and convolutional neural networks in the context of dental image processing: a systematic review


Creative Commons License

Felek T., TERCANLI H., Gök R. Ş.

BMC Oral Health, cilt.25, sa.1, 2025 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Derleme
  • Cilt numarası: 25 Sayı: 1
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1186/s12903-025-07036-5
  • Dergi Adı: BMC Oral Health
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, MEDLINE, Directory of Open Access Journals
  • Anahtar Kelimeler: Convolutional neural networks, Dental imaging, Image classification, Image segmentation, Vision transformers
  • Akdeniz Üniversitesi Adresli: Evet

Özet

Background: The aim of this systematic review is to compare the efficacy of convolutional neural networks (CNN) and Vision Transformers (ViT) in the field of dental imaging, in order to examine in depth the potential, advantages, and limitations of both models in this domain. Methods: The search strings used in the study were “((“Vision Transformer” OR ViT OR “Transformer architecture”) AND (“Convolutional Neural Network” OR CNN OR ConvNet) AND (Dental OR Dentistry OR “Maxillofacial” OR “Oral Radiology”) AND (Image OR Imaging OR Radiograph))”. The search was conducted in January 2025. Two investigators independently evaluated the full texts of all eligible articles and excluded those that did not meet the inclusion/exclusion criteria. Results: Of 2596 articles, 21 met the inclusion criteria. Depending on the task category, of the 21 studies that were reviewed, 14 (66.7%) utilized classification, while 7 (33.3%) utilized segmentation. Panoramic radiography is the most commonly used imaging modality (52.3%) and the ViT-based model was observed to have the highest performance (58%). Conclusion: ViT-based deep learning models tend to exhibit higher performance in many dental image analysis scenarios compared to traditional convolutional neural networks. However, in practice CNN and ViT approaches can be used in a complementary manner.