Evaluation of free-access artificial intelligence chatbots in preoperative patient education about general anesthesia: A comparative study of ChatGPT, gemini, and copilot

Oluş, FATİH; Babun, Hüseyin

doi:10.1177/20552076261450741

Evaluation of free-access artificial intelligence chatbots in preoperative patient education about general anesthesia: A comparative study of ChatGPT, gemini, and copilot

Oluş F., Babun H.

Digital Health, sa.12, ss.1-9, 2026 (SCI-Expanded, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1177/20552076261450741
Dergi Adı: Digital Health
Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Directory of Open Access Journals
Sayfa Sayıları: ss.1-9
Akdeniz Üniversitesi Adresli: Evet

Özet

Objective

Artificial intelligence (AI) chatbots are increasingly used by patients seeking medical information. However, the accuracy and educational quality of such tools in the context of anesthesia remain unclear. This study aimed to evaluate and compare the appropriateness of responses generated by three widely accessible AI platforms—ChatGPT, Gemini, and Copilot—regarding frequently asked questions about general anesthesia.

Methods

Fifty anesthesia-related questions were developed by two anesthesiologists and categorized into four domains: General Information and Process, Safety and Risks, Pain, Comfort, and Recovery, and Preoperative Preparation. Each question was entered in English into the free, publicly available versions of ChatGPT, Gemini, and Copilot. Ten blinded anesthesiologists rated the responses using a 5-point Likert scale (1 = very inappropriate to 5 = very appropriate). Mean scores were compared using one-way ANOVA with Tukey’s post-hoc tests, and inter-rater reliability was assessed using Cronbach’s α.

Results

ChatGPT achieved the highest overall mean score (4.68 ± 0.50), followed by Gemini (4.22 ± 0.63) and Copilot (3.28 ± 0.50), with significant differences among all platforms (p < 0.001). ChatGPT consistently outperformed the others across all four domains. Qualitative observations from evaluator comments suggested that ChatGPT’s concise summaries improved readability, Gemini provided more structured responses with more scholarly-style references, and Copilot was clear but often less detailed. Inter-rater reliability was high (Cronbach’s α = 0.89).

Conclusion

Among free-access AI chatbots, ChatGPT provided the most accurate and comprehensive explanations regarding general anesthesia. While Gemini and Copilot offered partial value, professional oversight remains essential to ensure safe and contextually accurate patient education in preoperative care.