Abstract

Background: Gastrointestinal (GI) tract perforation is a surgical emergency requiring rapid diagnosis, often via chest radiography. Artificial intelligence (AI), including large language models like ChatGPT, has potential to enhance medical imaging but its efficacy in detecting GI perforation is unclear. We compared the diagnostic accuracy of ChatGPT 3.5 and 4 with human experts in interpreting chest radiographs for GI perforation.

Methods: This retrospective study, approved by the Arel University Hospital Ethics Committee (E-52857131-050.06.04-455896), analyzed 504 chest radiographs from patients diagnosed with GI perforation between 2010 and 2021. Radiographs were classified into three groups: definite GI perforation, suspicious requiring further imaging, or no perforation. Two clinicians (emergency medicine specialist and general surgeon) independently evaluated radiographs, followed by ChatGPT 3.5 and 4 using a standardized prompt. Diagnostic accuracy was assessed with chi-square tests, and decision-making times with Student’s t-test (p<0.05 for significance).

Results: Of 504 patients (11.1% female, mean age 45.4 years), human evaluators correctly classified 80.1% of radiographs, compared with 3.9% for ChatGPT 3.5 and 5.9% for ChatGPT 4 (p<0.001). ChatGPT models were faster (p<0.001) but failed to interpret 94.1–96.1% of radiographs, often recommending clinical consultation.

Conclusion: General-purpose ChatGPT models lack the accuracy for reliable GI perforation diagnosis on chest radiographs. Specialized AI models, trained on medical imaging datasets, are needed to improve diagnostic precision and support clinical workflows.

Keywords: AI in healthcare, artificial intelligence, emergency radiology, radiology AI

License

How to Cite

1.
Tokoçin M, Tokoçin O. Diagnostic performance of ChatGPT in detecting gastrointestinal tract perforation on chest radiographs: a comparative study. J Trends Med Invest. 2025;1(2):49-53. https://doi.org/10.64512/JTMI.2025.11

References

  1. Patel V, Khan MN, Shrivastava A, et al. Artificial intelligence applied to gastrointestinal diagnostics: a review. J Pediatr Gastroenterol Nutr. 2020;70(1):4-11. https://doi.org/10.1097/MPG.0000000000002507 DOI: https://doi.org/10.1097/MPG.0000000000002507
  2. Xue VW, Lei P, Cho WC. The potential impact of ChatGPT in clinical and translational medicine. Clin Transl Med. 2023;13(3):e1216. https://doi.org/10.1002/ctm2.1216 DOI: https://doi.org/10.1002/ctm2.1216
  3. Hu M, Pan S, Li Y, Yang X. Advancing medical imaging with language models: a journey from N-grams to ChatGPT. arXiv. 2023:2304.04920.
  4. Hwang EJ, Hong JH, Lee KH, et al. Deep learning algorithm for surveillance of pneumothorax after lung biopsy: a multicenter diagnostic cohort study. Eur Radiol. 2020;30(7):3660-71. https://doi.org/10.1007/s00330-020-06771-3 DOI: https://doi.org/10.1007/s00330-020-06771-3
  5. Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American heart association course? Resuscitation. 2023;185:109732. https://doi.org/10.1016/j.resuscitation.2023.109732 DOI: https://doi.org/10.1016/j.resuscitation.2023.109732
  6. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-40. https://doi.org/10.1038/s41591-023-02448-8 DOI: https://doi.org/10.1038/s41591-023-02448-8
  7. Keshavarz P, Bagherieh S, Nabipoorashrafi SA, et al. ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives. Diagn Interv Imaging. 2024;105(7-8):251-65. https://doi.org/10.1016/j.diii.2024.04.003 DOI: https://doi.org/10.1016/j.diii.2024.04.003
  8. Ahyad RA, Zaylaee Y, Hassan T, et al. Cutting edge to cutting time: can ChatGPT improve the radiologist’s reporting? J Imaging Inform Med. 2025;38(1):346-56. https://doi.org/10.1007/s10278-024-01196-6 DOI: https://doi.org/10.1007/s10278-024-01196-6
  9. Shen OY, Pratap JS, Li X, Chen NC, Bhashyam AR. How does ChatGPT use source information compared with Google? a text network analysis of online health information. Clin Orthop Relat Res. 2024;482(4):578-88. https://doi.org/10.1097/CORR.0000000000002995 DOI: https://doi.org/10.1097/CORR.0000000000002995
  10. Shah P, Kendall F, Khozin S, et al. Artificial intelligence and machine learning in clinical development: a translational perspective. NPJ Digit Med. 2019;2:69. https://doi.org/10.1038/s41746-019-0148-3 DOI: https://doi.org/10.1038/s41746-019-0148-3
  11. Thondebhavi Subbaramaiah M, Shanthanna H. ChatGPT in the field of scientific publication - are we ready for it? Indian J Anaesth. 2023;67(5):407-8. https://doi.org/10.4103/ija.ija_294_23 DOI: https://doi.org/10.4103/ija.ija_294_23
  12. Ong CWM, Blackbourn HD, Migliori GB. GPT-4, artificial intelligence and implications for publishing. Int J Tuberc Lung Dis. 2023;27(6):425-6. https://doi.org/10.5588/ijtld.23.0143 DOI: https://doi.org/10.5588/ijtld.23.0143