ADVERTISEMENT

Home|Journals|Articles by Year|Audio Abstracts
 

Original Article



Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination

Numan Mercan, Ahmet Yurteri, Ebubekir Eravşar, Ahmet Yıldırım.



Abstract
Download PDF Post

Objectives: In this study, the accuracy rates of the answers given by three different large language models (LLMs) (ChatGPT-4o, DeepSeek-R1, and Gemini 2.0) to the multiple-choice questions (MCQs) asked in the European Board of Hand Surgery (EBHS) exam and the reasons for the wrong answers were examined. It was hypothesized that the DeepSeek-R1 model would show a higher accuracy rate than the other two models based on reported differences in training datasets.
Materials and Methods: 10 different exams published in The Journal of Hand Surgery (European Volume) (between 2022- 2024) and 150 true/false MCQs were examined in the study. The MCQs divided into five subheadings according to the content of the questions, and these were anatomy, trauma, systemic-chronic diseases, microsurgery, and congenital disorders. The error reasons for the wrong answers of the models were divided into four groups, and these were data-related, semantic, algorithmic, and logical errors.
Results: ChatGPT-4o had a correct answer rate of 74%, DeepSeek-R1 76.7%, and Gemini 2.0 73.3%, and no significant difference was observed between these rates (p = 0.572). The models gave the same answer for 103 out of 150 MCQs, and 84.5% of these answers were correct. In the evaluation of wrong answers, it was seen that the most frequent type of error was data-related.
Conclusion: There was no significant difference in accuracy rates, content-based subcategories, or error types among the three LLMs. Data-related errors indicate gaps in training, but approximately 75% accuracy in this exam suggests that further error analysis could enhance future model performance.

Key words: artificial intelligence; board exam; ChatGPT; DeepSeek; error analysis; Gemini; hand surgery; large language models







Bibliomed Article Statistics

33
24
19
23
5
R
E
A
D
S

47

18

14

12

1
D
O
W
N
L
O
A
D
S
0203040506
2026

Full-text options


Share this Article


Online Article Submission
• ejmanager.com




ejPort - eJManager.com
Author Tools
About BiblioMed
License Information
Terms & Conditions
Privacy Policy
Contact Us

The articles in Bibliomed are open access articles licensed under Creative Commons Attribution 4.0 International License (CC BY), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.