This study aimed to compare the responses of ChatGPT-5 and DeepSeek Chat Space to frequently asked questions (FAQs) regarding tibial plateau fractures, focusing on quality, reliability, and readability. A total of 15 FAQs were identified based on clinical expertise and patient-oriented online sources. These questions were submitted to ChatGPT-5 and DeepSeek in separate chat sessions, and responses were recorded without follow-up prompts. The outputs were evaluated using the Ensuring Quality Information for Patients (EQIP-20) and Global Quality Score (GQS) for quality, the Modified DISCERN Scale for reliability, and nine established readability indices. DeepSeek achieved significantly higher scores than ChatGPT-5 in both EQIP-20 (median 87.5 vs. 80.0, p=0.006) and GQS (median 5.0 vs. 4.0, p=0.001). For reliability, both models showed the same median DISCERN score (3.0), but DeepSeek’s results were more consistent, whereas ChatGPT-5’s responses were more variable (p=0.041). Readability analyses revealed that both models generated texts requiring high school to university-level comprehension, well above the recommended sixth grade reading level. Significant differences were observed in the Coleman–Liau Index, Simple Measure of Gobbledygook, Overall Language Weight, and Long Word Grade metrics. Both ChatGPT-5 and DeepSeek provided generally satisfactory responses regarding tibial plateau fractures. However, DeepSeek performed better in terms of quality and reliability, while ChatGPT-5 produced responses with greater variability. Despite these strengths, the outputs of both models were too complex for the average patient population, underscoring the continued need for physician oversight in patient education.
Key words: Artificial intelligence, chatgpt, deepseek, tibial plateau fractures, quality assessment, reliability, readability
|