If you believe the GenAI hype, then you should not be surprised by the viral New York Time article, A.I. Chatbots Defeated Doctors at Diagnosing Illness, where doctors who were given ChatGPT access along with conventional resources did only slightly better than doctors who did not have access to the bot. But ChatGPT alone outperformed the doctors.
Follow #GenAI4Dev on LinkedIn to see GenAI ideas in real time.
You may want to wish-away Generative AI, thinking its not relevant to global health. However, we are already using GenAI in global health, and its use will only grow. Its time we get ahead of our constituents and help them understand the tool they are already using.
4 Lessons Learned from ChatGPT Diagnosis
Yet, you should not rush to judgement that somehow Generative AI is better than humans. Instead, we should draw one very clear conclusion from this study: We need ChatGPT prompt training for frontline health workers in low- and middle-income countries.
1. Common Issues Require Judgement Calls
We should not be using ChatGPT for common issues, because it will sometimes over-evaluate things – is that an infected splinter or early septic shock!? Ultimately we need a person to pass judgment and triage the issues.
For example, in the USA, 40% of patients with belly pain in the ER leave with a final diagnosis of belly pain, even after a full suite of expensive and insured lab tests. A cause is not found, it is not treated as an emergency, and they’re told the pain will go away eventually. Doctors make a judgement call to send them home. ChatGPT is never going to make that call effectively.
2. Rare Disease Cases Require Consultations
The cases used in the study were from a series of uncommon presentations of disease. Situations in which a physician should seek diagnostic help from a colleague to better understand the issue and explore different diagnosis.
For example, cholesterol embolism syndrome is a rare situation, seen only a few times in most doctors’ entire careers. Every doctor should be asking for — and receiving — help for these cases. Obviously, a specialist is the best option, but in LMIC settings, ChatGPT can be a wonderful assistant to double-check diagnosis, for the reasons highlighted in the study.
3. Even Doctors Are Clueless on ChatGPT Prompts
Patients don’t arrive in medical clinics with a written summary of their symptoms and signs described in medical terms. The human doctor turns the patient’s issue into a cohesive medical summary. In this study, the cases were not written in medical jargon. Doctors had to develop them. We should not expect frontline health workers to create well-written descriptions for Generative AI.
Doctors were given access to the chatbot without explicit training in prompt engineering techniques Even tech-friendly physicians who volunteered to be a part of this study could have improved the quality of their interactions with more prompt engineering training. Hence, even if doctors live in the Bay Area or went to Stanford, they should not be assumed to have GenAI expertise. We need to be training health professionals in every location.
4. ChatGPT Can Excel with Well-Written Prompts
When ChatGPT was given well-written prompts, and not second-guessed by doctors, it gave better diagnosis than doctors alone or working with ChatGPT. This can indicate that some doctors did not want to ask for support from others, and therefore reached erroneous diagnosis. It also could indicate that even when ChatGPT was correct, doctors second-guessed the algorithms.
Finally, it again, suggests that health practitioners should be trained in prompt engineering.
The tiny sample set of 50 doctors and highly controlled study do hinder us from extrapolating too much from this researchy. We don’t really know if ChatGPT alone is better than well-trained doctors constructively working with GenAI. Only that we should be doing a better job alerting health practitioners to the promise and opportunity of generative AI in healthcare.