What do tuberculosis and breast cancer have in common? We’ve spent decades paying for artificial intelligence-based solutions to solve both problems without any discernible sense of whether we are actually helping people.
Astounding to imagine, and yet here we are. The hype around AI has successfully papered over the giant gap in the evidence base, which might have justified the millions, if not billions, we’ve thrown at this technology. In other words, in more situations than we care to admit, we still can’t definitively answer the question of whether this massive investment has improved or saved lives.
Birthplace of AI4health: Breast Cancer Screening
One of the first true examples of artificial intelligence (AI) in health care was computer-aided detection (CAD) in the assessment of mammograms for breast cancer screening.
The first U.S. Food and Drug Administration (FDA)-approved AI-infused product hit the market in the late 1990s, and by 2008 three-quarters of all mammograms conducted under the Medicaid programme were assessed using this technology. The FDA approved these tools because they allowed non-specialists to achieve a degree of diagnostic accuracy which was at least as good as, if not better than, breast cancer specialist radiographers.
Just short of a decade after the first CAD-based system was approved, there began a deluge of high-profile studies that one by one illustrated some flaw in this logic. These included increased false-positive recall rates with CAD-assisted single reading, or the fact that there seemed to be no relationship between the increased rates of biopsy as a result of CAD and detection of invasive breast cancer. It finally culminated in a landmark retrospective study of over half a million mammograms from the Breast Cancer Surveillance Consortium which found that:
“Computer-aided detection does not improve diagnostic accuracy of mammography. These results suggest that insurers pay more for CAD with no established benefit to women.”
Honestly, one of the simplest—and yet most effective—sentences I’ve ever read in an academic paper. But breast cancer CAD hasn’t gone away.
Today we have better deep-learning-based diagnostic systems and, at best, paltry evidence that we’re shifting the needle on women’s health in a cost-effective way. Finally, after more than 20 years, a Swedish team of researchers decided to do what we all knew was necessary and run a randomised controlled trial.
The interim results were published in August 2023 and concluded that AI-assisted reading of mammograms was, in fact, safe and reduced workload by 44%. In a couple of years, we’ll see whether using the AI-based tool reduces the number of ‘interval’ cancers, the holy grail of screening.
It will only have taken us three decades to finally have some robust evidence of whether integrating AI into breast cancer screening is worth it. Had it been anything other than AI, I’m almost certain we’d have asked the question earlier.
Same Problem, Different Context: TB Screening
Tuberculosis (TB) claims millions of lives annually, even today. In Africa, it is amongst the leading causes of death. A brilliant team at Malawi-Liverpool-Wellcome ran a trial several years ago that looked at the cost-effectiveness of using an AI tool to read chest radiographs for TB alongside HIV testing for people with undifferentiated coughs at acute primary care facilities. These tools are generically referred to as TB computer-aided diagnostics or TB-CAD.
Triple the proportion of people were identified as having TB and initiated on treatment when using TB-CAD versus standard diagnosis processes. Awesome, right?
Well, it’s complicated.
There are lots of effective diagnostic tools out there that would increase the number of people we identify with a disease. But everything comes at a price. The reality is that we can’t afford to buy everything everywhere. We need to move beyond just asking whether something is effective to if it is cost-effective. In other words, asking:
- Do I need this new technology for the marginal benefit it provides?
- Can I serve more patients by using financial resources in other ways?
In that same paper, when the authors report on the cost-effectiveness of TB-CAD in the community setting, they show an extremely high cost based on the World Health Organization-recommended quality-adjusted-life-year (QALY) thresholds. At the time of the study, it cost 12 times the Malawi GDP per capita ($4,800) for each QALY!
More than 70 countries have TB-CAD in their TB programmes, and many of them are screening millions of chest x-rays. Based on what you’ve just read, do you think it’s worth asking the question of whether this relatively expensive technology is the best use of health care resources?
Granted that’s just one example, so, let’s look at another. Fortunately, the same research team just published another trial, this time focused on diagnosing TB in people with HIV admitted to the hospital.
Differences in treatment rates observed in the group using standard diagnosis processes versus those who received a diagnosis using more recent innovations were driven by the use of a special new urine-based test rather than the computer-assisted tool.
In other words, the AI tool wasn’t the intervention driving the value. Different setting, remarkably similar results…
Why AI Diagnosis Isn’t a Game Changer
Even if AI-assisted tools can diagnose people more accurately, it doesn’t mean those people will be able to access high-quality treatment and thus live longer, healthier lives. A review comparing diagnostic tests illustrated that there is no relationship between how much better the new test was and the downstream benefit to patients.
This is because high-quality health care service delivery requires many complex pieces to work together effectively. Putting significant resources into tools such as AI-assisted diagnostics is only one piece of the puzzle and doesn’t guarantee success.
What is the Best Way Forward?
If you’ve read this far, you’re probably imagining I’m an AI-sceptic. Far from it, I’m a techno-optimist. I genuinely believe AI could play a profound role in improving the operational efficiency of health systems or facilitating access at the last mile.
But I also spent several years at one of the world’s largest philanthropies, running a team with a nine-figure budget dedicated to funding research on digital technology for science and health. Thus, I know a very inconvenient truth… the amount of philanthropic, catalytic capital is limited. We only get to invest in so many ‘game changing ideas’, and honestly, we’ve gotten too many big bets wrong.
So next time someone boldly declares that they got a new AI-based tool “with superhuman diagnostic ability”, take a moment to think: is there any actual evidence that we’re improving people’s lives as a result, and is it the most effective use of the limited resources we have? In a depressing number of cases, the answers to those two questions are enough to kill the conversation.
And for those of you shouting at your screens as you read this: “But Bilal, we all know the cost of the technology is going to be a fraction of what it is today in a year!” My response to you is simple: keep innovating, keep pushing the envelope, and let’s get the cost down to a point where all of this technology is affordable to implement at the last mile.
But for now, when we’re rolling out a national TB or breast cancer screening programme, we aren’t buying at tomorrow’s price, it’s the cost today that matters. And in a lot of cases, that price (for what it yields) is still way too high to be justified.
By Dr. Bilal Mateen, Chief AI Officer, PATH
Absolutely fascinating. This was an excellent and insightful read!
Would you agree that considering the incredible potential of AI we need better up front investment in the science and implementation science of the use case? Do we need global donors to think about LMIC uses cases and deployments more deeply in terms of learning and evidence generation agendas upfront, which more accurately guide scaled implementation backed by science using quality affordable AI in the right place at the right time with a proven impact?
The two use cases described for example are both highly capital intensive not necessarily due to the ongoing cost of the AI but the equipment needs to deliver the AI. The AI in your mammography use case demonstrated 44% efficiency gains. The questions is how does this translate in terms of value cost ratio when costs are high. Again we more more investment into these types of cost benefit analysis for AI earlier on.
Briefly on investing in improved diagnostics. This should be done with treatment access in tandem. Agreed. However diagnostic help scope the size and degree of system challange and often apply pressure downstream to treatment investments and consideration. Still useful perhaps.
So refreshing to see some evidence-based skepticism (or rather, skepticism based on lack of evidence) on the effectiveness, and cost-effectiveness of AI in health.
My only contention is the author’s point that “Had it been anything other than AI, I’m almost certain we’d have asked the question earlier.” In my experience we’re in exactly the same position when it comes to our continued investments in many digital areas that were originally super-hyped – including mobile web, social media, etc.
This is as much a problem of consistent, reliable data generation as data transparency – I’ll say it until I’m blue in the face: we need to get better at sharing our digital intervention KPIs openly and honestly, even when they seem terrible, or inconclusive.
Let’s keep sharing more writing from techno-pragmatists like this – those of us who see the potential but continue to ask hard questions of AI and digital as a whole.