I am Ed Gaible of Natoma Group and I am here to report that RCT fetishism is alive and well in development.
Sometime in the early part of this millennium, Randomized Control Trials (RCTs) as support for decision making began their migration from health-care, among other fields, to the field of development.
Early and influential proponents, such as Esther Duflo, Abhijit Banerjee and Dean Karlan, associated with the Abdul Latif Jameel Poverty Action Lab (J-PAL), advocated widespread-but-appropriate use of RCTs to guide decisions about issues that included incentives for teachers, provision of mosquito nets, appropriate cropping methods and so on.
RCTs, in their view and in mine, can help development practitioners span the yawning gap between intention—the outcomes that we intend to achieve—and impact—what our actions in support of those intentions make happen. That gap was, and perhaps still is, yawning.
About when RCTs emerged as an evaluative method in development one also found new emphasis on outcomes-based evaluation, results-based management, impact and a host of other processes/concepts/terms increasing focus on whether an initiative achieved a result rather than whether the money was spent. This shift in emphasis, in my view, was intended to address the fact that ~50 years of development activity had produced very limited evidence that we had even a clue about what works, how it works, and why.
RCTs from this perspective arrived as a fine-grained means of addressing questions that were emerging into view. These were questions that should, perhaps, have been addressed from the get go but, suddenly, as is frequent in a donor-driven field, news of the lack of outcomes was released and pressure for change was brought. Rapidly.
RCTs as a symptom of an emphasis on outcomes
Emphasis on outcomes was a response to the maturation of development as a field and the—OK, we can’t say its “de-politicization” but perhaps we can say— modulation of its political character and its increased emphasis on transnational, national and local enterprise. Early development actors—the Marshall Plan, the UN, USAID—were able to act independent of private-sector interests. But do not overlook the complex meaning of “ to act.”
Outcomes and impact, then, themselves arose as outcomes of a shift in emphasis from donor/government/NGO activity to enterprise. This shift corresponds to a more general increase in emphasis on productivity that emerged in all or almost all arenas of human endeavor. (See JF Leotard, “The Postmodern Condition,” for predictive analysis of this trend.)
Unsurprisingly, emphasis on outcomes was accompanied by new focus on microloans and microfinance, support for innovation, and emphasis on the power of FDI, etc., perhaps starting around 2001 and continuing to about 2010. Don’t miss the fact that the ideas of the brilliant and reasonable William Easterly in this model become foreseeable outcomes. In part as a result of merit, in part as a result of trends, RCTs pull focus as a means of achieving “development.”
By the way, “development” should appear always under erasure or in quotations: My “developed” homeland might elect a rich bigot as its leader this year or, for the first time in more than 50 contests, a woman; the UK might vote to leave the first and greatest institution to ensure piece on the European continent. “Development” can’t be written without its quotation marks.
RCTs are a hammer looking for a nail
One problem, of course, is that RCTs are not always feasible, appropriate, or accurate—especially in development contexts. (Nor are PPPs, nor is innovation, nor is FDI, btw.) One of the first times that someone proposed “We should do a lot of RCTs to figure out what’s really working,” I was in a country that has a lousy government and that has recently had earthquakes, floods, strikes. And it’s post-conflict to boot.
Since you’ve read this far, be sure to check out the comments and add your own
The person’s organization was doing good work, adolescent girls in some schools were playing football/soccer and volleyball, enjoying access to toilets, getting trained in menstrual hygiene and studying after school, as opposed to going home to do housework. These outcomes resulted from a few complementary programs, with each program implemented differently in each school. But, sure, let’s run some RCTs.
(In one school, the girls and boys played football and volleyball together; in the inter-school competition, when only girls played, that school was champion. Their approach was not emulated.)
My look at those initiatives was pretty challenging for the persons involved. I wanted answers for questions such as:
- Which were the components that were funded by the project I was to look at?
- What were the stated objectives?
- What was the cost per school per year?
Answers weren’t always available, or reliable, or applicable in all contexts. Getting answers took staff time, reducing time spent on other activities of value. My look, such as it was, was to combine a mid-point and an end-point evaluation—not an approach that I love, but I understand that confronted with real problems impacting the lives of real people, my evaluation might not be uppermost on the agenda. And the project had no baseline or mid-line information, and time was running out. But sure, let’s run some RCTs…
The decision to design, set up, implement and assess RCTs should perhaps be made when:
- Staff and the organization have the capacity to support them;
- Local organizations or other factors make consistent implementation possible;
- An initiative focuses on a single intervention (as opposed to a suite of complementary interventions)
- Internal validity (what the RCT shows) has clear links to external validity (other contexts where the RCT results can be confidently replicated), and;
- Costs fall within current budgeting.
Internal and external validity is a dichotomy that many critiques of RCTs latch onto. I wrestle, however, with a different dichotomy, between simple interventions and complex systems, and to what extent this dichotomy has bearing on the utility of RCTs. After all, RCTs are designed and promoted because they enable researchers to assess causality of single actions in complex contexts. But don’t complex systems combine factors in ways that defy analysis via any one of the combined factors?
RCTs in education: a successful failure
The Education For All idea is not complex, and it pre-dates the push for results-based management, outcomes/impact. EFA could I suppose be considered one of the prior programs that led to the emergence of an enabling environment for RCTs.
Let’s say we decide to increase participation in schooling by increasing access (build schools, provide transportation, hire more teachers, support free and compulsory basic education, and so on) while letting issues of education quality hang fire. Get ‘em in there, learning is good and they will learn.
(You’ll note, also, that EFA, dating from the 1990 conference in Jomtien, Thailand, originates in an era that still emphasized donor-supported, government action; whatever backlash moment took place in the early part of this century turned to social enterprise, micro-finance, PPPs and innovation—approaches that minimize the direct involvement of governments.)
What are the observed outcomes?
- Very large classes guided by teachers who lack skills, education and certification.
- Kids who complete primary school but can’t read or add.
- Older kids who complete secondary school but need to move back to their villages or out of the country because there are no jobs.
- Increasing numbers of low-cost private schools (mostly attended by boys),
- SDG Goal 4—increasing the emphasis on the quality of education, as opposed to the quantity of children served.
In one of the first influential articles demonstrating the power of RCTs, Esther Duflo and Rema Hanna report on a reduction in teacher absenteeism in non-formal schools resulting from a combination of financial incentives and cameras with “tamper-proof date and time function(s),” with which students documented their teachers’ attendance. Teachers’ attendance improved nearly 50 percent; students’ test scores improved 0.17 standard deviations, with students more likely to be matriculated into regular schools. Teacher capacity and the students’ characteristics didn’t matter: teachers showed up more frequently, and test scores improved.
It’s a brilliant article, comprehensively crafted to raise and dispatch potential confounding factors. But…
Teacher absenteeism remains a huge problem in 2016. And this is despite the fact that, as stated by Duflo and Hanna, teachers “as a group may have strong intrinsic motivation because of the value they place on interacting with children and in seeing the children succeed.” If this description of intrinsic motivation is true (and they aver that it’s true for teachers in developing countries), why is photographic evidence needed? And if it’s needed, why hasn’t photographic evidence been marshaled to address this common problem on a much larger scale? I’m unaware (although my awareness is incomplete), of any efforts to scale the intervention as described by Duflo and Hanna systemwide.
Duflo and Hanna point to a few possible answers for the limited adoption of the approach they validate. First, the intervention in 2005 was very expensive, about US $60 per student per year. Cited costs include the purchase of film cameras, and the attendant costs of film and film development. Next, conducting oversight activities, such as convening teachers to transfer film rolls, reviewing photos, and calculating payments (remember, this intervention combined verification with monetary incentives) are almost impossible to imagine at scale in a non-digital environment. Expanding these activities to cover, say, the 17,000 schools in Afghanistan or the more than 200,000 schools in Indonesia would be infeasible.
These challenges present a huge opportunity for the use of ICT. And in response, smartphones are being deployed to record teacher attendance in Uganda and, at least via pilot projects, in India and perhaps Afghanistan. Several of these projects are testing the open-source teacher-attendance solution developed by Ustad Mobile, which includes:
- the time-stamped photo,
- face-recognition [to ensure that the right person is in the photo],
- eye-blink recognition [to ensure that the photo is of a person and not of a photo]
- and mobile broadband to transmit the photo to a central database.
(Word to the wise: I’ve worked with Mike Dawson of Ustad, he’s a colleague and friend.)
One smartphone per school is enough to account for the attendance of all teachers; mobile-broadband internet ensures that the information gets stored; machine-readable verification data (in the photo, in the database) automates recording of teacher attendance.
But what about teachers’ intrinsic motivation? What about those incentives? What about learning something that’s relevant to my life?
RCTs cannot control for complex systems
Let’s agree for the moment that complex adaptive systems, including school systems, are characterized by:
- The interdependence of variables (leading to unintended consequences),
- Feedback loops (which positively or negatively amplify inputs),
- Learning and agency within the system (we could call snapping a photo of a teacher as proof as an adaptive response by agents),
- Self-similarity (combining hierarchical structures with replication of nodes in that hierarchy) and,
- Metastable (meaning that a stage of organization is robust and durable until a specific input produces change).
What might this swarm of characteristics mean for our efforts to hold teachers accountable? And for our efforts to improve education? And for our interest in RCTs? Well, who could know?
But I expect that if we acknowledge that our school systems are self similar, with high levels of feedback, and therefore resistant to low-grade change but susceptible to transformation, we might guess that a one-prong attack focusing on teacher attendance is going to pierce the Jell-O of the education system but ultimately do very little.
- Teachers will show up more, because they are made to show up more, but a system that fails to deliver textbooks and paychecks and diplomas will continue to fail to bring these things.
- A system that promotes a stale-dated colonial or post-colonial curriculum will continue to promote mastery of irrelevant content.
- Kids will learn more of what they’re going to be tested on, but they still won’t have extensible, usable, transformative skills or knowledge. (“Love to read? Love to learn? No, thank you, I do not.”)
If we counter these systems’ tendencies to keep on keeping on with comprehensive, even holistic interventions—enabling girls to play football, to focus on their homework, to use clean and private toilets, to learn about hygiene and sanitation and citizenship (and reading and writing and addition and subtraction)—we stand a better chance of cost-effective system-wide improvement than we do with a single, measurable intervention.
In this model of change RCTs have a role to play, in helping us assess the value of specific interventions—football or homework or toilets or curricula, and so on—and so does big data, perhaps enabling us to track the impact of “suites” of interventions, perhaps enabling us to assess the external validity of interventions that RCTs show to be effective.
But to the extent that RCTs support our fetishizing of single-focus interventions, to the extent that they lead us to emphasize the transactional nature of education—accountability + incentives = outcomes — they mire us in models of production, and productivity, that are themselves “outcomes” of our failed approach to “development,” they incline us toward actions that fall far short of the visionary, transformative efforts that our education systems deserve and need.
Thanks Ed for this great article. I have been working in this field for a few years now, with no education or research background, so I have been observing with a “candide” eye. From the outside, it seems to me that the reality is often set aside to create a perfect research scenario, which is absolutely not replicable.
I understand various criticisms against how RCTs are used (and mis-used/abused) in international development. What I see as awfully lacking in an article like this, however, is then how might we gain insight into what works and what doesn’t? How can we build evidence base so that we can sort through all the fads and pilotitis in international development and effectively inform policy making and resource allocation? What are the alternatives to RCTs? Or better ways to utilize RCTs? I’d love to hear more of these conversations rather than simply what is wrong with RCTs.
With regards to those “visionary, transformative efforts that our education systems deserve and need”
The Revolution will not be randomized.
It will not be evaluated in a double-blind, placebo control trial.
It will not be required to pass human subjects review.
No one will be programing tablets for data collection.
There will be no review or preregistration of a statistical analysis plan.
No one will be concerning themselves with sample frames, attrition from the study, or non-response.
The Revolution will not be randomized.
It is funny. I have been arguing the opposite – that Big Data and Systems Thinking has distracted everybody from the very important, but difficult task of testing predictions about the observable consequences of a discreet manipulation of a single component in a complex social system.
Hi Isabelle,
Yes IMHO it’s difficult to create the proper research scenario in a development context. But clearly the people at J-PAL and elsewhere (Matthew Kam comes instantly to mind) are able to manage it. My interest is in some sense more closely tied to seeing RCTs as a contextually situated result of a push for outcomes and measurability. Better than the previous near-absolute void, in terms of measurement of impact, but still not a mature approach in a mature field.
@kevin: Perhaps I’ve misspoke, or mis-written. I’m not trying to slam RCTs, I do think it’s important to see them in context as something other than a bandolier of magic bullets. My experience is that success in development comes from “bundles” of coherently related interventions, rather than from single-strand projects more easily assessed. I was about to suggest that a combination of big data, looking at those bundles, and controlled assessments looking at individual interventions might be fruitful. But @SamField has flanked me. While I agree the revolution won’t be randomized, I confess I don’t think we’re going to see a revolution, we’re going to see plodding steps forward and backward, we’re going to hope for more forward than back. i would say we’re absolutely unlikely to see those forward steps by `’testing predictions about the observable consequences of a discreet manipulation of a single component in a complex social system,’ for reasons I’ve outlined. Chiefly, it’s a complex system. @Sam Field @Isabelle Duston @Kevin`Hong One of the more interesting questions iMHO from the potential convergence of RCTs and big data surrounds the overlay of areas where RCTs and regular, massive data collection are both available. Afghanistan comes to mind, as the EMIS situation there is a cluster of Babel, and given the power-sharing situation (as described by a USAID guy to me) RCTs aren’t possible in +/- 85% of the country. Both avenues of assessment are shut off, leaving… what?
Ed, first I want to thank you for putting your thoughts out there and inviting feedback/criticism. Your argument reminds me of the “Streetlight effect”.
A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, “this is where the light is.”
Dear Ed – Thank you soooo much for writing this article. I have been working in international development for 25 years, and have watched, astounded and confused, and occasionally annoyed, as the RCT folks hijacked my work. What people seem to forget is that many of the interventions we implement HAVE been subject to RCTs before they are released – improved crop varieties go through years of testing, ORS, mosquito nets, vitamin A supplementation. All of these are well tested and proven to work before they even arrive on our radar.
Kevin, your question about what is the alternative, how can we know what works otherwise is well founded, but honestly, the best way to know if something is working is to ASK the people who are impacted by it, involved in it, and living it. I have repeatedly argued with my statistician colleagues that the best an RCT can do is tell you that there is a high degree of correlation between variables and outcomes – it can not ascertain causality, or explain any causal mechanism. We have to come up with and test hypotheses for that ourselves. Best way to know if a farmer’s yield increased because of adopting a new technology is to ask her – did your yield increase? Why? Easy peasey lemon squeezy.
Thanks also to Sam for his poem.