⇓ More from ICTworks

Why Do RCT in ICT4D When You Can A/B Test for Faster Results?

By Wayan Vota on April 11, 2014

a-b testing in international development

Recently, D. Jerome Martin tweeted that he was happy that 50% of USAID’s Development Innovation Ventures grantees were conducting randomized control trials in their interventions. He felt it was a move in the right direction, that big data drives big impact. I disagree.

I think its great we are measuring more of our activities in development – we need to end Oscar Night Syndrome – but I don’ think RCTs are necessarily the best way to go. There many critiques of RCTs (here, here, and here are a few), but my main issue is timing. RCTs take years. In ICT4D, technology changes each quarter, if not faster. An RCT started with BlackBerry phones but not tablets in 2010 and published today? Useless.

Learning From A/B Testing in Website Design

In web design, we have A/B testing, where you develop two (or more) version of a page and test to see which one has the better response rate. Google is a big fan.

What if interventions were A/B tested? Say the top two ideas were awarded pilot funding and the service that had the best intervention result received full funding to scale? Or if USAID required A/B testing of interventions throughout the program life, and we were all honest about funding the effective one, killing HiPPO’s in the process?

Wouldn’t that make A/B testing a faster, better way to do aid, especially technology infused aid, than randomized control trials?

My point here is not to rule out RCTs in development, only that often we do not have the luxury of years for an RCT experiment to run its course. Not in the timescales of development nor in technology. So let’s not slow down innovation by stifling ICT4D with RCTs.

A/B Testing Results in Educational Technology

Youth Impact, an organization committed to improving education outcomes, has implemented A/B testing over the past seven years in Botswana, focusing on interventions that improve student attendance and engagement.

One of their key findings has been the ability to implement low-cost, high-impact interventions at scale. For example, they found that sending text message reminders to parents about their children’s school attendance significantly improved student attendance.

This intervention, which was tested against a control group that did not receive the messages, provided immediate and actionable insights into what works, helping program designers refine their approach quickly and cost-effectively.

Benefits of A/B Testing in ICT4Edu Programs

The flexibility and scalability of A/B testing are major advantages in educational settings. Programs can be adjusted in real-time, meaning that ineffective methods can be quickly abandoned or improved, while successful interventions can be scaled up.

Additionally, A/B testing allows for the exploration of multiple variables simultaneously, offering a granular understanding of which specific elements of an intervention are driving its success or failure. This level of detail helps policymakers and educators make data-driven decisions that optimize both student outcomes and program budgets.

Another important benefit of A/B testing in education is its potential for resource allocation. Educational systems in low- and middle-income countries often face significant budget constraints, so understanding which interventions deliver the best return on investment is crucial.

By using A/B testing, policymakers can focus their limited resources on the interventions that have the most impact, thus improving overall program cost-effectiveness.

Use A/B Testing in Humanitarian Programs

The use of A/B testing in international development holds promise for expanding access to quality services in low-resource settings. It offers a pragmatic, data-driven approach to improving development outcomes, enabling programs to be both impactful and efficient.

By fostering a culture of experimentation and rapid learning, humanitarian digital systems can become more adaptive and responsive to the needs of constituents. A/B testing is not a silver bullet, it is an important tool that, when used effectively, can complement other research methods and lead to meaningful, scalable improvements in education.

Filed Under: Management
More About: , , , , , ,

Written by
Wayan Vota co-founded ICTworks. He also co-founded Technology Salon, MERL Tech, ICTforAg, ICT4Djobs, ICT4Drinks, JadedAid, Kurante, OLPC News and a few other things. Opinions expressed here are his own and do not reflect the position of his employer, any of its entities, or any ICTWorks sponsor.
Stay Current with ICTworksGet Regular Updates via Email

8 Comments to “Why Do RCT in ICT4D When You Can A/B Test for Faster Results?”

  1. Thanks for the post Wayan.

    Your point of view since I tweeted from #DIL2014 has helped to grow my understanding of how and when data is productive in development. I’m taking a course, Evaluating Social Programs, through JPAL edX to better understand the role of data in development. I will say in general that for tech solutions, evaluation should be quick and low friction to reach validation as quickly as possible.

  2. Linda Raftree says:

    I don’t think they are exactly the same, but I also wonder why there isn’t more A/B in development. Marketers do A/B testing all the time with direct mail – have been for years and years.

    Now I see that some folks are testing different SMS nudges (for behavior change comms) to see which ones work better — and I see people lauding it as if it’s a new and innovative concept.

  3. Richard Hoffbeck says:

    I think you’re misunderstanding the specifics of RCTs. The A/B testing you mentioned at Google is just an RCT with two intervention groups. Google can do that fairly quickly because they get a lot of page views and they can immediately measure the conversion rate. RCTs are expensive when the outcome is realized over long time periods or when the measured effect is weak. Most of my experience with RCTs are cancer screen studies that can take decades to come to a conclusion.

    Second, just because the technology changes doesn’t mean that the trial is meaningless. In cancer screen studies we often know that the technology will be obsolete by the time the study is completed. Usually that is OK. We aren’t studying the technology, we’re studying the screening. In your example, the fact that the study started with Blackberries and now we have iPhone & Android phones isn’t likely to be a problem. Most likely you were using the BB to deliver some information or service and the results would still be valid.

    Third, we know that the results from non-experimental studies are often misleading. An RCT helps to mask out the unobserved differences between participants. If you try to do a simple comparison between two programs it may be that Program A is more effective then Program B, or it may be that the differences were differences in Population A vs Population B, or more likely a combination of both. If you want to do A/B at the program level you effectively have two observations and almost no statistical power.

    As an example, a friend of mine was tasked with evaluating the effectiveness of using assigned contact people to increase follow-up compliance rates for a cancer screening program in a medium-sized hospital in SE Africa. When test results indicated the need for a follow-up visit patients would be randomly assigned to either getting the standard follow-up contact or be assigned to an individual who’d provide a more personal and persistent approach to getting them back in the clinic. After a relatively short period of time, think months, not decades, you could make an informed decision whether it was worth expenses of the second approach.

    RCTs aren’t a silver bullet but they are one of the better ways we have of measuring effects in a complex environment with large amounts of important, but unmeasured, variables.

  4. I did engineering and economics at Cambridge. Later I became a Chartered Accountant and had a US based career in international corporate management before moving on to consulting in socio-economic development for organizations like the World Bank, the United Nations and others.

    I have always been interested in getting the most (good) results from the least use of resources, and this includes the collection and use of data to improve performance and decision making. Using these criteria Randomized Control Trials (RCTs) in the development assistance arena are of quite low utility and very expensive.

    It is true that international development is complex, and the idea of the RCT is to measure relative performance in a complex environment, but RCT’s are not the only way to assess performance. I have been very effective at getting improved performance in many different situations and have made advanced use of ‘common sense’ and ‘management by walking around’. Huge improvements in performance can be achieved by very simple oversight and good basic accounting, and in much of the international development arena, all of this is missing.

    Peter Burgess – TrueValueMetrics
    Multi Dimension Impact Accounting

  5. Thanks for providing this forum and for the provocative post. As the manager of the project Linda mentioned, I’d like to weigh in. RCTs and A/B testing should both be pursued by the development community, but for different aims.

    As Richard correctly outlined, RCTs are usually intended to measure long-term outcomes that are difficult to measure using only administrative data: composition of household finances, responses to shocks, propensity for risky behavior, etc. A/B testing is limited to narrower outcomes that are administrative and immediate: conversions, clickthroughs, logins, sales. Welfare impacts take longer to measure, a point to which I’ll return later.

    Another shortcoming of A/B testing is that it often lacks theoretical rigor, though not always. Companies will vary minute details in seemingly arbitrary ways and learn results that rarely contribute to a deeper understanding of human behavior. Sites like whichtestwon.com are fun and interesting, but are neither derived from nor contribute to a complex model of human behavior. In many cases this is impossible because we simply don’t know enough about the types of people who click. The great benefits of A/B testing – large sample sizes, rapid prototyping and instantaneous results – have contributed to its greatest shortcoming, since with thousands of available tests per day, there is little time or need to choose carefully among theoretically plausible options.

    The SMS messaging project that Linda refers to is an example of a hybrid project – one that approaches the rapid A/B testing model while drawing from deeper insight in behavioral economics. We are testing a series of nudges that are supported by previous evidence and theory, using a more rapid timeframe than most full-scale RCTs allow and with more of a focus on administrative outcomes. The methodology itself is not what’s innovative about the project. The result—a deeper understanding of how messaging impacts financial behavior, and shared publicly—is the real innovation here.

    A/B testing is completely appropriate—and underutilized—when it comes to prototyping tech-based interventions, and can certainly be used for great effect when paired with qualitative and quantitative field techniques for designing a fantastic product. But for evaluating the impact of the product itself, an RCT is still the preferred method.
    Here’s one clear illustration of why.

    An SMS-based mHealth campaign in Uganda was evaluated in 2009-2010. Its aim was to increase knowledge about sexual health and decrease risky sexual behavior in youth aged 18-35. The mixed-methods evaluation found that, surprisingly, the service led to an increase in promiscuity and infidelity, and no change in the perception of norms. Had the service relied only on A/B testing for prototype design without the benefit of measuring impact, we would be left with an easy-to-use service that ran counter to its mission.

    The working paper for that study can be found here: https://www.poverty-action.org/sites/default/files/mixed-method-evaluation-passive-mhealth-sexual-information-texting-service-uganda.pdf

    • Linda says:

      Hi Aaron – just to clarify – I wasn’t referring to your project but I’m glad to know about it! I was referring to some work being done by government to test messages on encouraging citizens to pay taxes. There are some nice write ups on what was and wasn’t effective. But my comment was more about how I had overheard others talking about this as a “new idea” when I didn’t think it was a “new idea” at all, really!

    • Richard Hoffbeck says:

      Hi Aaron,

      I’m afraid that I only had time to skim the paper but my impression is that you randomized at the village level and then analyzed the data at the individual level. Is that correct? I wonder if you used n=60 rather than n=2,424 whether you’d still have contradictory results?

      I’m getting the impression that a lot of people think an RCT has to be some large scale grand affair with a lot of management and overhead. Anytime you randomly assign people to different ‘treatments’ you are doing a randomized trial. When Google randomly decides to show you page layout 1 instead of layout 2 that is an RCT. When Target gives you a bunch of coupons at the checkout I’m pretty sure they’re running an RCT.

      Businesses are doing short-term RCTs all the time. I think it was Steve Wynn who said there were only two things that would get you fired from one of his casinos – stealing, or chosing a control group that wasn’t sufficiently random.

      You’re right that you can’t do an RCT on most retrospective data like admin data. And ethics makes RCTs a no-go for a lot of things we want to study so we have to let people self-select into different risk groups and then do observational studies. But it seems like there are plenty of opportunities to use experimental methods in evaluating interventions, even if they’re only a small part of the overall project.

  6. Alex Rutto says:

    In my opinion, each approach has its own strengths and weaknesses depending on the evaluation purpose/objective. There are instances when you only require to use rapid appraisal methods to measure impact and some situations when your require approaches that can counter-factually give you the rigour needed to measure impact. In my practice, I have always gone for mixed methods approaches depending on what the evaluation objective is. RCT or non-RCTs you can still measure impact given the constraints of time and resources but the question is always how confident are we in the results from such measurement approaches.