We know that students receiving one-on-one tutoring outperform peers who only have traditional classroom instruction. However, individualized learning programs are cost-prohibitive in most low- and middle-income countries. New educational technologies can potentially provide cost-effective solutions.
Can we use artificial intelligence to provide customized mentoring experiences for learners?
That was the question posed by World Bank researchers in Edo State, Nigeria, when they designed an educational technology intervention that sought to to enhance educational outcomes using free generative AI tools.
In mid-2024, 800 first-year senior secondary students attended after-school English classes where teachers led students through interactions with Microsoft Copilot, a generative AI tool powered by ChatGPT, to master the selected topics comprising both grammar and writing tasks.
The teachers acted as “orchestra conductors” to guide students in using GenAI, starting each session with a suggested prompt. As the students interacted with Microsoft Copilot, the teachers mentored them, offering guidance and additional prompts. They also led brief reflection exercises at the end of each session.
4 Impressive Results from EduTech RCT
This is the first study to assess the impact of generative AI as a virtual tutor. Now the World Bank researches are sharing the results of their randomized evaluation. The results are overwhelmingly positive and demand that we look at GenAI for improving learning outcomes in many other situations.
1. Generative AI Students Significantly Outperformed Peers
The empirical findings demonstrated positive impacts on educational attainment. Following the six-week intervention period, participants underwent a comprehensive written assessment evaluating three critical domains: English language proficiency – the intervention’s primary focus – artificial intelligence comprehension, and digital competency.
Statistical analysis revealed that the GenAI treatment group participants significantly outperformed their non-AI control group counterparts across all measured domains, with particularly robust effects in English language acquisition, the program’s core objective. These results show generative AI as a powerful supplementary educational tool when implemented within a structured pedagogical framework and teacher oversight.
2. Higher Performance Across Subjects and Genders
Perhaps most striking were the intervention’s spillover effects beyond the program’s immediate scope.
Treatment group participants demonstrated superior performance on their standardized curricular assessments, which evaluated content substantially broader than the intervention’s focused curriculum.
This suggests that English language proficiency or AI interaction competencies may improve learning across diverse subject areas.
In addition, the treatment group showed a reduction in gender-based achievement disparities. Female students, who initially demonstrated lower baseline performance relative to their male counterparts, exhibited accelerated gains, indicating the program’s potential as a mechanism for reducing educational gender inequities.
3. Deeper Engagement Brought Greater Gains
Analysis of attendance data revealed a strong positive correlation between program participation frequency and academic achievement gains. The researchers quantified the relationship between participation and performance metrics, which included multiple exogenous variables that impacted attendance including seasonal flooding, industrial action by educators, and students’ economic obligations.
The data demonstrated a consistent positive relationship between attendance frequency and learning outcomes, with each additional day of participation yielding statistically significant improvements in measured competencies.
Notably, the marginal benefits of attendance showed no evidence of diminishing returns as the program progressed, suggesting that the intervention had not reached its point of optimal duration. In fact, extending the intervention period beyond its six-week duration could potentially yield proportionally greater educational benefits.
4. Generative AI Learning Gains Were Remarkable
The magnitude of observed learning gains was remarkable, with effect sizes approximating 0.3 standard deviations. To contextualize this impact:
- The intervention accomplished in six weeks what typically requires two academic years of traditional instruction.
- This program’s efficacy surpassed 80% of documented initiatives, including highly regarded pedagogical approaches.
The significance of these results is further amplified when considering both the abbreviated duration of the intervention and the conservative nature of the evaluation methodology, which likely produced downward-biased estimates of the program’s true impact.
PIONEER Generative AI Program Framework
The PIONEER framework deployed in the Edo State pilot initiative is a great guideline for educational technology practitioners seeking to replicate it’s success.
You’ll note that the framework builds a comprehensive ecosystem comprising engaged educators, robust infrastructure, rigorous monitoring protocols, and systematic evaluation mechanisms. At the same time, the minimal marginal costs and flexible design enable broader deployment across diverse educational settings.
The PIONEER framework includes:
1. Prioritizing students.
Student engagement is key, with participants exhibiting marked enthusiasm for AI-enabled learning environments. The response transcended mere novelty, as students organically discovered innovative approaches to leverage the large language models.
2. Inspiring teachers.
Teacher are key, and teacher sentiment moved from an initial skepticism regarding AI integration into recognition of its pedagogical potential and its role in enhancing student learning outcomes. A particularly promising development was the spontaneous formation of professional learning communities focused on sharing effective practices. The technology is increasingly viewed as an asset that augments rather than threatens their professional roles.
3. Optimizing immersion.
While the six-week implementation yielded valuable results, a longer duration would have enhanced program effectiveness. Initial weeks were necessarily devoted to fundamental digital literacy skills, including email setup, Microsoft Copilot account creation, and basic computer familiarization, given many students’ limited prior exposure to technology. An extended timeframe would allow for deeper engagement with core learning objectives.
4. Nurturing necessary infrastructure.
The program highlighted the critical importance of robust technological infrastructure. Intermittent power supply and connectivity issues, particularly prevalent during the rainy season, created significant impediments to student-AI interaction. Redundant power systems and reliable internet connectivity emerged as essential prerequisites for maintaining pedagogical continuity.
5. Empowering participants with relevant materials.
The success of innovative educational initiatives hinges on comprehensive support systems. The program developed specialized toolkits for both educators and students, incorporating structured guidance for productive AI interaction. The carefully crafted prompt engineering protocols enhanced the relevance of AI responses, incorporating contextually appropriate examples while maintaining pedagogical coherence.
6. Enhancing implementation.
The program acknowledged the inherent challenges in translating theoretical design into practical application. A dedicated monitoring team provided continuous oversight and real-time feedback, ensuring fidelity to program objectives while allowing for necessary adaptations.
7. Reducing AI risks.
While embracing AI’s potential for educational innovation, educators identified several critical concerns and educational risk areas, including overdependence, algorithmic hallucinations, and potential misuse. The development of targeted mitigation strategies proved essential in helping students navigate this novel educational technology.