⇓ More from ICTworks

3 Advantages of a Modern Data Stack for Digital Development

By Guest Writer on August 17, 2023

data stack for digital development

Managing data in the development sector can be messy. Development organizations want reliable data to assess the effectiveness of their interventions, aiming for continuous improvement and evidence-based decision-making. They strive to expand beyond traditional surveys and Monitoring and Evaluation (M&E) by digitizing operations and taking advantage of innovations like mobile data collection to gain better insights.

This leads to an increase in both the amount of data and diversity of sources (eg. real-time, static, database, API) organizations need to manage and draw insights from. We’ve seen first hand how organizations struggle to manage this complexity, which results in fragmented data systems instead of helpful insights. Data management becomes a burden instead of an enabler.

The data stack transition

Luckily, these data challenges are not unique to development. Best practices and analytics technologies from other sectors have emerged recently, forming the basis of the modern data stack. The improved tooling has enabled data-savvy organizations to adopt a more agile, scalable, cost-effective, and simpler approach to data management.

Software engineers use the term technology stack to describe the internal working of an application or service by referring to the various libraries and programs that comprise the solution. In the data world, a data stack is a way of articulating all the systems needed to provide the analytics solution.

data stack for digital development

In the development sector, a traditional data stack usually consists of some of the following components, sometimes called layers.

  • The first layer is data collection, referring to tools to gather static and real-time data from multiple sources, such as ODK forms, custom applications, spreadsheets, or internal / external databases.
  • The second layer is data ingestion: scripts or programs transferring data from the different data sources to a centralized database (eg. data warehouse) using the Extract Transform and Load (ETL)
  • The third layer is data storage: where data is housed in a centralized repository. Since the data format is written during the ETL process, the data is generally optimized for specific and predetermined analyses.
  • The last layer refers to data analysis and visualization, reporting tools connected to the data warehouse to run reports and/or power dashboards, such as Superset, PowerBI, or even Excel spreadsheets.

Sometimes the functions for multiple layers are performed by a single application, but the logical grouping is the same.

data stack for digital development

The critical change between a traditional and modern data stack is in the approach to data ingestion and storage: from ETL to ELT. The introduction of cloud based data warehouse technologies, like BigQuery, Redshift, and Snowflake, brought down the cost of running a data warehouse by orders of magnitude, while expanding speed and processing power.

As the capabilities of these databases rapidly improved, organizations began dumping raw data in their data warehouses, where it is transformed afterwards. The process of ETL (Extract Transform and Load) has become ELT (Extract Load and Transform), with dedicated tooling making this step easier than ever.

With data in a centralized place, data transformation is easier and more flexible for inquisitive analysts, especially when using dedicated tools.

One open-source software in particular, the Data Build Tool (DBT), has revolutionized the approach to data transformation. With DBT, analysts can write several scripts in the well known SQL language and create templates to relate them together. Indicators or metrics specific to the organization are defined once and can be governed like software: in a repeatable, testable and documented code repository.

The tool was born out of large organizations working on massive cloud-based data warehouses, but can be used effectively on Postgres and other conventional databases and even on premise. Data transformation is now its own layer in the stack.

Modern Data Stack Advantages

Embracing the modern data stack can have tremendous benefits for organizations in the development sector. In our work at Ona, we use these principles to design data management systems for purposes as varied as national immunization campaigns, M&E for tree planting efforts, cash distribution programs, and ed-tech interventions.

Each system is different in the business logic and program objectives, but all require a data stack capable of integrating information from different sources and creating a set of metrics that are well understood by the program stakeholders.

Key learnings we have seen for our clients from the switch in tooling include:

1. Simplify data ingestion.

Development projects rely on many data collection sources using systems that are less standard in the business world, such as DHIS2, Primero, RapidPro, ODK, CommCare, or custom applications. ELT-specific software can help create and maintain connectors to such sources. This is something that information system players like OpenFN and Ona have invested in a library of connectors for these tools that can be used to quickly support integrations.

2. Transformation layer combines data.

Many M&E activities require merging data from different sources, for example when collecting a campaign data using ODK and comparing progress against targets in a planning spreadsheet. Using a tool like DBT, the merging logic is written in code: data from raw tables is combined into transformed tables that are used for reporting, bringing three main advantages.

  • First, the data transformation steps can be tested and versioned following software engineering best-practices, leading to a more rigorous and maintainable process.
  • Second, the data can be processed closer to real time, meaning that dashboards and reports can be updated automatically without manual work once the system is set up.
  • Third, as reporting needs evolve, creating new transformations or indicators does not require any changes to the data connectors: the connectors are generic and reusable while the transformations are project-specific.

3. Design for modularity and flexibility.

Because tools in a modern data stack are easy to integrate but independent from one another, it is also possible to swap similar components over time to meet new needs. If a new data ingestion technology or BI tool becomes more relevant for a project, an organization can use the new tool while maintaining all other elements of the stack.

Given the limits of budgets and tech skills in many development organizations, being smart and flexible with tool selection can allow entities to better manage their IT spend and occasionally invest in new tooling that will be more accessible to staff in the field and/or to donors.

Thinking about data as a stack and embracing new technologies can seem daunting, but  organizations willing to adopt these best practices will benefit greatly. Tools for ETL and data transformations are mature enough for the development sector, enabling an easier relationship with data in their M&E activities and beyond.

Knowing about the trends is the first step to take action, but some expert help can also be useful to understand the needs specific to any organization and how to manage an infrastructure with several technology layers. Building on our experience, Ona has a dedicated service to help organizations manage their data by setting up and hosting a modern data stack. We call it Canopy.

By Alessandro Pietrobon at Ona

Filed Under: Data, Featured
More About: , , , , , , , ,

Written by
This Guest Post is an ICTworks community knowledge-sharing effort. We actively solicit original content and search for and re-publish quality ICT-related posts we find online. Please suggest a post (even your own) to add to our collective insight.
Stay Current with ICTworksGet Regular Updates via Email

9 Comments to “3 Advantages of a Modern Data Stack for Digital Development”

  1. Njoni Philippe says:

    Non à la discrimination et stigmatisation de la communauté LGBT partout dans le monde. Laisser la communauté à leur orientation sexuels ,assurés la sécurité de cette communauté s’il vous plaît.
    Merci

  2. Samuel Johnson-Scott says:

    I’m always interested in new approaches and architectures, and am keeping an open mind about the potential advantages of an ELT vs ETL approach, but I can also see significant challenges to applying this approach in resource-constrainted contexts.

    Most of the systems we work with have very weak system governance mechanisms, and tend to operate in silos (something exacerbated by different donors each funding and supporting different systems). For example, a subsystem might be changed without proper consultation or collaboration with other subsystems. With an ETL approach, this will usually break interoperability or data warehousing solutions, so ingestion doesn’t occur – which is not a bad thing, as it then triggers the discussions that should have happened prior to the change (eg confirming the specific changes in scope and definition of the data).

    The risk I can see with an ELT approach is that data is still ingested, but no longer meets the format or definitions that are assumed by the target DW / storage system. Since responsibility for transformation has moved from the source system to the target system, the staff managing it are unlikely to fully understand the implications of these changes, and as a result, transformations and analyses can end up compromised.

    In an environment of weak system governance and silo management of subsystems, I feel that that solutions which force collaboration between the teams managing source and target systems – eg a health information exchange using traditional ETL approaches – could be a safer option.

    But I’d be very interested to hear others’ thoughts on this (especially if you’ve already worked with ELT solutions in resource-constrained contexts).

  3. Matt Berg says:

    Hi Samuel,

    Thanks for thoughtful comment. I’m Matt a colleague of Alessandro at Ona.

    You raise a good and valid point. A lot of this come down to how the system is governed. With an ELT approach a change in the underlying data ingested would break upstream views in the data warehouse being managed by tools like dbt. So if different parties are involved you still need coordination.

    What we’ve found about ELT is since you have access to the data in a more raw state, it’s much easier to undo mistakes or adapt the analysis for different types of analysis after the fact. If you get your ETL wrong you may be dropping data that’s hard to get back later. Over ETL we found was a lot more fragile. With ELT you are just building views so it’s a lot safer and friendly way to learn.

    The ability to manage the data transformations in code that we can version, unit test and reuse has been a real game changer for us in terms of reproducing work and greatly improving relatability. Having an approached based primarily on sql we feel is really important when it comes to supporting capacity building, transparency/clarity in how a system works which both help ultimately with handover.

    Their is no magic bullet as the needs of health programs around the world have similar needs when it comes to sophistician to be able to do analysis to drive impact and in terms of reliability and fidelity. We feel helping to share the best practices and technologies that have really democratized the access to use of data in enterprises around the world is a good first step. Definitely appreciate and understand the perspective and challenges you are flagging though.

    Matt

    • Samuel Johnson-Scott says:

      Hi Matt,

      You make a good point about being able to easily correct mistakes if you’ve got the raw data – that’s definitely an important advantage. However, your point about being able to undertake additional types of analysis after the fact is actually my concern – this analysis should ideally be designed and directed by the teams managing the source system, who understand the idiosyncrasies of the data, but in my experience (with deadlines and resource constraints), the team working on the central stack can end up pumping out reams of analysis without fully understanding the very diverse data that they’re hosting.

      I do take your point that this all comes down to governance – if your data stack is managed collaboratively, and the source system teams retain ownership over the data and analysis in the central data stack, then that addresses the problem I’m raising. But putting that that governance in place, sustainably, is always a challenge.

      And yes, I totally agree that being able to version-control configurations (which is unfortunately not always the norm) is a huge advantage for all sorts of reasons – quality assurance, sustainability, audit trails, etc. So kudos for making this central to your stack!

  4. Max Richman says:

    Good post! As someone who used to work in global development and now works on US climate change, I agree with the stated benefits of the ELT framework. This tool stack matches what I’ve been using at a few startups here in the US. Once you have transformed data, then begins the fun of driving insights and value.

    I’m curious how the development sector has been navigating the proliferation of BI tools tableau, looker, hex, mode, power BI, etc. Trade offs between self service versus analyst-driven. Seems there’s a new tool popping up every week. Perhaps a topic for a future post 🙂

  5. Matt Berg says:

    Max – on the BI side there is a wealth of options and I don’t think that’s necessarily that bad of a thing. We actually are contributing to the “problem” with Akuko (https://akuko.io) that has a map and data publishing focus we see a bit missing in the existing tools. The key thing we recommend is to decouple your data warehouse from your BI tool. Some tools allow you ti manage data in-house. We found this typically leads to issues in the long term. The DW approach allows you to analyze and view your data with whatever tool best meets your needs.

  6. Donald Lobo says:

    We’ve released an open source platform that combines many ideas of the above called Dalgo, website coming soon.

    The previous internal name was Development Data Platform, all the information is here:

    https://projecttech4dev.org/ddp/

    All the code is in GitHub: https://github.com/devdataplatform/