Please RSVP Now to learn more about artificial intelligence solutions at the Global Digital Development Forum on May 5. Our amazing agenda features an entire conference track dedicated to different AI approaches, opportunities, and concerns.
As machine learning and data science applications grow ever more prevalent, there is an increased focus on data sharing and open data initiatives, particularly in the context of the African continent.
Many argue that data sharing can support research and policy design to alleviate poverty, inequality, and derivative effects in Africa. Despite the fact that the datasets in question are often extracted from African communities, conversations around the challenges of accessing and sharing African data are too often driven by non-African stakeholders.
These perspectives frequently employ deficit narratives, often focusing on lack of education, training, and technological resources in the continent as the leading causes of friction in the data ecosystem.
We argue in Narratives and Counternarratives on Data Sharing in Africa that these narratives obfuscate and distort the full complexity of the African data sharing landscape. Obstacles, issues, and challenges of data sharing concerning the African continent are multifaceted. Questions such as, ‘Is data sharing beneficial? Valuable?’ need to be contextualized by sub-community: We need to also ask ‘for whom?’
Throughout this work, we have examined such questions in a manner that is informed by the perspective of the stakeholders at the bottom of the data ecology chain – the hidden part of the iceberg.
There are numerous causes for challenges in this ecosystem. We focus on three broad and interrelated challenges that surfaced from our interviews and research. In particular, we explore: power asymmetries, issues of trust, and the need for contextual knowledge.
1. Data Sharing Power Asymmetries
Historically, traditional colonial powers sought unilateral domination over colonized people through control of socio-economic affairs and the reinvention of social orders for their own benefit.
In the current day-and-age, when data and digital technologies are powerful instruments, colonial-era oppression has been reincarnated in various data practices, including data collection, sharing, and analysis. The search for data accumulation, especially with regard to Western technology monopolies – both the scale as well as the manner in which it is being collected – have raised questions regarding unprecedented wealth accumulation and power struggles.
Currently, a significant proportion of Africa’s digital infrastructure is controlled by Western technology powers such as Amazon, Google, Facebook, and Uber. Traditional colonial powers pursued colonial invasion through justifications, such as “educating the uneducated.” Data accumulation processes are accompanied by similar colonial rhetoric, such as “liberating the bottom billion,” “helping the unbanked,” “connecting the unconnected,” and using data to “leapfrog poverty.”
However, this rhetoric may not only preserves historical coloniality dressed in data, but also perpetuates deficit narratives. These objections have been articulated by scholars and technologists within the continent.
For instance, Michael Kimani writes: “I find it hard to reconcile a group of American corporations, far removed from the realities of Africans, machinating a grand plan on how to save the unbanked women of Africa. Especially when you consider their recent history of data privacy breaches (Facebook) and worker exploitation (Uber).”
Colonial legacies and power imbalances embedded in data practices may appear less obvious and more nuanced within scientific research settings. Within the context of global health research, there has been growing concern about power imbalances in authorship, which has negatively affected how research undertaken by local researchers in low-income and middle-income countries (especially in Africa) is perceived.
Academics in the Global North are bestowed with the power to define what constitutes “legitimate knowledge,” “good research,” “standard method,” or a relevant and worthwhile problem. As a result, African researchers are left little room, if any at all, to compete at the global stage, even in matters concerning the African continent.
Following extensive analysis of interviews with senior university research managers in Zimbabwe, and on a public roundtable on Structural Inequalities in Global Academic Publishing, Jeater finds that “When we ask who gets to represent the ‘African perspective,’ we find it is decreasingly unAfrican.”
Resource inequalities, hegemonic academic standards that undervalue Southern research traditions, and the unilateral power Northerners hold to validate research, all contribute to structural obstacles that amount to systemic exclusion of African scholarship from global health research. In a data ecosystem built on such firm yet invisible power asymmetries, stakeholders already in a position of power not only benefit the most, but also make data accessibility inequitable.
Imbalances in authorship and power asymmetries constitute a continuation of the colonial project in global health research, creating fundamental trust issues concerning data sharing. Underneath power asymmetries in global health research partnerships between researchers from the North and South, lies what Seye Abimbola calls “the foreign gaze.”
In his analysis of this concept, Abimbola asks questions such as “who we are as authors, who we imagine we write for (i.e., gaze), and the position or standpoint from which we write (i.e., pose).” Furthermore, power asymmetries which occupy an important space around trust and data sharing often operate in invisible ways and take many forms.
In a similar study that explores power imbalances between the global North and South, using Zambia as a case study, Walsh et al. report that power imbalances and inequalities manifest at all stages of research. This includes everything from funding to agenda setting, data collection, analysis, interpretation, and reporting of results.
Looking at the underfunded Zambian health research, where up to 90% of the funding for health research comes from external funders, the bargaining power rests with the funders with little room for negotiations for Zambian scholars. Power asymmetries are also observed in the personas above.
In “On Good and Harm”, we find the European NGO at the top of the power hierarchy making key decisions regarding the data concerning yetet’ebek’e communities. Similarly, in the “Livestocks and Livelihoods” persona, the foundation from the Global North which provided grants dedicated to research in Wolonda holds much greater authority and power compared to the villagers who are the source of the data.
Power asymmetries, historically inherited from the colonial era, often get carried over into data practices and manifest themselves in various forms, from imbalanced authorship to uneven bargaining powers that come with funding. Having said that, power asymmetries are not limited to historical contexts only.
Within a given research project, for example, one can observe that power asymmetries exist between project managers and data analysts; data analysts and data collectors; data collectors and research participants. All these factors are contingent on various caveats and implications for trust impacting data processes from data quality to data sharing.
Examining data production and consumption process in the context of Malawian demographic survey, Crystal Biruk makes these power asymmetries, hierarchies and structural inequalities visible. Although obscured by partnership rhetoric, Malawian demographics mapping in fact embodies structures with unequal division of labor.
Biruk explains: “[B]eing on the ground in the field has the largest effect on data but—from the perspective of researchers—the activities of fieldworkers are framed as menial labor performed by easily replaceable and interchangeable individuals. The local expertise they offer, then, is not in designing research or writing proposals but comes as an additive to a project conceived in a distant office. These hierarchies are embedded in political-economic structures that privilege the knowledge work that is the purview of Western academic researchers over the so-called unskilled labor performed by field workers. Meanwhile, Malawian research collaborators occupy a middle space that is both constructed by and fraught with power and economic inequalities.”
2. Issues of Trust in Data Sharing
“Data often move at the speed of trust.” – Hamilton and Hopkins. Sharing data between different stakeholders hinges on trust. Trust is the fundamental component of all relationships in a data sharing ecosystem.
While trust, or lack thereof, has been identified as a key challenge that hampers data sharing, there remains much to be examined about the role of trust and how it manifests in relationships between various stakeholders in the African data sharing ecosystem.
Data sharing practices which operate in the absence of knowledge of local norms and contexts contribute – albeit indirectly – to the erosion of trust among stakeholders in the data sharing ecosystem. Initiatives coming from outside, with their own assumptions, interests, and objectives, tend to be met with suspicion by local communities.
The persona on “Soil and Apartheid” captures this in a stark manner; due to the apartheid regime, the doctoral researcher finds that Black farmers in Nova Africa suffered unjust land grabs. Such historical injustice plays out in the farmers’ reluctance to share soil data due to lack of trust. Resource inequalities and colonial oppressive histories instill deep mistrust towards open data and data sharing initiatives.
Although African researchers are generally supportive of data sharing, they are considerably less enthusiastic about open data expressing concerns that open data compromises national ownership and reopens the gates for “parachute-research” (i.e., Global Northern researchers absconding with data to their home countries). Such concerns are not unwarranted.
In fact, the findings from a recent study from Mbaye et al. affirm this fear. Mbaye et al. performed a systematic review examining African author proportions in the biomedical literature published between 1980-2016, in which research was originally done in Africa. The authors found that African researchers are significantly under-represented in the global health community, even when the data originates from Africa.
A common threat is parachute-research, in which non-African researchers benefit from data sharing and open data, are afforded the opportunity to narrate African stories (in some cases also contributing to deficit narratives), and publish scientific work using African generated data available through open access initiatives – all while ignoring the contributions of African communities and scholars. Recent work within the medical sciences published on the Ebola outbreak in the Democratic Republic of the Congo can be cited as a prime example in this regard.
Ideal data sharing initiatives, policies, and principles weigh the benefits and potential risks, and strive to find a reasonable balance. Benefits and risks also vary depending on the types of data being shared. For example, the use of genomics data poses the potential for far more detrimental risk to the individual providing the data, to the researcher, to the institution, and to the community compared to soil data.
The issues of trust as a challenge to data sharing, likewise, vary in degree depending on the data in question. Trust is a relatively significant challenge to sharing sensitive biomedical or health data and less significant when it comes to, for example, environmental sciences data.
Noting this concern, Walport and Brest emphasize, “people often agree to provide sensitive data because they trust the researcher and believe the researcher would not use the data in a way that would be harmful towards them. However, there is a concern that the trust may not carry over when the data are shared”.
3. Data Contexts and Local Knowledge
Contexts are a crucial element to making sense of data and data sharing. Yet data – within data science and machine learning, particularly – are often stripped away of contexts. In the process of data cleaning, for example, information that provides contexts about the specific background from which data are collected and how datasets are structured can be lost.
The importance of context for datasets has been explored by scholars, such as Loukissas, according to whom, we should shift into thinking in terms of data settings instead of datasets. Contexts are crucial to understanding data fully; data sharing practices that discard contexts risk becoming irrelevant and potentially harmful to local communities, as we see in each of the above personas.
Consequently, the context of the data itself – which provides a complete image – and awareness of local norms, cultures, and histories constitute crucial elements in a responsible data sharing practice. Thus, for data sharing practices to benefit the underserved, such groups’ welfare and interests need to be placed at the center stage.
However, our interviewees nearly unanimously agree that there remains a lot to be done to acknowledge and incorporate the interest, norms, and context of these below the iceberg. In fact, oftentimes, certain groups such as data subjects are hardly recognized as stakeholders at all.
We see various levels of disregard for context and local norms displayed in all the personas portrayed in the previous section. In “On Good and Harm”, the failure to ground data collection in communal understanding of privacy, for example, resulted in the NGO exposing yetet’ebek’e community to various risks in the process of data sharing.
Lack of common language and understanding of local norms creates a challenge for intra-continental data practice, as shown in “The Journey of African Scholars.” In both cases, the doctoral candidate faced difficulties understanding the local norms, which played a role in the community’s lack of trust towards her.
Similarly, in the “Livestocks and Livelihoods” persona, we observe that the funding foundation (from the Global North) failed to consider the context in which the data was collected. In summary, data sharing calls that are not aware of local norms, contexts, and culture, when imposed from the outside, constitutes a form of Western-centrism and colonialism.
Is African Data Sharing Beneficial?
Returning to the question “is data sharing good/beneficial?” we argue that responsible data sharing practices must, first and foremost, benefit local communities and experts, with a focus on those at the bottom of the iceberg. In recent years, the African continent as a whole has been considered a frontier opportunity for building data collection infrastructures.
The enthusiasm around data sharing, and especially in machine learning or data science for development/social good settings, has ranged from tempered discussions around new research avenues to proclamations that “the AI invasion is coming to Africa (and it’s a good thing)”. In this work, we echo previous discussions that this can lead to data colonialism and significant, irreparable harm to communities.
As we learned from the rich body of previous works, our experiences, interviews and the personas, data sharing practices are divergent, ad hoc, at times contradictory, and/or violate community values for data use.
As machine learning and data science move to focus on the Global South and especially the African continent, the need to understand what challenges exist in data sharing, and how we can improve data practices become more pressing.
By Rediet Abebe, University of California, Berkeley; Kehinde Aruleba, University of the Witwatersrand; Abeba Birhane, University College Dublin & Lero; Sara Kingsley, Carnegie Mellon University; George Obaido, University of the Witwatersrand; Sekou L. Remy, IBM Research – Africa; Swathi Sadagopan, Deloitte
To bridge this technological gap, Africa needs to think outside the box, the future is in a distributed storage solution that is cost-effective, scalable