We all want to be more responsible with the data that we collect and manage. This can be data that we create within our organizations, data we collect from our constituents, or data we manage that’s from other stakeholders we interact with.
However, “responsible data” is a vague term and needs more refinement to be useful.. Let’s make it more concrete with three steps you can take now to start on your responsible data journey.
Start with Acknowledging Responsible Data Tensions
I always define being responsible with data as balancing the tensions between three different objectives:
- Protecting privacy and security needs
- Using data to benefit our work and our constituents
- Supporting transparency and openness.
We need to start by acknowledging that these tensions exist and there are rarely simple answers on how to manage them. These tensions are also context specific – based on the specifics of topic, existing inequalities and marginalizations, presence of conflict, and other specific factors.
The following are three quick examples of tensions that come from our own experiences and issues we’ve read about.
Granularity vs. Privacy
Person or household level data is greatly useful for complex and nuanced analysis now and over time, but even if we remove names, national ID numbers, and/or addresses, the likelihood of re-identification of an individual is very high, especially if the sample sizes are low and/or we cross reference the data with other individual level data (including informal knowledge from the community).
Example: A web app is given to pregnant employees to help them manage their pregnancies. A de-identified, aggregated version (disaggregated by pregnancy stage and employer) is shared with employers, who use this data to improve their support to pregnant employees.
One employer asks the HR department to cross-reference the aggregated data against their HR records and own knowledge. Due to the small sample size, the employer is able to determine which employees are represented, and therefore is privy to private information about their pregnancies and health status. This information is then used when making decisions about promotions and salary increases.
Not collecting PII vs Security
Not collecting real names or identifying information (like IP addresses) increases privacy protection, but not having this information can make it impossible to compare individual results over time, make sure that data subjects are who they say they are, and/or follow up in the case of bad data or misuse.
Example: A project creates a web forum focused on discussing LGBTQ rights and challenges in a country where there is a lot of stigma. Because of strong concerns about privacy protection, it is decided that users can create their own usernames/passwords in ways that anonymizes their identity, and no other identifying information (phone number, email address, IP address) is captured by the system. Users feel free to share highly personal information online because only other users of the forum have access and they are told their posts cannot be linked back to them.
One of the users misuses their access and copies these private posts and shares it publicly. Because there is no way to re-identify individuals, there is no way to find the culprit, alert innocent users outside of the website that the breach occurred, or put in place redress to potentially vulnerable posters.
Transparency vs Privacy
Publishing land ownership, government services, and taxation records of individuals is important to fight against corruption and improve government accountability, but this information can be misused by bad actors and embarrass people in power. What must be published vs redacted can be misunderstood in ways that hurt the already marginalized and help the already powerful.
Example: A government passes an open data/government law. This results in land ownership records published online, creating in important conversations in parliament and the press about land reform.
Powerful landowners are unhappy that their ownership is so overt, so they put pressure on the government to redact the records for “privacy” concerns. Criminals also use this information to target wealthy landowners for robbery and extortion.
The health ministry interprets this regulation as requiring them to publish the names of recipients of government provided pre-natal care, resulting in thousands of young women’s pregnancy status being made publicly available.
In the above examples, it is important to note three main elements:
- In no case is there only harm; there is often a lot of benefit from this data.
- The harmful actors are not always criminals or hackers; some of the worst harm can come from well meaning people.
- The above are very context specific. What may be dangerous in one country (or to one community) may be completely fine in another circumstances
Develop Context-based Responsible Data Approaches
Our approach to being responsible with data is based on using core principles as there are no easy “one size fits all” answers. Using core principles helps your team figure out the nuances and approaches needed by different contexts. Below is a core principle we used in the Considerations for Using Data Responsibly at USAID and examples of what that looks like.
Core Principle
Prioritizing usage of data to benefit beneficiaries, by proactively analyzing benefits and risks of that data by different groups, and promoting transparency and accountability through that data.
These principles are about reframing and making explicit that the purpose of the data is benefiting beneficiaries, and that part of that process means explicitly analyzing and stating how these benefits will occur and what potential risks need to be managed. Finally, proactively planning for sharing and publication of some of that data to promote transparency and accountability is another core principle.
As part of the overall plan for the data, being responsible includes:
- Identifying the purpose and usage of the data;
- Based on the above, determining who benefits and who it risks, and maximizing the benefits and minimizing the risks to the most vulnerable; and
- Identifying the legal issues, including ownership, sovereignty, contractual agreements, etc., especially, but not limited to, publication and sharing of the data. Being explicit is important.
Often data is collected because the donor or the government partners ask us to collect it or we need to for our own M&E plans. Sharing and publication plans are often an afterthought. There is often not a holistic discussion across the entire activity about this data, how it will be used, how it will be collected, stored, and analyzed. Rather, data is seen as the purview of the M&E and/or ICT team.
However, we have found that the entire team needs to understand the data approach by an activity or organization, as data will impact everyone. By proactively discussing the potential benefits and risks of data for our activity before we start work, we can both prioritize and maximize high impact benefits, and discuss tensions ahead of time. By documenting the decision making process, it helps the team manage the data over a longer period of time, and address any emerging issues or questions on why decisions were made.
For example, a project focused on advocating for the rights of women in sexual and gender based violence (SGBV) will likely have strong elements of openness and transparency – to advocate for more attention and resources to fight against SGBV – as well as commitments to using the data to help individual women who are potential and actual victims. This commitment to using this data to help specific women includes making sure that data is well-protected and kept from potential or actual abusers.
Being responsible also means thinking about the various types of misuse of the data – for example, by providing public “hot zones” of sexual assault locations in a community, we may be intending to put pressure on communities to improve security in those areas. But are we normalizing abuse (“Oh there is a woman walking alone in a hot zone – clearly she is a sex worker”) or possibly giving abusers a way to more easily target women and girls? How do we make sure the benefits are occurring and not the misuse? How will we know?
Commit to Ongoing Responsible Data Usage
Principles are great, but insufficient on their own to ensure responsible data, especially when the team is diverse and perhaps located in different locations. Concrete processes that are normally required to make sure that teams are being responsible include:
Develop concrete data management plans
Each type of data to be used as part of an activity or organization should have its own data management plan which outlines the following elements:
- Collection/capture processes and tools
- Validation/cleaning processes and tools
- Storage location and tools including access rights
- Analysis/usage processes and tools (broken out by different users)
- Publication/sharing timelines, processes and tools (especially for Open Data requirements and post period of performance)
- Archival/deletion timelines, processes and tools
- Security level and protection requirements
- M&E indicators on quality, security, and usage.
For example, a project may plan on performing qualitative research with young women in the community on their perceptions of risk of SGBV as part of the above project. The data management plan will include whether the data will be:
- Collected by interviews, focus groups, or digital surveys, and how the non-digital approaches will capture responses.
- How the data will be reviewed, validated and cleaned – by whom and in what tools? Will the data be coded? By whom? Using what methods?
- Where the responses will be stored (hard copy? Emails/excel/soft copies on shared drive? Digital data collection tools?) and who can access the data and why?
- How the data will be analyzed and how it will be used by the activity to create positive impacts? Different types of analysis using statistics, keywords counts, content analysis, comparative analysis based on demographic information.
- What are the publication and sharing obligations and plans? For example, is the project in compliance with ADS 579 or with host government requirements? How will this data comply with open data requirements while also protecting the privacy of the individual?
- Security and protection requirements are based on elements such as subject matter, demographic information on data subjects (including their vulnerability levels), and sample size, as well as value of the data to different audiences for misuse. For example, data about criminal activity is highly attractive to the criminals. Sexual life data is very attractive to those concerned with sexually policing members of their community, especially if they can identify individuals. All data, however, should meet basic standards for confidentiality (data is only seen by those authorized to see it), integrity (meets expectations for accuracy, completeness and timeliness) and availability (can be found for usage, in the right format, by those who are authorized to use it).
Establish Data Governance Systems
Across all data collected, there should be a coherent data governance system that provides oversight and guidance that teams usually need in implementing data rich programming. Data governance includes but goes beyond IT governance – it includes representation from across the activity or organization, and helps provide vision and insight into how data can be used for improved programmatic performance.
Since often this data may include “offline” sources, and includes questions about what data to collect, and who should use it, IT will be a core, but not sole, member of any data governance system.
Having broad representation by different stakeholders, including programs, M&E, IT and operations, will lead to more collaborative decision-making around new data requirements, whether existing data is effective at meeting existing needs, and are there sufficient resources to implement the vision for the data. Each part of an organization or activity may have different perspectives on the need and usage of the data, which is why having their input is important.
Elements to be included in data governance include:
- A clear vision of the purpose of the data and how it serves the mission of the activity and/or organization.
- Specific activity or organizational policies, processes, and principles around how that data will be managed to achieve the vision.
- Identification of and advocacy for sufficient resources (money, time, skilled labor, etc.) needed to manage the data to meet the vision.
- M&E indicators for data management and compliance with legal requirements
- Setting expectations and policies on what happens if an incident is suspected – a clear chain of command and authorization for action so that if there is harm, it can be stopped and responded to quickly.
Join #5DaysofData Twitter Chat on May 16
Join @USAID_Digital, @mSTAR_Project, @DIAL_community, @GlobalDevLab, @ICT_Works and the responsible data community for a #5DaysofData Twitter Chat to learn how you can use data responsibly and ensure that data risk is regularly considered and addressed.
#5DaysofData Twitter Chat
May 16, 2019 – 12pm EST – 16:00GMT – Your Timezone
We’ll spend a hectic hour tweeting questions, answers, ideas, and concerns. Join us or just follow along using the #5DaysofData hashtag on twitter.
Sorry, the comment form is closed at this time.