As promised, here is an overview of the steps needed to create an open data plan in order to be compliant with USAID’s new Open Data Policy It is really easy to get lost in the weeds on this stuff, so I am only outlining the top-level steps; please note that every step may have many questions, decision points, and additional tasks included.
Remember, ALL projects and cooperative agreements will need to submit their data captured to the DDL during the period of performance of their award.
And of course, as this is a brand new policy with still many questions, the following is a set of suggested steps based on current understanding of USAID’s requirements. Here is USAID’s ADS579 Fact Sheet.
During the Proposal Stage
- Budget: Make sure your budget includes (either implicitly or explicitly) the time, effort and expertise to create and implement an open data plan, as well as the IT systems required to generate and store open data, and submit it to the Development Data Library. You may have to choose between an API and manual upload (which I will describe this in an upcoming article).
- Legal Requirements: If the proposal involves collecting or generating data from host governments or other partners, make sure that the Memorandum of Understanding (MOU) that you sign clearly outlines which data will be shared with USAID and what data needs to be protected (recognizing that many governments have their own laws and policies about data related to their citizens).
- Security/Privacy: You may also want to think about any other privacy or security issues related to the data you plan to collect, and think through the impact of intellectual property protection or issues related to working in conflict zones.
- Reuse/Recycle: In some cases, there may be already data out there (such as on http://usaid.gov/data, http://data.gov, http://aiddata.org or http://data.worldbank.org). More and more governments are publishing their data, such as the Ghana Open Data Initiative (http://data.gov.gh ). You may be able to save a ton of time and effort by using someone else’s dataset instead of collecting the data yourself!
During Negotiations/Project Kick Off
- Open Data Lead: Identify someone on the implementing partner team responsible for open data in the project (usually someone involved in data collection such as the M&E person or the technology person).
- Identify All Data: Write down ALL the data that is being collected by the project (regardless of whether it will be submitted or not). Examples include:
- Research data (results of household surveys, interviews with stakeholders)
- M&E data
- Project activity information (such as trainings held, locations, participants, agenda)
- Maps generated using GIS
- Prioritize: Identify in the data what constitutes “intellectual work” and therefore is high priority/important to be shared, vs. what constitutes “Incidental to award implementation”. This determination is important in order to figure out which datasets need to be submitted and which do not.
- Validate: Share that list with the COR/AOR in order to make sure s/he is in agreement. Once you have this agreement, you will probably have to come back with a more detailed plan, but it is a good idea to get at least a basic agreement on what will – and won’t – be shared, and those concerns you have about budget, legal/security/privacy issues, and so on at a very early stage.
- Plan: Make sure that submission of the data is part of your overall work plan so it doesn’t turn into an afterthought. Generating and submitting open data is a heck of a lot easier (and cheaper) if planned from the very start, than it is to retroactively do it during the close out of a project, five years in.
Remember, the Open Data policy is new to USAID as well as to the implementing partner community, so there are still many questions to be answered and some things that will still be trial and error in these early days.
During Data Design Phase (early!)
Once you have your list of datasets you are planning on sharing with USAID’s DDL, per dataset, you will need to identify or create the following items. Most of this information is required as part of the DDL submission as of this time, but this is also good practice for any dataset you are going to be capturing, even if you don’t know whether it will go to the DDL.
Summary Information
- Title and description of the data (such as “results of household surveys of attitudes towards climate change – raw and aggregated”) including its purpose (you would be surprised the number of datasets that don’t have this basic information).
- Relevant dates of the dataset (“captured March 2014”, “updated February 2015”)
- How is it going to be captured and where is it going to be housed?
Dataset Details
- Structure/data dictionary used (especially if you are using an international standard like IATI)
- Data quality approach (especially making sure it meets USAID data quality standards – ADS 203)
- Privacy and security issues (i.e. does this use human research subjects, raw data has national ID data, potential security issues due to conflict zone) and plan for protection
- Other proprietary information that may need special permission to share (such as covered under copyright and IP)
- Other resources and links to other documents such as those in the DEC or websites that may be related to this dataset
- Possible uses of the data beyond the project (for international development, local partners, USAID, and/ or for your organization)
Submission Information
- Name and contact info of the person submitting data
- Name of the prime organization
- Mechanism information (award number, operating unit, COR/AOR, contact info)
- Proposed access level for the data (public, restricted public, non-public, or other, and the reason why, any other restrictions, such as embargo dates)
- Publication plan/schedule (available on demand or submitted to the DDL on a periodic basis, and then details – URL, API instructions, etc.
Classification
- Program code (the foreign assistance categories), cross cutting themes, initiatives
- Keywords
- Language of the data
- Country or region it applies to
- The overarching program it belongs to
Notes on Submission to the USAID DDL
Submitting and sharing your data is not an all or nothing situation. The above are a good idea to identify for ALL data to be captured by your project, even if only part of it is eligible to be shared with USAID’s DDL. This is because you may find that you want to use some of this data for internal performance improvement or project analysis.
You can always decide to only submit a subset of data – such as aggregated information, rather than just the raw data, especially in the case where personally identifiable information is part of the raw data. You can also submit multiple datasets – one for public access and one for restricted or non-public access – that way some of the data can be reused by others in the development community and USAID staff get a fuller dataset while privacy is protected.
This is very useful, Siobhan. I recently published internal DDL guidance for our projects and realized I may have omitted a few key points you identify here. I did have two clarifications though:
-You said ALL projects must report, but our understanding is that the DDL requirement has to be included in the original award or in a modification to be enforced. A USAID representative once stated that voluntary compliance was encouraged, but given the cost implications I’m not sure this is reasonable. There are also concerns about applying this requirement retroactively in cases where we didn’t anticipate this use when we obtained informed consent (several in the community have anticipated this issue, but when we submitted our first dataset the USAID Data Steward actually acknowledged the issue and told us they were struggling with the implications on their side as well, and we’re still waiting to see how they are going to handle this).
-In your final paragraph, you mention the option to submit aggregated data. I was under the impression that this was not an option as sharing raw data was the explicit goal of this policy. We certainly wish we could submit aggregate data in sensitive situations, but so far we’ve been told the dataset has to be complete and granular with only personally identifiable information redacted. I’m also unsure about the option to submit one complete dataset and one partial for public access, as USAID, not the IP, make the final determination about level of access. Perhaps this can be negotiated with individual AOR/CORs, but are you aware of a general statement by USAID to this effect?
Thanks again for this very helpful post. I’ll be updating my own internal guidance with several of your points.
Thanks Reid,
To respond to your comments: Note, I do not represent USAID, so this is based on conversations with USAID staff and my own observations, mainly in Washington. Your mileage may vary.
1. What I have been told by USAID is that by now, 90% of all awards (CAs and Contracts) have been modded to include this language, and their intention is 100%, so it is better to assume you need to and be pleasantly surprised than the opposite.
2. I have also heard from USAID that they do understand the retroactive nature of data being submitted – including not having the systems, meta data, or informed consent. My understanding is that CORs and data stewards are very open to working with IPs on these issues, and that the senior level folks know this is a long term process. And that the priority is to get IPs to start putting in processes now – as well as ask these clarifying questions – so that any data collected from now on will meet the requirements. Progress, not perfection, is the goal.
3. Clarification on the aggregated data vs. not. And this is a point of confusion and needing extra clarity as the reality hits policy.
a. You can submit multiple versions of the same data to USAID for different purposes (i.e. internal usage vs publication). In fact, this makes life at USAID easier because all data is reviewed by many different offices looking at different issues (PII, FOIA compliance, Security). If you submit to USAID information how you treat this data with the data itself, it makes the entire process much smoother.
b. The debate between anonymized vs. obfuscated data is alive and well, and will need a lot of discussion. I don’t think there is any one answer.
c. The reality is that most CORs/AORs are not deeply familiar with these issues and the data stewards are doing this work on top of other work they are already assigned. So what one COR says may not be indicative of USAID’s standpoint on it.
4. the implementing partners MUST be part of this conversation to set the standards, as we are key users of this data as well as creators. What standard taxonomies do we use, what expectations are there for privacy and security, how can we collaborate to build better data sets as a community, how do we deal with unanticipated, negative usage.
Helpful, thanks again.
To point #4, I know several groups are working on this issue from the IP side – InterAction collected various concerns and requested a moratorium several months ago, there was a healthy round table discussion at the recent SID-Washington conference, etc. I’ve kept in touch with a few of the participants from the round table group and we talked about developing some type of working group that engages USAID more collaboratively to improve the policy, platform, and procedures. I’ve floated this idea to a couple USAID Open Data people and so far they’ve all responded by pointing me to the GitHub/StackExchange portals. Perhaps we need to do a bit more organizing on our end.
To that end, are you aware of any other groups, formal or informal, that are having dialogues with USAID HQ, Missions, Open Data Team, etc. regarding ADS 579?
I am aware of the Interaction work as well as the work done from the other direction on the aid transparency work with IATI (more on that another time).
Also, at the Open Data Conference in Ottowa in May this year, there was more conversation but still early days. I held a session on Open data for Ag at the ICTfor Ag conference (managed by FHI360) which had some promising conversations. And of course, the digital design principles folks have open/interoperable data for development as part of their agenda.
But I think this is very early days. I am noticing a huge lack of KM people (or even behavior change people!) being part of this – as well as academia -since this data is meant to do something to improve development. This is a resource we all want to build, protect, and expand upon.
I also think the conversation needs to be more granular by sector – what standards and expectations for data interoperability need to occur in Water, for example, or in reproductive health.
Very very early days, I think.
Well at least we’re not attempting anything difficult or complicated;)
Thanks again.
Great post and very insightful
Hi Siobhan Green,
Am Mireille Nsimire, I own the ICT Specialist position at IITA ( International Institute of Tropical Agriculture). I would like to know what is USAID open data about? By reading your publication I’ve understood that USAID open gives the possibility of sharing not only government data ( information about personal information, country and more) but also smallholders.
Can an International agriculture research institute like ours share information on USAID open data platform? right now we are developing an open data platform for all the CGIAR center and would like to know if some of our research result can be shared in your platform.