Accurate and comprehensive measurements of a range of sustainable development outcomes are fundamental inputs into both research and policy. One way to measure progress is to use satellite imagery to understand these outcomes, with a focus on approaches that combine imagery with machine learning.
As of early 2020, there are an estimated 713 active nonmilitary earth observation satellites in orbit, 75% of which were launched within the past five years. These satellites are now capturing imagery of Earth with unprecedented temporal, spatial, and spectral frequency. Advances in deep machine learning can construct a high-resolution poverty map from satellite imagery and other sources of geospatial big data.
These techniques work by “learning” how to predict poverty by being exposed to a large dataset that matches ground-truth labels of poverty (from geo-located household surveys) to imagery and other geospatial data. Intuitively, the algorithms learn the visible features that are predictive of poverty, such as road quality, building density, and land topology.
However, the growing abundance and improving resolution (spatial, temporal, and spectral) of satellite imagery is challenged by the paucity of ground data on key human-related outcomes and noisy training data, highlighting how this noise often leads to incorrect assessment of model performance. The largest constraint to model development is now training data rather than imagery.
In Using satellite imagery to understand and promote sustainable development, we show that while imagery has become abundant, the scarcity and, in many settings, unreliability of ground data make both training and validation of satellite-based models difficult.
Data are scarce where needed most
Household- or field-level surveys remain the main data collection tool for key development-related outcomes. Methodologies for such data collection are well developed and are implemented by national statistical agencies and other organizations in nearly all countries of the world. But their implementation and use also face a number of important challenges.
First, nationally representative surveys are expensive and time-consuming to conduct. Conducting a Demographic and Health Survey (DHS) or Living Standards Measurement Study (LSMS) in one country for one year typically costs $1.5 million to $2 million USD, with the entire survey operation taking multiple years and involving the training and deployment of enumerators to often remote and insecure locations. Population censuses are substantially more expensive, costing tens to hundreds of millions of US dollars in a typical African country.
An implication of this expense is that many countries conduct surveys infrequently, if at all. In half of African nations, at least 6.5 years pass between nationally representative livelihood surveys, as compared with subannual frequency in most wealthy countries. Survey frequency is on average substantially lower in less wealthy countries, meaning that data on livelihood outcomes are often lacking where they are arguably most needed.
Surveys are also much less common in less democratic societies, which could at least partly reflect the desire and ability of some autocrats to limit awareness of poor economic progress. The frequency of agricultural and population censuses also varies widely around the world. For instance, 24% of the world’s countries (49 out of 206) have gone more than 15 years since their last agricultural census, and 6% (13 out of 206) have gone more than 15 years since their last population census.
A second challenge is that survey samples are typically only representative at the national or (sometimes) regional level, meaning that they often cannot be used to generate accurate summary statistics at a state, county, or more local level. This represents a challenge for a range of research or policy applications that require individual- or local-level information, for example, targeting an antipoverty program or studying its impact.
Third, underlying data are not made publicly available in many surveys, including nearly all the surveys that contribute to official poverty statistics (such as those depicted in Fig. 1A), and no geographic information is publicly provided on where data were collected. These factors further deepen the challenge of using such data to conduct local research or policy evaluation or to train models to predict local outcomes using these data. Even when local-level anonymized georeferenced data are made public in some form, data are typically released more than a year after survey completion, hampering real-time knowledge of livelihood conditions on the ground.
Finally, as explored below, ground data can have multiple sources of noise or bias, further limiting their reliability and utility in research and decision-making. This noise has important implications for how satellite-based models trained on these data are validated and interpreted.
Existing ground data can be unreliable
Even where ground data are present, several key sources of error can limit their utility.
First, most outcomes are not measured directly but rather are inferred from responses to surveys. These responses can introduce large amounts of both random and systematic measurement error. For instance, in household consumption expenditure surveys, changes to the recall period or the list of items households are questioned about can lead to household expenditure estimates that are >25% too low relative to gold-standard household diaries.
In agriculture, the World Bank noted that the “practice of ‘eye observations’ or ‘desk-based estimation’ is commonly used by agricultural officers,” leading to often-conflicting estimates of key agricultural outcomes by different government ministries and to variation over time in published statistics that cannot easily be reconciled with events on the ground. Current practices are likely to have a bias toward overestimation, further weakening the quality of food security assessments.
An additional key source of noise comes from sampling variability. Surveys are typically designed to be representative at very large scales (e.g., nationally), and this representativeness is typically obtained by taking small random samples of households or fields across many cluster locations. Because most agricultural and economic outcomes of interest often exhibit substantial variation even at very local levels (e.g., coefficients of variation >1 at the village level), these small samples thus represent an unbiased but potentially very noisy measure of average outcomes in a given locality.
A third common source of error is noise purposefully introduced to protect the privacy of surveyed households. Adding jitter to village coordinates is common practice for most of the publicly released datasets based on household surveys, for instance with up to 2 km of random jitter added in urban areas and 5 km in rural areas.
Model evaluation with noisy data
The performance of satellite-based models, particularly in settings beyond where they were trained, is perhaps the most important concern for researchers and policy-makers interested in potential applications in sustainable development. Noisy training data can degrade model performance in two ways.
- It can diminish the ability of a model to learn predictive features.
- The model might learn relevant features but perform poorly in predicting test data, precisely because the test data has noise.
This latter outcome would lead researchers to understate the model’s true performance. As noisy datasets are increasingly used for model development, researchers must contend with the dual challenges of not overfitting to noise and not underestimating model performance. While existing work mainly highlights the former challenge, we believe the latter is perhaps more fundamental—and underappreciated.
A liglty edited version of Using satellite imagery to understand and promote sustainable development by Marshall Burke, Anne Driscoll, David B. Lobell, and Stefano Ermon
I will be happy to listen to different experiences and learn from them in order to improve my work