Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale J.V. Meenakshi, Nancy Johnson, and Mina Karasalo Technical Note N. 10 June 2021 Providing credible and quantifiable measures of the impact of CGIAR innovations at scale has always been a challenge. Recent SPIA-supported research has attempted to address this challenge using quasi- experimental approaches to assessing impact. This technical note details some of these completed and ongoing studies and approaches, focusing on the identification strategies used to infer causal impact, and the kinds of diverse data sets that may be brought to bear in making these inferences. A key message is also to integrate impact evaluation designs as part and parcel of monitoring and evaluation efforts. Acknowledgements: We are grateful for comments received from SPIA colleagues and the teams whose studies are reviewed in this technical note. Citation: Meenakshi, J.V., Johnson, N., & Karasalo M. (2021). Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale. Technical Note N.10. Rome. SPIA Cover image: Tilapia harvesting in Hapas. Credit: Habibul Haque/WorldFish Bangladesh Design and layout: Luca Pierotti, Macaroni Bros, and Abhilasha Vaid Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale J.V. Meenakshi, Nancy Johnson, and Mina Karasalo June 2021 Contents 1 Introduction ......................................................................................................................... 1 2 The Challenges of Measuring Impacts at Scale ..................................................................... 1 3 Global and National Assessments of the Long-Term Impacts of the Green Revolution .......... 2 4 Impacts of Recent CGIAR Innovations .................................................................................. 4 5 Using Dissemination/Diffusion Data in Designing Impact Studies ........................................ 5 6 Discussion and Conclusions .................................................................................................. 8 References .............................................................................................................................. 10 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale 1 Introduction To address the call for greater rigor in impact assessment, the credibility of causal claims is under careful examination (Stevenson et al., 2018). Experimental approaches may not be feasible for documenting the long-term and large-scale impact of agricultural research on human welfare, the environment, and other development outcomes (SPIA, 2020). In many such cases, quasi-experimental approaches—which aim to demonstrate causal evidence of the impact of an intervention on a set of outcomes—are the best option. While many estimation approaches exist, their causal claims rest on convincing and well-supported identification strategies. In this technical note, we review several recent and ongoing quasi-experimental impact studies that offer promising approaches for measuring impact at scale. The studies cover different types of CGIAR-related innovations (ranging from crop varieties and fish strains to conservation agriculture and approaches to strengthening collective action) and different types of development outcomes (such as changes in national income, infant mortality, forest cover, and atmospheric pollution). These studies also employ innovative ways of combining a multiplicity of nationally-representative and global datasets to measure impact, thereby reducing the need for special-purpose surveys. Moreover, the studies offer detailed explanations of empirical strategies that rely on information and data on innovation rollout and diffusion over time and space to address the sources of endogeneity that traditionally plague ex post impact assessments. The purpose of this technical note is to help CGIAR impact assessment specialists and practitioners elsewhere plan for and design high-quality impact studies at scale. With a focus on methodologies and data sources, we seek to collect and summarize attempts by the wider SPIA community to establish a credible basis for claiming causal impact of research on development outcomes. We do not emphasize impact results, and in most cases, these are not available yet as the studies are still ongoing. The goal is to familiarize readers with different approaches that are being used, and to facilitate learning that can help strengthen current and future impact assessments. 2 The Challenges of Measuring Impacts at Scale While scale is a complex term in agricultural research for development (AR4D), in this note scale refers to the number of people or households reached by, and potentially benefiting from, an innovation. In the case of environmental outcomes, scale may be measured in units of land or other resources. Because the definition of scale is context specific and varies by innovation, it is difficult to establish standard thresholds for determining when an innovation has scaled. A threshold of, say, one million beneficiaries, as used for example by Kremer et al. (2019) in a different context, may be of little use to define large scale in the context of many CGIAR innovations. For example, one of the studies we review below considers environmental and health impacts of exposure to sustainable agricultural practices across 250,000 plots spread across Mexico. Another focuses on the impacts of stress-tolerant rice varieties in regions of Bangladesh that are prone to flooding in the aman rice-growing season (which is one of three). Other examples are national or global in scope. The phrase at scale therefore can refer not only to the number of beneficiaries but also to the share of total expected beneficiaries or to aspects of the duration of dissemination efforts or use. What is common across the definitions of scale is that there is some degree of spontaneous, rather than incentivized, spread. Duration can be important for certain outcomes that can only be achieved as a result of use over 1 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale time. Environmental outcomes such as improved soil fertility may begin to manifest only after years of improved management practices. The opposite is also true; unintended negative consequences take time to appear. Impacts on social or demographic variables may result not only from long-term use but also from the general equilibrium effects of, for example, improved agricultural productivity on the broader economy. Thus, scale refers to the distance down the impact pathway from immediate outcomes such as agricultural productivity, to more distant outcomes related to human, social, and/or environmental welfare. However they define scale, in all cases the studies reviewed here attempt to document the impacts of big wins. A big win is thought to have happened when anecdotal evidence suggests not only that research has been successful in generating new knowledge and technologies but also that these innovations have achieved widespread use and are generating benefits. The value of benefits from big win cases is expected to far outweigh the research and development costs of the entire portfolio (SPIA, 2020). Documenting the benefits of and returns to these big wins is important from an accountability perspective. That does not mean, however, that the results are not also useful to broader audiences of policymakers, practitioners, and academics nor that the standards of rigor are lower than for other types of impact evaluations. One of the main reasons that measuring impact at scale is difficult relates to the endogeneity of adoption that results from self-selection (de Janvry et al., 2011). Outcomes for adopters cannot be directly compared with those of non-adopters unless access to the innovation is randomly assigned. While this is often the case in pilots, it is not a common way of rolling out agricultural innovations. This means that having good-quality data on adoption at scale, while needed, is not enough to measure impact. It is also necessary to account for both the observable and unobservable differences between adopters and non- adopters. Quasi-experimental identification strategies can help address endogeneity either by exploiting variation in variables that are correlated with adoption but are arguably exogenous so they can be used as instruments, or by using methods such as difference-in-difference or matching that are able to account for systematic differences between adopters and non-adopters. Balance tests and parallel trends can be used to assess whether evidence is consistent with the identifying assumptions, based on what is known about the sources or factors explaining variation in the rollout and/or access to the innovation. What constitutes sufficient evidence of support for identifying assumptions will necessarily be context specific and vary from case to case. Identifying which innovations will ultimately become big wins, when, and where takes time and is hard to predict. As a consequence, planning rigorous impact studies in advance can be a challenge. In many cases it will be necessary to use quasi-experimental approaches to measure impacts at scale, and these approaches will depend on being able to identify appropriate datasets after the fact, often repurposing data that were collected to meet a different objective. Doing this well will always require some creativity, and determining whether a causal claim is convincing enough is ultimately a judgment call. A growing number of good examples are becoming available. This note attempts to document some of them in one place. 3 Global and National Assessments of the Long-Term Impacts of the Green Revolution Gollin et al. (2021), von der Goltz et al. (2020), and Bharadwaj et al. (2020) are good examples of the overall approach of exploiting past diffusion patterns and using appropriate strategies to quantify the 2 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale impacts of CGIAR interventions on various outcomes. They consider the set and stream of adoption of several high-yielding crop varieties (HYVs) across several countries (von der Goltz et al., 2020, Gollin et al., 2021) or across districts in India (Bharadwaj et al., 2020) over several decades. However, they each use a somewhat different strategy to quantify the causal impact of the Green Revolution on outcomes that are realized only following a relatively long impact pathway. These outcomes include national incomes, population growth (Gollin et al., 2021), and child health (von der Goltz et al., 2020). Gollin et al. (2021) first make the case that the pattern of spread of high-yielding varieties across 80 countries and over time (as reported in Evenson & Gollin, 2003) was determined largely by factors that could plausibly be exogenous to each country, relating to the establishment of institutions that were to become CGIAR centers and the crop-specificity of available technologies. They demonstrate, using an event study1, that the yield growth induced by the introduction of HYVs was not driven by preexisting differential trends in yields. This finding enables them to establish the counterfactual increase in yield growth that would have occurred in the absence of the Green Revolution, which forms the basis of the instrumental variable they employ. In other words, the instrument represents the increase in yields that would have occurred across crops, keeping land and labor unchanged, aggregated using pre–Green Revolution area shares of various crops as weights. They conclude that the adoption and diffusion2 of HYVs led to substantial increases in national incomes (relative to scenarios where there had been no Green Revolution or it had been delayed by a decade), and to a consequent reduction in population growth. Von der Goltz et al. (2020) consider the extent to which the Green Revolution enabled increases in food supply that in turn prevented infant mortality, using three distinct types of data. They calculate infant mortality rates using mother’s recall of child deaths from the Demographic and Health Surveys, and exploit the fact that these data are georeferenced. For diffusion, they combine national-level adoption data (from Evenson & Gollin, 2003) with gridded crop maps (with 10-km by 10-km cells) to derive spatially and temporally disaggregated information on the spread of modern varieties. The overlap between the availability of adoption and georeferenced infant mortality information yielded an assessment that encompassed 37 countries. The key to their identification strategy is their use of the agroecological specificity of crop cultivation. In particular, their argument is that HYV share in a particular area and time is driven by crop suitability, which in turn is determined by exogenous agroclimatic characteristics. The modern variety diffusion indicator they use is a weighted average of the national- level diffusion of various crops, with weights determined by area shares determined through crop maps. It is this computed, disaggregated, cell-specific diffusion that serves as a source of exogenous variation. They repeat the exercise using different crop maps as a check on robustness. Their results show that an increase in the share of HYVs to 50 percent is associated with a decrease in infant mortality of about 33 deaths per 1,000 births. This estimation strategy also enables them to provide supportive evidence showing that increased food supplies are the causal mechanism at work. They are unable, however, to discern the effects on childhood stunting (if anything, the signs are perverse). 1 An event study can be thought of as a generalization of a difference-in-difference framework, with different units (say households) being treated at different periods of time. Thus there are a series of (say m) ‘post’ dummy variables (rather than a single post dummy variable in the basic DiD), representing households that began to be treated j (j = 1,…,m) periods ago. The identification stems from a comparison of outcomes of treated households with (a) pure control households (that is, those that will never receive the treatment) and (b) units that are not yet treated. 2 In keeping with the literature, we distinguish between diffusion, implying the natural (and often unmonitored) spread of adoption of an innovation, and dissemination, connoting organized and structured attempts to introduce new agricultural innovations. Diffusion is typically preceded by dissemination efforts, especially for agricultural innovations. 3 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale Bharadwaj et al. (2020) address the same question (that is, how the Green Revolution led to reductions in infant mortality) but focus only on India. They use districts (subunits of states) as the unit of analysis and rely on official statistical records on the adoption of HYVs. They then compare two children from the same district and with the same mother who were exposed to different levels of HYV adoption depending on the year of their birth, over and above any unobserved shocks to infant mortality that differ by year of birth and any long-term changes in the region of birth. They use an event-study specification to provide support for their key identifying assumption, by interacting eventual HYV adoption with year fixed effects and showing that there were no differential time trends in mortality before the Green Revolution. They show substantial decreases in infant mortality that can be attributed to the adoption of Green Revolution varieties and are able to rule out factors such as investments in child health as the mechanism by which these decreases were achieved. Other work, notably by Brainerd & Menon (2014), points to the adverse consequences of the intensive use of fertilizers associated with the Green Revolution on seasonal child health and mortality outcomes. 4 Impacts of Recent CGIAR Innovations Similar strategies can be used to assess the impacts of particular CGIAR innovations (in contrast to the aggregate of the Green Revolution), including innovations that are more recent or are from research areas outside of crop breeding. We provide summaries below of selected ongoing studies that are attempting to use past dissemination data, in combination with data from other sources, to assess ex post impact.3 For more detailed information about these studies and any updates on estimation strategy, please refer to the 2021 SPIA webinar series page, where recordings from presentations by all three study teams are available. • Conservation agriculture, Mexico. The impact study of the Sustainable Modernization of Traditional Agriculture program (MasAgro) is notable for at least two reasons: it is one of the few assessments of conservation agriculture at scale, and it successfully integrates monitoring and evaluation data into its empirical strategy. This administrative database, designed and compiled with technical support from CIMMYT, was integral to the scale-up strategy in Mexico. The authors note that the rapid scale-up of the extension services meant that it was unanticipated at a grid cell level, yielding, in turn, a substantial variation in the timing of exposure to treatment (extension services about conservation agriculture) across plots. They employ an event-study strategy to identify causal impacts. By combining satellite imagery on fires on agricultural land (as distinct from unrelated urban or forest fires), they are able to establish that conservation agriculture led to reduced burning of crop residue. In particular, there was a reduction in the incidence of crop fires in cells that received greater extension services. They use another administrative dataset on registered infant deaths and map them back to the region of residence to produce an outcome measure. Using natural exogenous variation in wind direction affecting exposure to air pollution resulting from fires, they are able to show impacts on infant deaths as a consequence of lower exposure to air pollution from fires. To the extent that that this study uses information on dissemination rather than diffusion, these results can be vested with an intent-to- treat interpretation. • Stress-tolerant rice varieties (STRVs), Bangladesh. Because large parts of Bangladesh are prone to flooding in the post-monsoon season, popular rice cultivars were bred to tolerate submergence for longer periods of time. The target area for the STRVs is therefore defined as all aman (post-monsoon) rice susceptible to floods. Using administrative data from national 3We highlight these studies because of the way they use dissemination data to look at impacts on particular outcomes. In some cases, the studies include other components looking at additional outcomes that we do not review here. SPIA is also supporting other studies using quasi-experimental approaches. For more information please visit: cas.cgiar.org/spia/impact-evidence 4 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale partners, researchers have undertaken a retrospective exercise to track the location and quantity of STRV seed produced and disseminated across all districts in Bangladesh, including in areas that are relatively more flood prone, over time. They can then map this information on the spatial and temporal variation in dissemination of STRVs onto the incidence of floods as estimated by satellite imagery. The location and duration of floods act as a natural experiment that allows researchers to quantify the extent to which regions that were exposed to sustained sales and dissemination of STRVs were able to insulate against losses in rice yields relative to those that saw relatively little or no dissemination. They use a proxy measure for yield loss based on the normalized difference vegetation index (NDVI). Subsequent efforts will focus on further ground- truthing yield losses as derived from the NDVI. • Approaches to restoring degraded common lands, India. To address land degradation in many of India’s ecologically fragile ecosystems, the Foundation for Ecological Security (FES) drew on CGIAR research to design a location-specific intervention model founded on the tenets of collective action and property rights to restore degraded common land. Since the early 2000s, FES has scaled out its intervention model to approximately 20,000 villages, comprising over 5.5 million acres of common land and 6.25 million people. To assess the extent to which the interventions contributed to the restoration of common lands, the researchers obtained secondary data on the variables FES used as selection criteria, both for villages in which it has intervened, and for villages in which it will intervene in the future. The researchers plan to use information on the phased rollout to determine whether areas that had been treated for longer were more likely to see larger impacts through a difference-in-difference strategy. Their outcome measures rely both on remote-sensing data on vegetative cover in georeferenced locations over time and on information canvassed through a survey. 5 Using Dissemination/Diffusion Data in Designing Impact Studies The examples reviewed in the preceding sections relied on information and data about the spread of innovations to construct identification strategies. The studies in Section 3 use, to different degrees, estimates of the timing and diffusion of HYVs to claim exogenous variation at the national and subnational levels. Gollin et al. (2021) use the differences in the timing of development and introduction of HYVs, to propose an instrument that can be considered exogenous to the individual country. Similarly, von der Goltz et al. (2020) argue that the timing of the onset of the Green Revolution was exogenous to individual countries, as it was the result of international crop research programs. They exploit this variation in conjunction with differences in the agroecological suitability of crop cultivation (again exogenously given) as measured by crop maps to generate variation. Another approach is pursued by Bharadwaj et al. (2020), who use an event study to show there were no differing trends in infant mortality before the introduction of new technologies across districts. To assess the impacts of HYVs, they exploit the variation in the year of birth of children who were born in the same area (or even had the same mother) but were, as a consequence of year of birth, exposed to different levels of HYV diffusion. The studies in Section 4 use information and data on dissemination efforts over time and space (regions/villages/localities) to build identification strategies. For the FES program, dissemination happened according to a specific set of targeting criteria—11 specific and observable criteria were selected to identify areas that were ecologically fragile, degraded, and marginalized and had high proportions of poor, tribal, landless, and smallholder populations. They then used these data to compute propensity scores and undertook a matching exercise based on these scores to obtain a set of treated and untreated villages. They were able to exploit village-level information from the national census of 2001 and 2011 to identify sites similar to the selected sites, at the time of selection. Using these same historical data, they also confirmed that the targeted sites did indeed match the original targeting 5 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale criteria. It is possible that selection bias can still operate at the community level. This is less likely to be a problem at the more aggregated level of a district. In the FES case the assumption is that any remaining unobservable features that governed site selection would be differenced out. The researchers plan to undertake a second round of matching following primary data collection to generate single difference estimates of impact on household level variables (such as income). The STRV study relies on administrative data on temporal and spatial variation in seed production and sales/distribution. The authors were able to track STRV seeds from regions where they were produced (which are typically not flood-prone) to the flood-prone districts in which they were distributed. This involved collating data across both public and private seed sectors, and over time from 2010 to 2018, although the public sector is the predominant seed supplier. Some flood-prone districts had different diffusion patterns over time, but may have been exposed to similar exogenous flooding events. Both sets of information are therefore necessary for identification. The MasAgro study uses dissemination data from CIMMYT’s monitoring and evaluation system. In particular, it employs an administrative dataset, Bitacora Electronica MasAgro, which contains georeferenced information on plot, farmer, and commodity characteristics for nearly 250,000 plots over a period of eight years, but data from before the implementation of MasAgro go back 19 years. Since these data were plot-specific and georeferenced, the authors could map this monitoring and evaluation database with remotely sensed information on fires, account for wind direction, and relate extension efforts to infant deaths via lower crop burning. Key to the ability to identify impact was CIMMYT’s investment in the monitoring and evaluation system, which made it possible to collate micro-level data and provide real-time feedback that incentivized farmers and extension agents to actively engage with it. In designing a rigorous impact assessment study using dissemination data, one key piece of information is why dissemination occurred the way it did. The FES study provides a clear example of how explicit information about the why, namely the targeting criteria, can be used to build a strong identification strategy. Even in cases when the why is not explicitly addressed, it is important to consider. A strong identification strategy ensures that, if there is incomplete information about the why, there is at least enough information to need to know it. Empirical tests such as pre-trend analysis or placebo tests can be used in support of the identification strategy, so that any unknown factors influencing rollout are assumed to be consistent with it. Unlike adoption/diffusion data, dissemination data has not been a priority for collection by impact evaluation specialists. Some of the approaches reviewed here suggest that dissemination data, while clearly not a substitute for adoption/diffusion data, can potentially be valuable for designing rigorous impact evaluations. Collecting these data may cost relatively little. Even where data are not regularly compiled by public agencies or by large-scale development programs, it may be possible to reconstruct the data for cases of potential big wins (Box 1). Going forward, collection of dissemination data can be built into the monitoring and evaluation systems for large-scale dissemination programs (Box 2). This is also an opportunity to think about monitoring, evaluation, learning, and impact assessment (MELIA) in an integrated manner. 6 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale Box 1. Supporting the collation of dissemination data for an ex post assessment SPIA is working with several study teams that have identified potential big wins but do not have the necessary data to justify and design a rigorous impact study. Improved short-season lentil varieties developed by ICARDA and partners in Bangladesh are thought to have enhanced the diversity of its predominantly rice-based cropping system, improving diets and the environment. Following the example of the STRV study, ICARDA set out to gather historical data on dissemination of improved lentil varieties from the Bangladesh Agricultural Development Corporation (BADC). As there was no systematic documentation of historical data, the ICARDA team asked regional offices to compile data from paper files and in some cases visited the regions and districts, to help compile data at subdistrict level. A similar process was followed with the two other major lentil seed distributors in Bangladesh. Genetically improved farmed tilapia (GIFT), first introduced in Bangladesh by WorldFish and the Bangladesh Fisheries Research Institute in 1994. There is widespread adoption of GIFT (Hamilton et al., 2020), and WorldFish is keen to estimate the impacts on a range of outcomes. No data are publicly available on GIFT dissemination through private hatcheries. Further, private hatcheries do not limit sales to certain geographic areas. Thus, the challenge is not only to obtain dissemination data but also to determine whether it is possible to link sales of GIFT from specific hatcheries to specific geographies. To identify hatchery catchment areas, the WorldFish team compiled a list of tilapia hatcheries and surveyed them to elicit information on their customers and on location and price of their GIFT and non-GIFT sales. They also asked where hatcheries sourced brood stock. The information was collated to define a set of potential catchment areas at the subdistrict level which were then validated through farmer surveys. This triangulation exercise confirmed the catchment areas and documented widespread GIFT diffusion. Orange-fleshed sweet potato and high-iron beans, developed by CIP and CIAT, have been widely disseminated in Uganda. Several large-scale adoption studies have been conducted and one is planned for 2021 (SPIA, 2021). HarvestPlus tracks seed production and dissemination at the national level, but sub-national data are kept by a range of national and local actors. To get a more detailed picture of where, when, and why planting material and trainings were disseminated across Uganda, SPIA led efforts in 2019 to reconstruct data at the subcounty level over the period 2009– 2019. Existing data from monitoring and evaluation (M&E) systems were compiled, standardized, and assessed for quality and completeness. To fill identified gaps, a series of regional workshops were undertaken with stakeholders during 2019-2020. The result was both a complete mapping of the dissemination and sensitization efforts in the region and an approach that could be adapted for other innovations and/or locations. The AfricaRice-SAED-ISRA (ASI) rice thresher was developed and released in 1997 with the aim of reducing postharvest losses, alleviating postharvest labor bottlenecks, and ensuring that the threshed rice was free of the dirt and debris associated with manual processes. The thresher has been disseminated in many countries in West Africa. Evidence exists on its effectiveness, and there is anecdotal information on spread. AfricaRice would like to estimate impact on postharvest losses, income, and women’s labor at scale. Researchers are compiling data on the production and dissemination of threshers in two key countries, Nigeria and Senegal. They are collecting data on the number of ASI threshers manufactured per year since introduction (1997 and 2015) and the total number of ASI threshers available in each district and year. Data will be collected from the Ministry of Agriculture, the extension service, local manufacturers, and service providers. 7 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale Box 2. Planning for future impacts Aflasafe, a set of biocontrol products for mitigating aflatoxin contamination, is in the process of going to scale in Africa for use on maize (10 countries), groundnuts (9 countries), and sorghum (in Ghana) (Konlambigue et al., 2020). Each Aflasafe product contains four atoxigenic isolates (of the fungus Aspergillus flavus, which causes aflatoxin contamination) that are native to the target country. Aflasafe products have been registered with pesticide regulatory authorities for use in each country, beginning with Nigeria in 2014 and mostly recently in Malawi in 2020 (Bandyopadhyay et al., 2019). Products are also being developed for another 10 countries. To date, Aflasafe has been widely promoted at scale only in Nigeria, as part of the AgResults Nigeria Aflasafe project (2014–2019). In other countries, IITA is in the process of implementing a scaling strategy that involves identifying and contracting licensing and distribution partners and developing a commercialization strategy (Konlambigue et al., 2020). With support from SPIA, IITA’s socioeconomic and M&E teams are working with the Aflasafe technical team to develop and put in place a system for tracking the spatial and temporal rollout of Aflasafe across countries. The multicountry nature of the rollout, the stage it is currently at, and the key role that IITA plays in providing technical support to public and private sector partners in each country, are some of the factors that make this case both feasible (in terms of access to data) and promising in terms of measuring potential future impact. 6 Discussion and Conclusions The studies reviewed here use rigorous quasi-experimental approaches to support causal claims about the impacts of agricultural research on development outcomes. The examples confirm the value of collecting diffusion data at scale. They also highlight the potential of dissemination data to strengthen study design. There has been little systematic effort to compile data on dissemination of CGIAR innovations. Given the potential value of these data and, at least in some cases, their relatively low cost, it is worth exploring how this collection can be done better, especially moving forward. As more CGIAR research programs partner with large-scale development efforts to increase the likelihood of achieving outcomes in line with their theories of change, opportunities for building dissemination data collection into program monitoring should increase. The studies and approaches reviewed here not only provide examples of identification strategies but also show how a multiplicity of data sources—ranging from remote sensing and administrative and municipal records to the Demogaphic and Health Surveys—can be effectively intertwined to build a convincing case. In each case, the choice of data sources was driven by the theory of change, especially when the outcomes are further along the causal chain, as described in Section 2. The fact that the variables within these diverse datasets—for both the interventions and the outcomes—are georeferenced is key to being able to link them. This feature also removes some of the data collection burden from research teams, who can focus on assembling existing datasets rather than collecting new data. Most of the completed and ongoing studies mentioned here were designed after the impacts were hypothesized to have occurred. However there is also a case for considering impacts prospectively, as for example is the case with several large-scale dissemination initiatives being planned by CGIAR and its partners. Approaching impact in these situations in a prospective manner has the advantage that the theories of change can be used to inform the selection of potential outcome variables, for which appropriate secondary data sources could be identified from the growing number of available survey and 8 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale geospatial datasets. The Aflasafe intervention described above is a good example that suggests that an integrated and prospective-looking MELIA approach is promising, and likely achievable at a relatively low cost. It is worth reiterating that a focus on documenting the why of dissemination, and its contribution to identification, does not substitute for collecting high-quality diffusion data. Supporting the collection of adoption data at scale is a core part of SPIA’s mandate (Evenson & Gollin, 2003; Walker & Alwang, 2015; Stevenson & Vlek, 2018; Kosmowski et al., 2020). SPIA has made important contributions to methods for collecting adoption data, focusing on measurement error (Stevenson, Macours, & Gollin, 2018), and making data available in the public domain for users4. While diffusion data are valuable on their own as evidence of uptake and of likely benefits for adopters, the possibility of using these datasets in rigorous impact assessments enhances their value, especially over time. The methods outlined above are useful ways of establishing average treatment effects at scale on welfare metrics such as incomes and health, as well as environmental outcomes in the form of reduced air pollution or arrested land degradation. These methods may be extended to look at impact heterogeneity where suitable levels of disaggregation and statistical power are available. 4 Datasets from several SPIA-supported studies are available on the SPIA dataverse. SPIA has also awarded 14 small grants to early-career researchers for studies of agricultural innovations in Ethiopia making use of the datasets of the Ethiopia Socioeconomic Survey (ESS), where SPIA has worked in partnership with the Central Statistics Agency of Ethiopia (CSA) and the LSMS-ISA team at the World Bank to integrate data collection protocols for scaled CGIAR- related innovations. 9 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale References Bandyopadhyay, R., Atehnkeng, J., Ortega-Beltran, A., Akande, A., Falade, T.D.O., & Cotty, P.J. (2019). “Ground-Truthing” Efficacy of Biological Control for Aflatoxin Mitigation in Farmers’ Fields in Nigeria: From Field Trials to Commercial Usage, a 10-Year Study. Frontiers in Microbiology 10, 2528. Bharadwaj, P., Fenske, J., Kala, N., & Mirza, R. A. (2020). The Green Revolution and infant mortality in India. Journal of Health Economics, 71, 102314. Brainerd, E., & Menon, N. (2014). Seasonal effects of water quality: The hidden costs of the Green Revolution to infant and child health in India. Journal of Development Economics, 107, 49–64. de Janvry, A., Dunstan, A., & Sadoulet, E. (2011). Recent advances in impact analysis methods for ex- post impact assessments of agricultural technology: Options for the CGIAR. Report prepared for the workshop “Increasing the rigor of ex-post impact assessment of agricultural research,” organized by the CGIAR Standing Panel on Impact Assessment (SPIA), 2 October 2010, Berkeley, CA. Rome: Independent Science and Partnership Council Secretariat. Evenson, R. E., & Gollin, D. (2003). Assessing the impact of the Green Revolution, 1960 to 2000. Science, 300(5620), 758–762. Gollin, D., Hansen, C. W., & Wingender, A. (2021). Two blades of grass: The impact of the Green Revolution. Journal of Political Economy. Hamilton, M.G. Lind, C.E., Barman, B.K., Velasco, R.R., Danting, M.J.C. & Benzie, J.A.H. (2020) Distinguishing between Nile tilapia strains using a low-density Single Nucleotide Polymorphism panel. Frontiers in Genetics ISPC (Independent Science and Partnership Council). (2018). Adoption and impact of improved lentil varieties in Bangladesh, 1996–2015. Brief no. 63. Rome: ISPC. Konlambigue, M., Ortega-Beltran, A., Bandyopadhyay, R., Shanks, T., Landreth, E., & Jacob, O. (2020). Lessons learned on scaling Aflasafe® through commercialization in Sub-Saharan Africa. A4NH Strategic Brief. Washington, DC: International Food Policy Research Institute. Kosmowski, F., Alemu, S., Mallia, P., Stevenson, J., & Macours, K. (2020). Shining a brighter light: Comprehensive evidence on adoption and diffusion of CGIAR-related innovations in Ethiopia. Rome: Standing Panel on Impact Assessment (SPIA). Kremer, M., Gallant, S., Rostapshova, O. & Thomas, M. (2019). Is development innovation a good investment? Which innovations scale? Evidence on social investing from USAID’s Development Innovation Ventures. Research working paper. https://scholar.harvard.edu/files/kremer/files/sror_div_19.12.13.pdf. Rashid, S., & Zhang, X., eds. (2019). The making of a Blue Revolution in Bangladesh: Enablers, impacts, and the path ahead for aquaculture. Washington, DC: International Food Policy Research Institute. https://doi.org/10.2499/9780896293618. SPIA (Standing Panel on Impact Assessment). (2018). Adoption and impact of improved lentil varieties in Bangladesh, 1996–2015. Brief no. 63. Rome: SPIA. SPIA. (2020). SPIA’s approach to impact assessment for CGIAR. Technical Note no. 8. Rome: SPIA. SPIA. (2021). Uganda AGDPG virtual meeting: Measurement of innovations in national surveys in Uganda. https://cas.cgiar.org/spia/events/uganda-agdpg-virtual-meeting-measurement-agricultural- innovations-national-surveys. 10 Designing Quasi-Experimental Impact Studies of Agricultural Research at Scale Stevenson, J., Macours, K., & Gollin, D. (2018). The rigor revolution in impact assessment: Implications for CGIAR. Rome: Independent Science and Partnership Council (ISPC). Stevenson, J. R., & Vlek, P. (2018). Assessing the adoption and diffusion of natural resource management practices: Synthesis of a new set of empirical studies. Rome: Independent Science and Partnership Council (ISPC). von der Goltz, J., Dar, A., Fishman, R., Mueller, N. D., Barnwal, P., & McCord, G. C. (2020). Health impacts of the Green Revolution: Evidence from 600,000 births across the developing world. Journal of Health Economics, 74, 102373. Walker, T. S., & Alwang, J., eds. (2015). Crop improvement, adoption and impact of improved varieties in food crops in sub-Saharan Africa. Wallingford, UK: CABI. 11 CGIAR Advisory Services – SPIA Via di San Domenico 1, 00153 Rome, Italy Email: spia@cgiar.org URL: https://cas.cgiar.org/spia