IFPRI Discussion Paper 02356 September 2025 Displacement and Development Evidence from a Graduation Program for Somalia’s Ultra-Poor Jessica Leight Kalle Hirvonen Naureen Karachiwalla Deboleena Rakshit Poverty, Gender, and Inclusion Unit INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE The International Food Policy Research Institute (IFPRI), a CGIAR Research Center established in 1975, provides research-based policy solutions to sustainably reduce poverty and end hunger and malnutrition. IFPRI’s strategic research aims to foster a climate-resilient and sustainable food supply; promote healthy diets and nutrition for all; build inclusive and efficient markets, trade systems, and food industries; transform agricultural and rural economies; and strengthen institutions and governance. Gender is integrated in all the Institute’s work. Partnerships, communications, capacity strengthening, and data and knowledge management are essential components to translate IFPRI’s research from action to impact. The Institute’s regional and country programs play a critical role in responding to demand for food policy research and in delivering holistic support for country-led development. IFPRI collaborates with partners around the world. AUTHORS Jessica Leight (j.leight@cgiar.org) is a Senior Research Fellow in the Poverty, Gender, and Inclusion (PGI) Unit at the International Food Policy Research Institute (IFPRI), Washington, DC. Kalle Hirvonen (k.hirvonen@cgiar.org) is a Senior Research Fellow in IFPRI’s PGI Unit, Washington, DC. Naureen Karachiwalla (n.karachiwalla@cgiar.org) is a Research Fellow in IFPRI’s PGI Unit, Nairobi, Kenya. Deboleena Rakshit (d.rakshit@cgiar.og) is a Research Analyst in IFPRI’s PGI Unit, Washington, DC. Notices 1IFPRI Discussion Papers contain preliminary material and research results and are circulated in order to stimulate discussion and critical comment. They have not been subject to a formal external review via IFPRI’s Publications Review Committee. Any opinions stated herein are those of the author(s) and are not necessarily representative of or endorsed by IFPRI. 2 The boundaries and names shown and the designations used on the map(s) herein do not imply official endorsement or acceptance by the International Food Policy Research Institute (IFPRI) or its partners and contributors. 3Copyright remains with the authors. The authors are free to proceed, without further IFPRI permission, to publish this paper, or any revised version of it, in outlets such as journals, books, and other publications. mailto:j.leight@cgiar.org mailto:k.hirvonen@cgiar.org mailto:n.karachiwalla@cgiar.org mailto:d.rakshit@cgiar.og iii Abstract While the population of internally displaced people around the world continues to grow, evidence around strategies to sustainably enhance livelihoods among IDPs remains extremely limited. We present findings from a randomized trial of an ultra-poor graduation program targeting IDPs in urban Baidoa, Somalia; the intervention pro-vided cash transfers, an asset transfer or technical training program, and facilitated savings groups. Our findings suggest that two years following program launch, the intervention has led to significant increases in consumption, assets, and savings; however, these effects seem to be driven almost exclusively by increased livestock production. An exploration of heterogeneous effect using generalized random forest methods further suggests that the positive effects of the treatment are dramatically larger for smaller households characterized by lower dependency ratios. Keywords: Somalia, internally displaced people, ultra-poor graduation iv Acknowledgments Funding for this work was provided by the U.S. Agency for International Development (USAID), the CGIAR Initiative on Fragility, Conflict, and Migration as well as the CGIAR Policy Innovations Program. We thank the UPG team at World Vision and ACTED who have facilitated this ongoing collaboration with IFPRI, particularly Caitlin Whittemore, Andrew Mu-gobo, and Asrat Bekele Balcha. We have benefited from discussions with Asrat Bekele Balcha, Daniel O. Gilligan, Colton Parks, and Caitlin Whittemore. IRB approval for this study was granted by the International Food Policy Research Institute (IFPRI), protocol #00007490. The trial was registered with the AEA RCT Registry (AEARCTR-0009452). 1 Introduction The global population of forcibly displaced people has doubled over the past decade to reach a record 120 million by 2024, and more than half of this population remain in their countries of origin as internally displaced persons (IDPs) (UNHCR, 2023a, 2024b). This sharp increase reflects the surging incidence of both violent conflict and climate-induced disasters (IDMC, 2024; Rustad, 2024; WMO, 2024). Many forcibly displaced people — 90% of whom reside in low- and middle-income countries —- then remain trapped in protracted displacement for years or even decades (UNHCR, 2024a), often living side by side with host communities facing their own economic and social challenges (UNHCR, 2023b). Aid in forced displacement contexts has traditionally focused on providing regular trans- fers of cash, food, or vouchers to support household consumption (Aker, 2017; Hidrobo et al., 2014; Altındağ and O’Connell, 2023), without addressing deeper, interconnected drivers of poverty among the displaced, including lost assets, inadequate shelter, poor physical and mental health, uncertain prospects for future residence, and overlapping market failures. Ef- fectively targeting these multiple constraints requires a multifaceted intervention, suggesting that graduation model programs — a sequenced set of interventions including consumption support, training, access to savings or credit, and an asset transfer — may be a promising strategy. There is strong evidence about the effects of these programs in reducing poverty and increasing investments in more stable settings (Banerjee et al., 2015; Bandiera et al., 2017), but very limited evidence in forced displacement contexts (Rozo and Grossman, 2025), where complementary public services are often lacking and high levels of ongoing uncertainty and sometimes violence could render households unwilling to invest. This study presents findings from a randomized controlled trial in the city of Baidoa, Somalia, evaluating an ultra-poor graduation (UPG) program targeted to IDPs. Baidoa is one of Somalia’s largest displacement hubs, hosting more than 600,000 IDPs who have fled protracted violence or droughts; broadly, in Somalia, over 70% of IDPs live below the extreme poverty line (Pape, 2017). Within this context, UPG provided six months of cash 1 transfers, followed by a choice between an asset transfer –— such as livestock or agricultural inputs — or enrollment in technical and vocational education training (TVET) to support livelihoods development. Additional components included the formation of savings groups and group-level coaching (focused on financial literacy, business skills, and social capital). The trial included a sample of 4,116 households identified as eligible for the interven- tion due to their baseline vulnerability to hunger and their residence in the targeted IDP sites; given that the number of eligible households exceeded those who could be served given resource constraints, households were randomly selected to enter the UPG program.1 Us- ing baseline data from 2022 and two follow-up surveys conducted in 2023 and 2024 and characterized by minimal attrition (less than 4% of households), we exploit the randomized design to estimate the program’s causal impact on consumption, food security and livelihood outcomes. Our primary findings suggest that two years following its launch, the UPG intervention had substantial effects on a range of consumption and livelihoods activities, including a 30% increase in consumption (consistent across both food and non-food consumption), a 300% increase in the value of assets (driven almost entirely by goats), and a nearly 50 percentage point increase in the probability of reporting any savings. Treated households are more likely to report income from a range of sources, though the largest increase is observed in livestock. There are also positive effects on respondent locus of control, and no adverse effects on local social cohesion despite the household-level randomization. Longitudinal data suggests that positive effects were evident even one year post-launch, and generally widened over the second year of implementation. Attrition in this trial was extremely low but concentrated in the control arm: accordingly, we also construct bounds on the treatment effects of interest following Lee (2005) and Kling et al. (2007) and find the estimated effects are generally robust to both Lee trimming and allowing attritors to deviate from the treatment arm-specific mean 1There were 6,323 eligible households and only 5,000 households could be served by the intervention; a subset of these intervention households entered the trial. Appendix A3 provides a detailed discussion of the ethical aspects of this trial, based on the structured ethics appendix suggested by Asiedu et al. (2021). 2 by up to two standard deviations. We also explore heterogeneity in the estimated treatment effects following the generalized random forest method proposed by Athey et al. (2019). We find that there is substantial heterogeneity in the estimated conditional average treatment effects, primarily predicted by variation in household composition at baseline (the average household size is nearly seven, with a standard deviation of over two). Households that are smaller, and characterized by a lower ratio of dependents to prime aged adults, show significantly larger treatment effects. In fact, households with five or fewer members and a dependency ratio in the lowest quartile exhibit a gain in consumption that is around 30% larger than the average treatment effect, suggesting that households with more care responsibilities may not be able to effectively take advantage of new livelihoods opportunities. We further demonstrate that a comparable pattern of heterogeneity is not observed in replication data from Banerjee et al. (2015), an evaluation of a graduation model intervention implemented in six stable, non- displacement sites, suggesting that this pattern may be somewhat distinct to displacement settings, requiring consideration in program design in such settings. Our paper contributes to the growing literature evaluating the effectiveness of graduation model programs (Banerjee et al., 2015, 2022, 2021; Bandiera et al., 2017; Balboni et al., 2022). While most previous studies have been conducted in stable, rural contexts, ours is among the first to evaluate a graduation program in a setting marked by forced displacement. Table A1 provides a overview of this literature and key characteristics of the setting, empirical de- sign, and interventions, highlighting the absence of evidence from conflict-affected contexts. Moreover, whereas previous research primarily evaluates the performance of these programs in rural areas, this trial contributes evidence from an urban setting. This trial, together with a study from post-conflict Afghanistan (Bedoya et al., 2019), indicate that graduation mod- els may be particularly effective in fragile or humanitarian settings, demonstrating relatively large impacts on household economic outcomes. This pattern of relatively large impacts aligns with findings from a parallel literature 3 on economic interventions targeting forcibly displaced populations. Despite the scale of global displacement, there remains a striking lack of rigorous evidence on the effectiveness of livelihood and poverty alleviation programs for these populations—particularly IDPs (Rozo and Grossman, 2025; Schuettler and Caron, 2020). Our study provides one of the first experimental evaluations of a multifaceted livelihoods program for IDPs. Related studies focusing on cash and cash+ interventions in refugee settings also report promising results. In Uganda, Gupta et al. (2024) conduct a randomized trial evaluating the economic impacts of a one-off unconditional $1,000 USD transfer directed at South Sudanese refugees residing in refugee camps, showing notable improvements in household consumption (11% relative to control group), asset accumulation (30%), and business revenue (64%) after 18 months. Also in Uganda, Baseler et al. (2024) experimentally evaluate a program combining $540 USD cash grants with mentorship for young urban microentrepreneurs, including both refugees and host community members. They find that the cash grant alone significantly boosts business profits and household earnings one year after implementation, but the mentorship component provides no additional benefits. Finally, in Kenya, MacPherson and Sterck (2021) use a regression discontinuity approach to compare a traditional humanitarian model of refugee assistance with a development-oriented approach, finding improvements in nutrition and food security after 16 months, likely driven by greater participation in small-scale agriculture.2 2 Context and Intervention 2.1 Context Baidoa is a city in southwestern Somalia, serving as the capital of the South West State and located approximately 250 kilometers from the capital Mogadishu (see Figure A1a). It is one of Somalia’s largest urban centers and an important hub for agriculture and livestock 2Another closely related strand of work focuses on assessing the impact of programs and legislation that facilitate refugee integration into local labor markets: see for example Sarvimäki and Hämäläinen (2016); Battisti et al. (2019); Fasani et al. (2021); Hussam et al. (2022). 4 trade (UN-Habitat, 2021). While population estimates vary widely due to the lack of re- cent census data and the large-scale influx of IDPs over the past decade, imputed estimates based on a 2014 survey suggest a population of around 300,000 in that year (UN-Habitat, 2021). However, ongoing conflict and recurring droughts have led to large-scale displace- ment, extensively increasing the city’s population. In 2022, the number IDPs in Baidoa was estimated to be nearly 600,000 (UNHCR, 2022), up from 169,000 in 2017 (UNHCR, 2017) and constituting one of the largest IDP populations within Somalia. Though a number of humanitarian and development organizations operate in the city, security remains fragile, with Al-Shabaab, a designated terrorist group, maintaining influence in surrounding rural areas (UN-Habitat, 2021) from which households are often displaced. 2.2 Intervention The project evaluated here was the Ultra-poor Graduation (UPG) Program, implemented by World Vision over three years and funded by the U.S. Agency for International Develop- ment (USAID) Bureau for Humanitarian Assistance (BHA).3 UPG supported ultra-poor and vulnerable households in graduating from extreme poverty and moving toward self-reliance, targeting primarily IDPs as well as a small number of households from vulnerable host com- munities, returnees, and refugees.4 Figure A1b maps the Baidoa IDP sites targeted for this program and thus included in the evaluation. Launched in June 2022 and running until December 2024, the intervention consisted of four main components. First, UPG households received six monthly unconditional cash transfers of $42.50 to provide consumption support. Second, they participated in savings groups designed to encourage savings; these regular meetings also served as a platform for training on topics such as financial literacy and business management. Third, households 3The formal title was Building Pathways Out of Poverty for Ultra-Poor IDPs and Vulnerable Host Com- munities in Baidoa. 4Returnees are Somali households who had been refugees in other countries and then returned; Baidoa is not necessarily their home city, however. The very small number of international refugees in Somalia are generally from Ethiopia and Yemen. 5 received either a one-time asset transfer or funding to enroll in a six-month technical training course, based on their preference. Households opting for the asset transfer could choose from goats, chickens, sheep, cows, crop seeds, or tools; the technical training was provided through a local institute. Fourth, participants attended regular group-based coaching sessions focused on life skills and social integration. A summary of the intervention components compared to other similar interventions is provided in Table A1. Program eligibility was determined based on household characteristics assessed in an initial vulnerability assessment, and eligible households had to meet two criteria: they were classified as experiencing moderate or severe hunger according to the Household Hunger Scale (Ballard et al., 2011), and they had resided in the IDP site for at least one month.5 The initial assessment identified 6,323 eligible households. 3 Methods 3.1 Experimental Design We employ a randomized controlled trial using randomization at the household level, given that UPG by design had resources to serve only 5,000 households among a larger number of eligible households. The control arm was expected to include 1,500 households, but was reduced to 1,323 households given the number of eligible households to ensure that the intervention target of 5,000 was met. Randomization was conducted by the research team in Stata prior to the baseline survey using data from the initial vulnerability assessment.6 5These criteria were identified as important for program eligibility by the program team and validated through focus group discussions with participants and community members. Households experiencing hunger were prioritized due to their higher level of need, while requiring at least one month of residency ensured greater stability, increasing the likelihood that participants would remain at the site. 6Randomization was stratified by four groups based on two binary variables: a binary variable for a household being above or below the median of an asset index constructed using vulnerability assessment data, and a binary variable equal to one if the household had resided in the IDP site for more than a year. The initial design also conducted random assignment to two treatment arms, to facilitate an analysis of two alternate household coaching strategies. However, ultimately only one coaching strategy — group-based coaching — was utilized, and thus the two treatment arms are pooled in analysis. 6 3.2 Ethics Ethical approval for this study was granted by the International Food Policy Research Insti- tute (IFPRI) Institutional Review Board (IRB), under protocol #00007490. Further ethical considerations, including policy equipoise, risks, and informed consent, are detailed in the structured ethics appendix (Appendix A3), following Asiedu et al. (2021). 3.3 Surveys The baseline survey was conducted between May 18 and June 12, 2022 and was targeted to include 3,000 treatment households and all 1,323 control households; the remaining 2,000 treatment households were not targeted for inclusion in the survey or evaluation. The realized sample included 4,116 households (2,872 treatment and 1,244 control).7 The first follow-up survey was conducted 14 months post-baseline and one year after the launch of the UPG intervention and resurveyed 4,089 (99.3%) sample households. The second follow- up survey was conducted 28 months post-baseline and two years following UPG launch, and resurveyed 3,982 (96.7%) sample households. All interviews were conducted in person. Due to security risks and the limited survey capacity in the study area, all surveys were carried out by the implementing partner, World Vision, which was responsible for hiring and training the enumerators. As is common in insecure forced displacement settings (Pape and Mistiaen, 2020; Pape and Verme, 2023), interview durations had to be restricted (with a target of around 60 minutes per household), limiting the scope of the questionnaires. Figure A2 in the Appendix summarizes the study timeline. 7The sampling frame fielded included 2,980 treatment households and 1,323 control households (the latter comprised of all households in the target IDP sites not served by the intervention); the sampling frame of treatment households itself shrank slightly from the original planned 3,000 given that 20 households in the sample list were duplicates. Within the target sample, both treatment and control, 187 households were not interviewed because they either could not be reached during the designated survey period or declined to participate. 7 3.4 Outcomes Table A10 in the Appendix provides definitions of all outcome variables analyzed, as prespec- ified in a registered analysis plan.8 The primary outcomes include the share of households characterized by moderate or severe hunger over the last 30 days based on the Household Hunger Scale; household per capita consumption in 2017 Purchasing Power Parity (PPP) dollars; and the estimated value of household assets, also valued in 2017 PPP dollars.9 Ta- ble A4 reports the estimated minimum detectable effect sizes for the primary outcomes based on the mean and standard deviations observed in the main follow-up survey. The experi- ment is adequately powered to detect even relatively small treatment effects: a 0.05 change in the share of households with moderate or severe hunger (11.5% of the control mean), a 0.15 $PPP change in per capita consumption (5.7% of the control mean), and a 35 $PPP change in the total value of assets (14% of the control mean). Secondary outcomes include additional, more detailed measures of assets and financial inclusion; income data capturing whether households report income from any one of six sources, and the amount of income; the livelihood coping strategies score, capturing strategies household may need to use to adapt to shortages of food or money; and measures of social cohesion and locus of control. The index of social cohesion was constructed using a series of questions about the individual’s perception of the broader community (Humble et al., 2023; Catholic Relief Services, 2019); and locus of control was measured following Rotter (1966) and Malacarne (2024). 8The trial registration number is AEARCTR-0009452. 9Consumption was measured using modules adapted from the 2017 Somali High-Frequency Survey (Pape (2017)) conducted by the World Bank, but shortened to manage interview length; additional details on the construction of the consumption measure are provided in Appendix A4.2. The 2017 base year is chosen because PPP at the time of the baseline survey it was the latest available International Comparison Pro- gram (ICP) benchmark year. PP Conversion factors are most reliable in benchmark years when the ICP conducts comprehensive price surveys (e.g., 2011, 2017, 2021). PPPs for non-benchmark years are typically interpolated using domestic and U.S. CPIs and may be subject to greater uncertainty. 8 3.5 Econometric Specification We estimate the following specification: yi,t = α + β Treatmenti + γ yi,0 + λi + ϵi,t, where yi,t is an outcome for household i in year t, Treatmenti is an indicator for assignment to the UPG program, yi,0 is the baseline measure of the outcome (if available), and λi are fixed effects for randomization strata. For outcomes that were not measured at baseline, we estimate the same model excluding yi,0. 10 Since assignment to treatment was not clustered, standard errors are adjusted for het- eroskedasticity following White (1980). We also report q-values corrected for multiple hy- pothesis testing (MHT) following Anderson (2008). We conduct MHT corrections within the set of primary outcomes, and within the set of each family of secondary outcomes. We also report average standard treatment effects for broader outcome families that pool across primary and secondary outcomes (consumption and food security, assets and savings, and income), following Kling et al. (2007). Although our prespecified main specification does not include additional baseline covariates, we also report an alternative specification that uses double lasso for covariate selection (Belloni et al., 2013; Cilliers et al., 2024). 4 Empirical Findings 4.1 Baseline Characteristics and Balance To characterize the sample, Table 1 summarizes key demographic characteristics at base- line and reports balance across the control and treatment arms. Eighty-three percent of households are IDPs (with the remainder including 7% refugees, 1% returnees, and 9% host 10Baseline values are reported only for the Livelihoods Coping Scale, tropical livestock units, and any savings; baseline HHS was also measured, but there is no variation in this measure at baseline since all eligible households were characterized by moderate or severe hunger. 9 community members), and the average household includes nearly seven members, of whom four are children. Sixty percent of households report having a pregnant or lactating woman, and 20% report the presence of an individual with a disability. Unsurprisingly, sampled households were characterized by a high level of food insecurity and a high level of depri- vation at baseline: the average HHS score was nearly four, consistent with the eligibility criteria of moderate to severe hunger; less than 1% of households reported any savings, and households owned around .2 tropical livestock units on average. Total baseline asset value was estimated to be at around $350.)11 Out of the 11 t-tests comparing baseline characteristics across the two study arms, only one shows a statistically significant difference: household size is modestly larger in the con- trol group, by approximately 0.3 members (or 5%), and this difference is highly statistically significant. When we estimate a joint test of balance across covariates, the p-value corre- sponding to the null hypothesis of no significant imbalance is 0.053 when using conventional p-values or 0.079 using randomization inference following Kerwin et al. (2024); we will ex- plore further in the robustness checks below the possibility of any bias in the estimated treatment effects due to the imbalance detected in baseline household size. 4.2 Implementation Fidelity Table A2 summarizes findings around implementation fidelity that suggest that in general, the program was carefully implemented in line with the randomized design. 88% of treated households reported receiving cash transfers from a non-governmental organization in the past three years, compared to 14% of control households (who may plausibly also be reporting transfers received from other NGO programs in the same recall period). Those who do report receiving transfers reported six transfers of $42 each, for a total transfer of around $250. Similarly, 87% of households assigned to treatment reported receipt of either assets or TVET training over the past three years, compared to only 3% of control households. The 11Note that as asset price data was not collected at baseline, assets are valued at prices collected in the second follow-up survey, using the median price measured within the sample. 10 assets track seems to have been slightly more popular than the TVET track, based on households’ self-reports: 41% report receipt of assets, and 30% report receipt of TVET (9% report receipt of both, an allocation that was not generally allowed under program guidelines and suggests that households may be mis-identifying another service as one provided by UPG). For households reporting assets, goats were the dominant choice (reported by nearly 80% of households) as evident in Figure A3a, while for households reporting training, the most popular courses were tailoring, tie dying, and beauty salon services (Figure A3b). Reported participation in savings groups and coaching is, however, somewhat lower: 63% of treatment households report participation in savings groups and 49% in coaching, compared to minimal participation in the control arm. We also separately assessed whether there were any cross-household spillovers from house- holds in the treatment to the control arm, and observed these were very rare. Fewer than 2% of control households reported receiving cash remittances from any other household (whether a program beneficiary or not). Fewer than 20% of treatment households reported that they had transferred or loaned the asset they received to any other household. While informa- tional spillovers may have been more common (a majority of treatment households stated they shared information received in training, though this is only vaguely defined), other direct forms of spillovers seem to be infrequent. 4.3 Primary Findings The primary treatment effects are reported in Table 2; Panel A reports effects on con- sumption and food security, Panel B reports effects on assets and financial inclusion, Panel C reports effects on income, and Panel D reports effects on social cohesion and locus of control.12 The findings in Panel A suggest the intervention led to large positive shifts in consumption, with an increase in per capita consumption of around 30% ($0.81) in absolute terms: the relative magnitude of this effect is consistent comparing across both food and 12Again, the prespecified primary outcomes are per capita consumption, household hunger scale status, and asset value, or Columns (1) and (4) of Panel A and Column (1) of Panel B. 11 non-food consumption. There is also a dramatic decline in the probability of households be- ing characterized by moderate or severe food insecurity according to the household hunger scale: recall that at baseline, all households were identified as eligible based on this criteria. Two years later, 42% of households in the control arm continue to experience this high level of food insecurity, while this has declined to only 12% in the treatment arm. The livelihoods coping score also shows a decline of about a third, consistent with the previous findings and suggesting that many households exposed to UPG are not having to resort to adverse measures (such as selling assets, borrowing, or withdrawing children from school) to obtain food. Panel B documents effects on assets and financial inclusion. The average asset value in dollars has roughly tripled in the treatment arm (reaching nearly $900, compared to under $300 in the control arm) and this is substantially driven by livestock, where the average number of tropical livestock units (TLUs) increases by nearly fourfold to around .49 TLUs. (In practice, this corresponds to a gain of roughly three goats at the second follow-up; given reported market prices, this is an increase in asset value of nearly $500.) Panel A of Table A5 in the Appendix reports effects on the count estimates for a whole range of assets, and it is evident that there are significant treatment effects on a large number of asset categories (mobile phones, various productive tools, and other livestock). These estimates are generally quite small in absolute terms (none exceeds .5 other than the estimate for goats, and most are under .1), though in multiple categories the increase is proportionately large: i.e, treatment households increase their reported inventory of spades, tarpaulins, solar panels, and sheep by around 70%; the number of donkey carts and poultry owned increases by around 50%; and there is a sixfold increase in the number of sewing machines. Returning to the main table, Column (3) in Panel B in Table 2 suggests there are also substantial effects on savings, as virtually no households (4%) in the control arm report cash savings, compared to nearly half of treatment households. The effects on credit access, while still positive, are less dramatic in magnitude (eight percentage points relative to a mean of 12 57%). Panel C reports effects on income. There is a large effect on income from livestock: the probability of any reported income from cropping or livestock production increases by 16 percentage points relative to a mean of only 5% in the control arm, and treatment households report an average of $34 of income from cropping and livestock in the reference period of one month, relative to only $6 for control households. The effects for non-farm businesses are, however, modest: there is an increase in the probability of having a non-farm business is five percentage points relative to a mean in the control arm of 15 percentage points, but the increase in the continuous measure of income is extremely small (2% of the control mean) and statistically insignificant. There is similarly no overall shift in the probability of wage income, though a further decomposition shows that there is some shift away from informal to formal wage labor. Panel D then reports two variables capturing shifts in social cohesion and locus of control. Because the intervention was individually randomized in an urban setting, there was an in- creased risk of adverse effects on social cohesion—particularly if some households were aware that others were receiving support while they were not. However, there is no evidence of this phenomenon here; we cannot reject the null that the treatment affected social cohesion. The estimated treatment effect on locus of control is notably positive and significant, suggesting some shifts in the psychological outlook of households linked to their enhanced economic status. [[[Naureen can add evidence here from other studies that find similar effects.]]] To capture some key effects graphically, Figure 1 shows group means and treatment effect estimates for the three primary outcomes as well as tropical livestock units across the two follow-up surveys, allowing us to track how impacts evolved over time. Panel A shows a steady decline in moderate or severe hunger, with the largest gains in the first year. House- holds in the control arm saw some early gains but little shift thereafter, leading to a widening gap between the two groups. Panel B presents total per capita consumption (not measured at baseline), which shows modest gains for the treatment group by the first follow-up and a 13 much larger gap in the following year. Growth in asset value can be assessed by employing asset prices as measured in the second follow-up survey: while control households saw flat or declining asset values, treated households accumulated assets steadily, and similarly for tropical livestock units. We report average standard treatment effects following Kling et al. (2007) to facilitate interpretation of general treatment effects across categories: consumption and food security, assets and financial inclusion, income, and social cohesion and locus of control.13 These findings are reported in Table 3 and highlight the wide variation in effect sizes across outcome families. The positive effect on consumption and food security is fairly large at around .28 standard deviations, but the effect on income is only around .05 standard deviations and statistically insignificant; both are dwarfed by the positive effect on asset and financial inclusion, nearly 1.5 standard deviations. The effects on attitudinal variables are around 0.05 standard deviations, but insignificant. We also report two additional exploratory analyses: the first is treatment effects on household size and composition. Although this analysis was not pre-specified, we consider it important given the evidence of baseline imbalance in household size, and because differential shifts in household composition have been documented in prior empirical evaluations of cash transfer programs for displaced households (Özler et al., 2021). As shown in Panel B of Table A5 in the Appendix, there is a small treatment effect for overall household size of .068 members, corresponding to less than a one percent increase in household size relative to the mean in the control arm at follow-up: the only (weakly) statistically significant shifts were in the number of young children (0-4 years of age) and adolescents (15-19 years of age). There was no significant change in the number of prime-age or older adults. Overall, these results suggest that the intervention did not meaningfully alter household composition, and it is unlikely that selective household entry or exit accounts for the main treatment effects. Given the evidence of baseline imbalance in household composition, we also re-estimate 13The variable capturing any moderate or severe food insecurity is reverse-coded in this analysis. 14 our main specification using a double lasso routine to select baseline covariates as controls: the lasso uniformly selects a single variable, household size. (Baseline values of the outcome variable are still uniformly included as control variables, when available.)14 Table A3 reports these specifications. While some treatment effects are slightly smaller (e.g., the coefficient on consumption), in general the differences are extremely minor. There is very little evidence that baseline imbalance led to any bias in the primary estimated effects. 4.4 Attrition As previously noted, attrition was on average strikingly low in this trial, particularly for an IDP sample: fewer than 4% of households were lost to follow-up. However, despite this low rate, there is a meaningful difference across treatment arms as reported in Table A6: in Column (1), we observe that the rate of attrition is six percentage points lower among treatment households (1.4%, compared to 7.6% among control households). In Columns (2) and (3) of the same table, we then regress attrition on a set of baseline covariates (the same covariates previously reported in the balance table) and the interaction of these covariates with treatment. Only a few baseline characteristics predict attrition: households characterized by a higher HHS score at baseline (more intense hunger) are more likely to attrit, while households characterized by a higher LCS score (indicative of more intense use of coping strategies) and a higher dependency ratio are less likely to attrit. However, the interaction effects between baseline covariates and the treatment dummy are generally insignificant, implying that characteristics of the attrited do not appear to differ across arms, again with the exception of the dependency ratio: households with a higher dependency ratio are less likely to attrit in the control arm, but this relationship is zero in the treatment arm. Nonetheless, to further explore the potential of any bias due to attrition, we also esti- mate bounds on the primary treatment effects using various strategies. For the continuous variables of interest, we first measure the attrition gap and construct bounds following Lee 14The fact that this routine selects only a single variable is by no means unusual, as extensively documented in Cilliers et al. (2014). 15 (2005): we estimate the difference in the proportion of non-missing observations between the treated and control groups and then create two counterfactual treated samples. To estimate the lower bound, we drop the treated units characterized by the highest outcome values until the attrition gap is exhausted, and to estimate the upper bound, we drop those characterized by the lowest outcome values. As an alternate strategy, we also follow Kling et al. (2007) to generate bounds assuming that attrited units in the treatment or control arm are character- ized by outcomes N standard deviations above or below the treatment group specific mean, where N in this case is set to two: the upper bound is estimated by setting attrited units in the treatment arm to two standard deviations above the treatment arm mean and attrited units in the control arm to be two standard deviations below the control arm mean, and vice versa for the lower bound. This allows for wide disparities in outcomes comparing attrited and non-attrited individuals. For binary variables, we estimate simple Manski bounds. The findings are presented in Table A7 and are generally consistent with our primary results. In Panel A, we can observe that the estimated treatment effects on consumption, assets, and the livelihoods coping score can be bounded away from zero, and with relatively tight variation in the estimated magnitude: i.e., the magnitude of the estimated effect for consumption ranges between 20% and 40%. In Panel B, we can see that even quite conser- vative Manski bounds allow us to reject the hypothesis of a null effect on moderate or severe hunger status, any savings, and any credit; we can also reject the hypothesis of a null effect for any agricultural or livestock income, though for the other income variables, the bounds cross zero. (Similarly, the estimated bounds cross zero for the continuous income variables, where the primary estimates were all somewhat noisy likely due to the large number of ze- roes, and bounds also cross zero for locus of control; for concision, these are not reported in the table.) 16 4.5 Heterogeneous Effects The previous graduation literature has generally used quantile regression to explore varia- tion in treatment effects (Banerjee et al., 2015; Bandiera et al., 2017; Bedoya et al., 2019). However, this strategy does not identify the underlying drivers of heterogeneity, and sim- pler linear analyses focusing on specific baseline characteristics as predictors of heterogeneity (e.g., wealth, education), have so far provided limited evidence of consistent or interpretable moderators (Bedoya et al., 2019; Bossuroy et al., 2022). To address this, we use relatively recent machine learning methods to explore treatment effect heterogeneity using a gener- alized random forest (GRF) (Athey et al., 2019), allowing for a data-driven exploration of heterogeneity across a rich set of baseline covariates. The GRF algorithm builds a causal random forest (CRF) that allows for the estimation of conditional average treatment effects, conditional on observable baseline characteristics. This method is arguably well suited to our trial, characterized by a large sample and individual- level randomization —- both important conditions for the effective application of causal forest methods (Wager and Athey, 2018; Davis and Heller, 2017).15 The first step is simply to assess how much heterogeneity in treatment effects is evident, focusing on per capita consumption as the primary outcome variable of interest. We estimate what is known as the “out-of-bag” conditional average treatment effect (CATE) — in which the treatment effect for each observation is predicted using only the trees for which that observation was not used in the training set — and present the distribution in Figure A4a. It is evident that there is substantial mass for an effect on consumption between around $0.6 and $0.85 (relative to the estimated mean effect of $0.81), but there is also a right tail of consumption effects of more than a dollar. The cumulative distribution function shown in Figure A4b suggests that the top quartile of consumption effects is above $0.77. We then probe the variable importance estimated by the GRF algorithm: this captures 15We use the GRF algorithm in R, and draw on the useful replication code and discussion of applications of GRF in an RCT context provided in Sylvia et al. (2021). 17 the percentage of importance for each baseline covariate in the forest, as measured by the frequency with which this variable is used as a splitting variable. The findings reported in Table A8 in the Appendix suggest that the most important variables predicting heterogene- ity are both linked to household composition: the number of members in the household, and the dependency ratio (defined as the ratio of the number of children under 14 and elderly over 55 to the number of adults and adolescents). Other, weaker, predictors of heterogeneity include baseline asset value and tropical livestock units. Figure A5 then captures how the estimated out-of-bag CATEs vary with respect to the three most predictive characteristics. The first two scatter plots capture a notable pattern of treatment effect heterogeneity in which the largest effects are observed for smaller households (four or fewer members) and those characterized by lower dependency ratios (under around two dependents per produc- tive adult). There is also some weak evidence of heterogeneity in which households that were worse-off at baseline as measured by asset value are characterized by somewhat larger treatment effects, but this effect is relatively flat. To then test heterogeneity using a simpler specification, we follow Sylvia et al. (2021) and estimate a standard heterogeneous effects regression using these key indicators of interest. Given the graphical pattern suggesting that larger treatment effects are concentrated in the left tail of household size and dependency ratio, we generate binary variables equal to one if the household is characterized by household size, dependency ratio, or baseline asset value below the 25th percentile at baseline and estimate heterogeneous effects, reported in Table 4. We can observe that households characterized by a smaller size (under five members) and a lower dependency ratio (under 2.5) show dramatically larger treatment effects, but for baseline asset value, the heterogeneity is lower in magnitude and not statistically significant. Column (4) presents the joint specification including all three variables and both household size and the dependency ratio remain strongly significant: a household characterized by both small size and a low dependency ratio would have a predicted treatment effect that is more than double a household characterized by large size and a high dependency ratio. 18 Overall, this analysis suggests that in predicting treatment effects of this intervention, demographics is truly destiny — and dominates other observable characteristics at baseline. Importantly, the transfers and other material support provided by UPG were fixed in size at the household level and did not scale with respect to family size; this is a common feature of graduation model interventions, as evident (for example) in the BRAC program handbook or in the detailed program descriptions provided in Banerjee et al. (2015).16 That being said, the magnitude of the treatment effect heterogeneity; the fact that it is observed both for overall household size and for the dependency ratio; and the fact that this heterogeneity is observed around a year following the conclusion of transfers suggests that this is not solely a mechanical effect of greater intervention intensity, and may also reflect the fact that large households with many dependents were less able to exploit new livelihoods opportunities. To further explore the potential generalizability of this pattern of heterogeneity, we also conduct an exploratory analysis of the data from the original six-site graduation trial re- ported in Banerjee et al. (2015) using the same generalized random forest method; we focus primarily on the question of what baseline variables predict heterogeneity in the treatment effect for per capita consumption in the longer-term (three-year) follow-up.17 Our goal is to understand whether the pattern in which household size and demographics dominate in predicting conditional average treatment effects in our sample of Baidoa is consistent in a sample drawn from stable, non-displacement, rural contexts. We first conduct the GRF analysis using a large set of covariates available at baseline, including a number of variables (baseline consumption and income) that were not collected at baseline in the Baidoa trial; we then restrict to the set of covariates that are also avail- able in the Baidoa baseline data, and report findings from the “full” and “reduced” model, respectively. The findings suggest that there is also meaningful heterogeneity in the esti- mated treatment effects in the Banerjee et al. trial: the simple intent-to-treat estimate on 16Only one site had a program feature that was adjusted for household size, consumption support as provided in Ghana. 17We follow the primary specification described in the paper, using both binary variables for country and randomization strata and clustering at the randomization unit level. 19 long-term consumption is $3.36 per person per month, but the estimated CATEs range from $1 to nearly $6 in 2014 PPP terms, as captured in Figure A6 presenting presents the den- sity functions of the conditional average treatment effect for the full and restricted model, respectively. Table A9 then documents patterns of variable importance in the two models, focusing on the ten most important variables in each and sorted in accordance with their rank in the full model.18 In the full model, baseline consumption, asset value, and income from revenue and agriculture are most predictive of variation in the conditional average treatment effect (weights of roughly 0.1 each); and in the restricted model, asset value remains the most predictive (0.24), with similar predictive power then from household size, food security, and perceived economic welfare (weights of 0.2 each).19 The observed patterns of heterogeneity in the CATE with respect to these baseline characteristics seem somewhat nonlinear, as sum- marized in Figure A7, but for household size, a similar (though flatter) pattern is observed in the restricted model in which smaller households are characterized by larger treatment effects.20 Overall, though, household size is not a dominant predictor of variation in the CATE; it has some predictive power, baseline economic characteristics seem to be somewhat more predictive. This exploration suggests that the pattern observed in our data in which demographic characteristics are dramatically more predictive of variation in the conditional average treat- ment effect vis-a-vis baseline economic variables is distinct from the pattern observed in Banerjee et al. (2015) for a graduation program in a non-displacement settings. There are several potential explanations for this that are not mutually exclusive. First, household size may be uniquely important in a displacement setting, where households are large and have 18Binary variables corresponding to country and randomization strata are included in the random forest method, but parallel to the Baidoa analysis, the variable importance weights are re-normalized to exclude the weights estimated on randomization strata. “NA” in the final column denotes that that variable is not included in the restricted model. The weights in the full model do not sum to one because other variables outside the top ten are predictive. 19Data on household composition that would allow us to identify the number of dependents is not reported. 20The top 5% of outliers for each explanatory variable (consumption, assets, etc.) are truncated from the graph for clarity. 20 agglomerated or altered in composition either by necessity or as a coping mechanism in the face of shocks. Average household size at baseline in our sample is 6.7, not dramatically different from the mean (5.8) in the sample in Banerjee et al. (2015), though the dependency ratio is somewhat high: only 2.6 of these members, on average, are between the ages of 15 and 55. Dependency ratios even in sub-Saharan Africa are typically below 100% (Cleland and Machiyama, 2017), though that is often using a different minimum age for elderly (65), and constructed using macro-level data that may not be directly comparable. In the Somali context, restrictive gender norms may pose a further challenge by limiting the economic engagement of women even of prime working age. A second hypothesis is that baseline economic status (level of assets, food security) has little predictive power in a displacement context in which households have experienced a series of adverse shocks and thus where their current ownership of assets may be effectively random. Interestingly, the probability of owning any assets is estimated to be much higher in this sample compared to the Banerjee et al. sample (98% versus 42%), though the mean level of assets is higher in the Banerjee et al. sample due to higher asset ownership at the 75th percentile and above; Figure A8 summarizes kernel densities in both samples, with the important caveat that the specific questionnaire modules used to measure assets are not the same.21 Accordingly, it is not the case that assets are non-predictive of treatment effects because no household in the Baidoa sample owns any assets; but it may be there is simply little predictive information captured in the existing distribution. A third hypothesis is that constraints on economic activities linked to household size (and the number of dependents) may be more acute in an urban displacement context such as Baidoa, also characterized by some ongoing security risks. In a more typical, non-fragile, rural setting, the active care burden of both young and old dependents may be lower due to 21The continuous variables in the Banerjee et al. sample are expressed in 2014 purchasing power parity terms, while our variables are captured in 2017 purchasing power parity terms. We employ a simple ad- justment using the U.S. GDP deflator and thus apply an adjustment factor of 1.05 to the Banerjee et al. estimates, bearing in mind this is only indicative. 21 reduced security risk for unsupervised dependents, and the existence of a typical informal network of care and support. That being said, it is somewhat surprising that this would pose a meaningful constraint given that the most important livelihoods activity for households benefiting from the UPG intervention in Baidoa is raising livestock, an activity typically undertaken at home and viewed as compatible with domestic and care responsibilities; one hypothesis is that marketing livestock for actual income generation is challenging for those constrained by the care of more dependents. 4.6 Cost-effectiveness Given the evidence around the substantial positive effects of the UPG intervention in Baidoa, it is informative to consider its costs and explore cost-effectiveness. Here, we find the the intervention is in fact relatively expensive: the all-in cost per household is estimated at approximately $7,770 in 2017 PPP terms or $2,930 in 2017 nominal USD (to enable cross- country comparability, we express costs in both PPP and nominal terms).22 This suggests the Baidoa UPG program is one of the highest-cost programs assessed in the literature (Table A1), though our cost estimate plausibly reflects an upper bound given that it is based on the total program budget including overhead, monitoring, and management costs often excluded from cost analyses.23 For comparison, Banerjee et al. (2015) report 2017 PPP costs of approximately $5,600 in Ghana, $3,960 in Pakistan, and $4,420 in Peru. In Afghanistan, Bedoya et al. (2019) estimate the intervention cost at around $7,470 per household, a cost level comparable to Somalia (and perhaps not coincidentally, in a similarly challenging, conflict-affected setting). 22The original budgeted cost was $3,200 in 2022 nominal USD. We convert this estimate to Somali shillings, and adjust to the 2017 base year using IMF CPI data and World Bank exchange and PPP conversion factors. 23We were unable to conduct a detailed costing analysis, as this analysis was interrupted when many staff at the implementing partner were terminated following loss of USAID funding. Accordingly, the estimate here is based on the total program budget awarded to the World Vision-led consortium (with the IFPRI subaward removed) divided by the 5,000 households served. It includes all management, oversight, and monitoring costs—expenses often excluded from published cost estimates. Moreover, it assumes the entire budget was spent in Somalia, though in practice, some share likely supported international staff or external oversight. 22 In terms of effectiveness, the intervention effect size here for consumption (30% relative to the control mean) is among the largest effects observed in this literature as summarized in Table A1, larger than the effects observed in Banerjee et al. (2015) and comparable to Bedoya et al. (2019).24 The effect on assets and financial inclusion — nearly 1.5 standard deviations — is much larger than the effect on comparable variables observed in Banerjee et al. (around 0.2 – 0.4 standard deviations). However, the effects in this trial are observed at relatively short duration (only two years post-program launch, rather than three or four), again as summarized in the same table. While this analysis should be interpreted cautiously given the absence of detailed cost data and thus our inability to verify that costs are measured comparably across these different trials, it suggests that UPG may be in a broadly similar (or slightly lower) range of cost-effectiveness vis-a-vis other parallel interventions, but only if the large effects observed in the short-term persist.25 5 Conclusions Recognizing the inadequacy of regular consumption support in addressing the multiple chal- lenges faced by displaced households, the 2016 New York Declaration for Refugees and Migrants called for a shift toward sustainable, development-oriented responses to forced dis- placement that promote economic self-reliance among refugees and IDPs (United Nations, 2016). Our study is among the first to experimentally assess an intervention closely aligned with this declaration, and our findings suggest that an integrated ultra-poor graduation model does lead to significant increases in consumption, assets, and financial inclusion for a vulnerable IDP sample in urban Somalia. These effects are primarily driven by enhanced income from livestock—particularly goats—with little evidence of household livelihoods diversifying into non-agricultural activities. While 24Relative to standard deviations in the control arm, however, the effect of .1 standard deviations observed here is roughly similar to the pooled effect observed in the original Science trial (Banerjee et al., 2015). 25There are, of course, other examples of parallel interventions implemented at dramatically lower cost with high levels of effectiveness: for example, recent work in Niger by Bossuroy et al. (2022) assessing an intervention costing less than $600. 23 such gains offer a tangible path to improved welfare, they raise important questions about the sustainability of these effects. On the one hand, reliance on livestock may reflect a natu- ral livelihood strategy in Baidoa, a major livestock trade hub in Somalia. On the other hand, it is unlikely that one can goat one’s way out of poverty in the long run, to paraphrase Lant Pritchett (The New York Times, 2007), as sustained poverty reduction typically requires movement out of agriculture. In addition, we demonstrate using a generalized random forest analysis that the positive effects of the treatment are notably larger for smaller households characterized by a smaller number of dependents; household size is the primary baseline variable predicting variation in the conditional average treatment effect. This is in contrast to the findings from an exploratory re-analysis of the data from Banerjee et al. (2015), a graduation program im- plemented in multiple stable settings, where baseline values of economic outcomes are more predictive of varying treatment effects. These findings suggest that livelihoods interven- tions implemented in displacement contexts should more carefully consider how household composition and dependent responsibilities may shape households’ economic choices, and ultimately influence households’ capacity to benefit from such programs. This empirical pattern has implications for both targeting — some households might show significantly more positive treatment effects — and intervention design, in that it might be desirable to tailor any available intervention to households that have more care responsibilities. Given that large households do not show very positive effects from an integrated livelihoods intervention in this context, these households might alternatively be preferentially targeted for ongoing cash transfers to sustain consumption and human capital investment. At least in a fragile or displacement-affected context, a livelihoods intervention that has the explicit objective of generating a longer-term income stream might consider targeting households that are smaller and have reduced care responsibilities. 24 Table 1: Baseline balance (1) (2) (1)-(2) Control Treatment Pairwise t-test Variable Mean/(SE) Mean/(SE) Mean difference IDP household 0.833 0.838 -0.005 (0.011) (0.007) Household size 6.901 6.573 0.328*** (0.068) (0.043) Dependency ratio 2.398 2.373 0.025 (0.096) (0.060) Any pregnant / lactating woman 0.589 0.601 -0.012 (0.014) (0.009) Any disabled member 0.204 0.192 0.013 (0.011) (0.007) Any adult male 0.871 0.859 0.011 (0.010) (0.006) Any cash savings 0.007 0.009 -0.002 (0.002) (0.002) Baseline asset value 358.784 337.792 20.992 (19.428) (11.100) Tropical livestock units 0.198 0.182 0.016 (0.015) (0.009) Household Hunger Scale 3.477 3.469 0.008 (0.033) (0.022) Livelihoods Coping Strategies index 1.657 1.587 0.069 (0.061) (0.038) Number of observations 1244 2872 4116 F-test of joint significance 0.053 F-test using randomization inference 0.079 Notes: All pair-wise regressions and F-tests are based on specifications including strata fixed effects and robust standard errors. Randomization inference p-values are based on 1,000 repetitions. Asterisks indicate significance at the ten, five, and one percent level. 25 Table 2: Primary experimental effects (1) (2) (3) (4) (5) Panel A: Consumption and food security Total consumption Food consumption Non-food consumption Moderate or severe HHS Livelihood coping score UPG 0.81*** 0.59*** 0.22*** -0.30*** -0.57*** ( 0.06) ( 0.04) ( 0.02) ( 0.02) ( 0.09) q-value 0.001*** 0.001*** 0.001*** 0.001*** 0.001*** Control mean 2.63 2.03 0.61 0.42 1.71 Observations 3964 3964 3964 3982 3982 Panel B: Assets and financial inclusion Asset value TLUs Any savings Any credit UPG 603.14*** 0.35*** 0.42*** 0.08*** (21.25) ( 0.02) ( 0.01) ( 0.02) q-value 0.001*** 0.001*** 0.001*** 0.001*** Control mean 263.32 0.14 0.04 0.57 Observations 3982 3982 3966 3974 Panel C: Income Any ag. + livestock Ag. + livestock income Any non-farm business Non-farm income Any wage Wage income UPG 0.16*** 28.11*** 0.05*** 3.81 -0.00 21.62 ( 0.01) ( 8.47) ( 0.01) (25.63) ( 0.02) (36.51) q-value 0.001*** 0.001*** 0.001*** 0.317 0.317 0.206 Control mean 0.05 6.08 0.15 125.84 0.42 586.59 Observations 3982 3982 3982 3982 3982 3982 Panel D: Social cohesion and locus of control Social cohesion Locus of control UPG -0.04 0.10*** ( 0.03) ( 0.03) q-value 0.085* 0.001*** Control mean 0.03 -0.07 Observations 3982 3982 Notes: The primary outcomes are per capita total consumption (col 1); moderate or severe household hunger scale (HHS) (col 4); and value of assets (col 1, Panel B). All regressions control for strata fixed effects and for the baseline outcome variable, if available (baseline values are available for the livelihood coping score, tropical livestock units, and any savings; there is no baseline variation in the household hunger score). Robust standard errors are reported in parentheses and the endline control mean is reported. Asterisks indicate significance at the ten, five, and one percent level. The reported q-values are p-values adjusted for multiple inference based on the false discovery rate correction procedure outlined in Anderson (2008). 26 Figure 1: Primary outcomes: Longitudinal effects 0.00 -0.12*** -0.30*** 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Sh ar e of h ou se ho ld s A: Share of households with moderate or severe hunger 0.19*** 0.81*** 0 1 2 3 4 5 6 7 8 $ PP P B: Daily per capita total consumption -12.93 207.89*** 531.68*** 0 200 400 600 800 1000 $ PP P Baseline 1st follow-up 2nd follow-up C: Total value of assets -0.01 0.19*** 0.35*** 0 .2 .4 .6 .8 1 N um be r o f t ro pi ca l l iv es to ck u ni ts Baseline 1st follow-up 2nd follow-up D: Tropical livestock units Control UPG 95% CI 95% CI Notes: These graphs report treatment effects at both 1st follow-up (one year following program enrollment) and 2nd follow-up (two years). Solid dots show the control and treatment (UPG) group means, and capped bars represent the corresponding 95% confidence intervals. The reported numbers are treatment effects, estimated using an ANCOVA at first and second follow-up rounds that controls for strata dummies and baseline values of the variable, if available. Statistical significance is indicated as * 0.1, ** 0.05, and *** 0.01. Daily per capita consumption was not measured at baseline. 27 Table 3: Average standard treatment effects (1) (2) (3) (4) Consumption and food security Assets and financial inclusion Income Social cohesion and locus of control UPG 0.287*** 1.436*** 0.047 0.041 (0.035) (0.043) (0.037) (0.035) Constant 0.000 0.000 0.001 0.000 (0.030) (0.030) (0.029) (0.029) Observations 3964 3961 3982 3982 Notes: This table reports the average standard treatment effect for each family of outcomes following Kling et al. 2007, normalized with respect to standard deviations in the control arm. (The variable corresponding to any moderate or severe food insecurity is reverse-coded in this analysis.) Robust standard errors in parentheses. Asterisks indicate significance at the ten, five, and one percent level. 28 Table 4: Heterogeneous effects (1) (2) (3) (4) Per capita consumption UPG 0.625*** 0.680*** 0.703*** 0.493*** (0.054) (0.062) (0.092) (0.090) UPG X Household under 5 members 0.369*** 0.296** (0.130) (0.133) Household under 5 members 1.229*** 1.178*** (0.110) (0.113) UPG X Dependency ratio under 2.5 0.367*** 0.240** (0.127) (0.120) Dependency ratio under 2.5 0.491*** 0.168* (0.104) (0.099) UPG X Low baseline assets 0.171 0.126 (0.116) (0.106) Low baseline assets -0.181* -0.143 (0.101) (0.093) Observations 3964 3964 3964 3964 Notes: This table reports heterogeneous effects for the primary outcome of per capita daily consumption with respect to baseline covariates identified using the generalized random forest method. We define three binary variables using cutoffs derived from the 25th percentile of the baseline distribution: household size under five members, a dependency ratio (defined as the ratio of individuals under 15 or over 55 to those aged 16–54) under one, and baseline asset value below the 25th percentile or $74. Asterisks indicate significant at the ten, five and one percent level. 29 References Aker, Jenny C., “Comparing Cash and Voucher Transfers in a Humanitarian Context: Evidence from the Democratic Republic of Congo,” The World Bank Economic Review, 2017, 31 (1), 44–70. Altındağ, O. and S. D. O’Connell, “The short-lived effects of unconditional cash trans- fers to refugees,” Journal of Development Economics, 2023, 160, 102942. Anderson, Michael L, “Multiple inference and gender differences in the effects of early intervention: A reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects,” Journal of the American Statistical Association, 2008, 103 (484), 1481–1495. Asiedu, Edward, Dean Karlan, Monica Lambon-Quayefio, and Christopher Udry, “A call for structured ethics appendices in social science papers,” Proceedings of the Na- tional Academy of Sciences, 2021, 118 (29), e2024570118. Athey, Susan, Julie Tibshirani, and Stefan Wager, “Generalized random forests,” The Annals of Statistics, 2019, 47 (2), 1148 – 1178. Balboni, Clare, Oriana Bandiera, Robin Burgess, Maitreesh Ghatak, and Anton Heil, “Why do people stay poor?,” The Quarterly Journal of Economics, 2022, 137 (2), 785–844. Ballard, Terri, Jennifer Coates, Anne Swindale, and Megan Deitchler, “Household hunger scale: indicator definition and measurement guide,” Washington, DC: Food and nutrition technical assistance II project, FHI, 2011, 360, 23. Bandiera, Oriana, Robin Burgess, Narayan Das, Selim Gulesci, Imran Rasul, and Munshi Sulaiman, “Labor markets and poverty in village economies,” The Quarterly Journal of Economics, 2017, 132 (2), 811–870. Banerjee, Abhijit, Dean Karlan, Robert Osei, Hannah Trachtman, and Christo- pher Udry, “Unpacking a multi-faceted program to build sustainable income for the very poor,” Journal of Development Economics, 2022, 155, 102781. , Esther Duflo, and Garima Sharma, “Long-term effects of the targeting the ultra poor program,” American Economic Review: Insights, 2021, 3 (4), 471–486. , , Nathanael Goldberg, Dean Karlan, Robert Osei, William Parienté, Jeremy Shapiro, Bram Thuysbaert, and Christopher Udry, “A multifaceted pro- gram causes lasting progress for the very poor: Evidence from six countries,” Science, 2015, 348 (6236), 1260799. Baseler, Travis, Thomas Ginn, Ibrahim Kasirye, Belinda Muya, and Andrew Zeitlin, “Mentoring Small Businesses: Evidence from Uganda,” mimeo, 2024. Battisti, Michele, Yvonne Giesing, and Nadzeya Laurentsyeva, “Can job search assistance improve the labour market integration of refugees? Evidence from a field ex- periment,” Labour Economics, 2019, 61, 101745. 30 Bedoya, G. et al., “No household left behind: Afghanistan targeting the ultra-poor impact evaluation,” Technical Report w25981, National Bureau of Economic Research 2019. Belloni, Alexandre, Victor Chernozhukov, and Christian Hansen, “Inference on Treatment Effects after Selection among High-Dimensional Controls†,” The Review of Economic Studies, 11 2013, 81 (2), 608–650. Bossuroy, Thomas, Markus Goldstein, Bassirou Karimou, Dean Karlan, Harounan Kazianga, William Parienté, Patrick Premand, Catherine C Thomas, Christopher Udry, Julia Vaillant et al., “Tackling psychosocial and capital constraints to alleviate poverty,” Nature, 2022, 605 (7909), 291–297. Brune, L. et al., “Social protection amidst social upheaval: Examining the impact of a multi-faceted program for ultra-poor households in Yemen,” Journal of Development Economics, 2022, 155, 102780. Carletto, Gero, Marco Tiberti, and Alberto Zezza, “Measure for measure: Comparing survey based estimates of income and consumption for rural households,” The World Bank Research Observer, 2022, 37 (1), 1–38. Catholic Relief Services, “Social Cohesion Indicators Bank,” https://www.crs.org/ our-work-overseas/research-publications/social-cohesion-indicators-bank 2019. Accessed: 2024-03-07. Cilliers, Jacobus, Nour Elashmawy, and David McKenzie, “Using post-double se- lection Lasso in field experiments,” Technical Report, World Bank 2024. Cleland, John and Kazuyo Machiyama, “The challenges posed by demographic change in sub-Saharan Africa: A concise overview,” Population and Development Review, 2017, 43, 264–286. Davis, Jonathan MV and Sara B Heller, “Using causal forests to predict treatment heterogeneity: An application to summer jobs,” American Economic Review, 2017, 107 (5), 546–550. Deaton, Angus and Salman Zaidi, Guidelines for constructing consumption aggregates for welfare analysis, Vol. 135, World Bank Publications, 2002. Fasani, Francesco, Tommaso Frattini, and Luigi Minale, “Lift the ban? Initial em- ployment restrictions and refugee labour market outcomes,” Journal of the European Eco- nomic Association, 2021, 19 (5), 2803–2854. Gibson, John and Scott Rozelle, “Prices and unit values in poverty measurement and tax reform analysis,” The World Bank Economic Review, 2005, 19 (1), 69–97. Gupta, Prankur, Daniel Stein, Kyla Longman, Heather Lanthorn, Rico Bergmann, Emmanuel Nshakira-Rukundo, Noel Rutto, Christine Kahura, Winfred Kananu, Gabrielle Posner et al., “Cash transfers amid shocks: A large, one-time, unconditional cash transfer to refugees in Uganda has multidimensional benefits after 19 months,” World Development, 2024, 173, 106339. 31 Hidrobo, Melissa, John Hoddinott, Amber Peterman, Amy Margolies, and Vanessa Moreira, “Cash, food, or vouchers? Evidence from a randomized experiment in northern Ecuador,” Journal of Development Economics, 2014, 107, 144–156. Humble, Steve, Aditya Sharma, Baladevan Rangaraju, Pauline Dixon, and Mark Pennington, “Associations between neighbourhood social cohesion and subjective well- being in two different informal settlement types in Delhi, India: a quantitative cross- sectional study,” BMJ open, 2023, 13 (4), e067680. Hussam, R. et al., “The psychosocial value of employment: Evidence from a refugee camp,” American Economic Review, 2022, 112 (11), 3694–3724. IDMC, “2024 Global Report on Internal Displacement,” 2024. Kaplan, Lennart, Utz Pape, and James Walsh, “Eliciting Accurate Consumption Responses from Vulnerable Populations,” Data Collection in Fragile States: Innovations from Africa and Beyond, 2020, pp. 193–206. Kerwin, Jason, Nada Rostom, and Olivier Sterck, “Striking the Right Balance: Why Standard Balance Tests Over-Reject the Null, and How to Fix It,” Technical Report, IZA Discussion Papers 2024. Kling, Jeffrey R, Jeffrey B Liebman, and Lawrence F Katz, “Experimental analysis of neighborhood effects,” Econometrica, 2007, 75 (1), 83–119. Lee, David S, “Training, wages, and sample selection: Estimating sharp bounds on treat- ment effects,” 2005. Leight, Jessica, Harold Alderman, Daniel Gilligan, Melissa Hidrobo, and Michael Mulford, Can a light-touch graduation model enhance livelihood outcomes? Evidence from Ethiopia, Intl Food Policy Res Inst, 2023. MacPherson, C. and O. Sterck, “Empowering refugees through cash and agriculture: A regression discontinuity design,” Journal of Development Economics, 2021, 149, 102614. Malacarne, Jonathan G, “The farmer and the fates: Locus of control and investment in rainfed agriculture,” Applied Economic Perspectives and Policy, 2024, 46 (2), 534–552. Mancini, Giulia and Giovanni Vecchi, “On the construction of a consumption aggregate for inequality and poverty analysis,” World Bank Group, Washington, DC, 2022. Özler, Berk, Çiğdem Çelik, Scott Cunningham, P Facundo Cuevas, and Luca Parisotto, “Children on the move: Progressive redistribution of humanitarian cash trans- fers among refugees,” Journal of Development Economics, 2021, 153, 102733. Pape, Utz, “Somali Poverty Profile: Findings from Wave 1 of the Somali High Frequency Survey,” Technical Report, World Bank Group, Washington, D.C. 2017. 32 and Johan Mistiaen, “Household expenditure and poverty measures in 60 minutes: a new approach with results from Mogadishu,” World Bank Policy Research Working Paper, 2018, (8430). and , “Rapid Consumption Surveys,” in Johannes Hoogeveen and Utz Pape, eds., Data Collection in Fragile States: Innovations from Africa and Beyond, Palgrave Macmillan, 2020, chapter 9, pp. 153–171. and Paolo Verme, “Measuring Poverty in Forced Displacement Contexts,” GLO Dis- cussion Paper, 2023. and Philip Randolph Wollburg, “Estimation of poverty in Somalia using innovative methodologies,” World Bank Policy Research Working Paper, 2019, (8735). Rotter, Julian B, “Generalized expectancies for internal versus external control of rein- forcement.,” Psychological monographs: General and applied, 1966, 80 (1), 1. Rozo, Sandra V and Guy Grossman, “Refugees and Other Forcibly Displaced Popula- tions,” VoxDevLit, 2025, 14 (1). Rustad, Siri Aas, Conflict Trends: A Global Overview, 1946–2023, Peace Research Insti- tute Oslo (PRIO), 2024. Sarvimäki, Matti and Kari Hämäläinen, “Integrating immigrants: The impact of re- structuring active labor market programs,” Journal of Labor Economics, 2016, 34 (2), 479–508. Schuettler, Kirsten and Laura Caron, Jobs interventions for refugees and internally displaced persons, World Bank Group, 2020. Sylvia, Sean, Nele Warrinnier, Renfu Luo, Ai Yue, Orazio Attanasio, Alexis Medina, and Scott Rozelle, “From quantity to quality: Delivering a home-based par- enting intervention through China’s family planning cadres,” The Economic Journal, 2021, 131 (635), 1365–1400. The New York Times, “Should We Globalize Labor Too?,” The New York Times Maga- zine, June 10, 2007 2007. UN-Habitat, “Baidoa Urban Profile: Working Paper and Spatial Analyses for Ur- ban Planning and Durable Solutions,” https://reliefweb.int/report/somalia/ baidoa-urban-profile-working-paper-and-spatial-analyses-urban-planning-and-durable 2021. Accessed: 2024-03-07. UNHCR, “Location and populations of IDP sites in Baidoa as at 28 April 2017,” https: //data.unhcr.org/en/documents/details/56361 2017. Accessed: 2024-03-07. , “Location and populations of IDP sites in Baidoa as at July 2022,” https://data. unhcr.org/en/documents/details/94414 2022. Accessed: 2024-03-07. 33 , “Global Trends: Forced Displacement,” https://www.unhcr.org/global-trends 2023. Accessed: 2024-03-07. , “Mid-Year Trends,” https://www.unhcr.org/mid-year-trends 2023. Accessed: 2024- 03-07. , “Internally Displaced People,” 2024. Accessed: 2024-03-07. , “Refugee Statistics,” https://www.unhcr.org/refugee-statistics 2024. Accessed: 2024-03-07. United Nations, “New York Declaration for Refugees and Migrants (A/RES/71/1),” 2016. Resolution adopted by the General Assembly on 19 September 2016. Wager, Stefan and Susan Athey, “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests,” Journal of the American Statistical Association, 2018, 113 (523), 1228–1242. White, Halbert, “A heteroskedasticity-consistent covariance matrix estimator and a di- rect test for heteroskedasticity,” Econometrica: Journal of the Econometric Society, 1980, pp. 817–838. WMO, “Extreme Weather,” 2024. Accessed: 2024-03-07. 34 Appendix A1 Graduation literature Table A1 provides an overview of key studies evaluating graduation-style interventions, sum- marizing characteristics of the trial design, intervention components, and estimated impacts on consumption. The final two rows of the table report the estimated per-household or per- participant intervention costs in both 2017 Purchasing Power Parity (PPP) dollars and 2017 nominal U.S. dollars, allowing for cross-country and cross-study comparison. These cost estimates have been harmonized using a standardized approach described in the following paragraphs. The original intervention costs are reported in U.S. dollar terms in Banerjee et al. (2015) (Ethiopia, Ghana, Honduras, India, Pakistan, and Peru), Bedoya et al. (2019) (Afghanistan), and in this study (Somalia). In this case, the procedure follows three steps: first, we recover the local currency value by applying the USD exchange rate for the base year of the original cost data. Second, we adjust the local currency values to 2017 levels using changes in domestic consumer price indices (CPI). Finally, we convert the 2017 local currency values into 2017 PPP dollars using PPP conversion factors for individual consumption. All CPI, exchange rate, and PPP data are sourced from the World Bank, except in the case of Somalia, where CPI data were drawn from the IMF due to gaps in the World Bank series. Bandiera et al. (2017) (Bangladesh), Bossuroy et al. (2022) (Niger), and Brune et al. (2022) (Yemen) report costs in PPP dollar terms. Here, to convert these costs to 2017 PPP and nominal U.S. dollars, we follow a two-step process: first, we recover the local currency value by multiplying the original PPP figure by the relevant PPP exchange rate in the base year. Then, we follow the inflation and conversion steps described above starting from step two. Yemen is a special case. The intervention cost was reported in 2010 PPP dollars, but no reliable CPI data exist for Yemen beyond 2014. To address this, we estimate an inflation 35 adjustment factor based on the change in the nominal exchange rate between 2010 and 2017, scaled by U.S. inflation over the same period. This approach assumes that relative movements in the exchange rate reflect changes in local price levels in the absence of reliable domestic inflation data. 36 T ab le A 1: O ve rv ie w of gr ad u at io n li te ra tu re S tu d y (a ) (b ) (b ) (b ) (b ) (b ) (b ) (c ) (d ) (e ) (f ) (g ) T ri a l ch a ra c te ri st ic s [1 2p t] C ou n tr y B G D E T H G H A H N D IN D P A K P E R A F G N E R Y E M E T H S O M Y ea r la u n ch ed 20 07 20 10 20 11 20 09 20 07 20 08 20 11 20 16 20 16 20 10 20 19 20 22 Y ea rs el ap se d 4 3 3 3 3 3 3 2 1. 5 4 3 2 C on te x t R u ra l R u ra l R u ra l R u ra l R u ra l R u ra l R u ra l R u ra l R u ra l B ot h R u ra l U rb an F or ce d d is p la ce m en t N o N o N o N o N o N o N o N o N o N o N o ID P O n go in g co n fl ic t N o N o N o N o N o N o N o Y es N o Y es N o Y es C on su m p ti on IT T 1 0. 9% 18 .2 % 10 .6 % -5 .8 % 10 .7 % 7. 0% 5. 2% 29 .7 % 14 .7 % -2 .4 % 0. 7% 30 .8 % p -v al u e 0. 00 3 0. 00 0 0. 00 7 0. 15 2 0. 00 1 0. 07 9 0. 08 5 0. 00 0 0. 00 0 0. 64 4 > .2 0. 00 0 In te rv e n ti o n ch a ra c te ri st ic s A ss et (A ) or gr a n t (G ) A A A A A A A A G A A / G A C on su m p ti on su p p or t C a sh F o o d * C as h F o o d C as h C as h C as h * C as h C as h * C as h * C as h * C as h M en to ri n g Y es Y es Y es Y es Y es Y es Y es Y es Y es Y es N o Y es S k il ls tr ai n in g N o N o N o N o N o N o N o Y es Y es Y es Y es N o S av in gs gr ou p s Y es Y es Y es Y es Y es Y es Y es Y es Y es Y es Y es Y es A cc es s to fi n an ce N o Y es Y es Y es Y es N o Y es Y es N o N o N o N o P sy ch os o ci al su p p or t N o N o N o N o N o N o N o N o Y es N o Y es † N o S o ci al /b eh av io ra l Y es Y es Y es Y es Y es Y es Y es Y es Y es Y es Y es Y es su p p or t C os t ($ P P P 20 17 ) 1, 48 9 3 ,2 51 5, 6 00 3, 04 6 1, 23 4 3, 95 9 4, 42 2 7, 47 4 59 5 1, 10 1 N R 7, 76 8 C os t ($ U S D 20 17 ) 54 1 1, 11 8 2 ,2 01 1, 38 5 38 2 1, 23 1 2, 59 1 1, 76 8 25 0 33 2 N R 2, 92 7 N o te s: a B a n d ie ra et a l. (2 0 1 7 ), b B a n er je e et a l. (2 0 1 5 ), c B ed o y a et a l. (2 0 1 9 ), d B o ss u ro y et a l. (2 0 2 2 ), e B ru n e et a l. (2 0 2 2 ), f L ei g h t et a l. (2 0 2 3 ), g T h is st u d y. B G D = B a n g la d es h , E T H = E th io p ia , G H A = G h a n a , H N D = H o n d u ra s, IN D = In d ia , P A K = P a k is ta n , P E R = P er u , A F G = A fg h a n is ta n , N E R = N ig er , Y E M = Y em en , S O M = S o m a li a . Y ea rs el a p se d re fe r to th e y ea rs el a p se d b et w ee n p ro g ra m la u n ch a n d th e co ll ec ti o n fo fo ll o w -u p d a ta . C o st s a re co n v er te d to p u rc h a si n g p o w er p a ri ty te rm s in th e b a se y ea r em p lo y ed a n d th en co n v er te d to P P P te rm s in 2 0 1 7 . F o r L ei g h t et a l. in E th io p ia , th e re p o rt ed co effi ci en t is th e m ea n eff ec t o n co n su m p ti o n a cr o ss th re e a rm s ex a m in ed ; n o n e w er e st a ti st ic a ll y si g n ifi ca n t, h en ce th e n o ta ti o n o f th e p -v a lu e. * C o n su m p ti o n su p p o rt w a s p ro v id ed to a ll in d iv id u a ls in th e sa m p le (i .e ., th er e w a s n o ex p er im en ta l v a ri a ti o n .) † P sy ch o so ci a l su p p o rt w a s p ro v id ed o n ly to th o se id en ti fi ed a s el ig ib le b a se d o n sy m p to m s o f d ep re ss io n 37 A2 Appendix Exhibits Figure A1: Map of Somalia and the IDP survey sites in Baidoa (a) Somalia (b) Baidoa Notes: Figure (a) shows the location of Baidoa in Somalia and with respect to the capital, Mogadishu. Shapefile from UNOCHA Somalia. Figure (b) shows the locations of IPD households at the second follow-up survey. Basemap from OpenStreetMap. 38 Figure A2: Study timeline Figure A3: Asset and TVET Choices (a) Asset transfers (b) TVET training Notes: The figures capture the reported choices of productive assets and Technical and Vocational Education and Training (TVET) in the treatment arm. 39 Figure A4: Out-of-bag CATE estimates on consumption from GRF algorithm (a) Kernel density function (b) Cumulative distribution function Notes: These figures report the out-of-bag conditional average treatment effects (CATE) from the generalized random forest (GRF) method following Athey et al. (2019). Figure A4a reports the kernel; density function, and Figure A4b reports the cumulative density function. 40 Figure A5: Scatter plots of out-of-bag CATE estimates and observable baseline characteris- tics Notes: This graph reports the correlation between the conditional average treatment effect (CATE) and the three baseline covariates identified as most predictive of variation in the conditional average treatment effect, as reported in Table A8. 41 Figure A6: Out-of-bag CATE estimates on consumption: Banerjee et al. (2015) (a) Model including full set of baseline covariates (b) Model including restricted set of baseline co- variates Notes: These figures report the out-of-bag conditional average treatment effects (CATE) from the generalized random forest method following Athey et al. (2019), estimated using the replication data from the Banerjee et al. (2015) trial for the outcome variable of consumption in the long-term follow-up. 42 Figure A7: Scatter plots of out-of-bag CATE estimates and observable baseline characteris- tics: Banerjee et al. (2015) (a) Model including full set of baseline covariates (b) Model including restricted set of baseline co- variates Notes: This graph reports the correlation between the conditional average treatment effect (CATE) and the three baseline covariates identified as most predictive of variation in the conditional average treatment effect, as reported in Table A8. 43 Figure A8: Baseline assets across samples Notes: This graph presents the kernel density of baseline asset value in both the Baidoa sample analyzed in this paper and the sample in Banerjee et al. (2015); the density figure is truncated at the 95th percentile of asset value in the combined sample. All value estimates are in 2017 purchasing power parity-adjusted dollars. 44 Table A2: Program exposure (1) (2) (3) (4) Cash transfers Asset transfer or TVET training Savings groups Coaching UPG 0.738*** 0.844*** 0.569*** 0.310*** (0.012) (0.008) (0.011) (0.015) Constant 0.143*** 0.024*** 0.056*** 0.182*** (0.010) (0.005) (0.007) (0.011) Observations 3982 3982 3982 3982 Notes: This table reports treatment effects on reported intervention receipt for the four main intervention elements: cash transfers, asset transfers or Technical and Vocational Education and Training (TVET), savings groups, and coaching. All regressions include strata fixed effects; asterisks indicate significance at the ten, five, and one percent level. 45 Table A3: Primary experimental effects: Additional baseline controls (1) (2) (3) (4) (5) Panel A: Consumption and food security Total consumption Food consumption Non-food consumption Moderate or severe HHS Livelihood coping score UPG 0.71*** 0.51*** 0.19*** -0.30*** -0.57*** ( 0.05) ( 0.04) ( 0.02) ( 0.02) ( 0.09) q-value 0.001*** 0.001*** 0.001*** 0.001*** 0.001*** Control mean 2.63 2.03 0.61 0.42 1.71 Observations 3964 3964 3964 3982 3982 Panel B: Assets and financial inclusion Asset value TLUs Any savings Any credit UPG 603.97*** 0.35*** 0.42*** -0.00 (21.34) ( 0.02) ( 0.01) ( 0.01) q-value 0.001*** 0.001*** 0.001*** 0.372 Control mean 263.32 0.14 0.04 0.18 Observations 3982 3982 3966 3982 Panel C: Income Any livestock Livestock income Any non-farm business Non-farm income Any wage Wage income UPG 0.15*** 34.11*** 0.05*** 7.99 -0.00 22.37 ( 0.01) ( 7.84) ( 0.01) (25.59) ( 0.02) (36.57) q-value 0.001*** 0.001*** 0.001*** 0.372 0.396 0.291 Control mean 0.02 -1.05 0.15 125.84 0.42 586.59 Observations 3982 3982 3982 3982 3982 3982 Panel D: Social cohesion and locus of control Social cohesion Locus of control UPG -0.04 0.10*** ( 0.03) ( 0.03) q-value 0.111 0.001*** Control mean 0.03 -0.07 Observations 3982 3982 Notes: The primary outcomes are per capita total consumption (col 1); moderate or severe household hunger scale (HHS) (col 4); and value of assets (col 1, Panel B). All regressions control for strata fixed effects and for the baseline outcome variable, if available (baseline values are available for the livelihood coping score, tropical livestock units, and any savings; there is no baseline variation in the household hunger score); a control variable for household size, selected by a double lasso, is also included. Robust standard errors are reported in parentheses and the endline control mean is reported. Statistical significance is indicated as follows: * p < 0.10, ** p < 0.05, *** p < 0.01. The reported q-values are p-values adjusted for multiple inference based on the false discovery rate correction procedure outlined in Anderson (2008). 46 Table A4: Minimum detectable effects Share of households moderate or severe HHS Per capita consumption in USD-PPP Asset value in USD-PPP N, Control 1,150 1,146 1,150 N, Treatment 2,832 2,818 2,832 Mean, Control 0.42 2.63 263 SD, Control 0.49 1.54 482 Adjusted SD, Control 0.49 1.53 481 MDES 0.05 0.15 47.15 MDES relative to mean (%) 11.50 5.70 17.90 MDES relative to SD (%) 9.80 9.80 9.80 Notes: HHS = Household hunger score, USD = United States dollar, PPP = Purchasing power parity, N = Number of observations, SD = Standard deviation, MDES = Minimum Detectable effect size. 47 Table A5: Additional outcomes (1) (2) (3) (4) (5) Variable Control mean: binary owned Control mean Treatment effect Std. error N Panel A: Assets Mobile phones 0.908 1.013 0.068*** 0.016 3954 Axe 0.742 0.802 -0.014 0.019 3872 Hammer 0.233 0.260 0.021 0.018 3752 Grain bag 0.182 0.596 0.329*** 0.064 3745 Panga 0.167 0.188 0.026 0.016 3761 Pick axe 0.159 0.181 0.030* 0.016 3776 Spade or shovel 0.138 0.143 0.098*** 0.014 3718 Plough (oxen-pulled) 0.132 0.183 0.074*** 0.020 3679 Hoe 0.130 0.160 0.048*** 0.017 3711 Hand mattock 0.125 0.157 0.075*** 0.019 3793 Goats 0.091 0.264 2.879*** 0.076 3912 Sickle 0.089 0.116 0.026* 0.016 3692 Donkey 0.076 0.076 0.020** 0.010 3763 Poultry 0.070 0.185 0.102*** 0.034 3899 Hand saw 0.068 0.100 0.011 0.015 3729 Tarpaulin 0.065 0.083 0.064*** 0.014 3702 Wheelbarrow 0.062 0.066 0.063*** 0.011 3751 Donkey cart 0.060 0.064 0.034*** 0.010 3758 Solar panels 0.060 0.060 0.042*** 0.009 3896 Sheep 0.057 0.093 0.073*** 0.018 3893 Panel B: Household composition Household size . 7.152 0.068** 0.032 3982 Ages 0–4 . 0.925 0.049* 0.026 3982 Ages 5–14 . 2.768 0.052 0.034 3982 Ages 15–19 . 0.964 -0.047* 0.024 3982 Ages 20–54 . 2.057 0.008 0.021 3982 Ages 55+ . 0.438 -0.011 0.008 3982 Notes: This table presents supplementary regression findings for ownership by asset category, in Panel A, and household composition, in Panel B. Column (1) reports the mean of a binary variable for any asset in that category owned, in the control arm. Asterisks indicate significance at the ten, five, and one percent level. 48 Table A6: Attrition (1) (2) (3) [1em] Treated household -0.062*** -0.062*** -0.061 (0.008) (0.008) (0.047) IDP household -0.009 -0.030 (0.008) (0.022) IDP X Treat 0.030 (0.023) Household size 0.000 0.004 (0.001) (0.004) Size X Treat -0.005 (0.004) Dependency ratio at baseline -0.002*** -0.005*** (0.001) (0.001) Dependency X Treat 0.005*** (0.001) Any pregnant or lactating woman 0.004 0.013 (0.006) (0.015) PLW X Treat -0.011 (0.016) Any disabled member 0.002 -0.006 (0.007) (0.018) Disability X Treat 0.011 (0.020) Any adult male -0.005 -0.021 (0.009) (0.025) Male X Treat 0.022 (0.026) Any cash savings -0.000 0.000 (0.000) (0.001) 49 Any savings X Treat -0.000 (0.001) Baseline asset value -0.001 -0.000 (0.001) (0.003) Asset value X Treat -0.001 (0.003) Tropical livestock units 0.019 0.017 (0.017) (0.037) TLUs X Treat 0.001 (0.041) Household Hunger Scale 0.006** 0.010 (0.003) (0.007) HHS X Treat -0.006 (0.007) Livelihoods Coping Strategies index -0.002** -0.006** (0.001) (0.003) LCS X Treat 0.005 (0.003) [1em] Observations 4116 4116 4116 Notes: Each column regresses a binary variable equal to one for households that attrite at the 2nd follow-up on a binary variable for treated; baseline characteristics; and the interaction between the two. Baseline asset value is expressed in hundreds of dollars. All regressions includes strata fixed effects; asterisks indicate significance at the ten, five, and one percent level. 50 Table A7: Treatment effects bounds correcting for attrition Panel A: Lee and Kling-Liebman bounds Lee Lee Kling-Liebman Kling-Liebman lower upper lower upper Total cons. .523 (.053) .97 (.056) .526 (.058) 1.09 (.058) Food cons. .366 (.042) .719 (.044) .37 (.046) .814 (.046) Non-food cons. .119 (.018) .263 (.019) .12 (.02) .311 (.02) Asset value 603.137 (21.252) 651.061 (21.558) 507.03 (21.938) 699.675 (21.938) LCS score -.948 (.079) -.57 (.085) -1.019 (.089) -.123 (.089) TLUs .204 (.017) .346 (.022) .247 (.023) .447 (.023) Panel B: Manski bounds Lower Upper Moderate ./ severe HHS -.377 (.015) -.228 (.015) Any savings .32 (.011) .456 (.013) Any credit .008 (.017) .163 (.017) Ag. / livestock inc. .058 (.01) .19 (.012) Non-farm bus. inc. -.043 (.012) .098 (.014) Wage inc. -.085 (.017) .07 (.017) Notes: This table reports bounds correcting for attrition for the outcomes of interest. For the continuous variables in Panel A, we report Lee bounds and Kling-Liebman bounds, allowing for attrited individuals to be characterized by outcomes two standard deviations above (below) the mean observed in the relevant treatment arm. For the binary variables in Panel B, we report Manski bounds. Standard errors are in parentheses. 51 Table A8: Baseline characteristics used in GRF analysis and variable importance Baseline characteristics Variable importance Household size 0.31 Dependency ratio 0.18 Baseline asset value 0.17 Tropical Livestock Units 0.06 Livelihoods Coping Score 0.05 Strata 3 0.04 Strata 2 0.03 Any pregnant or lactating woman 0.03 Household Hunger Score 0.02 Strata 1 0.02 Engaged in farming 0.02 Any disabled member 0.02 Any adult male 0.01 Receiving transfers 0.01 IDP household 0.01 Self-employed 0.01 Other income 0.01 Any primary education 0.01 Wage labor 0.01 Any savings 0 Asset income 0 Notes: This table reports the variables employed in the generalized random forest (GRF) algorithm and their estimated importance, defined as the frequency with which each observable characteristic is used as a splitting variable. The weights are re-scaled such that the total weights for baseline characteristics excluding binary variables for randomization strata add up to one. 52 Table A9: Baseline characteristics used in GRF analysis and variable importance: Banerjee et al. 2015 Characteristic Importance (Full) Importance (Restricted) Consumption per capita 0.1 NA Total asset value 0.1 0.24 Income from agriculture 0.09 NA Revenue from animals 0.09 NA Income from paid labor 0.08 NA Household size 0.07 0.2 Food security 0.06 0.2 Income from business 0.05 NA Perception of economic welfare 0.05 0.2 Any savings 0.04 0.16 Notes: This table reports the variables employed in the generalized random forest (GRF) algorithm and their estimated importance, defined as the frequency with which each observable characteristic is used as a splitting variable, in the analysis conducted using the replication data from Banerjee et al (2015). We report the analysis conducted for a full set of available baseline characteristics, and a restricted set of available baseline characteristics, corresponding to those also available in the Baidoa sample; only the top ten most predictive variables in the “full” analysis are reported in the table, and thus for this model, the reported weights do not sum to one. ”NA” denotes a variable not included in the restricted model. The weights are re-scaled such that the total weights for baseline characteristics excluding binary variables for randomization strata and country add up to one. 53 A3 Structured Ethics Appendix The structured ethics appendix is based on Asiedu et al. (2021). Policy Equipoise Is there policy equipoise? That is, is there uncertainty regarding participants’ net benefits from each arm of the study relative to the other arms and to the best possible policy to which participants could have access? If not, ethical randomization requires two conditions related to scarcity: (1) Was there scarcity, i.e., did the inclusion of multiple arms change the expected aggregate value of the programs delivered? (2) Do all ex-ante identifiable participants have equal moral or legal claims to the scarce programs? While evidence on the effectiveness of graduation programs in forced displacement settings remains limited, prior academic literature from other contexts suggests that receiving the graduation package —which includes cash grants and either a TVET opportunity or an asset transfer — is likely to yield greater benefits than not receiving it. Given this, and despite the scarcity of evidence in displacement settings, the study may not fully meet the condition of policy equipoise. However, the randomization is ethically justified under conditions of scarcity. First, the available resources were insufficient to provide the package to all eligible households in this context. Second, eligibility was determined using pre-specified household characteristics assessed ex ante: households were classified as experiencing moderate or severe hunger based on the Household Hunger Scale and had resided in the IDP site for at least one month. These criteria were applied consistently, and no ex-ante identifiable group with a stronger moral or legal claim to the program was excluded. Third, randomization did not shift the aggregate benefit of the intervention (provided to 5,000 households). On this basis, the study satisfies commonly accepted ethical conditions for randomization in settings where policy equipoise may be uncertain or absent. 54 Role of researchers with respect to implementation Are researchers “active” researchers, i.e. did the researchers have direct decision making power over whether and how to implement the program? If YES, what was the disclosure to participants and informed consent process for participation in the program? Providing IRB approval details may be sufficient but further clarification of any important issues should be discussed here. If NO, i.e., implementation was separate, explain the separation. The researchers did not play an “active” role in the implementation of this project. Implementation was carried out independently by World