IFPRI Discussion Paper 02356 

September 2025 

Displacement and Development 

Evidence from a Graduation Program for Somalia’s Ultra-Poor 

Jessica Leight 
Kalle Hirvonen 

Naureen Karachiwalla 
Deboleena Rakshit 

Poverty, Gender, and Inclusion Unit 


INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE 
The International Food Policy Research Institute (IFPRI), a CGIAR Research Center established in 1975, 
provides research-based policy solutions to sustainably reduce poverty and end hunger and malnutrition. 
IFPRI’s strategic research aims to foster a climate-resilient and sustainable food supply; promote healthy 
diets and nutrition for all; build inclusive and efficient markets, trade systems, and food industries; 
transform agricultural and rural economies; and strengthen institutions and governance. Gender is 
integrated in all the Institute’s work. Partnerships, communications, capacity strengthening, and data and 
knowledge management are essential components to translate IFPRI’s research from action to impact. 
The Institute’s regional and country programs play a critical role in responding to demand for food policy 
research and in delivering holistic support for country-led development. IFPRI collaborates with partners 
around the world.  

AUTHORS 
Jessica Leight (j.leight@cgiar.org) is a Senior Research Fellow in the Poverty, Gender, and Inclusion 
(PGI) Unit at the International Food Policy Research Institute (IFPRI), Washington, DC. 

Kalle Hirvonen (k.hirvonen@cgiar.org) is a Senior Research Fellow in IFPRI’s PGI Unit, Washington, 
DC. 

Naureen Karachiwalla (n.karachiwalla@cgiar.org) is a Research Fellow in IFPRI’s PGI Unit, Nairobi, 
Kenya. 

Deboleena Rakshit (d.rakshit@cgiar.og) is a Research Analyst in IFPRI’s PGI Unit, Washington, DC. 

Notices 
1IFPRI Discussion Papers contain preliminary material and research results and are circulated in order to stimulate discussion and 
critical comment. They have not been subject to a formal external review via IFPRI’s Publications Review Committee. Any opinions 
stated herein are those of the author(s) and are not necessarily representative of or endorsed by IFPRI.  
2 The boundaries and names shown and the designations used on the map(s) herein do not imply official endorsement or 
acceptance by the International Food Policy Research Institute (IFPRI) or its partners and contributors. 
3Copyright remains with the authors. The authors are free to proceed, without further IFPRI permission, to publish this paper, or any 
revised version of it, in outlets such as journals, books, and other publications.

mailto:j.leight@cgiar.org
mailto:k.hirvonen@cgiar.org
mailto:n.karachiwalla@cgiar.org
mailto:d.rakshit@cgiar.og


iii 

Abstract 

While the population of internally displaced people around the world continues to grow, evidence 
around strategies to sustainably enhance livelihoods among IDPs remains extremely limited. We 
present findings from a randomized trial of an ultra-poor graduation program targeting IDPs in 
urban Baidoa, Somalia; the intervention pro-vided cash transfers, an asset transfer or technical 
training program, and facilitated savings groups. Our findings suggest that two years following 
program launch, the intervention has led to significant increases in consumption, assets, and 
savings; however, these effects seem to be driven almost exclusively by increased livestock 
production. An exploration of heterogeneous effect using generalized random forest methods 
further suggests that the positive effects of the treatment are dramatically larger for smaller 
households characterized by lower dependency ratios. 

Keywords: Somalia, internally displaced people, ultra-poor graduation 


iv 

Acknowledgments 

Funding for this work was provided by the U.S. Agency for International Development (USAID), the 
CGIAR Initiative on Fragility, Conflict, and Migration as well as the CGIAR Policy Innovations 
Program. We thank the UPG team at World Vision and ACTED who have facilitated this ongoing 
collaboration with IFPRI, particularly Caitlin Whittemore, Andrew Mu-gobo, and Asrat Bekele Balcha. 
We have benefited from discussions with Asrat Bekele Balcha, Daniel O. Gilligan, Colton Parks, and 
Caitlin Whittemore. IRB approval for this study was granted by the International Food Policy Research 
Institute (IFPRI), protocol #00007490. The trial was registered with the AEA RCT Registry 
(AEARCTR-0009452). 


1 Introduction

The global population of forcibly displaced people has doubled over the past decade to reach

a record 120 million by 2024, and more than half of this population remain in their countries

of origin as internally displaced persons (IDPs) (UNHCR, 2023a, 2024b). This sharp increase

reflects the surging incidence of both violent conflict and climate-induced disasters (IDMC,

2024; Rustad, 2024; WMO, 2024). Many forcibly displaced people — 90% of whom reside

in low- and middle-income countries —- then remain trapped in protracted displacement

for years or even decades (UNHCR, 2024a), often living side by side with host communities

facing their own economic and social challenges (UNHCR, 2023b).

Aid in forced displacement contexts has traditionally focused on providing regular trans-

fers of cash, food, or vouchers to support household consumption (Aker, 2017; Hidrobo et

al., 2014; Altındağ and O’Connell, 2023), without addressing deeper, interconnected drivers

of poverty among the displaced, including lost assets, inadequate shelter, poor physical and

mental health, uncertain prospects for future residence, and overlapping market failures. Ef-

fectively targeting these multiple constraints requires a multifaceted intervention, suggesting

that graduation model programs — a sequenced set of interventions including consumption

support, training, access to savings or credit, and an asset transfer — may be a promising

strategy. There is strong evidence about the effects of these programs in reducing poverty

and increasing investments in more stable settings (Banerjee et al., 2015; Bandiera et al.,

2017), but very limited evidence in forced displacement contexts (Rozo and Grossman, 2025),

where complementary public services are often lacking and high levels of ongoing uncertainty

and sometimes violence could render households unwilling to invest.

This study presents findings from a randomized controlled trial in the city of Baidoa,

Somalia, evaluating an ultra-poor graduation (UPG) program targeted to IDPs. Baidoa

is one of Somalia’s largest displacement hubs, hosting more than 600,000 IDPs who have

fled protracted violence or droughts; broadly, in Somalia, over 70% of IDPs live below the

extreme poverty line (Pape, 2017). Within this context, UPG provided six months of cash

1


transfers, followed by a choice between an asset transfer –— such as livestock or agricultural

inputs — or enrollment in technical and vocational education training (TVET) to support

livelihoods development. Additional components included the formation of savings groups

and group-level coaching (focused on financial literacy, business skills, and social capital).

The trial included a sample of 4,116 households identified as eligible for the interven-

tion due to their baseline vulnerability to hunger and their residence in the targeted IDP

sites; given that the number of eligible households exceeded those who could be served given

resource constraints, households were randomly selected to enter the UPG program.1 Us-

ing baseline data from 2022 and two follow-up surveys conducted in 2023 and 2024 and

characterized by minimal attrition (less than 4% of households), we exploit the randomized

design to estimate the program’s causal impact on consumption, food security and livelihood

outcomes.

Our primary findings suggest that two years following its launch, the UPG intervention

had substantial effects on a range of consumption and livelihoods activities, including a 30%

increase in consumption (consistent across both food and non-food consumption), a 300%

increase in the value of assets (driven almost entirely by goats), and a nearly 50 percentage

point increase in the probability of reporting any savings. Treated households are more likely

to report income from a range of sources, though the largest increase is observed in livestock.

There are also positive effects on respondent locus of control, and no adverse effects on local

social cohesion despite the household-level randomization. Longitudinal data suggests that

positive effects were evident even one year post-launch, and generally widened over the

second year of implementation. Attrition in this trial was extremely low but concentrated in

the control arm: accordingly, we also construct bounds on the treatment effects of interest

following Lee (2005) and Kling et al. (2007) and find the estimated effects are generally robust

to both Lee trimming and allowing attritors to deviate from the treatment arm-specific mean

1There were 6,323 eligible households and only 5,000 households could be served by the intervention; a
subset of these intervention households entered the trial. Appendix A3 provides a detailed discussion of the
ethical aspects of this trial, based on the structured ethics appendix suggested by Asiedu et al. (2021).

2


by up to two standard deviations.

We also explore heterogeneity in the estimated treatment effects following the generalized

random forest method proposed by Athey et al. (2019). We find that there is substantial

heterogeneity in the estimated conditional average treatment effects, primarily predicted by

variation in household composition at baseline (the average household size is nearly seven,

with a standard deviation of over two). Households that are smaller, and characterized

by a lower ratio of dependents to prime aged adults, show significantly larger treatment

effects. In fact, households with five or fewer members and a dependency ratio in the

lowest quartile exhibit a gain in consumption that is around 30% larger than the average

treatment effect, suggesting that households with more care responsibilities may not be able

to effectively take advantage of new livelihoods opportunities. We further demonstrate that

a comparable pattern of heterogeneity is not observed in replication data from Banerjee et

al. (2015), an evaluation of a graduation model intervention implemented in six stable, non-

displacement sites, suggesting that this pattern may be somewhat distinct to displacement

settings, requiring consideration in program design in such settings.

Our paper contributes to the growing literature evaluating the effectiveness of graduation

model programs (Banerjee et al., 2015, 2022, 2021; Bandiera et al., 2017; Balboni et al., 2022).

While most previous studies have been conducted in stable, rural contexts, ours is among

the first to evaluate a graduation program in a setting marked by forced displacement. Table

A1 provides a overview of this literature and key characteristics of the setting, empirical de-

sign, and interventions, highlighting the absence of evidence from conflict-affected contexts.

Moreover, whereas previous research primarily evaluates the performance of these programs

in rural areas, this trial contributes evidence from an urban setting. This trial, together with

a study from post-conflict Afghanistan (Bedoya et al., 2019), indicate that graduation mod-

els may be particularly effective in fragile or humanitarian settings, demonstrating relatively

large impacts on household economic outcomes.

This pattern of relatively large impacts aligns with findings from a parallel literature

3


on economic interventions targeting forcibly displaced populations. Despite the scale of

global displacement, there remains a striking lack of rigorous evidence on the effectiveness of

livelihood and poverty alleviation programs for these populations—particularly IDPs (Rozo

and Grossman, 2025; Schuettler and Caron, 2020). Our study provides one of the first

experimental evaluations of a multifaceted livelihoods program for IDPs. Related studies

focusing on cash and cash+ interventions in refugee settings also report promising results.

In Uganda, Gupta et al. (2024) conduct a randomized trial evaluating the economic impacts

of a one-off unconditional $1,000 USD transfer directed at South Sudanese refugees residing

in refugee camps, showing notable improvements in household consumption (11% relative to

control group), asset accumulation (30%), and business revenue (64%) after 18 months. Also

in Uganda, Baseler et al. (2024) experimentally evaluate a program combining $540 USD cash

grants with mentorship for young urban microentrepreneurs, including both refugees and

host community members. They find that the cash grant alone significantly boosts business

profits and household earnings one year after implementation, but the mentorship component

provides no additional benefits. Finally, in Kenya, MacPherson and Sterck (2021) use a

regression discontinuity approach to compare a traditional humanitarian model of refugee

assistance with a development-oriented approach, finding improvements in nutrition and food

security after 16 months, likely driven by greater participation in small-scale agriculture.2

2 Context and Intervention

2.1 Context

Baidoa is a city in southwestern Somalia, serving as the capital of the South West State and

located approximately 250 kilometers from the capital Mogadishu (see Figure A1a). It is

one of Somalia’s largest urban centers and an important hub for agriculture and livestock

2Another closely related strand of work focuses on assessing the impact of programs and legislation that
facilitate refugee integration into local labor markets: see for example Sarvimäki and Hämäläinen (2016);
Battisti et al. (2019); Fasani et al. (2021); Hussam et al. (2022).

4


trade (UN-Habitat, 2021). While population estimates vary widely due to the lack of re-

cent census data and the large-scale influx of IDPs over the past decade, imputed estimates

based on a 2014 survey suggest a population of around 300,000 in that year (UN-Habitat,

2021). However, ongoing conflict and recurring droughts have led to large-scale displace-

ment, extensively increasing the city’s population. In 2022, the number IDPs in Baidoa was

estimated to be nearly 600,000 (UNHCR, 2022), up from 169,000 in 2017 (UNHCR, 2017)

and constituting one of the largest IDP populations within Somalia. Though a number of

humanitarian and development organizations operate in the city, security remains fragile,

with Al-Shabaab, a designated terrorist group, maintaining influence in surrounding rural

areas (UN-Habitat, 2021) from which households are often displaced.

2.2 Intervention

The project evaluated here was the Ultra-poor Graduation (UPG) Program, implemented

by World Vision over three years and funded by the U.S. Agency for International Develop-

ment (USAID) Bureau for Humanitarian Assistance (BHA).3 UPG supported ultra-poor and

vulnerable households in graduating from extreme poverty and moving toward self-reliance,

targeting primarily IDPs as well as a small number of households from vulnerable host com-

munities, returnees, and refugees.4 Figure A1b maps the Baidoa IDP sites targeted for this

program and thus included in the evaluation.

Launched in June 2022 and running until December 2024, the intervention consisted

of four main components. First, UPG households received six monthly unconditional cash

transfers of $42.50 to provide consumption support. Second, they participated in savings

groups designed to encourage savings; these regular meetings also served as a platform for

training on topics such as financial literacy and business management. Third, households

3The formal title was Building Pathways Out of Poverty for Ultra-Poor IDPs and Vulnerable Host Com-
munities in Baidoa.

4Returnees are Somali households who had been refugees in other countries and then returned; Baidoa
is not necessarily their home city, however. The very small number of international refugees in Somalia are
generally from Ethiopia and Yemen.

5


received either a one-time asset transfer or funding to enroll in a six-month technical training

course, based on their preference. Households opting for the asset transfer could choose from

goats, chickens, sheep, cows, crop seeds, or tools; the technical training was provided through

a local institute. Fourth, participants attended regular group-based coaching sessions focused

on life skills and social integration. A summary of the intervention components compared

to other similar interventions is provided in Table A1.

Program eligibility was determined based on household characteristics assessed in an

initial vulnerability assessment, and eligible households had to meet two criteria: they were

classified as experiencing moderate or severe hunger according to the Household Hunger

Scale (Ballard et al., 2011), and they had resided in the IDP site for at least one month.5

The initial assessment identified 6,323 eligible households.

3 Methods

3.1 Experimental Design

We employ a randomized controlled trial using randomization at the household level, given

that UPG by design had resources to serve only 5,000 households among a larger number

of eligible households. The control arm was expected to include 1,500 households, but was

reduced to 1,323 households given the number of eligible households to ensure that the

intervention target of 5,000 was met. Randomization was conducted by the research team

in Stata prior to the baseline survey using data from the initial vulnerability assessment.6

5These criteria were identified as important for program eligibility by the program team and validated
through focus group discussions with participants and community members. Households experiencing hunger
were prioritized due to their higher level of need, while requiring at least one month of residency ensured
greater stability, increasing the likelihood that participants would remain at the site.

6Randomization was stratified by four groups based on two binary variables: a binary variable for a
household being above or below the median of an asset index constructed using vulnerability assessment
data, and a binary variable equal to one if the household had resided in the IDP site for more than a year.
The initial design also conducted random assignment to two treatment arms, to facilitate an analysis of
two alternate household coaching strategies. However, ultimately only one coaching strategy — group-based
coaching — was utilized, and thus the two treatment arms are pooled in analysis.

6


3.2 Ethics

Ethical approval for this study was granted by the International Food Policy Research Insti-

tute (IFPRI) Institutional Review Board (IRB), under protocol #00007490. Further ethical

considerations, including policy equipoise, risks, and informed consent, are detailed in the

structured ethics appendix (Appendix A3), following Asiedu et al. (2021).

3.3 Surveys

The baseline survey was conducted between May 18 and June 12, 2022 and was targeted

to include 3,000 treatment households and all 1,323 control households; the remaining 2,000

treatment households were not targeted for inclusion in the survey or evaluation.

The realized sample included 4,116 households (2,872 treatment and 1,244 control).7 The

first follow-up survey was conducted 14 months post-baseline and one year after the launch of

the UPG intervention and resurveyed 4,089 (99.3%) sample households. The second follow-

up survey was conducted 28 months post-baseline and two years following UPG launch,

and resurveyed 3,982 (96.7%) sample households. All interviews were conducted in person.

Due to security risks and the limited survey capacity in the study area, all surveys were

carried out by the implementing partner, World Vision, which was responsible for hiring and

training the enumerators. As is common in insecure forced displacement settings (Pape and

Mistiaen, 2020; Pape and Verme, 2023), interview durations had to be restricted (with a

target of around 60 minutes per household), limiting the scope of the questionnaires. Figure

A2 in the Appendix summarizes the study timeline.

7The sampling frame fielded included 2,980 treatment households and 1,323 control households (the latter
comprised of all households in the target IDP sites not served by the intervention); the sampling frame of
treatment households itself shrank slightly from the original planned 3,000 given that 20 households in the
sample list were duplicates. Within the target sample, both treatment and control, 187 households were
not interviewed because they either could not be reached during the designated survey period or declined to
participate.

7


3.4 Outcomes

Table A10 in the Appendix provides definitions of all outcome variables analyzed, as prespec-

ified in a registered analysis plan.8 The primary outcomes include the share of households

characterized by moderate or severe hunger over the last 30 days based on the Household

Hunger Scale; household per capita consumption in 2017 Purchasing Power Parity (PPP)

dollars; and the estimated value of household assets, also valued in 2017 PPP dollars.9 Ta-

ble A4 reports the estimated minimum detectable effect sizes for the primary outcomes based

on the mean and standard deviations observed in the main follow-up survey. The experi-

ment is adequately powered to detect even relatively small treatment effects: a 0.05 change

in the share of households with moderate or severe hunger (11.5% of the control mean), a

0.15 $PPP change in per capita consumption (5.7% of the control mean), and a 35 $PPP

change in the total value of assets (14% of the control mean).

Secondary outcomes include additional, more detailed measures of assets and financial

inclusion; income data capturing whether households report income from any one of six

sources, and the amount of income; the livelihood coping strategies score, capturing strategies

household may need to use to adapt to shortages of food or money; and measures of social

cohesion and locus of control. The index of social cohesion was constructed using a series of

questions about the individual’s perception of the broader community (Humble et al., 2023;

Catholic Relief Services, 2019); and locus of control was measured following Rotter (1966)

and Malacarne (2024).

8The trial registration number is AEARCTR-0009452.
9Consumption was measured using modules adapted from the 2017 Somali High-Frequency Survey (Pape

(2017)) conducted by the World Bank, but shortened to manage interview length; additional details on the
construction of the consumption measure are provided in Appendix A4.2. The 2017 base year is chosen
because PPP at the time of the baseline survey it was the latest available International Comparison Pro-
gram (ICP) benchmark year. PP Conversion factors are most reliable in benchmark years when the ICP
conducts comprehensive price surveys (e.g., 2011, 2017, 2021). PPPs for non-benchmark years are typically
interpolated using domestic and U.S. CPIs and may be subject to greater uncertainty.

8


3.5 Econometric Specification

We estimate the following specification:

yi,t = α + β Treatmenti + γ yi,0 + λi + ϵi,t,

where yi,t is an outcome for household i in year t, Treatmenti is an indicator for assignment

to the UPG program, yi,0 is the baseline measure of the outcome (if available), and λi are

fixed effects for randomization strata. For outcomes that were not measured at baseline, we

estimate the same model excluding yi,0.
10

Since assignment to treatment was not clustered, standard errors are adjusted for het-

eroskedasticity following White (1980). We also report q-values corrected for multiple hy-

pothesis testing (MHT) following Anderson (2008). We conduct MHT corrections within

the set of primary outcomes, and within the set of each family of secondary outcomes. We

also report average standard treatment effects for broader outcome families that pool across

primary and secondary outcomes (consumption and food security, assets and savings, and

income), following Kling et al. (2007). Although our prespecified main specification does not

include additional baseline covariates, we also report an alternative specification that uses

double lasso for covariate selection (Belloni et al., 2013; Cilliers et al., 2024).

4 Empirical Findings

4.1 Baseline Characteristics and Balance

To characterize the sample, Table 1 summarizes key demographic characteristics at base-

line and reports balance across the control and treatment arms. Eighty-three percent of

households are IDPs (with the remainder including 7% refugees, 1% returnees, and 9% host

10Baseline values are reported only for the Livelihoods Coping Scale, tropical livestock units, and any
savings; baseline HHS was also measured, but there is no variation in this measure at baseline since all
eligible households were characterized by moderate or severe hunger.

 9


community members), and the average household includes nearly seven members, of whom

four are children. Sixty percent of households report having a pregnant or lactating woman,

and 20% report the presence of an individual with a disability. Unsurprisingly, sampled

households were characterized by a high level of food insecurity and a high level of depri-

vation at baseline: the average HHS score was nearly four, consistent with the eligibility

criteria of moderate to severe hunger; less than 1% of households reported any savings, and

households owned around .2 tropical livestock units on average. Total baseline asset value

was estimated to be at around $350.)11

Out of the 11 t-tests comparing baseline characteristics across the two study arms, only

one shows a statistically significant difference: household size is modestly larger in the con-

trol group, by approximately 0.3 members (or 5%), and this difference is highly statistically

significant. When we estimate a joint test of balance across covariates, the p-value corre-

sponding to the null hypothesis of no significant imbalance is 0.053 when using conventional

p-values or 0.079 using randomization inference following Kerwin et al. (2024); we will ex-

plore further in the robustness checks below the possibility of any bias in the estimated

treatment effects due to the imbalance detected in baseline household size.

4.2 Implementation Fidelity

Table A2 summarizes findings around implementation fidelity that suggest that in general,

the program was carefully implemented in line with the randomized design. 88% of treated

households reported receiving cash transfers from a non-governmental organization in the

past three years, compared to 14% of control households (who may plausibly also be reporting

transfers received from other NGO programs in the same recall period). Those who do report

receiving transfers reported six transfers of $42 each, for a total transfer of around $250.

Similarly, 87% of households assigned to treatment reported receipt of either assets or

TVET training over the past three years, compared to only 3% of control households. The

11Note that as asset price data was not collected at baseline, assets are valued at prices collected in the
second follow-up survey, using the median price measured within the sample.

10


assets track seems to have been slightly more popular than the TVET track, based on

households’ self-reports: 41% report receipt of assets, and 30% report receipt of TVET (9%

report receipt of both, an allocation that was not generally allowed under program guidelines

and suggests that households may be mis-identifying another service as one provided by

UPG). For households reporting assets, goats were the dominant choice (reported by nearly

80% of households) as evident in Figure A3a, while for households reporting training, the

most popular courses were tailoring, tie dying, and beauty salon services (Figure A3b).

Reported participation in savings groups and coaching is, however, somewhat lower: 63% of

treatment households report participation in savings groups and 49% in coaching, compared

to minimal participation in the control arm.

We also separately assessed whether there were any cross-household spillovers from house-

holds in the treatment to the control arm, and observed these were very rare. Fewer than 2%

of control households reported receiving cash remittances from any other household (whether

a program beneficiary or not). Fewer than 20% of treatment households reported that they

had transferred or loaned the asset they received to any other household. While informa-

tional spillovers may have been more common (a majority of treatment households stated

they shared information received in training, though this is only vaguely defined), other

direct forms of spillovers seem to be infrequent.

4.3 Primary Findings

The primary treatment effects are reported in Table 2; Panel A reports effects on con-

sumption and food security, Panel B reports effects on assets and financial inclusion, Panel

C reports effects on income, and Panel D reports effects on social cohesion and locus of

control.12 The findings in Panel A suggest the intervention led to large positive shifts in

consumption, with an increase in per capita consumption of around 30% ($0.81) in absolute

terms: the relative magnitude of this effect is consistent comparing across both food and

12Again, the prespecified primary outcomes are per capita consumption, household hunger scale status,
and asset value, or Columns (1) and (4) of Panel A and Column (1) of Panel B.

11


non-food consumption. There is also a dramatic decline in the probability of households be-

ing characterized by moderate or severe food insecurity according to the household hunger

scale: recall that at baseline, all households were identified as eligible based on this criteria.

Two years later, 42% of households in the control arm continue to experience this high level

of food insecurity, while this has declined to only 12% in the treatment arm. The livelihoods

coping score also shows a decline of about a third, consistent with the previous findings

and suggesting that many households exposed to UPG are not having to resort to adverse

measures (such as selling assets, borrowing, or withdrawing children from school) to obtain

food.

Panel B documents effects on assets and financial inclusion. The average asset value in

dollars has roughly tripled in the treatment arm (reaching nearly $900, compared to under

$300 in the control arm) and this is substantially driven by livestock, where the average

number of tropical livestock units (TLUs) increases by nearly fourfold to around .49 TLUs.

(In practice, this corresponds to a gain of roughly three goats at the second follow-up; given

reported market prices, this is an increase in asset value of nearly $500.) Panel A of Table

A5 in the Appendix reports effects on the count estimates for a whole range of assets, and

it is evident that there are significant treatment effects on a large number of asset categories

(mobile phones, various productive tools, and other livestock). These estimates are generally

quite small in absolute terms (none exceeds .5 other than the estimate for goats, and most are

under .1), though in multiple categories the increase is proportionately large: i.e, treatment

households increase their reported inventory of spades, tarpaulins, solar panels, and sheep

by around 70%; the number of donkey carts and poultry owned increases by around 50%;

and there is a sixfold increase in the number of sewing machines.

Returning to the main table, Column (3) in Panel B in Table 2 suggests there are also

substantial effects on savings, as virtually no households (4%) in the control arm report cash

savings, compared to nearly half of treatment households. The effects on credit access, while

still positive, are less dramatic in magnitude (eight percentage points relative to a mean of

12


57%).

Panel C reports effects on income. There is a large effect on income from livestock: the

probability of any reported income from cropping or livestock production increases by 16

percentage points relative to a mean of only 5% in the control arm, and treatment households

report an average of $34 of income from cropping and livestock in the reference period of

one month, relative to only $6 for control households. The effects for non-farm businesses

are, however, modest: there is an increase in the probability of having a non-farm business is

five percentage points relative to a mean in the control arm of 15 percentage points, but the

increase in the continuous measure of income is extremely small (2% of the control mean)

and statistically insignificant. There is similarly no overall shift in the probability of wage

income, though a further decomposition shows that there is some shift away from informal

to formal wage labor.

Panel D then reports two variables capturing shifts in social cohesion and locus of control.

Because the intervention was individually randomized in an urban setting, there was an in-

creased risk of adverse effects on social cohesion—particularly if some households were aware

that others were receiving support while they were not. However, there is no evidence of this

phenomenon here; we cannot reject the null that the treatment affected social cohesion. The

estimated treatment effect on locus of control is notably positive and significant, suggesting

some shifts in the psychological outlook of households linked to their enhanced economic

status. [[[Naureen can add evidence here from other studies that find similar effects.]]]

To capture some key effects graphically, Figure 1 shows group means and treatment effect

estimates for the three primary outcomes as well as tropical livestock units across the two

follow-up surveys, allowing us to track how impacts evolved over time. Panel A shows a

steady decline in moderate or severe hunger, with the largest gains in the first year. House-

holds in the control arm saw some early gains but little shift thereafter, leading to a widening

gap between the two groups. Panel B presents total per capita consumption (not measured

at baseline), which shows modest gains for the treatment group by the first follow-up and a

13


much larger gap in the following year. Growth in asset value can be assessed by employing

asset prices as measured in the second follow-up survey: while control households saw flat

or declining asset values, treated households accumulated assets steadily, and similarly for

tropical livestock units.

We report average standard treatment effects following Kling et al. (2007) to facilitate

interpretation of general treatment effects across categories: consumption and food security,

assets and financial inclusion, income, and social cohesion and locus of control.13 These

findings are reported in Table 3 and highlight the wide variation in effect sizes across outcome

families. The positive effect on consumption and food security is fairly large at around .28

standard deviations, but the effect on income is only around .05 standard deviations and

statistically insignificant; both are dwarfed by the positive effect on asset and financial

inclusion, nearly 1.5 standard deviations. The effects on attitudinal variables are around

0.05 standard deviations, but insignificant.

We also report two additional exploratory analyses: the first is treatment effects on

household size and composition. Although this analysis was not pre-specified, we consider it

important given the evidence of baseline imbalance in household size, and because differential

shifts in household composition have been documented in prior empirical evaluations of cash

transfer programs for displaced households (Özler et al., 2021). As shown in Panel B of

Table A5 in the Appendix, there is a small treatment effect for overall household size of

.068 members, corresponding to less than a one percent increase in household size relative

to the mean in the control arm at follow-up: the only (weakly) statistically significant shifts

were in the number of young children (0-4 years of age) and adolescents (15-19 years of age).

There was no significant change in the number of prime-age or older adults. Overall, these

results suggest that the intervention did not meaningfully alter household composition, and

it is unlikely that selective household entry or exit accounts for the main treatment effects.

Given the evidence of baseline imbalance in household composition, we also re-estimate

13The variable capturing any moderate or severe food insecurity is reverse-coded in this analysis.

14


our main specification using a double lasso routine to select baseline covariates as controls:

the lasso uniformly selects a single variable, household size. (Baseline values of the outcome

variable are still uniformly included as control variables, when available.)14 Table A3 reports

these specifications. While some treatment effects are slightly smaller (e.g., the coefficient on

consumption), in general the differences are extremely minor. There is very little evidence

that baseline imbalance led to any bias in the primary estimated effects.

4.4 Attrition

As previously noted, attrition was on average strikingly low in this trial, particularly for

an IDP sample: fewer than 4% of households were lost to follow-up. However, despite this

low rate, there is a meaningful difference across treatment arms as reported in Table A6:

in Column (1), we observe that the rate of attrition is six percentage points lower among

treatment households (1.4%, compared to 7.6% among control households). In Columns

(2) and (3) of the same table, we then regress attrition on a set of baseline covariates

(the same covariates previously reported in the balance table) and the interaction of these

covariates with treatment. Only a few baseline characteristics predict attrition: households

characterized by a higher HHS score at baseline (more intense hunger) are more likely to

attrit, while households characterized by a higher LCS score (indicative of more intense

use of coping strategies) and a higher dependency ratio are less likely to attrit. However,

the interaction effects between baseline covariates and the treatment dummy are generally

insignificant, implying that characteristics of the attrited do not appear to differ across arms,

again with the exception of the dependency ratio: households with a higher dependency ratio

are less likely to attrit in the control arm, but this relationship is zero in the treatment arm.

Nonetheless, to further explore the potential of any bias due to attrition, we also esti-

mate bounds on the primary treatment effects using various strategies. For the continuous

variables of interest, we first measure the attrition gap and construct bounds following Lee

14The fact that this routine selects only a single variable is by no means unusual, as extensively documented
in Cilliers et al. (2014).

15


(2005): we estimate the difference in the proportion of non-missing observations between the

treated and control groups and then create two counterfactual treated samples. To estimate

the lower bound, we drop the treated units characterized by the highest outcome values until

the attrition gap is exhausted, and to estimate the upper bound, we drop those characterized

by the lowest outcome values. As an alternate strategy, we also follow Kling et al. (2007) to

generate bounds assuming that attrited units in the treatment or control arm are character-

ized by outcomes N standard deviations above or below the treatment group specific mean,

where N in this case is set to two: the upper bound is estimated by setting attrited units in

the treatment arm to two standard deviations above the treatment arm mean and attrited

units in the control arm to be two standard deviations below the control arm mean, and vice

versa for the lower bound. This allows for wide disparities in outcomes comparing attrited

and non-attrited individuals. For binary variables, we estimate simple Manski bounds.

The findings are presented in Table A7 and are generally consistent with our primary

results. In Panel A, we can observe that the estimated treatment effects on consumption,

assets, and the livelihoods coping score can be bounded away from zero, and with relatively

tight variation in the estimated magnitude: i.e., the magnitude of the estimated effect for

consumption ranges between 20% and 40%. In Panel B, we can see that even quite conser-

vative Manski bounds allow us to reject the hypothesis of a null effect on moderate or severe

hunger status, any savings, and any credit; we can also reject the hypothesis of a null effect

for any agricultural or livestock income, though for the other income variables, the bounds

cross zero. (Similarly, the estimated bounds cross zero for the continuous income variables,

where the primary estimates were all somewhat noisy likely due to the large number of ze-

roes, and bounds also cross zero for locus of control; for concision, these are not reported in

the table.)

16


4.5 Heterogeneous Effects

The previous graduation literature has generally used quantile regression to explore varia-

tion in treatment effects (Banerjee et al., 2015; Bandiera et al., 2017; Bedoya et al., 2019).

However, this strategy does not identify the underlying drivers of heterogeneity, and sim-

pler linear analyses focusing on specific baseline characteristics as predictors of heterogeneity

(e.g., wealth, education), have so far provided limited evidence of consistent or interpretable

moderators (Bedoya et al., 2019; Bossuroy et al., 2022). To address this, we use relatively

recent machine learning methods to explore treatment effect heterogeneity using a gener-

alized random forest (GRF) (Athey et al., 2019), allowing for a data-driven exploration of

heterogeneity across a rich set of baseline covariates.

The GRF algorithm builds a causal random forest (CRF) that allows for the estimation of

conditional average treatment effects, conditional on observable baseline characteristics. This

method is arguably well suited to our trial, characterized by a large sample and individual-

level randomization —- both important conditions for the effective application of causal

forest methods (Wager and Athey, 2018; Davis and Heller, 2017).15

The first step is simply to assess how much heterogeneity in treatment effects is evident,

focusing on per capita consumption as the primary outcome variable of interest. We estimate

what is known as the “out-of-bag” conditional average treatment effect (CATE) — in which

the treatment effect for each observation is predicted using only the trees for which that

observation was not used in the training set — and present the distribution in Figure A4a.

It is evident that there is substantial mass for an effect on consumption between around $0.6

and $0.85 (relative to the estimated mean effect of $0.81), but there is also a right tail of

consumption effects of more than a dollar. The cumulative distribution function shown in

Figure A4b suggests that the top quartile of consumption effects is above $0.77.

We then probe the variable importance estimated by the GRF algorithm: this captures

15We use the GRF algorithm in R, and draw on the useful replication code and discussion of applications
of GRF in an RCT context provided in Sylvia et al. (2021).

17


the percentage of importance for each baseline covariate in the forest, as measured by the

frequency with which this variable is used as a splitting variable. The findings reported in

Table A8 in the Appendix suggest that the most important variables predicting heterogene-

ity are both linked to household composition: the number of members in the household, and

the dependency ratio (defined as the ratio of the number of children under 14 and elderly

over 55 to the number of adults and adolescents). Other, weaker, predictors of heterogeneity

include baseline asset value and tropical livestock units. Figure A5 then captures how the

estimated out-of-bag CATEs vary with respect to the three most predictive characteristics.

The first two scatter plots capture a notable pattern of treatment effect heterogeneity in

which the largest effects are observed for smaller households (four or fewer members) and

those characterized by lower dependency ratios (under around two dependents per produc-

tive adult). There is also some weak evidence of heterogeneity in which households that

were worse-off at baseline as measured by asset value are characterized by somewhat larger

treatment effects, but this effect is relatively flat.

To then test heterogeneity using a simpler specification, we follow Sylvia et al. (2021) and

estimate a standard heterogeneous effects regression using these key indicators of interest.

Given the graphical pattern suggesting that larger treatment effects are concentrated in the

left tail of household size and dependency ratio, we generate binary variables equal to one if

the household is characterized by household size, dependency ratio, or baseline asset value

below the 25th percentile at baseline and estimate heterogeneous effects, reported in Table

4. We can observe that households characterized by a smaller size (under five members)

and a lower dependency ratio (under 2.5) show dramatically larger treatment effects, but for

baseline asset value, the heterogeneity is lower in magnitude and not statistically significant.

Column (4) presents the joint specification including all three variables and both household

size and the dependency ratio remain strongly significant: a household characterized by both

small size and a low dependency ratio would have a predicted treatment effect that is more

than double a household characterized by large size and a high dependency ratio.

18


Overall, this analysis suggests that in predicting treatment effects of this intervention,

demographics is truly destiny — and dominates other observable characteristics at baseline.

Importantly, the transfers and other material support provided by UPG were fixed in size at

the household level and did not scale with respect to family size; this is a common feature of

graduation model interventions, as evident (for example) in the BRAC program handbook

or in the detailed program descriptions provided in Banerjee et al. (2015).16 That being said,

the magnitude of the treatment effect heterogeneity; the fact that it is observed both for

overall household size and for the dependency ratio; and the fact that this heterogeneity is

observed around a year following the conclusion of transfers suggests that this is not solely

a mechanical effect of greater intervention intensity, and may also reflect the fact that large

households with many dependents were less able to exploit new livelihoods opportunities.

To further explore the potential generalizability of this pattern of heterogeneity, we also

conduct an exploratory analysis of the data from the original six-site graduation trial re-

ported in Banerjee et al. (2015) using the same generalized random forest method; we focus

primarily on the question of what baseline variables predict heterogeneity in the treatment

effect for per capita consumption in the longer-term (three-year) follow-up.17 Our goal is

to understand whether the pattern in which household size and demographics dominate in

predicting conditional average treatment effects in our sample of Baidoa is consistent in a

sample drawn from stable, non-displacement, rural contexts.

We first conduct the GRF analysis using a large set of covariates available at baseline,

including a number of variables (baseline consumption and income) that were not collected

at baseline in the Baidoa trial; we then restrict to the set of covariates that are also avail-

able in the Baidoa baseline data, and report findings from the “full” and “reduced” model,

respectively. The findings suggest that there is also meaningful heterogeneity in the esti-

mated treatment effects in the Banerjee et al. trial: the simple intent-to-treat estimate on

16Only one site had a program feature that was adjusted for household size, consumption support as
provided in Ghana.

17We follow the primary specification described in the paper, using both binary variables for country and
randomization strata and clustering at the randomization unit level.

19


long-term consumption is $3.36 per person per month, but the estimated CATEs range from

$1 to nearly $6 in 2014 PPP terms, as captured in Figure A6 presenting presents the den-

sity functions of the conditional average treatment effect for the full and restricted model,

respectively.

Table A9 then documents patterns of variable importance in the two models, focusing

on the ten most important variables in each and sorted in accordance with their rank in the

full model.18 In the full model, baseline consumption, asset value, and income from revenue

and agriculture are most predictive of variation in the conditional average treatment effect

(weights of roughly 0.1 each); and in the restricted model, asset value remains the most

predictive (0.24), with similar predictive power then from household size, food security, and

perceived economic welfare (weights of 0.2 each).19 The observed patterns of heterogeneity

in the CATE with respect to these baseline characteristics seem somewhat nonlinear, as sum-

marized in Figure A7, but for household size, a similar (though flatter) pattern is observed

in the restricted model in which smaller households are characterized by larger treatment

effects.20 Overall, though, household size is not a dominant predictor of variation in the

CATE; it has some predictive power, baseline economic characteristics seem to be somewhat

more predictive.

This exploration suggests that the pattern observed in our data in which demographic

characteristics are dramatically more predictive of variation in the conditional average treat-

ment effect vis-a-vis baseline economic variables is distinct from the pattern observed in

Banerjee et al. (2015) for a graduation program in a non-displacement settings. There are

several potential explanations for this that are not mutually exclusive. First, household size

may be uniquely important in a displacement setting, where households are large and have

18Binary variables corresponding to country and randomization strata are included in the random forest
method, but parallel to the Baidoa analysis, the variable importance weights are re-normalized to exclude
the weights estimated on randomization strata. “NA” in the final column denotes that that variable is not
included in the restricted model. The weights in the full model do not sum to one because other variables
outside the top ten are predictive.

19Data on household composition that would allow us to identify the number of dependents is not reported.
20The top 5% of outliers for each explanatory variable (consumption, assets, etc.) are truncated from the

graph for clarity.

20


agglomerated or altered in composition either by necessity or as a coping mechanism in the

face of shocks. Average household size at baseline in our sample is 6.7, not dramatically

different from the mean (5.8) in the sample in Banerjee et al. (2015), though the dependency

ratio is somewhat high: only 2.6 of these members, on average, are between the ages of 15

and 55.

Dependency ratios even in sub-Saharan Africa are typically below 100% (Cleland and

Machiyama, 2017), though that is often using a different minimum age for elderly (65), and

constructed using macro-level data that may not be directly comparable. In the Somali

context, restrictive gender norms may pose a further challenge by limiting the economic

engagement of women even of prime working age.

A second hypothesis is that baseline economic status (level of assets, food security) has

little predictive power in a displacement context in which households have experienced a

series of adverse shocks and thus where their current ownership of assets may be effectively

random. Interestingly, the probability of owning any assets is estimated to be much higher

in this sample compared to the Banerjee et al. sample (98% versus 42%), though the mean

level of assets is higher in the Banerjee et al. sample due to higher asset ownership at the

75th percentile and above; Figure A8 summarizes kernel densities in both samples, with the

important caveat that the specific questionnaire modules used to measure assets are not the

same.21 Accordingly, it is not the case that assets are non-predictive of treatment effects

because no household in the Baidoa sample owns any assets; but it may be there is simply

little predictive information captured in the existing distribution.

A third hypothesis is that constraints on economic activities linked to household size

(and the number of dependents) may be more acute in an urban displacement context such

as Baidoa, also characterized by some ongoing security risks. In a more typical, non-fragile,

rural setting, the active care burden of both young and old dependents may be lower due to

21The continuous variables in the Banerjee et al. sample are expressed in 2014 purchasing power parity
terms, while our variables are captured in 2017 purchasing power parity terms. We employ a simple ad-
justment using the U.S. GDP deflator and thus apply an adjustment factor of 1.05 to the Banerjee et al.
estimates, bearing in mind this is only indicative.

21


reduced security risk for unsupervised dependents, and the existence of a typical informal

network of care and support. That being said, it is somewhat surprising that this would pose

a meaningful constraint given that the most important livelihoods activity for households

benefiting from the UPG intervention in Baidoa is raising livestock, an activity typically

undertaken at home and viewed as compatible with domestic and care responsibilities; one

hypothesis is that marketing livestock for actual income generation is challenging for those

constrained by the care of more dependents.

4.6 Cost-effectiveness

Given the evidence around the substantial positive effects of the UPG intervention in Baidoa,

it is informative to consider its costs and explore cost-effectiveness. Here, we find the the

intervention is in fact relatively expensive: the all-in cost per household is estimated at

approximately $7,770 in 2017 PPP terms or $2,930 in 2017 nominal USD (to enable cross-

country comparability, we express costs in both PPP and nominal terms).22 This suggests

the Baidoa UPG program is one of the highest-cost programs assessed in the literature (Table

A1), though our cost estimate plausibly reflects an upper bound given that it is based on the

total program budget including overhead, monitoring, and management costs often excluded

from cost analyses.23 For comparison, Banerjee et al. (2015) report 2017 PPP costs of

approximately $5,600 in Ghana, $3,960 in Pakistan, and $4,420 in Peru. In Afghanistan,

Bedoya et al. (2019) estimate the intervention cost at around $7,470 per household, a cost

level comparable to Somalia (and perhaps not coincidentally, in a similarly challenging,

conflict-affected setting).

22The original budgeted cost was $3,200 in 2022 nominal USD. We convert this estimate to Somali shillings,
and adjust to the 2017 base year using IMF CPI data and World Bank exchange and PPP conversion factors.

23We were unable to conduct a detailed costing analysis, as this analysis was interrupted when many staff
at the implementing partner were terminated following loss of USAID funding. Accordingly, the estimate
here is based on the total program budget awarded to the World Vision-led consortium (with the IFPRI
subaward removed) divided by the 5,000 households served. It includes all management, oversight, and
monitoring costs—expenses often excluded from published cost estimates. Moreover, it assumes the entire
budget was spent in Somalia, though in practice, some share likely supported international staff or external
oversight.

22


In terms of effectiveness, the intervention effect size here for consumption (30% relative

to the control mean) is among the largest effects observed in this literature as summarized in

Table A1, larger than the effects observed in Banerjee et al. (2015) and comparable to Bedoya

et al. (2019).24 The effect on assets and financial inclusion — nearly 1.5 standard deviations

— is much larger than the effect on comparable variables observed in Banerjee et al. (around

0.2 – 0.4 standard deviations). However, the effects in this trial are observed at relatively

short duration (only two years post-program launch, rather than three or four), again as

summarized in the same table. While this analysis should be interpreted cautiously given

the absence of detailed cost data and thus our inability to verify that costs are measured

comparably across these different trials, it suggests that UPG may be in a broadly similar

(or slightly lower) range of cost-effectiveness vis-a-vis other parallel interventions, but only

if the large effects observed in the short-term persist.25

5 Conclusions

Recognizing the inadequacy of regular consumption support in addressing the multiple chal-

lenges faced by displaced households, the 2016 New York Declaration for Refugees and

Migrants called for a shift toward sustainable, development-oriented responses to forced dis-

placement that promote economic self-reliance among refugees and IDPs (United Nations,

2016). Our study is among the first to experimentally assess an intervention closely aligned

with this declaration, and our findings suggest that an integrated ultra-poor graduation

model does lead to significant increases in consumption, assets, and financial inclusion for a

vulnerable IDP sample in urban Somalia.

These effects are primarily driven by enhanced income from livestock—particularly goats—with

little evidence of household livelihoods diversifying into non-agricultural activities. While

24Relative to standard deviations in the control arm, however, the effect of .1 standard deviations observed
here is roughly similar to the pooled effect observed in the original Science trial (Banerjee et al., 2015).

25There are, of course, other examples of parallel interventions implemented at dramatically lower cost
with high levels of effectiveness: for example, recent work in Niger by Bossuroy et al. (2022) assessing an
intervention costing less than $600.

23


such gains offer a tangible path to improved welfare, they raise important questions about

the sustainability of these effects. On the one hand, reliance on livestock may reflect a natu-

ral livelihood strategy in Baidoa, a major livestock trade hub in Somalia. On the other hand,

it is unlikely that one can goat one’s way out of poverty in the long run, to paraphrase Lant

Pritchett (The New York Times, 2007), as sustained poverty reduction typically requires

movement out of agriculture.

In addition, we demonstrate using a generalized random forest analysis that the positive

effects of the treatment are notably larger for smaller households characterized by a smaller

number of dependents; household size is the primary baseline variable predicting variation

in the conditional average treatment effect. This is in contrast to the findings from an

exploratory re-analysis of the data from Banerjee et al. (2015), a graduation program im-

plemented in multiple stable settings, where baseline values of economic outcomes are more

predictive of varying treatment effects. These findings suggest that livelihoods interven-

tions implemented in displacement contexts should more carefully consider how household

composition and dependent responsibilities may shape households’ economic choices, and

ultimately influence households’ capacity to benefit from such programs.

This empirical pattern has implications for both targeting — some households might

show significantly more positive treatment effects — and intervention design, in that it

might be desirable to tailor any available intervention to households that have more care

responsibilities. Given that large households do not show very positive effects from an

integrated livelihoods intervention in this context, these households might alternatively be

preferentially targeted for ongoing cash transfers to sustain consumption and human capital

investment. At least in a fragile or displacement-affected context, a livelihoods intervention

that has the explicit objective of generating a longer-term income stream might consider

targeting households that are smaller and have reduced care responsibilities.

24


Table 1: Baseline balance

(1) (2) (1)-(2)
Control Treatment Pairwise t-test

Variable Mean/(SE) Mean/(SE) Mean difference

IDP household 0.833 0.838 -0.005
(0.011) (0.007)

Household size 6.901 6.573 0.328***
(0.068) (0.043)

Dependency ratio 2.398 2.373 0.025
(0.096) (0.060)

Any pregnant / lactating woman 0.589 0.601 -0.012
(0.014) (0.009)

Any disabled member 0.204 0.192 0.013
(0.011) (0.007)

Any adult male 0.871 0.859 0.011
(0.010) (0.006)

Any cash savings 0.007 0.009 -0.002
(0.002) (0.002)

Baseline asset value 358.784 337.792 20.992
(19.428) (11.100)

Tropical livestock units 0.198 0.182 0.016
(0.015) (0.009)

Household Hunger Scale 3.477 3.469 0.008
(0.033) (0.022)

Livelihoods Coping Strategies index 1.657 1.587 0.069
(0.061) (0.038)

Number of observations 1244 2872 4116
F-test of joint significance 0.053
F-test using randomization inference 0.079
Notes: All pair-wise regressions and F-tests are based on specifications including strata fixed effects and robust standard errors.
Randomization inference p-values are based on 1,000 repetitions. Asterisks indicate significance at the ten, five, and one percent
level.

25


Table 2: Primary experimental effects

(1) (2) (3) (4) (5)

Panel A: Consumption and food security

Total
consumption

Food
consumption

Non-food
consumption

Moderate
or severe HHS

Livelihood
coping score

UPG 0.81*** 0.59*** 0.22*** -0.30*** -0.57***
( 0.06) ( 0.04) ( 0.02) ( 0.02) ( 0.09)

q-value 0.001*** 0.001*** 0.001*** 0.001*** 0.001***
Control mean 2.63 2.03 0.61 0.42 1.71
Observations 3964 3964 3964 3982 3982

Panel B: Assets and financial inclusion

Asset value TLUs Any savings Any credit

UPG 603.14*** 0.35*** 0.42*** 0.08***
(21.25) ( 0.02) ( 0.01) ( 0.02)

q-value 0.001*** 0.001*** 0.001*** 0.001***
Control mean 263.32 0.14 0.04 0.57
Observations 3982 3982 3966 3974

Panel C: Income

Any
ag. +

livestock
Ag. + livestock

income
Any non-farm

business
Non-farm
income

Any
wage

Wage
income

UPG 0.16*** 28.11*** 0.05*** 3.81 -0.00 21.62
( 0.01) ( 8.47) ( 0.01) (25.63) ( 0.02) (36.51)

q-value 0.001*** 0.001*** 0.001*** 0.317 0.317 0.206
Control mean 0.05 6.08 0.15 125.84 0.42 586.59
Observations 3982 3982 3982 3982 3982 3982

Panel D: Social cohesion and locus of control

Social cohesion Locus of control

UPG -0.04 0.10***
( 0.03) ( 0.03)

q-value 0.085* 0.001***
Control mean 0.03 -0.07
Observations 3982 3982

Notes: The primary outcomes are per capita total consumption (col 1); moderate or severe household hunger scale (HHS) (col
4); and value of assets (col 1, Panel B). All regressions control for strata fixed effects and for the baseline outcome variable, if
available (baseline values are available for the livelihood coping score, tropical livestock units, and any savings; there is no
baseline variation in the household hunger score). Robust standard errors are reported in parentheses and the endline control
mean is reported. Asterisks indicate significance at the ten, five, and one percent level. The reported q-values are p-values
adjusted for multiple inference based on the false discovery rate correction procedure outlined in Anderson (2008).

26


Figure 1: Primary outcomes: Longitudinal effects

0.00

-0.12*** -0.30***

0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1

Sh
ar

e 
of

 h
ou

se
ho

ld
s

A: Share of households with moderate or severe hunger

0.19***

0.81***

0

1

2

3

4

5

6

7

8

$ 
PP

P

B: Daily per capita total consumption

-12.93

207.89***

531.68***

0

200

400

600

800

1000

$ 
PP

P

Baseline 1st follow-up 2nd follow-up

C: Total value of assets

-0.01

0.19*** 0.35***

0

.2

.4

.6

.8

1

N
um

be
r o

f t
ro

pi
ca

l l
iv

es
to

ck
 u

ni
ts

Baseline 1st follow-up 2nd follow-up

D: Tropical livestock units

Control UPG

95% CI 95% CI

Notes: These graphs report treatment effects at both 1st follow-up (one year following program enrollment) and 2nd follow-up
(two years). Solid dots show the control and treatment (UPG) group means, and capped bars represent the corresponding
95% confidence intervals. The reported numbers are treatment effects, estimated using an ANCOVA at first and second
follow-up rounds that controls for strata dummies and baseline values of the variable, if available. Statistical significance is
indicated as * 0.1, ** 0.05, and *** 0.01. Daily per capita consumption was not measured at baseline.

27


Table 3: Average standard treatment effects

(1) (2) (3) (4)
Consumption and

food security
Assets and

financial inclusion Income
Social cohesion and
locus of control

UPG 0.287*** 1.436*** 0.047 0.041
(0.035) (0.043) (0.037) (0.035)

Constant 0.000 0.000 0.001 0.000
(0.030) (0.030) (0.029) (0.029)

Observations 3964 3961 3982 3982

Notes: This table reports the average standard treatment effect for each family of outcomes following Kling et al. 2007,
normalized with respect to standard deviations in the control arm. (The variable corresponding to any moderate or severe
food insecurity is reverse-coded in this analysis.) Robust standard errors in parentheses. Asterisks indicate significance at the
ten, five, and one percent level.

28


Table 4: Heterogeneous effects

(1) (2) (3) (4)
Per capita consumption

UPG 0.625*** 0.680*** 0.703*** 0.493***
(0.054) (0.062) (0.092) (0.090)

UPG X Household under 5 members 0.369*** 0.296**
(0.130) (0.133)

Household under 5 members 1.229*** 1.178***
(0.110) (0.113)

UPG X Dependency ratio under 2.5 0.367*** 0.240**
(0.127) (0.120)

Dependency ratio under 2.5 0.491*** 0.168*
(0.104) (0.099)

UPG X Low baseline assets 0.171 0.126
(0.116) (0.106)

Low baseline assets -0.181* -0.143
(0.101) (0.093)

Observations 3964 3964 3964 3964

Notes: This table reports heterogeneous effects for the primary outcome of per capita daily consumption with respect to
baseline covariates identified using the generalized random forest method. We define three binary variables using cutoffs
derived from the 25th percentile of the baseline distribution: household size under five members, a dependency ratio (defined
as the ratio of individuals under 15 or over 55 to those aged 16–54) under one, and baseline asset value below the 25th
percentile or $74. Asterisks indicate significant at the ten, five and one percent level.

29


References

Aker, Jenny C., “Comparing Cash and Voucher Transfers in a Humanitarian Context:
Evidence from the Democratic Republic of Congo,” The World Bank Economic Review,
2017, 31 (1), 44–70.

Altındağ, O. and S. D. O’Connell, “The short-lived effects of unconditional cash trans-
fers to refugees,” Journal of Development Economics, 2023, 160, 102942.

Anderson, Michael L, “Multiple inference and gender differences in the effects of early
intervention: A reevaluation of the Abecedarian, Perry Preschool, and Early Training
Projects,” Journal of the American Statistical Association, 2008, 103 (484), 1481–1495.

Asiedu, Edward, Dean Karlan, Monica Lambon-Quayefio, and Christopher Udry,
“A call for structured ethics appendices in social science papers,” Proceedings of the Na-
tional Academy of Sciences, 2021, 118 (29), e2024570118.

Athey, Susan, Julie Tibshirani, and Stefan Wager, “Generalized random forests,”
The Annals of Statistics, 2019, 47 (2), 1148 – 1178.

Balboni, Clare, Oriana Bandiera, Robin Burgess, Maitreesh Ghatak, and Anton
Heil, “Why do people stay poor?,” The Quarterly Journal of Economics, 2022, 137 (2),
785–844.

Ballard, Terri, Jennifer Coates, Anne Swindale, and Megan Deitchler, “Household
hunger scale: indicator definition and measurement guide,” Washington, DC: Food and
nutrition technical assistance II project, FHI, 2011, 360, 23.

Bandiera, Oriana, Robin Burgess, Narayan Das, Selim Gulesci, Imran Rasul, and
Munshi Sulaiman, “Labor markets and poverty in village economies,” The Quarterly
Journal of Economics, 2017, 132 (2), 811–870.

Banerjee, Abhijit, Dean Karlan, Robert Osei, Hannah Trachtman, and Christo-
pher Udry, “Unpacking a multi-faceted program to build sustainable income for the very
poor,” Journal of Development Economics, 2022, 155, 102781.

, Esther Duflo, and Garima Sharma, “Long-term effects of the targeting the ultra
poor program,” American Economic Review: Insights, 2021, 3 (4), 471–486.

, , Nathanael Goldberg, Dean Karlan, Robert Osei, William Parienté,
Jeremy Shapiro, Bram Thuysbaert, and Christopher Udry, “A multifaceted pro-
gram causes lasting progress for the very poor: Evidence from six countries,” Science,
2015, 348 (6236), 1260799.

Baseler, Travis, Thomas Ginn, Ibrahim Kasirye, Belinda Muya, and Andrew
Zeitlin, “Mentoring Small Businesses: Evidence from Uganda,” mimeo, 2024.

Battisti, Michele, Yvonne Giesing, and Nadzeya Laurentsyeva, “Can job search
assistance improve the labour market integration of refugees? Evidence from a field ex-
periment,” Labour Economics, 2019, 61, 101745.

30


Bedoya, G. et al., “No household left behind: Afghanistan targeting the ultra-poor impact
evaluation,” Technical Report w25981, National Bureau of Economic Research 2019.

Belloni, Alexandre, Victor Chernozhukov, and Christian Hansen, “Inference on
Treatment Effects after Selection among High-Dimensional Controls†,” The Review of
Economic Studies, 11 2013, 81 (2), 608–650.

Bossuroy, Thomas, Markus Goldstein, Bassirou Karimou, Dean Karlan,
Harounan Kazianga, William Parienté, Patrick Premand, Catherine C
Thomas, Christopher Udry, Julia Vaillant et al., “Tackling psychosocial and capital
constraints to alleviate poverty,” Nature, 2022, 605 (7909), 291–297.

Brune, L. et al., “Social protection amidst social upheaval: Examining the impact of
a multi-faceted program for ultra-poor households in Yemen,” Journal of Development
Economics, 2022, 155, 102780.

Carletto, Gero, Marco Tiberti, and Alberto Zezza, “Measure for measure: Comparing
survey based estimates of income and consumption for rural households,” The World Bank
Research Observer, 2022, 37 (1), 1–38.

Catholic Relief Services, “Social Cohesion Indicators Bank,” https://www.crs.org/

our-work-overseas/research-publications/social-cohesion-indicators-bank

2019. Accessed: 2024-03-07.

Cilliers, Jacobus, Nour Elashmawy, and David McKenzie, “Using post-double se-
lection Lasso in field experiments,” Technical Report, World Bank 2024.

Cleland, John and Kazuyo Machiyama, “The challenges posed by demographic change
in sub-Saharan Africa: A concise overview,” Population and Development Review, 2017,
43, 264–286.

Davis, Jonathan MV and Sara B Heller, “Using causal forests to predict treatment
heterogeneity: An application to summer jobs,” American Economic Review, 2017, 107
(5), 546–550.

Deaton, Angus and Salman Zaidi, Guidelines for constructing consumption aggregates
for welfare analysis, Vol. 135, World Bank Publications, 2002.

Fasani, Francesco, Tommaso Frattini, and Luigi Minale, “Lift the ban? Initial em-
ployment restrictions and refugee labour market outcomes,” Journal of the European Eco-
nomic Association, 2021, 19 (5), 2803–2854.

Gibson, John and Scott Rozelle, “Prices and unit values in poverty measurement and
tax reform analysis,” The World Bank Economic Review, 2005, 19 (1), 69–97.

Gupta, Prankur, Daniel Stein, Kyla Longman, Heather Lanthorn, Rico
Bergmann, Emmanuel Nshakira-Rukundo, Noel Rutto, Christine Kahura,
Winfred Kananu, Gabrielle Posner et al., “Cash transfers amid shocks: A large,
one-time, unconditional cash transfer to refugees in Uganda has multidimensional benefits
after 19 months,” World Development, 2024, 173, 106339.

31


Hidrobo, Melissa, John Hoddinott, Amber Peterman, Amy Margolies, and
Vanessa Moreira, “Cash, food, or vouchers? Evidence from a randomized experiment
in northern Ecuador,” Journal of Development Economics, 2014, 107, 144–156.

Humble, Steve, Aditya Sharma, Baladevan Rangaraju, Pauline Dixon, and Mark
Pennington, “Associations between neighbourhood social cohesion and subjective well-
being in two different informal settlement types in Delhi, India: a quantitative cross-
sectional study,” BMJ open, 2023, 13 (4), e067680.

Hussam, R. et al., “The psychosocial value of employment: Evidence from a refugee
camp,” American Economic Review, 2022, 112 (11), 3694–3724.

IDMC, “2024 Global Report on Internal Displacement,” 2024.

Kaplan, Lennart, Utz Pape, and James Walsh, “Eliciting Accurate Consumption
Responses from Vulnerable Populations,” Data Collection in Fragile States: Innovations
from Africa and Beyond, 2020, pp. 193–206.

Kerwin, Jason, Nada Rostom, and Olivier Sterck, “Striking the Right Balance: Why
Standard Balance Tests Over-Reject the Null, and How to Fix It,” Technical Report, IZA
Discussion Papers 2024.

Kling, Jeffrey R, Jeffrey B Liebman, and Lawrence F Katz, “Experimental analysis
of neighborhood effects,” Econometrica, 2007, 75 (1), 83–119.

Lee, David S, “Training, wages, and sample selection: Estimating sharp bounds on treat-
ment effects,” 2005.

Leight, Jessica, Harold Alderman, Daniel Gilligan, Melissa Hidrobo, and Michael
Mulford, Can a light-touch graduation model enhance livelihood outcomes? Evidence from
Ethiopia, Intl Food Policy Res Inst, 2023.

MacPherson, C. and O. Sterck, “Empowering refugees through cash and agriculture: A
regression discontinuity design,” Journal of Development Economics, 2021, 149, 102614.

Malacarne, Jonathan G, “The farmer and the fates: Locus of control and investment in
rainfed agriculture,” Applied Economic Perspectives and Policy, 2024, 46 (2), 534–552.

Mancini, Giulia and Giovanni Vecchi, “On the construction of a consumption aggregate
for inequality and poverty analysis,” World Bank Group, Washington, DC, 2022.

Özler, Berk, Çiğdem Çelik, Scott Cunningham, P Facundo Cuevas, and Luca
Parisotto, “Children on the move: Progressive redistribution of humanitarian cash trans-
fers among refugees,” Journal of Development Economics, 2021, 153, 102733.

Pape, Utz, “Somali Poverty Profile: Findings from Wave 1 of the Somali High Frequency
Survey,” Technical Report, World Bank Group, Washington, D.C. 2017.

32


and Johan Mistiaen, “Household expenditure and poverty measures in 60 minutes: a
new approach with results from Mogadishu,” World Bank Policy Research Working Paper,
2018, (8430).

and , “Rapid Consumption Surveys,” in Johannes Hoogeveen and Utz Pape, eds., Data
Collection in Fragile States: Innovations from Africa and Beyond, Palgrave Macmillan,
2020, chapter 9, pp. 153–171.

and Paolo Verme, “Measuring Poverty in Forced Displacement Contexts,” GLO Dis-
cussion Paper, 2023.

and Philip Randolph Wollburg, “Estimation of poverty in Somalia using innovative
methodologies,” World Bank Policy Research Working Paper, 2019, (8735).

Rotter, Julian B, “Generalized expectancies for internal versus external control of rein-
forcement.,” Psychological monographs: General and applied, 1966, 80 (1), 1.

Rozo, Sandra V and Guy Grossman, “Refugees and Other Forcibly Displaced Popula-
tions,” VoxDevLit, 2025, 14 (1).

Rustad, Siri Aas, Conflict Trends: A Global Overview, 1946–2023, Peace Research Insti-
tute Oslo (PRIO), 2024.

Sarvimäki, Matti and Kari Hämäläinen, “Integrating immigrants: The impact of re-
structuring active labor market programs,” Journal of Labor Economics, 2016, 34 (2),
479–508.

Schuettler, Kirsten and Laura Caron, Jobs interventions for refugees and internally
displaced persons, World Bank Group, 2020.

Sylvia, Sean, Nele Warrinnier, Renfu Luo, Ai Yue, Orazio Attanasio, Alexis
Medina, and Scott Rozelle, “From quantity to quality: Delivering a home-based par-
enting intervention through China’s family planning cadres,” The Economic Journal, 2021,
131 (635), 1365–1400.

The New York Times, “Should We Globalize Labor Too?,” The New York Times Maga-
zine, June 10, 2007 2007.

UN-Habitat, “Baidoa Urban Profile: Working Paper and Spatial Analyses for Ur-
ban Planning and Durable Solutions,” https://reliefweb.int/report/somalia/

baidoa-urban-profile-working-paper-and-spatial-analyses-urban-planning-and-durable

2021. Accessed: 2024-03-07.

UNHCR, “Location and populations of IDP sites in Baidoa as at 28 April 2017,” https:

//data.unhcr.org/en/documents/details/56361 2017. Accessed: 2024-03-07.

, “Location and populations of IDP sites in Baidoa as at July 2022,” https://data.

unhcr.org/en/documents/details/94414 2022. Accessed: 2024-03-07.

33


, “Global Trends: Forced Displacement,” https://www.unhcr.org/global-trends 2023.
Accessed: 2024-03-07.

, “Mid-Year Trends,” https://www.unhcr.org/mid-year-trends 2023. Accessed: 2024-
03-07.

, “Internally Displaced People,” 2024. Accessed: 2024-03-07.

, “Refugee Statistics,” https://www.unhcr.org/refugee-statistics 2024. Accessed:
2024-03-07.

United Nations, “New York Declaration for Refugees and Migrants (A/RES/71/1),” 2016.
Resolution adopted by the General Assembly on 19 September 2016.

Wager, Stefan and Susan Athey, “Estimation and Inference of Heterogeneous Treatment
Effects using Random Forests,” Journal of the American Statistical Association, 2018, 113
(523), 1228–1242.

White, Halbert, “A heteroskedasticity-consistent covariance matrix estimator and a di-
rect test for heteroskedasticity,” Econometrica: Journal of the Econometric Society, 1980,
pp. 817–838.

WMO, “Extreme Weather,” 2024. Accessed: 2024-03-07.

34


Appendix

A1 Graduation literature

Table A1 provides an overview of key studies evaluating graduation-style interventions, sum-

marizing characteristics of the trial design, intervention components, and estimated impacts

on consumption. The final two rows of the table report the estimated per-household or per-

participant intervention costs in both 2017 Purchasing Power Parity (PPP) dollars and 2017

nominal U.S. dollars, allowing for cross-country and cross-study comparison. These cost

estimates have been harmonized using a standardized approach described in the following

paragraphs.

The original intervention costs are reported in U.S. dollar terms in Banerjee et al. (2015)

(Ethiopia, Ghana, Honduras, India, Pakistan, and Peru), Bedoya et al. (2019) (Afghanistan),

and in this study (Somalia). In this case, the procedure follows three steps: first, we recover

the local currency value by applying the USD exchange rate for the base year of the original

cost data. Second, we adjust the local currency values to 2017 levels using changes in

domestic consumer price indices (CPI). Finally, we convert the 2017 local currency values

into 2017 PPP dollars using PPP conversion factors for individual consumption. All CPI,

exchange rate, and PPP data are sourced from the World Bank, except in the case of Somalia,

where CPI data were drawn from the IMF due to gaps in the World Bank series.

Bandiera et al. (2017) (Bangladesh), Bossuroy et al. (2022) (Niger), and Brune et al.

(2022) (Yemen) report costs in PPP dollar terms. Here, to convert these costs to 2017 PPP

and nominal U.S. dollars, we follow a two-step process: first, we recover the local currency

value by multiplying the original PPP figure by the relevant PPP exchange rate in the base

year. Then, we follow the inflation and conversion steps described above starting from step

two.

Yemen is a special case. The intervention cost was reported in 2010 PPP dollars, but no

reliable CPI data exist for Yemen beyond 2014. To address this, we estimate an inflation

35


adjustment factor based on the change in the nominal exchange rate between 2010 and

2017, scaled by U.S. inflation over the same period. This approach assumes that relative

movements in the exchange rate reflect changes in local price levels in the absence of reliable

domestic inflation data.

36


T
ab

le
A
1:

O
ve
rv
ie
w

of
gr
ad

u
at
io
n
li
te
ra
tu
re

S
tu
d
y

(a
)

(b
)

(b
)

(b
)

(b
)

(b
)

(b
)

(c
)

(d
)

(e
)

(f
)

(g
)

T
ri
a
l
ch

a
ra

c
te
ri
st
ic
s

[1
2p

t]
C
ou

n
tr
y

B
G
D

E
T
H

G
H
A

H
N
D

IN
D

P
A
K

P
E
R

A
F
G

N
E
R

Y
E
M

E
T
H

S
O
M

Y
ea
r
la
u
n
ch
ed

20
07

20
10

20
11

20
09

20
07

20
08

20
11

20
16

20
16

20
10

20
19

20
22

Y
ea
rs

el
ap

se
d

4
3

3
3

3
3

3
2

1.
5

4
3

2
C
on

te
x
t

R
u
ra
l

R
u
ra
l

R
u
ra
l

R
u
ra
l

R
u
ra
l

R
u
ra
l

R
u
ra
l

R
u
ra
l

R
u
ra
l

B
ot
h

R
u
ra
l

U
rb
an

F
or
ce
d
d
is
p
la
ce
m
en
t

N
o

N
o

N
o

N
o

N
o

N
o

N
o

N
o

N
o

N
o

N
o

ID
P

O
n
go
in
g
co
n
fl
ic
t

N
o

N
o

N
o

N
o

N
o

N
o

N
o

Y
es

N
o

Y
es

N
o

Y
es

C
on

su
m
p
ti
on

IT
T

1
0.
9%

18
.2
%

10
.6
%

-5
.8
%

10
.7
%

7.
0%

5.
2%

29
.7
%

14
.7
%

-2
.4
%

0.
7%

30
.8
%

p
-v
al
u
e

0.
00

3
0.
00

0
0.
00

7
0.
15

2
0.
00

1
0.
07

9
0.
08

5
0.
00

0
0.
00

0
0.
64

4
>

.2
0.
00

0

In
te
rv

e
n
ti
o
n

ch
a
ra

c
te
ri
st
ic
s

A
ss
et

(A
)
or

gr
a
n
t
(G

)
A

A
A

A
A

A
A

A
G

A
A

/
G

A
C
on

su
m
p
ti
on

su
p
p
or
t

C
a
sh

F
o
o
d

*
C
as
h

F
o
o
d

C
as
h

C
as
h

C
as
h

*
C
as
h

C
as
h

*
C
as
h

*
C
as
h
*

C
as
h

M
en
to
ri
n
g

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

N
o

Y
es

S
k
il
ls

tr
ai
n
in
g

N
o

N
o

N
o

N
o

N
o

N
o

N
o

Y
es

Y
es

Y
es

Y
es

N
o

S
av
in
gs

gr
ou

p
s

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

A
cc
es
s
to

fi
n
an

ce
N
o

Y
es

Y
es

Y
es

Y
es

N
o

Y
es

Y
es

N
o

N
o

N
o

N
o

P
sy
ch
os
o
ci
al

su
p
p
or
t

N
o

N
o

N
o

N
o

N
o

N
o

N
o

N
o

Y
es

N
o

Y
es

†
N
o

S
o
ci
al
/b

eh
av
io
ra
l

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

Y
es

su
p
p
or
t

C
os
t
($
P
P
P

20
17

)
1,
48

9
3
,2
51

5,
6
00

3,
04

6
1,
23

4
3,
95

9
4,
42

2
7,
47

4
59

5
1,
10

1
N
R

7,
76

8
C
os
t
($
U
S
D

20
17

)
54

1
1,
11

8
2
,2
01

1,
38

5
38

2
1,
23

1
2,
59

1
1,
76

8
25

0
33

2
N
R

2,
92

7

N
o
te
s:

a
B
a
n
d
ie
ra

et
a
l.
(2
0
1
7
),

b
B
a
n
er
je
e
et

a
l.
(2
0
1
5
),

c
B
ed

o
y
a
et

a
l.
(2
0
1
9
),

d
B
o
ss
u
ro
y
et

a
l.
(2
0
2
2
),

e
B
ru

n
e
et

a
l.
(2
0
2
2
),

f
L
ei
g
h
t
et

a
l.
(2
0
2
3
),

g
T
h
is

st
u
d
y.

B
G
D

=
B
a
n
g
la
d
es
h
,
E
T
H

=
E
th

io
p
ia
,
G
H
A

=
G
h
a
n
a
,
H
N
D

=
H
o
n
d
u
ra
s,

IN
D

=
In
d
ia
,
P
A
K

=
P
a
k
is
ta
n
,
P
E
R

=
P
er
u
,
A
F
G

=
A
fg
h
a
n
is
ta
n
,
N
E
R

=
N
ig
er
,
Y
E
M

=
Y
em

en
,
S
O
M

=
S
o
m
a
li
a
.
Y
ea

rs
el
a
p
se
d
re
fe
r
to

th
e
y
ea

rs
el
a
p
se
d
b
et
w
ee
n
p
ro
g
ra
m

la
u
n
ch

a
n
d
th

e
co

ll
ec
ti
o
n
fo

fo
ll
o
w
-u
p
d
a
ta
.
C
o
st
s
a
re

co
n
v
er
te
d
to

p
u
rc
h
a
si
n
g
p
o
w
er

p
a
ri
ty

te
rm

s
in

th
e

b
a
se

y
ea

r
em

p
lo
y
ed

a
n
d
th

en
co

n
v
er
te
d
to

P
P
P

te
rm

s
in

2
0
1
7
.
F
o
r
L
ei
g
h
t
et

a
l.

in
E
th

io
p
ia
,
th

e
re
p
o
rt
ed

co
effi

ci
en

t
is

th
e
m
ea

n
eff

ec
t
o
n
co

n
su

m
p
ti
o
n
a
cr
o
ss

th
re
e
a
rm

s
ex

a
m
in
ed

;
n
o
n
e
w
er
e
st
a
ti
st
ic
a
ll
y
si
g
n
ifi
ca

n
t,

h
en

ce
th

e
n
o
ta
ti
o
n
o
f
th

e
p
-v
a
lu
e.

*
C
o
n
su

m
p
ti
o
n
su

p
p
o
rt

w
a
s
p
ro
v
id
ed

to
a
ll
in
d
iv
id
u
a
ls

in
th

e
sa
m
p
le

(i
.e
.,
th

er
e
w
a
s
n
o

ex
p
er
im

en
ta
l
v
a
ri
a
ti
o
n
.)

†
P
sy
ch

o
so
ci
a
l
su

p
p
o
rt

w
a
s
p
ro
v
id
ed

o
n
ly

to
th

o
se

id
en

ti
fi
ed

a
s
el
ig
ib
le

b
a
se
d
o
n
sy
m
p
to
m
s
o
f
d
ep

re
ss
io
n

37


A2 Appendix Exhibits

Figure A1: Map of Somalia and the IDP survey sites in Baidoa

(a) Somalia (b) Baidoa

Notes: Figure (a) shows the location of Baidoa in Somalia and with respect to the capital, Mogadishu. Shapefile from
UNOCHA Somalia. Figure (b) shows the locations of IPD households at the second follow-up survey. Basemap from
OpenStreetMap.

38


Figure A2: Study timeline

Figure A3: Asset and TVET Choices

(a) Asset transfers (b) TVET training

Notes: The figures capture the reported choices of productive assets and Technical and Vocational Education and Training
(TVET) in the treatment arm.

39


Figure A4: Out-of-bag CATE estimates on consumption from GRF algorithm

(a) Kernel density function (b) Cumulative distribution function

Notes: These figures report the out-of-bag conditional average treatment effects (CATE) from the generalized random forest
(GRF) method following Athey et al. (2019). Figure A4a reports the kernel; density function, and Figure A4b reports the
cumulative density function.

40


Figure A5: Scatter plots of out-of-bag CATE estimates and observable baseline characteris-
tics

Notes: This graph reports the correlation between the conditional average treatment effect (CATE) and the three baseline
covariates identified as most predictive of variation in the conditional average treatment effect, as reported in Table A8.

41


Figure A6: Out-of-bag CATE estimates on consumption: Banerjee et al. (2015)

(a) Model including full set of baseline covariates
(b) Model including restricted set of baseline co-
variates

Notes: These figures report the out-of-bag conditional average treatment effects (CATE) from the generalized random forest
method following Athey et al. (2019), estimated using the replication data from the Banerjee et al. (2015) trial for the
outcome variable of consumption in the long-term follow-up.

42


Figure A7: Scatter plots of out-of-bag CATE estimates and observable baseline characteris-
tics: Banerjee et al. (2015)

(a) Model including full set of baseline covariates
(b) Model including restricted set of baseline co-
variates

Notes: This graph reports the correlation between the conditional average treatment effect (CATE) and the three baseline
covariates identified as most predictive of variation in the conditional average treatment effect, as reported in Table A8.

43


Figure A8: Baseline assets across samples

Notes: This graph presents the kernel density of baseline asset value in both the Baidoa sample analyzed in this paper and the
sample in Banerjee et al. (2015); the density figure is truncated at the 95th percentile of asset value in the combined sample.
All value estimates are in 2017 purchasing power parity-adjusted dollars.

44


Table A2: Program exposure

(1) (2) (3) (4)

Cash transfers
Asset transfer or
TVET training Savings groups Coaching

UPG 0.738*** 0.844*** 0.569*** 0.310***
(0.012) (0.008) (0.011) (0.015)

Constant 0.143*** 0.024*** 0.056*** 0.182***
(0.010) (0.005) (0.007) (0.011)

Observations 3982 3982 3982 3982

Notes: This table reports treatment effects on reported intervention receipt for the four main intervention elements: cash
transfers, asset transfers or Technical and Vocational Education and Training (TVET), savings groups, and coaching. All
regressions include strata fixed effects; asterisks indicate significance at the ten, five, and one percent level.

45


Table A3: Primary experimental effects: Additional baseline controls

(1) (2) (3) (4) (5)

Panel A: Consumption and food security

Total
consumption

Food
consumption

Non-food
consumption

Moderate
or severe HHS

Livelihood
coping score

UPG 0.71*** 0.51*** 0.19*** -0.30*** -0.57***

( 0.05) ( 0.04) ( 0.02) ( 0.02) ( 0.09)

q-value 0.001*** 0.001*** 0.001*** 0.001*** 0.001***

Control mean 2.63 2.03 0.61 0.42 1.71

Observations 3964 3964 3964 3982 3982

Panel B: Assets and financial inclusion

Asset value TLUs Any savings Any credit

UPG 603.97*** 0.35*** 0.42*** -0.00

(21.34) ( 0.02) ( 0.01) ( 0.01)

q-value 0.001*** 0.001*** 0.001*** 0.372

Control mean 263.32 0.14 0.04 0.18

Observations 3982 3982 3966 3982

Panel C: Income

Any
livestock

Livestock
income

Any non-farm
business

Non-farm
income

Any
wage

Wage
income

UPG 0.15*** 34.11*** 0.05*** 7.99 -0.00 22.37

( 0.01) ( 7.84) ( 0.01) (25.59) ( 0.02) (36.57)

q-value 0.001*** 0.001*** 0.001*** 0.372 0.396 0.291

Control mean 0.02 -1.05 0.15 125.84 0.42 586.59

Observations 3982 3982 3982 3982 3982 3982

Panel D: Social cohesion and locus of control

Social cohesion Locus of control

UPG -0.04 0.10***

( 0.03) ( 0.03)

q-value 0.111 0.001***

Control mean 0.03 -0.07

Observations 3982 3982

Notes: The primary outcomes are per capita total consumption (col 1); moderate or severe household hunger scale (HHS) (col
4); and value of assets (col 1, Panel B). All regressions control for strata fixed effects and for the baseline outcome variable, if
available (baseline values are available for the livelihood coping score, tropical livestock units, and any savings; there is no
baseline variation in the household hunger score); a control variable for household size, selected by a double lasso, is also
included. Robust standard errors are reported in parentheses and the endline control mean is reported. Statistical significance
is indicated as follows: * p < 0.10, ** p < 0.05, *** p < 0.01. The reported q-values are p-values adjusted for multiple
inference based on the false discovery rate correction procedure outlined in Anderson (2008).

46


Table A4: Minimum detectable effects

Share of households
moderate or
severe HHS

Per capita
consumption
in USD-PPP

Asset value
in USD-PPP

N, Control 1,150 1,146 1,150
N, Treatment 2,832 2,818 2,832
Mean, Control 0.42 2.63 263
SD, Control 0.49 1.54 482
Adjusted SD, Control 0.49 1.53 481
MDES 0.05 0.15 47.15
MDES relative to mean (%) 11.50 5.70 17.90
MDES relative to SD (%) 9.80 9.80 9.80

Notes: HHS = Household hunger score, USD = United States dollar, PPP = Purchasing power parity, N = Number of
observations, SD = Standard deviation, MDES = Minimum Detectable effect size.

47


Table A5: Additional outcomes

(1) (2) (3) (4) (5)

Variable
Control mean:
binary owned Control mean Treatment effect Std. error N

Panel A: Assets

Mobile phones 0.908 1.013 0.068*** 0.016 3954
Axe 0.742 0.802 -0.014 0.019 3872
Hammer 0.233 0.260 0.021 0.018 3752
Grain bag 0.182 0.596 0.329*** 0.064 3745
Panga 0.167 0.188 0.026 0.016 3761
Pick axe 0.159 0.181 0.030* 0.016 3776
Spade or shovel 0.138 0.143 0.098*** 0.014 3718
Plough (oxen-pulled) 0.132 0.183 0.074*** 0.020 3679
Hoe 0.130 0.160 0.048*** 0.017 3711
Hand mattock 0.125 0.157 0.075*** 0.019 3793
Goats 0.091 0.264 2.879*** 0.076 3912
Sickle 0.089 0.116 0.026* 0.016 3692
Donkey 0.076 0.076 0.020** 0.010 3763
Poultry 0.070 0.185 0.102*** 0.034 3899
Hand saw 0.068 0.100 0.011 0.015 3729
Tarpaulin 0.065 0.083 0.064*** 0.014 3702
Wheelbarrow 0.062 0.066 0.063*** 0.011 3751
Donkey cart 0.060 0.064 0.034*** 0.010 3758
Solar panels 0.060 0.060 0.042*** 0.009 3896
Sheep 0.057 0.093 0.073*** 0.018 3893

Panel B: Household composition

Household size . 7.152 0.068** 0.032 3982
Ages 0–4 . 0.925 0.049* 0.026 3982
Ages 5–14 . 2.768 0.052 0.034 3982
Ages 15–19 . 0.964 -0.047* 0.024 3982
Ages 20–54 . 2.057 0.008 0.021 3982
Ages 55+ . 0.438 -0.011 0.008 3982

Notes: This table presents supplementary regression findings for ownership by asset category, in Panel A, and household
composition, in Panel B. Column (1) reports the mean of a binary variable for any asset in that category owned, in the
control arm. Asterisks indicate significance at the ten, five, and one percent level.

48


Table A6: Attrition

(1) (2) (3)

[1em] Treated household -0.062*** -0.062*** -0.061

(0.008) (0.008) (0.047)

IDP household -0.009 -0.030

(0.008) (0.022)

IDP X Treat 0.030

(0.023)

Household size 0.000 0.004

(0.001) (0.004)

Size X Treat -0.005

(0.004)

Dependency ratio at baseline -0.002*** -0.005***

(0.001) (0.001)

Dependency X Treat 0.005***

(0.001)

Any pregnant or lactating woman 0.004 0.013

(0.006) (0.015)

PLW X Treat -0.011

(0.016)

Any disabled member 0.002 -0.006

(0.007) (0.018)

Disability X Treat 0.011

(0.020)

Any adult male -0.005 -0.021

(0.009) (0.025)

Male X Treat 0.022

(0.026)

Any cash savings -0.000 0.000

(0.000) (0.001)

49


Any savings X Treat -0.000

(0.001)

Baseline asset value -0.001 -0.000

(0.001) (0.003)

Asset value X Treat -0.001

(0.003)

Tropical livestock units 0.019 0.017

(0.017) (0.037)

TLUs X Treat 0.001

(0.041)

Household Hunger Scale 0.006** 0.010

(0.003) (0.007)

HHS X Treat -0.006

(0.007)

Livelihoods Coping Strategies index -0.002** -0.006**

(0.001) (0.003)

LCS X Treat 0.005

(0.003)

[1em] Observations 4116 4116 4116

Notes: Each column regresses a binary variable equal to one for households that attrite at the 2nd follow-up on a binary
variable for treated; baseline characteristics; and the interaction between the two. Baseline asset value is expressed in hundreds
of dollars. All regressions includes strata fixed effects; asterisks indicate significance at the ten, five, and one percent level.

50


Table A7: Treatment effects bounds correcting for attrition

Panel A: Lee and Kling-Liebman bounds

Lee Lee Kling-Liebman Kling-Liebman
lower upper lower upper

Total cons. .523 (.053) .97 (.056) .526 (.058) 1.09 (.058)

Food cons. .366 (.042) .719 (.044) .37 (.046) .814 (.046)

Non-food cons. .119 (.018) .263 (.019) .12 (.02) .311 (.02)

Asset value 603.137 (21.252) 651.061 (21.558) 507.03 (21.938) 699.675 (21.938)

LCS score -.948 (.079) -.57 (.085) -1.019 (.089) -.123 (.089)

TLUs .204 (.017) .346 (.022) .247 (.023) .447 (.023)

Panel B: Manski bounds

Lower Upper

Moderate ./ severe HHS -.377 (.015) -.228 (.015)

Any savings .32 (.011) .456 (.013)

Any credit .008 (.017) .163 (.017)

Ag. / livestock inc. .058 (.01) .19 (.012)

Non-farm bus. inc. -.043 (.012) .098 (.014)

Wage inc. -.085 (.017) .07 (.017)

Notes: This table reports bounds correcting for attrition for the outcomes of interest. For the continuous variables in Panel A,
we report Lee bounds and Kling-Liebman bounds, allowing for attrited individuals to be characterized by outcomes two
standard deviations above (below) the mean observed in the relevant treatment arm. For the binary variables in Panel B, we
report Manski bounds. Standard errors are in parentheses.

51


Table A8: Baseline characteristics used in GRF analysis and variable importance

Baseline characteristics Variable importance

Household size 0.31

Dependency ratio 0.18

Baseline asset value 0.17

Tropical Livestock Units 0.06

Livelihoods Coping Score 0.05

Strata 3 0.04

Strata 2 0.03

Any pregnant or lactating woman 0.03

Household Hunger Score 0.02

Strata 1 0.02

Engaged in farming 0.02

Any disabled member 0.02

Any adult male 0.01

Receiving transfers 0.01

IDP household 0.01

Self-employed 0.01

Other income 0.01

Any primary education 0.01

Wage labor 0.01

Any savings 0

Asset income 0

Notes: This table reports the variables employed in the generalized random forest (GRF) algorithm and their estimated
importance, defined as the frequency with which each observable characteristic is used as a splitting variable. The weights are
re-scaled such that the total weights for baseline characteristics excluding binary variables for randomization strata add up to
one.

52


Table A9: Baseline characteristics used in GRF analysis and variable importance: Banerjee
et al. 2015

Characteristic Importance (Full) Importance (Restricted)

Consumption per capita 0.1 NA

Total asset value 0.1 0.24

Income from agriculture 0.09 NA

Revenue from animals 0.09 NA

Income from paid labor 0.08 NA

Household size 0.07 0.2

Food security 0.06 0.2

Income from business 0.05 NA

Perception of economic welfare 0.05 0.2

Any savings 0.04 0.16

Notes: This table reports the variables employed in the generalized random forest (GRF) algorithm and their estimated
importance, defined as the frequency with which each observable characteristic is used as a splitting variable, in the analysis
conducted using the replication data from Banerjee et al (2015). We report the analysis conducted for a full set of available
baseline characteristics, and a restricted set of available baseline characteristics, corresponding to those also available in the
Baidoa sample; only the top ten most predictive variables in the “full” analysis are reported in the table, and thus for this
model, the reported weights do not sum to one. ”NA” denotes a variable not included in the restricted model. The weights
are re-scaled such that the total weights for baseline characteristics excluding binary variables for randomization strata and
country add up to one.

53


A3 Structured Ethics Appendix

The structured ethics appendix is based on Asiedu et al. (2021).

Policy Equipoise

Is there policy equipoise? That is, is there uncertainty regarding participants’ net benefits

from each arm of the study relative to the other arms and to the best possible policy to

which participants could have access? If not, ethical randomization requires two conditions

related to scarcity: (1) Was there scarcity, i.e., did the inclusion of multiple arms change

the expected aggregate value of the programs delivered? (2) Do all ex-ante identifiable

participants have equal moral or legal claims to the scarce programs?

While evidence on the effectiveness of graduation programs in forced displacement

settings remains limited, prior academic literature from other contexts suggests that

receiving the graduation package —which includes cash grants and either a TVET

opportunity or an asset transfer — is likely to yield greater benefits than not receiving it.

Given this, and despite the scarcity of evidence in displacement settings, the study may not

fully meet the condition of policy equipoise. However, the randomization is ethically

justified under conditions of scarcity. First, the available resources were insufficient to

provide the package to all eligible households in this context. Second, eligibility was

determined using pre-specified household characteristics assessed ex ante: households were

classified as experiencing moderate or severe hunger based on the Household Hunger Scale

and had resided in the IDP site for at least one month. These criteria were applied

consistently, and no ex-ante identifiable group with a stronger moral or legal claim to the

program was excluded. Third, randomization did not shift the aggregate benefit of the

intervention (provided to 5,000 households). On this basis, the study satisfies commonly

accepted ethical conditions for randomization in settings where policy equipoise may be

uncertain or absent.

54


Role of researchers with respect to implementation

Are researchers “active” researchers, i.e. did the researchers have direct decision making

power over whether and how to implement the program? If YES, what was the disclosure to

participants and informed consent process for participation in the program? Providing IRB

approval details may be sufficient but further clarification of any important issues should be

discussed here. If NO, i.e., implementation was separate, explain the separation.

The researchers did not play an “active” role in the implementation of this project.

Implementation was carried out independently by World