Refugee and Asylum Seeker Flow Data
The lab is grateful to have had the opportunity to work with the United Nations Refugee Agency (UNHCR) on its release of (directed) dyadic, yearly refugee and asylum seeker flow data, released ahead of World Refugee Day 2022. Please see:
Shaver, A., Krick, B., Blancaflor, J., Liu, X., Samara, G., Ku, S., Hu, S., Carreon, M., Lim, T., Raps, R., Velasquez, A., Angelo, J., de Melo, S. & Zuo, Z. “The Causes and Consequences of Refugee Flows: A Contemporary Re-Analysis.” American Political Science Review (2024).
For additional details about the data see also the associated APSR appendix:
https://static.cambridge.org/content/id/urn%3Acambridge.org%3Aid%3Aarticle%3AS0003055424000285/resource/name/S0003055424000285sup001.pdf
The raw refugee flow data may be accessed directly from the UNHCR at the following address:
https://www.unhcr.org/refugee-statistics/insights/explainers/forcibly-displaced-flow-data.html
For panel datasets at the origin country-year; asylum country-year; and directed dyad-year, please visit the Harvard Dataverse replication materials:
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JADOZL
Within the Harvard Dataverse files, users will find the following three files:
state_year_final.csv
dyad_year_final.csv
flows_imputed.csv
state_year_final contains four variables:
flows — This variable captures yearly outflows from the origin state with years preceding country reporting timelines set to NA.* (Though, see below for an important caveat when using origin-level data.)
flows_zero — This variable captures yearly outflows from the origin state with years preceding country reporting timelines set to 0. (We do not recommend using this variable given that many 0s likely represent positive unobserved values. This variable is included to sake of comparisons carried out in the analyses presented in the APSR article.)
inflow — This variable captures yearly inflows to the asylum state with years preceding country reporting timelines set to NA.
infows_zero — This variable captures yearly inflows to the asylum state with years preceding country reporting timelines set to 0. (We do not recommend using this variable given that many 0s likely represent positive unobserved values. This variable is included for comparison purposes.)
dyad_year_final contains two variables:
flows — This variable captures yearly directed flows between state pairs, with years preceding country reporting timelines set to NA.
flows_zero — This variable captures yearly directed flows between state pairs, with years preceding country reporting timelines set to 0. (We do not recommend using this variable given that many 0s likely represent positive unobserved values. This variable is included for comparison purposes.)
flows_imputed.csv contains imputed observations for major cases of displacement like Afghanistan, Colombia, and Iraq.
Count — This variable updated the raw UNHCR data of yearly directed flows between state pairs to include imputations for various origin country year cases (please https://static.cambridge.org/content/id/urn%3Acambridge.org%3Aid%3Aarticle%3AS0003055424000285/resource/name/S0003055424000285sup001.pdf for specifics). Please note that this data, like the raw UNHCR data, is not in balanced panel format. Thus, users interested in using this data for panel data analyses would need to construct balanced panels from this data.
With inputs from the PVL, The Economist includes findings of this new data here: “How the war in Ukraine compares to other refugee crises”
If using this data, please cite:
1) UNHCR Refugee Data Finder
*Using data corresponding to the year on which the UNHCR began tracking refugees and asylum seekers by country , we construct asylum country- and directed dyad-year panel datasets in which observations of NA are entered for years preceding the initiation of country-specific collection efforts. Because data on flows were typically secured at the level of the asylum (and not origin) country, for the origin country-year panel dataset, we are able to identify only a subset of those observations for which data was incomplete or missing. For those cases, we similarly set values to NA but urge caution when using origin country-year data as some historical values may only partially reflect total outflows.