Refugee and Asylum Seeker Flow Data

Dyad-year refugee and asylum seeker outflows (between country centroids for all flows greater than or equal to 500 persons for the period 2000 through 2021). Source: Andrew Shaver; UNHCR

The lab is grateful to have had the opportunity to work with the United Nations Refugee Agency (UNHCR) on its release of (directed) dyadic, yearly refugee and asylum seeker flow data, released ahead of World Refugee Day 2022. Please see:

Shaver, A., Krick, B., Blancaflor, J., Liu, X., Samara, G., Ku, S., Hu, S., Carreon, M., Lim, T., Raps, R., Velasquez, A., Angelo, J., de Melo, S. & Zuo, Z. “The Causes and Consequences of Refugee Flows: A Contemporary Re-Analysis.” American Political Science Review (2024).

For additional details about the data see also the associated APSR appendix:

https://static.cambridge.org/content/id/urn%3Acambridge.org%3Aid%3Aarticle%3AS0003055424000285/resource/name/S0003055424000285sup001.pdf

The raw refugee flow data may be accessed directly from the UNHCR at the following address:

https://www.unhcr.org/refugee-statistics/insights/explainers/forcibly-displaced-flow-data.html

For panel datasets at the origin country-year; asylum country-year; and directed dyad-year, please visit the Harvard Dataverse replication materials:

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JADOZL

Within the Harvard Dataverse files, users will find the following three files:

  1. state_year_final.csv

  2. dyad_year_final.csv

  3. flows_imputed.csv

state_year_final contains four variables:

  • flows — This variable captures yearly outflows from the origin state with years preceding country reporting timelines set to NA.* (Though, see below for an important caveat when using origin-level data.)

  • flows_zero — This variable captures yearly outflows from the origin state with years preceding country reporting timelines set to 0. (We do not recommend using this variable given that many 0s likely represent positive unobserved values. This variable is included to sake of comparisons carried out in the analyses presented in the APSR article.)

  • inflow — This variable captures yearly inflows to the asylum state with years preceding country reporting timelines set to NA.

  • infows_zero — This variable captures yearly inflows to the asylum state with years preceding country reporting timelines set to 0. (We do not recommend using this variable given that many 0s likely represent positive unobserved values. This variable is included for comparison purposes.)

dyad_year_final contains two variables:

  • flows — This variable captures yearly directed flows between state pairs, with years preceding country reporting timelines set to NA.

  • flows_zero — This variable captures yearly directed flows between state pairs, with years preceding country reporting timelines set to 0. (We do not recommend using this variable given that many 0s likely represent positive unobserved values. This variable is included for comparison purposes.)

flows_imputed.csv contains imputed observations for major cases of displacement like Afghanistan, Colombia, and Iraq.

With inputs from the PVL, The Economist includes findings of this new data here: “How the war in Ukraine compares to other refugee crises”

If using this data, please cite:

1) UNHCR Refugee Data Finder

2) Shaver, A., Krick, B., Blancaflor, J., Liu, X., Samara, G., Ku, S., Hu, S., Carreon, M., Lim, T., Raps, R., Velasquez, A., Angelo, J., de Melo, S. & Zuo, Z. “The Causes and Consequences of Refugee Flows: A Contemporary Re-Analysis.” American Political Science Review (2024).

3) Fearon, J. & Shaver, A. “Civil War Violence and Refugee Outflows.” Empirical Studies of Conflict Project. Working Paper No. 25 (2021).

*Using data corresponding to the year on which the UNHCR began tracking refugees and asylum seekers by country , we construct asylum country- and directed dyad-year panel datasets in which observations of NA are entered for years preceding the initiation of country-specific collection efforts. Because data on flows were typically secured at the level of the asylum (and not origin) country, for the origin country-year panel dataset, we are able to identify only a subset of those observations for which data was incomplete or missing. For those cases, we similarly set values to NA but urge caution when using origin country-year data as some historical values may only partially reflect total outflows.