Replication Package for: Migration and the Value of Social Networks

Oct 15, 2023

This replication package accompanies Blumenstock, J.E., Chi, G., Tan, X. (forthcoming). "Migration and the Value of Social Networks". Review of Economic Studies.

Authors

Joshua Blumenstock
Guanghua Chi
Xu Tan

License

The code in this repository is provided under a GNU GPL v3.0 License

Data availability and provenance statements

Statement about rights

The author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.

Data availability

The paper uses a mix of public survey data and private mobile phone data. The survey data were obtained from the National Institute of Statistics of Rwanda (2012, 2014).

National Institute of Statistics of Rwanda, “The Evolution of Poverty in Rwanda from 2000 to 2011: Results from the Household Surveys (EICV),” Technical Report, Kigali, Rwanda February 2012. Accessed from https://catalog.ihsn.org/index.php/catalog/3142/download/46398
National Institute of Statistics of Rwanda, “Migration and Spatial Mobility,” Technical Report, Kigali, Rwanda January 2014. Accessed from https://www.statistics.gov.rw/publication/rphc4-thematic-report-migration-and-spatial-mobility

The private data used in the paper were derived from mobile phone metadata obtained from a mobile phone operator in Rwanda. Due to privacy and confidentiality restrictions, these raw data cannot be shared publicly. The data were obtained by, and used with the permission of, Nathan Eagle ([email protected]); requests to access such data should be directed to Nathan.

The data folder contains all of the sample input and output data required to generate the figures and tables.

Computational Requirements

Our analysis was run on a 56-core Intel-based Linux server with 512 GB of memory. Most of the code can be run using the software listed below. In a few instances noted in the table at the end of this README, specific scripts require additional software

Python 2.7.16
- pandas 0.24.2
- numpy 1.16.5
- matplotlib 2.2.3
- seaborn 0.9.0
- dateutil 2.8.0
- statsmodels 0.10.1
- GraphLab-Create 2.1
- pyspark 2.1.1
R 4.3.1
- fixest 0.11.1

When performing this analysis, we installed GraphLab using Python 2.7 pip install GraphLab-Create. GraphLab has since been deprecated and replaced by Turi Create, which can be installed by pip install turicreate. However, our code has not been tested on Turi Create.

Description of programs/code

The repository contains three folders:

data: Contains all of the sample input and output data required to generate the figures and tables
figures: Contains the code needed to generate the figures
tables: Contains the code needed to generate the tables

The top-level directory also contains three scripts that are used to detect migrations and calculate network statistics for each mobile subscriber. The outputs of these scripts provide the input for most of the tables and figures in the paper.

Name	Step	Script	input data	output	Notes
Detecting migrants	step 1: calculate modal districts for each month.	modal_district.py	XXXX_mobility.txt	XXXX_modal_district.txt	pyspark is required to run this script. Run this script within Spark Shell (./bin/spark-shell)
	step 2: detect migration type of each person (urban to rural, rural to urban, rural to rural, remains, roamers)	migration_type.py	XXXX_modal_district.txt	XXXX_migration.txt XXXX_migration_XXmonth.txt	pyspark is required to run this script. Run this script within Spark Shell (./bin/spark-shell) You can modify the script to use 3-, 6-, and 12-month definitions of migration.
Network structure	calculate network structures for each person, such as degree, support, infomration	network_structure.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	XXXX_user_result.csv	graphlab and snap are required to run the script

List of tables and figures

The provided code reproduces all tables and figures in the paper.

Figure	Name	Step	Script	input data	output	Note
Figure 1	Schematic diagrams of the social networks of three migrants	N/A	N/A	N/A	N/A	It's just a schematic diagram. No code or data is required.
Figure 2	The social network of a single migrant	N/A	N/A	N/A	N/A	Gephi is used to visualize this network.
Figure 3	Location of all mobile phone towers in Rwanda, circa 2008	N/A	N/A	tower_district.csv	N/A	QGIS is used to map the tower location.
Figure 4	Changes in network structure over time		figure_4_A2_A3.py	XXXX_migration.txt XXXX_user_result.csv	Figure 4 Figure A2 Figure A3
Figure 5	Migration and degree centrality		figure_5_6ac_7ac.py	XXXX_migration.txt XXXX_user_result.csv	Figure 5
Figure 6	Migration and network “interconnectedness”	step 1: generate rate for the regression	figure_6bd_7bd_A6bd_step1_generate_data_for_regression.py	XXXX_migration.txt XXXX_user_result.csv	cluster_dest_for_regression.csv information_dest_for_regression.csv support_dest_for_regression.csv
		step 2: convert the data format for regression	figure_6bd_7bd_A6bd_step2_convert_data.py	cluster_dest_for_regression.csv information_dest_for_regression.csv support_dest_for_regression.csv	cluster_dest_for_regression_22degree.csv information_dest_for_regression_22degree.csv support_dest_for_regression_22degree.csv	graphlab is required to run the script
		step 3: sampling the samples because of computing resource restriction	figure_6bd_7bd_A6bd_step3_sampling.py	cluster_dest_for_regression_22degree.csv information_dest_for_regression_22degree.csv support_dest_for_regression_22degree.csv	cluster_dest_for_regression_22degree_sampled_10pct.csv information_dest_for_regression_22degree_sampled_10pct.csv support_dest_for_regression_22degree_sampled_10pct.csv
		step 4: calculate coefficient and confidence interval	figure_6bd_7bd_A6bd_step4_regression.R	cluster_dest_for_regression_22degree_sampled_10pct.csv information_dest_for_regression_22degree_sampled_10pct.csv support_dest_for_regression_22degree_sampled_10pct.csv	inset_XXXXX_coef_XXXX.csv inset_XXXXX_se_XXX_XXXX.csv (only included destination information in the repository. The other files have the same format)
		step 5: plot the figure	figure_5_6ac_7ac.py, figure_6bd_7bd_A6bd_step5_plot.py	inset_XXXXX_coef_XXXX.csv inset_XXXXX_se_XXX_XXXX.csv	Figure 6
Figure 7	Relationship between migration and “extensiveness”	same as Figure 6	same as Figure 6	inset_XXXXX_coef_XXXX.csv inset_XXXXX_se_XXX_XXXX.csv	Figure 7
Figure 8	The role of (higher order) strong and weak ties in a migrant’s network	N/A	N/A	N/A	N/A
Figure A1	Validation of Migration Data	step 1: calculate the proportion of migrants to/from each district	figure_A1_A13_step1.py	XXXX_migration.txt XXXX_migration_12month.txt	cdr_move_to_district_proportion_2month.csv cdr_move_from_district_proportion_2month.csv cdr_move_to_district_proportion_12month.csv cdr_move_from_district_proportion_12month.csv
		step 2: plot the distribution	figure_A1_step2.py	cdr_move_to_district_proportion_2month.csv cdr_move_from_district_proportion_2month.csv census_destination_simple.csv census_origin.csv (the last two files are from the internal migrants reported in the 2012 Rwandan census data. see the details in the paper)	Figure A1
Figure A2	Changes in number of contacts over time		figure_4_A2_A3.py	XXXX_migration.txt XXXX_user_result.csv	Figure 4 Figure A2 Figure A3
Figure A3	Changes in number of calls over time		figure_4_A2_A3.py	XXXX_migration.txt XXXX_user_result.csv	Figure 4 Figure A2 Figure A3
Figure A4	Number of friends of friends, before and after migration (migrants)		figure_A4_A5.py	XXXX_migration.txt XXXX_user_result.csv	Figure A4 Figure A5
Figure A5	Percent of friends with common support, before and after migration (migrants)		figure_A4_A5.py	XXXX_migration.txt XXXX_user_result.csv	Figure A4 Figure A5
Figure A6	Relationship between migration rate and clustering		same as Figure 6 and 7	same as Figure 6 and 7	Figure A6
Figure A7	Migrants have fewer friends of friends than non-migrants		figure_A7.py	dest_home_d_s_l.csv	Figure A7
Figure A8	Number of friends of friends, before and after migration, shift-share approach		figure_A8_A9.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	Figure A8 Figure A9
Figure A9	Percent of friends with common support, before and after migration, shift-share approach		figure_A8_A9.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	Figure A8 Figure A9
Figure A10	Calibration results: marginal plots	step 1: calculate utility	figure_A10_A11_A12_step1_calcualte_utility.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt XXXX_user_result.csv	all_add_feature dataframe (not saved into a file, directly run step 2)	These three scripts in the three steps were supposed to be in one file. To make it easier to understand, we split it into three parts.
		step 2: simulate	figure_A10_step2_simulate.py	all_add_feature dataframe	a dataframe a dataframe (not saved into a file, directly run step 3)
		step 3: plot	figure_A10_A12_step3_plot.py	a dataframe	Figure A10
Figure A11	Calibration results: ‘information’ and ‘cooperation’ utility	step 1: calculate utility	figure_A10_A11_A12_step1_calcualte_utility.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt XXXX_user_result.csv	all_add_feature dataframe
		step 2: plot utility	figure_A11_step2_plot.py	all_add_feature dataframe	figure A11
Figure A12	Calibration results (with τ): marginal plots	step 1: calculate utility	figure_A10_A11_A12_step1_calcualte_utility.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt XXXX_user_result.csv	all_add_feature dataframe
		step 2: simulate	figure_A12_step2_simulate.py	all_add_feature dataframe	a dataframe (not saved into a file, directly run step 3)
		step 3: plot	figure_A10_A12_step3_plot.py	a dataframe	Figure A12
Figure A13	Validation of Migration Data - Varying Definition of Migration	step 1: calculate the proportion of migrants to/from each district	figure_A1_A13_step1.py	XXXX_migration.txt XXXX_migration_6month.txt XXXX_migration_12month.txt	cdr_move_to_district_proportion_2month.csv cdr_move_from_district_proportion_2month.csv cdr_move_to_district_proportion_6month.csv cdr_move_from_district_proportion_6month.csv cdr_move_to_district_proportion_12month.csv cdr_move_from_district_proportion_12month.csv
		step 2: plot the distribution	figure_A13_step2.py	cdr_move_to_district_proportion_6month.csv cdr_move_to_district_proportion_12month.csv census_destination_simple.csv census_origin.csv (the last two files are from the internal migrants reported in the 2012 Rwandan census data. see the details in the paper)	Figure A13

The folder of tables lists all the codes for generating tables in the paper. The table below lists all the code, input data, output data, and note for each table.

Table	Name	Step	Script	input data	output	Note
Table 1	Summary statistics of mobile phone metadata		table1.py	XXXX_call.txt XXXX_migration.txt XXXX_user_result.csv	Table 1
Table 2	Effects of home & destination network structure on migration	step 1: generate data for regressions	table2_step1_generate_data_for_regression.py	XXXX_migration.txt XXXX_user_result.csv	dest_home_d_s_l.csv
		step 2: regression	table_2_step2_regression.R	dest_home_d_s_l.csv	Table 2
Table A1	Detailed migration statistics derived from phone data, for different definitions of ‘migration’		table_A1.py	XXXX_migration.txt XXXX_migration_1month.txt XXXX_migration_3month.txt XXXX_migration_6month.txt	Table A1
Table A2	Migration and destination network structure - Migrants only	step 1: generate data for regressions	table2_step1_generate_data_for_regression.py	XXXX_migration.txt XXXX_user_result.csv	dest_home_d_s_l_migrants_only.csv	Use the same scipt as Table 2. But uncoment line 112 to get migrants only.
		step 2: regression	table_A2.R	dest_home_d_s_l_migrants_only.csv	Table A2
Table A3	Heterogeneity by Migration Frequency (Repeat and First-time)	step 1: generate data for regressions	table_A3_step1_firsttime_repeat.py	XXXX_migration.txt XXXX_user_result.csv dest_home_d_s_l.csv	dest_home_d_s_l_firsttime_repeat.csv
		step 2: regression	table_A3_step2.R	dest_home_d_s_l_firsttime_repeat.csv	Table A3
Table A4	Heterogeneity by Migration Duration (Long-term vs. Short-term)	step 1: calculate longterm and shortterm migrants	table_A4_step1_longterm_migrants.py		XXXX_migration_longterm.csv XXXX_migration_shortterm.csv
		step 2: generate data for regressions	table_A4_step2_shorttime_longtime.py	XXXX_user_result.csv XXXX_migration.txt XXXX_migration_longterm.csv XXXX_migration_shortterm.csv	dest_home_d_s_l_shorttime_longtime.csv
		step 3: regression	table_A4_step3.R	dest_home_d_s_l_shorttime_longtime.csv dest_home_d_s_l.csv	Table A4
Table A5	Heterogeneity by Distance (Adjacent districts vs. Non-adjacent districts)	step 1: generate data for regressions	table_A5_step1_adjacent.py	neighbor_district.csv dest_home_d_s_l.csv	dest_home_d_s_l_adjacent.csv
		step 2: regression	table_A5_step2.R	dest_home_d_s_l_adjacent.csv	Table A5
Table A6	The role of recent migrants and co-migrants	step 1: calculate recent migrants and co-migrants	table_A6_step1_network_feature_recent_migrant.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	XXXX_user_result_recent_migrant.csv
		step 2: generate data for regressions	table_A6_step2_recent_migrant.py	XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_recent_migrant.csv dest_home_d_s_l.csv	dest_home_d_s_l_recent_migrant.csv
		step 3: regression	table_A6_step3.R	dest_home_d_s_l_recent_migrant.csv	Table A6
Table A7	Migration and networks, controlling for prior visits to the destination	step 1: calculate if a migrant visited the destination before	table_A7_step1_migration_visit_before.py	XXXX_call.txt XXXX_migration.txt tower_district.csv	XXXX_migrant_if_visit_before.csv
		step 2: generate data for regressions	table_A7_step2_visit_before.py	XXXX_migration.txt XXXX_user_result.csv XXXX_migrant_if_visit_before.csv	dest_home_d_s_l_visit_before.csv
		step 3: regression	table_A7_step3.R	dest_home_d_s_l_visit_before.csv	Table A7
Table A8	The role of strong ties and weak ties	step 1: calculate strong/weak ties	table_A8_step1_user_feature_strongtie.py	XXXX_call.txt XXXX_migration.txt tower_district.csv	XXXX_user_result_strongtie.csv
		step 2: generate data for regressions	table_A8_step2_strongtie.py	XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_strongtie.csv	dest_home_d_s_l_strongtie.csv
		step 3: regression	table_A8_step3.R	dest_home_d_s_l_strongtie.csv	Table A8
Table A9	Disaggregating the friend of friend effect by the strength of the 2nd-degree tie	step 1: calculate strong/weak ties for information and support	table_A9_A10_step1_strongweak.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	XXXX_user_result_infor_support_strong.csv
		step 2: generate data for regressions	table_A9_step2_infor_strong.py	XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_infor_support_strong.csv.csv	dest_home_d_s_l_infor_strong.csv
		step 3: regression	table_A9_step3.R	dest_home_d_s_l_infor_strong.csv	Table A9
Table A10	Disaggregating the network support effect by the strength of supported ties	step 1: calculate strong/weak ties for information and support	table_A9_A10_step1_strongweak.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	XXXX_user_result_infor_support_strong.csv
		step 2: generate data for regressions	table_A10_step2_support_strong.py	XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_infor_support_strong.csv XXXX_user_result_strongtie.csv	dest_home_d_s_l_support_strong.csv
		step 3: regression	table_A10_step3.R	dest_home_d_s_l_support_strong.csv dest_home_d_s_l.csv	Table A10
Table A11	Beyond location-specific subnetworks	step 1: calculate overall information	table_A11_step1_information_overall.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	XXXX_information_overall_network.csv
		step 2: generate data for regressions	table_A11_step2_beyond_location.py	XXXX_migration.txt XXXX_user_result.csv XXXX_information_overall_network.csv	dest_home_d_s_l_information_overall_network.csv
		step 3: regression	table_A11_step3.R	dest_home_d_s_l_information_overall_network.csv	Table A11
Table A12	Jointly estimated effects (6 month network lag)		table_A12.R	same input files as Table 2.	Table A12	Preprocessing steps are the same as Table 2. Only need to change line 119 in network_structure.py to network_date = start_date + relativedelta(months=i - pre_month_n + 1)
Table A13	“Shift share” regression results	step 1: calculate shift share network features	table_A13_step1_network_feature.py	XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt	user_result_information_support_diff_if_remained_True_XXXX_between_XX_to_XX_months.csv
		step 2: generate data for regressions	table_A13_step2_shift_share_data.py	XXXX_migration.txt user_result_information_support_diff_if_remained_True_XXXX_between_XX_to_XX_months.csv	dest_home_for_regression_information_support_diff_between_XX_to_XX_month.csv
		step 3: regression	table_A13_step3.R	dest_home_for_regression_information_support_diff_between_XX_to_XX_month.csv	Table A13
Table A14	Robustness to alternative fixed effect specifications		table_A14.R	dest_home_d_s_l.csv	Table A14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Package for: Migration and the Value of Social Networks

Oct 15, 2023

Authors

License

Data availability and provenance statements

Statement about rights

Data availability

Computational Requirements

Description of programs/code

List of tables and figures

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
figures		figures
tables		tables
LICENSE		LICENSE
README.md		README.md
migration_type.py		migration_type.py
modal_district.py		modal_district.py
network_structure.py		network_structure.py

License

jblumenstock/migration_networks_replication

Folders and files

Latest commit

History

Repository files navigation

Replication Package for: Migration and the Value of Social Networks

Oct 15, 2023

Authors

License

Data availability and provenance statements

Statement about rights

Data availability

Computational Requirements

Description of programs/code

List of tables and figures

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages