This replication package accompanies Blumenstock, J.E., Chi, G., Tan, X. (forthcoming). "Migration and the Value of Social Networks". Review of Economic Studies.
- Joshua Blumenstock
- Guanghua Chi
- Xu Tan
The code in this repository is provided under a GNU GPL v3.0 License
The author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.
The paper uses a mix of public survey data and private mobile phone data. The survey data were obtained from the National Institute of Statistics of Rwanda (2012, 2014).
-
National Institute of Statistics of Rwanda, “The Evolution of Poverty in Rwanda from 2000 to 2011: Results from the Household Surveys (EICV),” Technical Report, Kigali, Rwanda February 2012. Accessed from https://catalog.ihsn.org/index.php/catalog/3142/download/46398
-
National Institute of Statistics of Rwanda, “Migration and Spatial Mobility,” Technical Report, Kigali, Rwanda January 2014. Accessed from https://www.statistics.gov.rw/publication/rphc4-thematic-report-migration-and-spatial-mobility
The private data used in the paper were derived from mobile phone metadata obtained from a mobile phone operator in Rwanda. Due to privacy and confidentiality restrictions, these raw data cannot be shared publicly. The data were obtained by, and used with the permission of, Nathan Eagle ([email protected]); requests to access such data should be directed to Nathan.
The data folder contains all of the sample input and output data required to generate the figures and tables.
Our analysis was run on a 56-core Intel-based Linux server with 512 GB of memory. Most of the code can be run using the software listed below. In a few instances noted in the table at the end of this README, specific scripts require additional software
- Python 2.7.16
- pandas 0.24.2
- numpy 1.16.5
- matplotlib 2.2.3
- seaborn 0.9.0
- dateutil 2.8.0
- statsmodels 0.10.1
- GraphLab-Create 2.1
- pyspark 2.1.1
- R 4.3.1
- fixest 0.11.1
When performing this analysis, we installed GraphLab using Python 2.7 pip install GraphLab-Create
. GraphLab has since been deprecated and replaced by Turi Create, which can be installed by pip install turicreate
. However, our code has not been tested on Turi Create.
The repository contains three folders:
- data: Contains all of the sample input and output data required to generate the figures and tables
- figures: Contains the code needed to generate the figures
- tables: Contains the code needed to generate the tables
The top-level directory also contains three scripts that are used to detect migrations and calculate network statistics for each mobile subscriber. The outputs of these scripts provide the input for most of the tables and figures in the paper.
Name | Step | Script | input data | output | Notes |
---|---|---|---|---|---|
Detecting migrants | step 1: calculate modal districts for each month. | modal_district.py | XXXX_mobility.txt | XXXX_modal_district.txt | pyspark is required to run this script. Run this script within Spark Shell (./bin/spark-shell) |
step 2: detect migration type of each person (urban to rural, rural to urban, rural to rural, remains, roamers) | migration_type.py | XXXX_modal_district.txt | XXXX_migration.txt XXXX_migration_XXmonth.txt |
pyspark is required to run this script. Run this script within Spark Shell (./bin/spark-shell) You can modify the script to use 3-, 6-, and 12-month definitions of migration. |
|
Network structure | calculate network structures for each person, such as degree, support, infomration | network_structure.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
XXXX_user_result.csv | graphlab and snap are required to run the script |
The provided code reproduces all tables and figures in the paper.
Figure | Name | Step | Script | input data | output | Note |
---|---|---|---|---|---|---|
Figure 1 | Schematic diagrams of the social networks of three migrants | N/A | N/A | N/A | N/A | It's just a schematic diagram. No code or data is required. |
Figure 2 | The social network of a single migrant | N/A | N/A | N/A | N/A | Gephi is used to visualize this network. |
Figure 3 | Location of all mobile phone towers in Rwanda, circa 2008 | N/A | N/A | tower_district.csv | N/A | QGIS is used to map the tower location. |
Figure 4 | Changes in network structure over time | figure_4_A2_A3.py | XXXX_migration.txt XXXX_user_result.csv |
Figure 4 Figure A2 Figure A3 |
||
Figure 5 | Migration and degree centrality | figure_5_6ac_7ac.py | XXXX_migration.txt XXXX_user_result.csv |
Figure 5 | ||
Figure 6 | Migration and network “interconnectedness” | step 1: generate rate for the regression | figure_6bd_7bd_A6bd_step1_generate_data_for_regression.py | XXXX_migration.txt XXXX_user_result.csv |
cluster_dest_for_regression.csv information_dest_for_regression.csv support_dest_for_regression.csv |
|
step 2: convert the data format for regression | figure_6bd_7bd_A6bd_step2_convert_data.py | cluster_dest_for_regression.csv information_dest_for_regression.csv support_dest_for_regression.csv |
cluster_dest_for_regression_22degree.csv information_dest_for_regression_22degree.csv support_dest_for_regression_22degree.csv |
graphlab is required to run the script | ||
step 3: sampling the samples because of computing resource restriction | figure_6bd_7bd_A6bd_step3_sampling.py | cluster_dest_for_regression_22degree.csv information_dest_for_regression_22degree.csv support_dest_for_regression_22degree.csv |
cluster_dest_for_regression_22degree_sampled_10pct.csv information_dest_for_regression_22degree_sampled_10pct.csv support_dest_for_regression_22degree_sampled_10pct.csv |
|||
step 4: calculate coefficient and confidence interval | figure_6bd_7bd_A6bd_step4_regression.R | cluster_dest_for_regression_22degree_sampled_10pct.csv information_dest_for_regression_22degree_sampled_10pct.csv support_dest_for_regression_22degree_sampled_10pct.csv |
inset_XXXXX_coef_XXXX.csv inset_XXXXX_se_XXX_XXXX.csv (only included destination information in the repository. The other files have the same format) |
|||
step 5: plot the figure | figure_5_6ac_7ac.py, figure_6bd_7bd_A6bd_step5_plot.py | inset_XXXXX_coef_XXXX.csv inset_XXXXX_se_XXX_XXXX.csv |
Figure 6 | |||
Figure 7 | Relationship between migration and “extensiveness” | same as Figure 6 | same as Figure 6 | inset_XXXXX_coef_XXXX.csv inset_XXXXX_se_XXX_XXXX.csv |
Figure 7 | |
Figure 8 | The role of (higher order) strong and weak ties in a migrant’s network | N/A | N/A | N/A | N/A | |
Figure A1 | Validation of Migration Data | step 1: calculate the proportion of migrants to/from each district | figure_A1_A13_step1.py | XXXX_migration.txt XXXX_migration_12month.txt |
cdr_move_to_district_proportion_2month.csv cdr_move_from_district_proportion_2month.csv cdr_move_to_district_proportion_12month.csv cdr_move_from_district_proportion_12month.csv |
|
step 2: plot the distribution | figure_A1_step2.py | cdr_move_to_district_proportion_2month.csv cdr_move_from_district_proportion_2month.csv census_destination_simple.csv census_origin.csv (the last two files are from the internal migrants reported in the 2012 Rwandan census data. see the details in the paper) |
Figure A1 | |||
Figure A2 | Changes in number of contacts over time | figure_4_A2_A3.py | XXXX_migration.txt XXXX_user_result.csv |
Figure 4 Figure A2 Figure A3 |
||
Figure A3 | Changes in number of calls over time | figure_4_A2_A3.py | XXXX_migration.txt XXXX_user_result.csv |
Figure 4 Figure A2 Figure A3 |
||
Figure A4 | Number of friends of friends, before and after migration (migrants) | figure_A4_A5.py | XXXX_migration.txt XXXX_user_result.csv |
Figure A4 Figure A5 |
||
Figure A5 | Percent of friends with common support, before and after migration (migrants) | figure_A4_A5.py | XXXX_migration.txt XXXX_user_result.csv |
Figure A4 Figure A5 |
||
Figure A6 | Relationship between migration rate and clustering | same as Figure 6 and 7 | same as Figure 6 and 7 | Figure A6 | ||
Figure A7 | Migrants have fewer friends of friends than non-migrants | figure_A7.py | dest_home_d_s_l.csv | Figure A7 | ||
Figure A8 | Number of friends of friends, before and after migration, shift-share approach | figure_A8_A9.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
Figure A8 Figure A9 |
||
Figure A9 | Percent of friends with common support, before and after migration, shift-share approach | figure_A8_A9.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
Figure A8 Figure A9 |
||
Figure A10 | Calibration results: marginal plots | step 1: calculate utility | figure_A10_A11_A12_step1_calcualte_utility.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt XXXX_user_result.csv |
all_add_feature dataframe (not saved into a file, directly run step 2) | These three scripts in the three steps were supposed to be in one file. To make it easier to understand, we split it into three parts. |
step 2: simulate | figure_A10_step2_simulate.py | all_add_feature dataframe | a dataframe a dataframe (not saved into a file, directly run step 3) | |||
step 3: plot | figure_A10_A12_step3_plot.py | a dataframe | Figure A10 | |||
Figure A11 | Calibration results: ‘information’ and ‘cooperation’ utility | step 1: calculate utility | figure_A10_A11_A12_step1_calcualte_utility.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt XXXX_user_result.csv |
all_add_feature dataframe | |
step 2: plot utility | figure_A11_step2_plot.py | all_add_feature dataframe | figure A11 | |||
Figure A12 | Calibration results (with τ): marginal plots | step 1: calculate utility | figure_A10_A11_A12_step1_calcualte_utility.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt XXXX_user_result.csv |
all_add_feature dataframe | |
step 2: simulate | figure_A12_step2_simulate.py | all_add_feature dataframe | a dataframe (not saved into a file, directly run step 3) | |||
step 3: plot | figure_A10_A12_step3_plot.py | a dataframe | Figure A12 | |||
Figure A13 | Validation of Migration Data - Varying Definition of Migration | step 1: calculate the proportion of migrants to/from each district | figure_A1_A13_step1.py | XXXX_migration.txt XXXX_migration_6month.txt XXXX_migration_12month.txt |
cdr_move_to_district_proportion_2month.csv cdr_move_from_district_proportion_2month.csv cdr_move_to_district_proportion_6month.csv cdr_move_from_district_proportion_6month.csv cdr_move_to_district_proportion_12month.csv cdr_move_from_district_proportion_12month.csv |
|
step 2: plot the distribution | figure_A13_step2.py | cdr_move_to_district_proportion_6month.csv cdr_move_to_district_proportion_12month.csv census_destination_simple.csv census_origin.csv (the last two files are from the internal migrants reported in the 2012 Rwandan census data. see the details in the paper) |
Figure A13 |
The folder of tables lists all the codes for generating tables in the paper. The table below lists all the code, input data, output data, and note for each table.
Table | Name | Step | Script | input data | output | Note |
---|---|---|---|---|---|---|
Table 1 | Summary statistics of mobile phone metadata | table1.py | XXXX_call.txt XXXX_migration.txt XXXX_user_result.csv |
Table 1 | ||
Table 2 | Effects of home & destination network structure on migration | step 1: generate data for regressions | table2_step1_generate_data_for_regression.py | XXXX_migration.txt XXXX_user_result.csv |
dest_home_d_s_l.csv | |
step 2: regression | table_2_step2_regression.R | dest_home_d_s_l.csv | Table 2 | |||
Table A1 | Detailed migration statistics derived from phone data, for different definitions of ‘migration’ | table_A1.py | XXXX_migration.txt XXXX_migration_1month.txt XXXX_migration_3month.txt XXXX_migration_6month.txt |
Table A1 | ||
Table A2 | Migration and destination network structure - Migrants only | step 1: generate data for regressions | table2_step1_generate_data_for_regression.py | XXXX_migration.txt XXXX_user_result.csv |
dest_home_d_s_l_migrants_only.csv | Use the same scipt as Table 2. But uncoment line 112 to get migrants only. |
step 2: regression | table_A2.R | dest_home_d_s_l_migrants_only.csv | Table A2 | |||
Table A3 | Heterogeneity by Migration Frequency (Repeat and First-time) | step 1: generate data for regressions | table_A3_step1_firsttime_repeat.py | XXXX_migration.txt XXXX_user_result.csv dest_home_d_s_l.csv |
dest_home_d_s_l_firsttime_repeat.csv | |
step 2: regression | table_A3_step2.R | dest_home_d_s_l_firsttime_repeat.csv | Table A3 | |||
Table A4 | Heterogeneity by Migration Duration (Long-term vs. Short-term) | step 1: calculate longterm and shortterm migrants | table_A4_step1_longterm_migrants.py | XXXX_migration_longterm.csv XXXX_migration_shortterm.csv |
||
step 2: generate data for regressions | table_A4_step2_shorttime_longtime.py | XXXX_user_result.csv XXXX_migration.txt XXXX_migration_longterm.csv XXXX_migration_shortterm.csv |
dest_home_d_s_l_shorttime_longtime.csv | |||
step 3: regression | table_A4_step3.R | dest_home_d_s_l_shorttime_longtime.csv dest_home_d_s_l.csv |
Table A4 | |||
Table A5 | Heterogeneity by Distance (Adjacent districts vs. Non-adjacent districts) | step 1: generate data for regressions | table_A5_step1_adjacent.py | neighbor_district.csv dest_home_d_s_l.csv |
dest_home_d_s_l_adjacent.csv | |
step 2: regression | table_A5_step2.R | dest_home_d_s_l_adjacent.csv | Table A5 | |||
Table A6 | The role of recent migrants and co-migrants | step 1: calculate recent migrants and co-migrants | table_A6_step1_network_feature_recent_migrant.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
XXXX_user_result_recent_migrant.csv | |
step 2: generate data for regressions | table_A6_step2_recent_migrant.py | XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_recent_migrant.csv dest_home_d_s_l.csv |
dest_home_d_s_l_recent_migrant.csv | |||
step 3: regression | table_A6_step3.R | dest_home_d_s_l_recent_migrant.csv | Table A6 | |||
Table A7 | Migration and networks, controlling for prior visits to the destination | step 1: calculate if a migrant visited the destination before | table_A7_step1_migration_visit_before.py | XXXX_call.txt XXXX_migration.txt tower_district.csv |
XXXX_migrant_if_visit_before.csv | |
step 2: generate data for regressions | table_A7_step2_visit_before.py | XXXX_migration.txt XXXX_user_result.csv XXXX_migrant_if_visit_before.csv |
dest_home_d_s_l_visit_before.csv | |||
step 3: regression | table_A7_step3.R | dest_home_d_s_l_visit_before.csv | Table A7 | |||
Table A8 | The role of strong ties and weak ties | step 1: calculate strong/weak ties | table_A8_step1_user_feature_strongtie.py | XXXX_call.txt XXXX_migration.txt tower_district.csv |
XXXX_user_result_strongtie.csv | |
step 2: generate data for regressions | table_A8_step2_strongtie.py | XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_strongtie.csv |
dest_home_d_s_l_strongtie.csv | |||
step 3: regression | table_A8_step3.R | dest_home_d_s_l_strongtie.csv | Table A8 | |||
Table A9 | Disaggregating the friend of friend effect by the strength of the 2nd-degree tie | step 1: calculate strong/weak ties for information and support | table_A9_A10_step1_strongweak.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
XXXX_user_result_infor_support_strong.csv | |
step 2: generate data for regressions | table_A9_step2_infor_strong.py | XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_infor_support_strong.csv.csv |
dest_home_d_s_l_infor_strong.csv | |||
step 3: regression | table_A9_step3.R | dest_home_d_s_l_infor_strong.csv | Table A9 | |||
Table A10 | Disaggregating the network support effect by the strength of supported ties | step 1: calculate strong/weak ties for information and support | table_A9_A10_step1_strongweak.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
XXXX_user_result_infor_support_strong.csv | |
step 2: generate data for regressions | table_A10_step2_support_strong.py | XXXX_migration.txt XXXX_user_result.csv XXXX_user_result_infor_support_strong.csv XXXX_user_result_strongtie.csv |
dest_home_d_s_l_support_strong.csv | |||
step 3: regression | table_A10_step3.R | dest_home_d_s_l_support_strong.csv dest_home_d_s_l.csv |
Table A10 | |||
Table A11 | Beyond location-specific subnetworks | step 1: calculate overall information | table_A11_step1_information_overall.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
XXXX_information_overall_network.csv | |
step 2: generate data for regressions | table_A11_step2_beyond_location.py | XXXX_migration.txt XXXX_user_result.csv XXXX_information_overall_network.csv |
dest_home_d_s_l_information_overall_network.csv | |||
step 3: regression | table_A11_step3.R | dest_home_d_s_l_information_overall_network.csv | Table A11 | |||
Table A12 | Jointly estimated effects (6 month network lag) | table_A12.R | same input files as Table 2. | Table A12 | Preprocessing steps are the same as Table 2. Only need to change line 119 in network_structure.py to network_date = start_date + relativedelta(months=i - pre_month_n + 1) | |
Table A13 | “Shift share” regression results | step 1: calculate shift share network features | table_A13_step1_network_feature.py | XXXX_call.txt XXXX_modal_district.txt XXXX_migration.txt |
user_result_information_support_diff_if_remained_True_XXXX_between_XX_to_XX_months.csv | |
step 2: generate data for regressions | table_A13_step2_shift_share_data.py | XXXX_migration.txt user_result_information_support_diff_if_remained_True_XXXX_between_XX_to_XX_months.csv |
dest_home_for_regression_information_support_diff_between_XX_to_XX_month.csv | |||
step 3: regression | table_A13_step3.R | dest_home_for_regression_information_support_diff_between_XX_to_XX_month.csv | Table A13 | |||
Table A14 | Robustness to alternative fixed effect specifications | table_A14.R | dest_home_d_s_l.csv | Table A14 |