Skip to content

jblumenstock/migration_networks_replication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Replication Package for: Migration and the Value of Social Networks

Oct 15, 2023

This replication package accompanies Blumenstock, J.E., Chi, G., Tan, X. (forthcoming). "Migration and the Value of Social Networks". Review of Economic Studies.

Authors

  • Joshua Blumenstock
  • Guanghua Chi
  • Xu Tan

License

The code in this repository is provided under a GNU GPL v3.0 License

Data availability and provenance statements

Statement about rights

The author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.

Data availability

The paper uses a mix of public survey data and private mobile phone data. The survey data were obtained from the National Institute of Statistics of Rwanda (2012, 2014).

The private data used in the paper were derived from mobile phone metadata obtained from a mobile phone operator in Rwanda. Due to privacy and confidentiality restrictions, these raw data cannot be shared publicly. The data were obtained by, and used with the permission of, Nathan Eagle ([email protected]); requests to access such data should be directed to Nathan.

The data folder contains all of the sample input and output data required to generate the figures and tables.

Computational Requirements

Our analysis was run on a 56-core Intel-based Linux server with 512 GB of memory. Most of the code can be run using the software listed below. In a few instances noted in the table at the end of this README, specific scripts require additional software

  • Python 2.7.16
    • pandas 0.24.2
    • numpy 1.16.5
    • matplotlib 2.2.3
    • seaborn 0.9.0
    • dateutil 2.8.0
    • statsmodels 0.10.1
    • GraphLab-Create 2.1
    • pyspark 2.1.1
  • R 4.3.1
    • fixest 0.11.1

When performing this analysis, we installed GraphLab using Python 2.7 pip install GraphLab-Create. GraphLab has since been deprecated and replaced by Turi Create, which can be installed by pip install turicreate. However, our code has not been tested on Turi Create.

Description of programs/code

The repository contains three folders:

  • data: Contains all of the sample input and output data required to generate the figures and tables
  • figures: Contains the code needed to generate the figures
  • tables: Contains the code needed to generate the tables

The top-level directory also contains three scripts that are used to detect migrations and calculate network statistics for each mobile subscriber. The outputs of these scripts provide the input for most of the tables and figures in the paper.

Name Step Script input data output Notes
Detecting migrants step 1: calculate modal districts for each month. modal_district.py XXXX_mobility.txt XXXX_modal_district.txt pyspark is required to run this script. Run this script within Spark Shell (./bin/spark-shell)
step 2: detect migration type of each person (urban to rural, rural to urban, rural to rural, remains, roamers) migration_type.py XXXX_modal_district.txt XXXX_migration.txt
XXXX_migration_XXmonth.txt
pyspark is required to run this script. Run this script within Spark Shell (./bin/spark-shell)
You can modify the script to use 3-, 6-, and 12-month definitions of migration.
Network structure calculate network structures for each person, such as degree, support, infomration network_structure.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_user_result.csv graphlab and snap are required to run the script

List of tables and figures

The provided code reproduces all tables and figures in the paper.

Figure Name Step Script input data output Note
Figure 1 Schematic diagrams of the social networks of three migrants N/A N/A N/A N/A It's just a schematic diagram. No code or data is required.
Figure 2 The social network of a single migrant N/A N/A N/A N/A Gephi is used to visualize this network.
Figure 3 Location of all mobile phone towers in Rwanda, circa 2008 N/A N/A tower_district.csv N/A QGIS is used to map the tower location.
Figure 4 Changes in network structure over time figure_4_A2_A3.py XXXX_migration.txt
XXXX_user_result.csv
Figure 4
Figure A2
Figure A3
Figure 5 Migration and degree centrality figure_5_6ac_7ac.py XXXX_migration.txt
XXXX_user_result.csv
Figure 5
Figure 6 Migration and network “interconnectedness” step 1: generate rate for the regression figure_6bd_7bd_A6bd_step1_generate_data_for_regression.py XXXX_migration.txt
XXXX_user_result.csv
cluster_dest_for_regression.csv
information_dest_for_regression.csv
support_dest_for_regression.csv
step 2: convert the data format for regression figure_6bd_7bd_A6bd_step2_convert_data.py cluster_dest_for_regression.csv
information_dest_for_regression.csv
support_dest_for_regression.csv
cluster_dest_for_regression_22degree.csv
information_dest_for_regression_22degree.csv
support_dest_for_regression_22degree.csv
graphlab is required to run the script
step 3: sampling the samples because of computing resource restriction figure_6bd_7bd_A6bd_step3_sampling.py cluster_dest_for_regression_22degree.csv
information_dest_for_regression_22degree.csv
support_dest_for_regression_22degree.csv
cluster_dest_for_regression_22degree_sampled_10pct.csv
information_dest_for_regression_22degree_sampled_10pct.csv
support_dest_for_regression_22degree_sampled_10pct.csv
step 4: calculate coefficient and confidence interval figure_6bd_7bd_A6bd_step4_regression.R cluster_dest_for_regression_22degree_sampled_10pct.csv
information_dest_for_regression_22degree_sampled_10pct.csv
support_dest_for_regression_22degree_sampled_10pct.csv
inset_XXXXX_coef_XXXX.csv
inset_XXXXX_se_XXX_XXXX.csv
(only included destination information in the repository. The other files have the same format)
step 5: plot the figure figure_5_6ac_7ac.py, figure_6bd_7bd_A6bd_step5_plot.py inset_XXXXX_coef_XXXX.csv
inset_XXXXX_se_XXX_XXXX.csv
Figure 6
Figure 7 Relationship between migration and “extensiveness” same as Figure 6 same as Figure 6 inset_XXXXX_coef_XXXX.csv
inset_XXXXX_se_XXX_XXXX.csv
Figure 7
Figure 8 The role of (higher order) strong and weak ties in a migrant’s network N/A N/A N/A N/A
Figure A1 Validation of Migration Data step 1: calculate the proportion of migrants to/from each district figure_A1_A13_step1.py XXXX_migration.txt
XXXX_migration_12month.txt
cdr_move_to_district_proportion_2month.csv
cdr_move_from_district_proportion_2month.csv
cdr_move_to_district_proportion_12month.csv
cdr_move_from_district_proportion_12month.csv
step 2: plot the distribution figure_A1_step2.py cdr_move_to_district_proportion_2month.csv
cdr_move_from_district_proportion_2month.csv
census_destination_simple.csv
census_origin.csv
(the last two files are from the internal migrants reported in the 2012 Rwandan census data. see the details in the paper)
Figure A1
Figure A2 Changes in number of contacts over time figure_4_A2_A3.py XXXX_migration.txt
XXXX_user_result.csv
Figure 4
Figure A2
Figure A3
Figure A3 Changes in number of calls over time figure_4_A2_A3.py XXXX_migration.txt
XXXX_user_result.csv
Figure 4
Figure A2
Figure A3
Figure A4 Number of friends of friends, before and after migration (migrants) figure_A4_A5.py XXXX_migration.txt
XXXX_user_result.csv
Figure A4
Figure A5
Figure A5 Percent of friends with common support, before and after migration (migrants) figure_A4_A5.py XXXX_migration.txt
XXXX_user_result.csv
Figure A4
Figure A5
Figure A6 Relationship between migration rate and clustering same as Figure 6 and 7 same as Figure 6 and 7 Figure A6
Figure A7 Migrants have fewer friends of friends than non-migrants figure_A7.py dest_home_d_s_l.csv Figure A7
Figure A8 Number of friends of friends, before and after migration, shift-share approach figure_A8_A9.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
Figure A8
Figure A9
Figure A9 Percent of friends with common support, before and after migration, shift-share approach figure_A8_A9.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
Figure A8
Figure A9
Figure A10 Calibration results: marginal plots step 1: calculate utility figure_A10_A11_A12_step1_calcualte_utility.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_user_result.csv
all_add_feature dataframe (not saved into a file, directly run step 2) These three scripts in the three steps were supposed to be in one file. To make it easier to understand, we split it into three parts.
step 2: simulate figure_A10_step2_simulate.py all_add_feature dataframe a dataframe a dataframe (not saved into a file, directly run step 3)
step 3: plot figure_A10_A12_step3_plot.py a dataframe Figure A10
Figure A11 Calibration results: ‘information’ and ‘cooperation’ utility step 1: calculate utility figure_A10_A11_A12_step1_calcualte_utility.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_user_result.csv
all_add_feature dataframe
step 2: plot utility figure_A11_step2_plot.py all_add_feature dataframe figure A11
Figure A12 Calibration results (with τ): marginal plots step 1: calculate utility figure_A10_A11_A12_step1_calcualte_utility.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_user_result.csv
all_add_feature dataframe
step 2: simulate figure_A12_step2_simulate.py all_add_feature dataframe a dataframe (not saved into a file, directly run step 3)
step 3: plot figure_A10_A12_step3_plot.py a dataframe Figure A12
Figure A13 Validation of Migration Data - Varying Definition of Migration step 1: calculate the proportion of migrants to/from each district figure_A1_A13_step1.py XXXX_migration.txt
XXXX_migration_6month.txt
XXXX_migration_12month.txt
cdr_move_to_district_proportion_2month.csv
cdr_move_from_district_proportion_2month.csv
cdr_move_to_district_proportion_6month.csv
cdr_move_from_district_proportion_6month.csv
cdr_move_to_district_proportion_12month.csv
cdr_move_from_district_proportion_12month.csv
step 2: plot the distribution figure_A13_step2.py cdr_move_to_district_proportion_6month.csv
cdr_move_to_district_proportion_12month.csv
census_destination_simple.csv
census_origin.csv
(the last two files are from the internal migrants reported in the 2012 Rwandan census data. see the details in the paper)
Figure A13

The folder of tables lists all the codes for generating tables in the paper. The table below lists all the code, input data, output data, and note for each table.

Table Name Step Script input data output Note
Table 1 Summary statistics of mobile phone metadata table1.py XXXX_call.txt
XXXX_migration.txt
XXXX_user_result.csv
Table 1
Table 2 Effects of home & destination network structure on migration step 1: generate data for regressions table2_step1_generate_data_for_regression.py XXXX_migration.txt
XXXX_user_result.csv
dest_home_d_s_l.csv
step 2: regression table_2_step2_regression.R dest_home_d_s_l.csv Table 2
Table A1 Detailed migration statistics derived from phone data, for different definitions of ‘migration’ table_A1.py XXXX_migration.txt
XXXX_migration_1month.txt
XXXX_migration_3month.txt
XXXX_migration_6month.txt
Table A1
Table A2 Migration and destination network structure - Migrants only step 1: generate data for regressions table2_step1_generate_data_for_regression.py XXXX_migration.txt
XXXX_user_result.csv
dest_home_d_s_l_migrants_only.csv Use the same scipt as Table 2. But uncoment line 112 to get migrants only.
step 2: regression table_A2.R dest_home_d_s_l_migrants_only.csv Table A2
Table A3 Heterogeneity by Migration Frequency (Repeat and First-time) step 1: generate data for regressions table_A3_step1_firsttime_repeat.py XXXX_migration.txt
XXXX_user_result.csv
dest_home_d_s_l.csv
dest_home_d_s_l_firsttime_repeat.csv
step 2: regression table_A3_step2.R dest_home_d_s_l_firsttime_repeat.csv Table A3
Table A4 Heterogeneity by Migration Duration (Long-term vs. Short-term) step 1: calculate longterm and shortterm migrants table_A4_step1_longterm_migrants.py XXXX_migration_longterm.csv
XXXX_migration_shortterm.csv
step 2: generate data for regressions table_A4_step2_shorttime_longtime.py XXXX_user_result.csv
XXXX_migration.txt
XXXX_migration_longterm.csv
XXXX_migration_shortterm.csv
dest_home_d_s_l_shorttime_longtime.csv
step 3: regression table_A4_step3.R dest_home_d_s_l_shorttime_longtime.csv
dest_home_d_s_l.csv
Table A4
Table A5 Heterogeneity by Distance (Adjacent districts vs. Non-adjacent districts) step 1: generate data for regressions table_A5_step1_adjacent.py neighbor_district.csv
dest_home_d_s_l.csv
dest_home_d_s_l_adjacent.csv
step 2: regression table_A5_step2.R dest_home_d_s_l_adjacent.csv Table A5
Table A6 The role of recent migrants and co-migrants step 1: calculate recent migrants and co-migrants table_A6_step1_network_feature_recent_migrant.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_user_result_recent_migrant.csv
step 2: generate data for regressions table_A6_step2_recent_migrant.py XXXX_migration.txt
XXXX_user_result.csv
XXXX_user_result_recent_migrant.csv
dest_home_d_s_l.csv
dest_home_d_s_l_recent_migrant.csv
step 3: regression table_A6_step3.R dest_home_d_s_l_recent_migrant.csv Table A6
Table A7 Migration and networks, controlling for prior visits to the destination step 1: calculate if a migrant visited the destination before table_A7_step1_migration_visit_before.py XXXX_call.txt
XXXX_migration.txt
tower_district.csv
XXXX_migrant_if_visit_before.csv
step 2: generate data for regressions table_A7_step2_visit_before.py XXXX_migration.txt
XXXX_user_result.csv
XXXX_migrant_if_visit_before.csv
dest_home_d_s_l_visit_before.csv
step 3: regression table_A7_step3.R dest_home_d_s_l_visit_before.csv Table A7
Table A8 The role of strong ties and weak ties step 1: calculate strong/weak ties table_A8_step1_user_feature_strongtie.py XXXX_call.txt
XXXX_migration.txt
tower_district.csv
XXXX_user_result_strongtie.csv
step 2: generate data for regressions table_A8_step2_strongtie.py XXXX_migration.txt
XXXX_user_result.csv
XXXX_user_result_strongtie.csv
dest_home_d_s_l_strongtie.csv
step 3: regression table_A8_step3.R dest_home_d_s_l_strongtie.csv Table A8
Table A9 Disaggregating the friend of friend effect by the strength of the 2nd-degree tie step 1: calculate strong/weak ties for information and support table_A9_A10_step1_strongweak.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_user_result_infor_support_strong.csv
step 2: generate data for regressions table_A9_step2_infor_strong.py XXXX_migration.txt
XXXX_user_result.csv
XXXX_user_result_infor_support_strong.csv.csv
dest_home_d_s_l_infor_strong.csv
step 3: regression table_A9_step3.R dest_home_d_s_l_infor_strong.csv Table A9
Table A10 Disaggregating the network support effect by the strength of supported ties step 1: calculate strong/weak ties for information and support table_A9_A10_step1_strongweak.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_user_result_infor_support_strong.csv
step 2: generate data for regressions table_A10_step2_support_strong.py XXXX_migration.txt
XXXX_user_result.csv
XXXX_user_result_infor_support_strong.csv
XXXX_user_result_strongtie.csv
dest_home_d_s_l_support_strong.csv
step 3: regression table_A10_step3.R dest_home_d_s_l_support_strong.csv
dest_home_d_s_l.csv
Table A10
Table A11 Beyond location-specific subnetworks step 1: calculate overall information table_A11_step1_information_overall.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
XXXX_information_overall_network.csv
step 2: generate data for regressions table_A11_step2_beyond_location.py XXXX_migration.txt
XXXX_user_result.csv
XXXX_information_overall_network.csv
dest_home_d_s_l_information_overall_network.csv
step 3: regression table_A11_step3.R dest_home_d_s_l_information_overall_network.csv Table A11
Table A12 Jointly estimated effects (6 month network lag) table_A12.R same input files as Table 2. Table A12 Preprocessing steps are the same as Table 2. Only need to change line 119 in network_structure.py to network_date = start_date + relativedelta(months=i - pre_month_n + 1)
Table A13 “Shift share” regression results step 1: calculate shift share network features table_A13_step1_network_feature.py XXXX_call.txt
XXXX_modal_district.txt
XXXX_migration.txt
user_result_information_support_diff_if_remained_True_XXXX_between_XX_to_XX_months.csv
step 2: generate data for regressions table_A13_step2_shift_share_data.py XXXX_migration.txt
user_result_information_support_diff_if_remained_True_XXXX_between_XX_to_XX_months.csv
dest_home_for_regression_information_support_diff_between_XX_to_XX_month.csv
step 3: regression table_A13_step3.R dest_home_for_regression_information_support_diff_between_XX_to_XX_month.csv Table A13
Table A14 Robustness to alternative fixed effect specifications table_A14.R dest_home_d_s_l.csv Table A14

About

Replication files for "Migration and the Value of Social Networks"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published