Code accompanying the paper "# What Drives Online Popularity: Author, Content or Sharers? Estimating Spread Dynamics with Bayesian Mixture Hawkes" [Calderon P, Rizoiu MA].
This repo contains data and the Stan and Python implementation of the Bayesian Mixture Hawkes (BMH) model in the paper. The repo is divided into two parts: model_inference/
and headline_style_profiling/
.
model_inference/
contains the Stan and Python implementation of the BMH-P (popularity) and BMH-K (kernel) models. There are three subdirectories:
bmhp/
contains the BMH-P implementation and RNIX/CNIX cascade sizes.bmhk_rnix/
contains the BMH-K implementation and RNIX inter-arrival distribution data.bmhk_cnix/
contains the BMH-K implementation and CNIX inter-arrival distribution data.
Each model_inference/bmh*/
folder consists of the following folders:
data/
contains the pertinent data files for fittingstan_files/
contains the Stan code for the BMH model variantslog/
holds log files upon runningoutput/
holds (fitting) output files from runningrun_bayesianfit.sh
preds
holds (prediction) output files from runningrun_bayesianpred.sh
tracker/
stores tracker files when runningrun_bayesianfit.sh
on PBStracker_pred/
stores tracker files when runningrun_bayesianpred.sh
on PBSmetrics_mean/
andpreds_mean/
contain holdout likelihood and cascade size predictions for BMH-K and BMH-P, resp.
Note that the data files in bayesian-mixture-hawkes/model_inference/bmhk_cnix/data/
, bayesian-mixture-hawkes/model_inference/bmhk_rnix/data/
and bayesian-mixture-hawkes/model_inference/bmhp/data/
are clipped due to storage issues. Please adjust accordingly. Full datasets can be provided upon request.
Pipeline to run inference and prediction:
- Compile the Stan models using
compile_models.py
. - Run BMH fitting on data with
multiple_run_bayesianfit.sh
- Obtain BMH predictions with
multiple_run_bayesianpred.sh
- Run
collect_mean.py
to get mean metrics. - (BMH-P) Run
collect_pred.py
to get cascade size predictions.
headline_style_profiling/
contains the Python resources for the headline style profiling case study in the paper. There are two subdirectories.
inflammatory_fakenix/
contains resources for the CNIX analysis.inflammatory_rnix/
contains resources for the RNIX analysis.
Raw headline data is in inflammatory_fakenix/data/
.
Code to generate the graphs in the case study are in inflammatory_rnix/theta/combined_analysis.ipynb
.
Both dataset and code are distributed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. If you require a different license, please contact us at [email protected] or [email protected].