This repository contains code and data accompanying the publication “Linking User Opinion Dynamics and Online Discussions” [Largeron et al, ’21].
[Largeron et al, '21] Largeron, C., Mardale, A., & Rizoiu, M.-A. Linking User Opinion Dynamics and Online Discussions. In Proceedings of the Symposium on Intelligent Data Analysis, 2021.
This repository contains the following data – the Reddit discussions around Brexit (submissions and comments):
– contains the Reddit submission (posts) that initiate discussion threads around Brexit (CSV compressed using LZMA). Lines are individual submissions, columns are features of the submissions (e.g., author, text, URL etc). The functionreadSubmissions()
in the fileutils.R
reads submissions into an R object.Data/diffusions_comments_extra.csv.xz
– contains the comments to each of the Reddit submissions contained in the above file (CSV compressed using LZMA). Lines are individual comments, columns are features similar to the submissions. The functionreadComments()
in the fileutils.R
reads comments into an R object. Comments and submissions can be merged using the functionmergeSubmissionsAndComments()
– contains the Brexit stance detector: a trained Naive Bayes model (trained on Twitter data) for labeling whether a text is Pro- or Against-Brexit.
We also provide the following code scripts:
– Python script to crawl ther/brexit
subreddit. Creates the submissions and comments files here above. R script that starts fromdata/all_users_data.csv.xz
, and builds the profession profiles (stored in the filedata/profession-profiles.csv
(where X is 0-3) – R scripts to build the textual description (FS0) and activity descriptors (FS1 to FS3) to predict the future Brexit stance (see paper for details). These scripts generate the filesData/feature-sets/FX_improved_data.csv
(where X is 0-3), which are the datasets used to train the next stance classifiers.scripts/library_loader.R
– R script that to load all required libraries for execution and check their versions.scripts/utils.R
– additional functions for reading, writing data and plotting.
Both data set and code are distributed under the General Public License v3 (GPLv3) license, a copy of which is included in this repository, in the LICENSE file. If you require a different license and for other questions, please contact us at [email protected]