Skip to content

tmistele/predicting-citation-counts-net

Repository files navigation

Predicting Citation Counts with a Neural Network

This is the code used for this paper. Reproducing the results from this paper can be done as follows:

  1. Download files. We plan to make these available on a webserver in the future. For now, you can ask us for these files and save the following ones in data/arxiv/keywords-backend/
papers
paper_topics
all_lengths.json
broadness_lda

and these in data/arxiv/thomsonreuters/

JournalHomeGrid-2001.csv
...
JournalHomeGrid-2009.csv
  1. Set up a MySQL database and save the connection data in settings_private.py.
DB_PASS = '...'
DB_USER = '...'
DB_HOST = '...'
DB_NAME = '...'
  1. Set up the database and import the arXiv/Paperscape/JIF data.
mysql < database_structure.sql
python arxiv_importer.py
python paperscape_importer.py
python jif_importer.py
  1. Some pre-processing needs to be done.
python analysis.py
python net.py
  1. Run the following SQL command.
UPDATE analysissingle512_authors SET train_real = train
  1. Generate the cross-validation groups and prepare the x and y data.
python run_local.py prepare
  1. Train the neural network and random forest models for each cross-validation round $i (0 to 19).
python run_cluster.py train-rf $i
python run_cluster.py train-net $i
  1. Evaluate the trained models as well as some naive baseline models for each $i and summarize the results.
python run_local.py evaluate-rf --i $i
python run_local.py evaluate-net --i $i
python run_local.py evaluate-linear-naive --i $i
python run_local.py summarize

The summary files will be placed in data/analysissingle512/evaluate/no-max-hindex, the results for each individual trained model will be placed in data/analysissingle512/evaluate/no-max-hindex/task-results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages