Skip to content

This repository contains the crawling scripts used for the paper "You Call This Archaeology? Evaluating Web Archives for Reproducible Web Security Measurements"

Notifications You must be signed in to change notification settings

cispa/internet-archive-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

You Call This Archaeology? Evaluating Web Archives for Reproducible Web Security Measurements

This repository contains the crawling scripts used for the paper "You Call This Archaeology? Evaluating Web Archives for Reproducible Web Security Measurements". It is a collection of various scripts we used to collect the data. A user can set up a database following the structure in db_scheme.sql and then use main.py to start each script.

Collection

The collection directory contains all scripts that were used to collect the data we based the analysis on. One exception is Common Crawl which is handled in the cc-scripts.

Updater

The updater directory contains all scripts that were used to add additional information to the tables.

Utils

The utils directory contains all additional scripts and data.

About

This repository contains the crawling scripts used for the paper "You Call This Archaeology? Evaluating Web Archives for Reproducible Web Security Measurements"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages