Skip to content

digitalmethodsinitiative/4cat_web_studies_extensions

Repository files navigation

Web Studies a 4CAT Extension

Web Studies is a companion extension to the 4CAT Capture and Analysis Toolkit. It add functionality to 4CAT by utilizing Selenium along with a Firefox browser to collect data from web sources.

Features

New datasources

General web studies

  • Selenium URL Collector
    • Collect HTML, text, and links from a list of URLs
  • Web Archive Collector
  • Screenshot Generator
    • Take screenshots of web pages

App store studies

  • Apple Store
  • Google Store

Cloud app store studies

  • Microsoft Azure App Store
  • Amazon Web Services (AWS) Marketplace
    • Collect data on AWS applications

New analysis processors

  • Take screenshots of any column containing URLs
  • Detect trackers
    • Provide a list of various source code to search for in collected HTML

Installation

These extensions are designed to work with 4CAT v1.46 or later.

For instructions on adding the "I do not care about cookies" browser extension, see below.

Docker installation

  1. Download/clone extensions into both 4CAT backend and frontend containers
  • docker exec 4cat_backend git clone https://github.com/digitalmethodsinitiative/4cat_web_studies_extensions.git extensions/web_studies/
  • docker exec 4cat_frontend git clone https://github.com/digitalmethodsinitiative/4cat_web_studies_extensions.git extensions/web_studies/
  1. Restart 4CAT containers
  • docker compose restart from 4CAT directory where docker-compose.yml and .env files were previously downloaded
  • This will automatically install necessary dependencies, Firefox, and Geckodriver
  1. Activate desired new datasources from the 4CAT Control Panel
  • Control Panel -> Settings -> Data sources

Direct/manual installation

  1. Download or clone this repository and copy the folders into the extensions folder in your 4CAT directory
  • git clone https://github.com/digitalmethodsinitiative/4cat_web_studies_extensions.git extensions/web_studies/
  1. Run 4CAT's migrate script to install necessary packages
  • python helper-scripts/migrate.py
  • Note: fourcat_insall.py is only designed to run on linux systems. For other systems you will need set up the following:
    • Install python packages from requirements.txt
    • Download Firefox
    • Download the appropriate Geckodriver compatible with that version of Firefox (https://github.com/mozilla/geckodriver/releases/)
    • Adjust settings in 4CAT interface via Control Panel -> Settings -> selenium to point to Firefox/Geckodriver programs
  1. Activate desired datasources from the 4CAT Control Panel
  • Control Panel -> Settings -> Data sources

Browser extensions

Some datasources/processors can make use of a Firefox extension that removes cookies. To install:

Docker

  1. docker exec 4cat_backend wget https://addons.mozilla.org/firefox/downloads/file/4216095/istilldontcareaboutcookies-1.1.4.xpi # you can find the most recent version at the above link
  2. Enable the extension in the 4CAT Control Panel
  • Control Panel -> Settings -> selenium
  • Update "Firefox Extensions" by adding the filename to the path section
    • e.g. {"i_dont_care_about_cookies": {"path": "istilldontcareaboutcookies-1.1.4.xpi", "always_enabled": false}}

About

Web Studies companion datasources and processors for 4CAT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages