Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 472 Bytes

README.md

File metadata and controls

13 lines (9 loc) · 472 Bytes

DSCC PDF Pipeline

Python-based pipeline to prepare scanned PDFs in the DSCC collection for publication

Pipeline description

Image correction -> OCR -> PDF resizing -> Coverpage addition -> Metadata embedding -> Final pdf output

Usage

  • Place pdf in data/input
  • Add metadata to data/metadata.csv
  • sh src/pipeline.sh

Written by Patrick J. Burns, ISAW Library; 2022-2023.