Skip to content

crappy script to convert TEI files to ODF. Requires grobid

License

Notifications You must be signed in to change notification settings

bozo32/tei-to-odf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

TEI to ODF Converter

This project provides a Python script to convert PDFs to TEI (Text Encoding Initiative) format using GROBID and then generate ODF (OpenDocument) files from the TEI content with crappy formatting. The script handles headings, lists, tables, and references.

Prerequisites

Anaconda or Miniconda installed. Java 11 or higher (required for GROBID). GROBID installed and running on http://localhost:8070. Installation

Clone the Repository git clone https://github.com/yourusername/your-repository.git cd your-repository Set Up the Conda Environment Create and activate the conda environment:

conda create -n tei_to_odfenv python=3.8 conda activate tei_to_odf_env

Install Dependencies pip install lxml odfpy requests

Ensure GROBID is installed and running.

Usage

Prepare Directories: create source/ tei/ output/ in the folder holding convert.py

Place your PDF files in the source/ directory. The script will generate TEI files in the tei/ directory and ODF files in the output/ directory.

Run the Script: python your_script_name.py The script will:

Convert PDFs to TEI (if TEI files don't already exist). Parse TEI files to extract content. Generate ODF files with proper formatting.

Notes

GROBID Server URL: If GROBID is running on a different URL or port, update the convert_pdf_to_tei function in the script. Dependencies: Ensure all dependencies are installed in your conda environment.

About

crappy script to convert TEI files to ODF. Requires grobid

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages