Skip to content
Enakshi-1998 edited this page Jul 26, 2022 · 6 revisions

If you want to process an IPCC report to extract useful information here are the guidelines:

STEP 1:

Clone the semanticClimate repository in your system git clone https://github.com/petermr/semanticClimate.git

Step 2:

PDF to HTML

Step 3:

Image Extraction Using pyamiimage See: https://github.com/petermr/pyamiimage/wiki

Step 4:

Extract Abbreviations from the HTML See: https://github.com/petermr/semanticClimate/blob/main/abbreviations/Codes/commands_abbreviation_extraction.ipynb

Step 5:

Analyze your text using docanalysis See Tutorial: https://github.com/petermr/docanalysis/wiki/docanalysis-Tutorial

Pre-requisites

General

Python (Latest version is 3.10 but 3.8 works the best): Download & Install Python from https://www.python.org/downloads/

Git Bash: Download & Install Git from https://git-scm.com/downloads

For running pyamiimage

Tesseract (https://tesseract-ocr.github.io/tessdoc/Home.html#binaries)

easyOCR (https://www.jaided.ai/easyocr/)