Skip to content

COMET-NLP-Group/RRCMP-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Reducing Redundancy in Coastal Management Using Natural Language Processing

A Python natural language processing program for identifying key words and phrases in conservation management plans.

The Driving Questions

  1. Can we identify common themes / conservation measures among management plans?
  2. Can we capture values and interests of plans’ authors?

Approach

The PdfScrape program processes PDFs hosted online into various analysis ready data, and performs some initial, exploratory visualizations of common words and their connections.

Project Status

Initial development focused on coastal management plans for the state of Washington.

Contained within this repository are:

  • A complied list of URLs for the management plans.

  • Various analysis ready versions of Fish & Wildlife Species Recovery Plans PDFs

  • Exploratory visualizations of common words and their connections (see below)

  • The PdfScrape program & video tutorial explaining how to use the program for your own list of PDF URLs.

Example visualizations

Frequency plots of most common verbs and nouns

FrequencyPlotNounsVerbs

Pseudo-clustering

For user-specified key words, the pseudo-clustering plot shows the relationship between the word count for each key word (KeyCount) and the word count for the words surrounding / co-occurring with the key word (WordCount). Further details provided in the PdfScrape README

pseudocluster

TSNE (t-distributed stochastic neighbor embedding)

Visually analyze text clustering patterns from the input PDFs

tsne

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •