Skip to content
This repository has been archived by the owner on Mar 9, 2023. It is now read-only.

Data cleaning

chris s edited this page May 2, 2018 · 2 revisions

dwclean

dwclean is a command line tool that cleans, validates and enhances Darwin Core CSV/TSV files.

The original version was tightly tailored to the specific needs of the Norwegian GBIF node at the time, but as source data quality has improved over time we were able to slowly get rid of the code dealing with extreme edge cases. In 2018 the large dwclean script was cleaned up and split into a command line tool, a library and plugins handling the actual work of cleaning and validating data. Some of the edge case handling (such as guessing MGRS grid zone designators based on square identifiers) still lives on in the MUSIT plugin.

Clone this wiki locally