Skip to content

sidatasciencelab/mercury_sheets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mercury sheets

Pulling barcodes from FigShare.ipynb is a notebook that shows how to download the image sets from FigShare, and extract the barcodes, which are then saved in barcodes_from_figshare.tsv.

Checking Botany scan dates.ipynb is a notebook that has a processing function for Dask that pulls out multiple image ids for records that have multiple images.

media_list.txt contains a list of aws media ids which are used in the download_images.py script. The file was created using the command aws s3 ls s3://smithsonian-open-access/media/nmnh/ > media_list.txt. To reduce the size of the file, the version in this repository was filtered to only include the ids that end with .jpg .

download_images.py is a script that downloads botany images and metadata using SI Open Access on AWS. The metadata from AWS is simplified using the extract_ids function in the script and saved to metadata.tsv. The media ids from media_list.txt are used to download 2292004 images to a thumbnails directory. Further explanations of steps are commented in the script.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published