Skip to content

Commit

Permalink
Adding catalogue number
Browse files Browse the repository at this point in the history
  • Loading branch information
rukayaj committed May 15, 2024
1 parent f9768b7 commit 1ddb579
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 1 deletion.
1 change: 1 addition & 0 deletions app/helpers/process_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ def create_export_df(dwcs, ocrs, base_url):
df.rename(columns={'index': 'filename'}, inplace=True)
df = df.astype(str)
df['id'] = df['filename'].str.extract(r'([^\.]+)')
df['catalogNumber'] = df['id']
df['associatedMedia'] = base_url + df['filename']
del df['filename']
df['occurrenceRemarks'] = 'Record originally derived from Google Cloud Vision OCR using GPT-4-turbo'
Expand Down
1 change: 0 additions & 1 deletion app/helpers/prompt.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ Important Note! Label OCR text contains rulers (i.e. incremental cm counts from
First you correct any obvious OCR errors, and then you extract ONLY the following Darwin Core terms. Be sure to translate into English and when that's not possible convert from Cyrillic to latin alphabet:

- scientificName: Full scientific name, not containing identification qualifications.
- catalogNumber: Unique identifier for the record in the dataset or collection.
- recordNumber: Identifier given during recording, often linking field notes and Occurrence record.
- recordedBy: List of people, groups, or organizations responsible for recording the original Occurrence.
- year: Four-digit year of the Event.
Expand Down

0 comments on commit 1ddb579

Please sign in to comment.