Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scenarios for the acquisition of data files and file versioning #238

Closed
tagtuna opened this issue Apr 17, 2023 · 8 comments
Closed

Scenarios for the acquisition of data files and file versioning #238

tagtuna opened this issue Apr 17, 2023 · 8 comments
Assignees
Labels
help wanted Extra attention is needed
Milestone

Comments

@tagtuna
Copy link
Contributor

tagtuna commented Apr 17, 2023

Use case 1: Tag returned logged data by orbiting satellites
File: iccat_gbyp0008_ArgosTrans_eTUFF0.txt
Date: April 4
instrument_name = "iccat_gbyp0008"
What happened: Received satellite messages were decoded by Wildlife Computers software (via an online backend platform, with a particular firmware version). A geolocation algorithm was run and a track was generated. This track was deemed as the best possible at this time (reference track). A set of output .csv files were then downloaded and converted into an eTUFF file by client.

Use case 2: Client ran the geolocation algorithm to generate two additional tracks
Files: iccat_gbyp0008_ArgosTrans_eTUFF1.txt & iccat_gbyp0008_ArgosTrans_eTUFF2.txt
Date: April 17
instrument_name = "iccat_gbyp0008"
What happened: Client re-did the geolocation processing and generated two new track solutions using different speed filters (captured by the metadata attribute, geolocation_parameters). Separate eTUFFs were generated with the track data only. Client did not believe either of the solutions was better than the original track; therefore client just want to append these for future use/ further evaluation. By the same token, the eTUFFs did not include the original logged water column data because client thought it is a waste of space to repeat data that is already submitted.

Use case 3: Hardware was physically recovered and the client was able to download the complete archive via an USB cable
No file example
Date: June 30
instrument_name = "iccat_gbyp0008"
What happened: The downloaded data represent the complete records. Data available from Use Case 1 is a subset of this archive. A new eTUFF (much bigger file size) was generated. Client believes this “version” provides the best representation of the logged data, and finds limited value in retaining earlier versions. Tracks were re-run but the solutions were not that different from the previous ones, therefore no changes were required there.
eTUFF_examples.zip

Things to consider

  1. All the above use cases provide eTUFF files containing data (and additional data) for the same instrument in the same tag deployment on the same animal. Therefore instrument_name this piece of metadata remains the same throughout. However, my initial thought is for our internal/database tag_id is to keep tag_id the same for use case no. 1 & 2, but different for case no. 3. This would allow us to distinguish the satellite transmitted dataset vs. physically downloaded dataset (coz' different tag_id) but as they share the same instrument_name, we can use that to lookup all the events that have happened as illustrated by the use cases.
  2. I emphasize on an instrument+deployment combo because the same hardware could be reused or redeployed on another animal. That means we can't rely on a combination of serial_number and ptt or even platform (mentioned in issue Address tag identification integrity issues #164). That's also why we have asked the client to make sure instrument_name is unique.
  3. submission_id will be most useful for keeping tabs on events happened. For use case 1 & 2, submission_id + tag_id should yield 3 different combinations because they are ingested from 3 separate files. This allows us to work out which is the original logger data (use case 1, combo 1), reference track (use case 1, combo 1), alternative track solution one (use case 2, combo 2) and alternative track solution two (use case 2, combo 3).
  4. For multiple-track implementation, my first assessment is that, all the current ingest and migration steps should pretty much be the same. Data table data_position will house the reference tracks and alternative solutions, distinguishable by the different submission_id and tag_id combos.
  5. The only remaining step is to update metadata_position table if any new track ingested should be flagged as a "reference track" by checking the metadata attribute referencetrack_included in the eTUFF.
@lewismc lewismc self-assigned this Apr 17, 2023
@lewismc lewismc added the help wanted Extra attention is needed label Apr 17, 2023
@lewismc lewismc added this to the 0.11.0 milestone Apr 17, 2023
@lewismc
Copy link
Member

lewismc commented Apr 17, 2023

@tagtuna I downloaded these and will try to ingest tonight.

@lewismc
Copy link
Member

lewismc commented Apr 17, 2023

@tagtuna can you please summarize what the overall desired outcome is of the above? Thanks

@tagtuna
Copy link
Contributor Author

tagtuna commented Apr 18, 2023

@lewismc Thanks for the query - I have added more on the issue. Hopefully this helps.

@lewismc
Copy link
Member

lewismc commented Apr 19, 2023

@tagtuna the attached .zip file is somewhat troublesome. If I unzip it I get the following

unzip eTUFF_examples.zip
...
Archive:  eTUFF_examples.zip
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF1.txt
   creating: __MACOSX/
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF1.txt
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF0.txt
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF0.txt
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF2.txt
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF2.txt

As you can see, it appears that there are nested directories in the .zip archive. This is not a major problem as we can simply avoid any directories and only process files in the root directory, however please confirm what the behavior should be. Thanks

@tagtuna
Copy link
Contributor Author

tagtuna commented Apr 19, 2023

@lewismc Sorry - that zip wasn't what I intended to make. I guess it must be some weird way I get the files zipped up!

@lewismc
Copy link
Member

lewismc commented Apr 19, 2023

OK thanks for confirming. If we ned to augment this aspect of the ingestion logic in the future at least we can come back to this thread. Thanks

@lewismc
Copy link
Member

lewismc commented Apr 20, 2023

@tagtuna what is a combo? You refer to various combo's and I don't see the details here. Thanks for explaining.

@tagtuna
Copy link
Contributor Author

tagtuna commented Apr 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
No open projects
Development

No branches or pull requests

2 participants