Scenarios for the acquisition of data files and file versioning #238

tagtuna · 2023-04-17T06:08:11Z

Use case 1: Tag returned logged data by orbiting satellites
File: iccat_gbyp0008_ArgosTrans_eTUFF0.txt
Date: April 4
instrument_name = "iccat_gbyp0008"
What happened: Received satellite messages were decoded by Wildlife Computers software (via an online backend platform, with a particular firmware version). A geolocation algorithm was run and a track was generated. This track was deemed as the best possible at this time (reference track). A set of output .csv files were then downloaded and converted into an eTUFF file by client.

Use case 2: Client ran the geolocation algorithm to generate two additional tracks
Files: iccat_gbyp0008_ArgosTrans_eTUFF1.txt & iccat_gbyp0008_ArgosTrans_eTUFF2.txt
Date: April 17
instrument_name = "iccat_gbyp0008"
What happened: Client re-did the geolocation processing and generated two new track solutions using different speed filters (captured by the metadata attribute, geolocation_parameters). Separate eTUFFs were generated with the track data only. Client did not believe either of the solutions was better than the original track; therefore client just want to append these for future use/ further evaluation. By the same token, the eTUFFs did not include the original logged water column data because client thought it is a waste of space to repeat data that is already submitted.

Use case 3: Hardware was physically recovered and the client was able to download the complete archive via an USB cable
No file example
Date: June 30
instrument_name = "iccat_gbyp0008"
What happened: The downloaded data represent the complete records. Data available from Use Case 1 is a subset of this archive. A new eTUFF (much bigger file size) was generated. Client believes this “version” provides the best representation of the logged data, and finds limited value in retaining earlier versions. Tracks were re-run but the solutions were not that different from the previous ones, therefore no changes were required there.
eTUFF_examples.zip

Things to consider

All the above use cases provide eTUFF files containing data (and additional data) for the same instrument in the same tag deployment on the same animal. Therefore instrument_name this piece of metadata remains the same throughout. However, my initial thought is for our internal/database tag_id is to keep tag_id the same for use case no. 1 & 2, but different for case no. 3. This would allow us to distinguish the satellite transmitted dataset vs. physically downloaded dataset (coz' different tag_id) but as they share the same instrument_name, we can use that to lookup all the events that have happened as illustrated by the use cases.
I emphasize on an instrument+deployment combo because the same hardware could be reused or redeployed on another animal. That means we can't rely on a combination of serial_number and ptt or even platform (mentioned in issue Address tag identification integrity issues #164). That's also why we have asked the client to make sure instrument_name is unique.
submission_id will be most useful for keeping tabs on events happened. For use case 1 & 2, submission_id + tag_id should yield 3 different combinations because they are ingested from 3 separate files. This allows us to work out which is the original logger data (use case 1, combo 1), reference track (use case 1, combo 1), alternative track solution one (use case 2, combo 2) and alternative track solution two (use case 2, combo 3).
For multiple-track implementation, my first assessment is that, all the current ingest and migration steps should pretty much be the same. Data table data_position will house the reference tracks and alternative solutions, distinguishable by the different submission_id and tag_id combos.
The only remaining step is to update metadata_position table if any new track ingested should be flagged as a "reference track" by checking the metadata attribute referencetrack_included in the eTUFF.

The text was updated successfully, but these errors were encountered:

lewismc · 2023-04-17T18:27:03Z

@tagtuna I downloaded these and will try to ingest tonight.

lewismc · 2023-04-17T18:27:42Z

@tagtuna can you please summarize what the overall desired outcome is of the above? Thanks

tagtuna · 2023-04-18T02:10:23Z

@lewismc Thanks for the query - I have added more on the issue. Hopefully this helps.

lewismc · 2023-04-19T03:23:21Z

@tagtuna the attached .zip file is somewhat troublesome. If I unzip it I get the following

unzip eTUFF_examples.zip
...
Archive:  eTUFF_examples.zip
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF1.txt
   creating: __MACOSX/
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF1.txt
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF0.txt
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF0.txt
  inflating: iccat_gbyp0008_ArgosTrans_eTUFF2.txt
  inflating: __MACOSX/._iccat_gbyp0008_ArgosTrans_eTUFF2.txt

As you can see, it appears that there are nested directories in the .zip archive. This is not a major problem as we can simply avoid any directories and only process files in the root directory, however please confirm what the behavior should be. Thanks

tagtuna · 2023-04-19T03:54:25Z

@lewismc Sorry - that zip wasn't what I intended to make. I guess it must be some weird way I get the files zipped up!

lewismc · 2023-04-19T03:56:07Z

OK thanks for confirming. If we ned to augment this aspect of the ingestion logic in the future at least we can come back to this thread. Thanks

lewismc · 2023-04-20T05:28:14Z

@tagtuna what is a combo? You refer to various combo's and I don't see the details here. Thanks for explaining.

tagtuna · 2023-04-20T05:33:21Z

Lewis, by combo I meant: submission_id plus tag_id combination

…

On Thu, Apr 20, 2023, 13:28 Lewis John McGibbney ***@***.***> wrote: @tagtuna <https://github.com/tagtuna> what is a combo? You refer to various combo's and I don't see the details here. Thanks for explaining. — Reply to this email directly, view it on GitHub <#238 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC272JTWJD3KW4BPR7MUIPLXCDCPTANCNFSM6AAAAAAXAWMNYE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

lewismc self-assigned this Apr 17, 2023

lewismc added this to ICCAT Product Drive Phase 2 (2022-10-15 --> 2023-05-27) Apr 17, 2023

lewismc added the help wanted Extra attention is needed label Apr 17, 2023

lewismc added this to the 0.11.0 milestone Apr 17, 2023

lewismc mentioned this issue May 19, 2023

ISSUE-238 Scenarios for the acquisition of data files and file versioning #265

Merged

lewismc modified the milestones: 0.11.0, 0.13.0 May 19, 2023

lewismc closed this as completed Jun 12, 2023

github-project-automation bot moved this to ✅ Done in ICCAT Product Drive Phase 2 (2022-10-15 --> 2023-05-27) Jun 12, 2023

lewismc mentioned this issue Jun 13, 2023

Create granular checksum logic representing metadata, data, profile and global eTUFF characteristics #270

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scenarios for the acquisition of data files and file versioning #238

Scenarios for the acquisition of data files and file versioning #238

tagtuna commented Apr 17, 2023 •

edited

Loading

lewismc commented Apr 17, 2023

lewismc commented Apr 17, 2023

tagtuna commented Apr 18, 2023

lewismc commented Apr 19, 2023 •

edited

Loading

tagtuna commented Apr 19, 2023

lewismc commented Apr 19, 2023

lewismc commented Apr 20, 2023

tagtuna commented Apr 20, 2023 via email

Scenarios for the acquisition of data files and file versioning #238

Scenarios for the acquisition of data files and file versioning #238

Comments

tagtuna commented Apr 17, 2023 • edited Loading

lewismc commented Apr 17, 2023

lewismc commented Apr 17, 2023

tagtuna commented Apr 18, 2023

lewismc commented Apr 19, 2023 • edited Loading

tagtuna commented Apr 19, 2023

lewismc commented Apr 19, 2023

lewismc commented Apr 20, 2023

tagtuna commented Apr 20, 2023 via email

tagtuna commented Apr 17, 2023 •

edited

Loading

lewismc commented Apr 19, 2023 •

edited

Loading