-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scenarios for the acquisition of data files and file versioning #238
Comments
@tagtuna I downloaded these and will try to ingest tonight. |
@tagtuna can you please summarize what the overall desired outcome is of the above? Thanks |
@lewismc Thanks for the query - I have added more on the issue. Hopefully this helps. |
@tagtuna the attached .zip file is somewhat troublesome. If I unzip it I get the following
As you can see, it appears that there are nested directories in the .zip archive. This is not a major problem as we can simply avoid any directories and only process files in the root directory, however please confirm what the behavior should be. Thanks |
@lewismc Sorry - that zip wasn't what I intended to make. I guess it must be some weird way I get the files zipped up! |
OK thanks for confirming. If we ned to augment this aspect of the ingestion logic in the future at least we can come back to this thread. Thanks |
@tagtuna what is a combo? You refer to various |
Lewis, by combo I meant: submission_id plus tag_id combination
…On Thu, Apr 20, 2023, 13:28 Lewis John McGibbney ***@***.***> wrote:
@tagtuna <https://github.com/tagtuna> what is a combo? You refer to
various combo's and I don't see the details here. Thanks for explaining.
—
Reply to this email directly, view it on GitHub
<#238 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC272JTWJD3KW4BPR7MUIPLXCDCPTANCNFSM6AAAAAAXAWMNYE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Use case 1: Tag returned logged data by orbiting satellites
File: iccat_gbyp0008_ArgosTrans_eTUFF0.txt
Date: April 4
instrument_name = "iccat_gbyp0008"
What happened: Received satellite messages were decoded by Wildlife Computers software (via an online backend platform, with a particular firmware version). A geolocation algorithm was run and a track was generated. This track was deemed as the best possible at this time (reference track). A set of output .csv files were then downloaded and converted into an eTUFF file by client.
Use case 2: Client ran the geolocation algorithm to generate two additional tracks
Files: iccat_gbyp0008_ArgosTrans_eTUFF1.txt & iccat_gbyp0008_ArgosTrans_eTUFF2.txt
Date: April 17
instrument_name = "iccat_gbyp0008"
What happened: Client re-did the geolocation processing and generated two new track solutions using different speed filters (captured by the metadata attribute,
geolocation_parameters
). Separate eTUFFs were generated with the track data only. Client did not believe either of the solutions was better than the original track; therefore client just want to append these for future use/ further evaluation. By the same token, the eTUFFs did not include the original logged water column data because client thought it is a waste of space to repeat data that is already submitted.Use case 3: Hardware was physically recovered and the client was able to download the complete archive via an USB cable
No file example
Date: June 30
instrument_name = "iccat_gbyp0008"
What happened: The downloaded data represent the complete records. Data available from Use Case 1 is a subset of this archive. A new eTUFF (much bigger file size) was generated. Client believes this “version” provides the best representation of the logged data, and finds limited value in retaining earlier versions. Tracks were re-run but the solutions were not that different from the previous ones, therefore no changes were required there.
eTUFF_examples.zip
Things to consider
instrument_name
this piece of metadata remains the same throughout. However, my initial thought is for our internal/databasetag_id
is to keeptag_id
the same for use case no. 1 & 2, but different for case no. 3. This would allow us to distinguish the satellite transmitted dataset vs. physically downloaded dataset (coz' differenttag_id
) but as they share the sameinstrument_name
, we can use that to lookup all the events that have happened as illustrated by the use cases.serial_number
andptt
or evenplatform
(mentioned in issue Address tag identification integrity issues #164). That's also why we have asked the client to make sureinstrument_name
is unique.submission_id
will be most useful for keeping tabs on events happened. For use case 1 & 2,submission_id
+tag_id
should yield 3 different combinations because they are ingested from 3 separate files. This allows us to work out which is the original logger data (use case 1, combo 1), reference track (use case 1, combo 1), alternative track solution one (use case 2, combo 2) and alternative track solution two (use case 2, combo 3).data_position
will house the reference tracks and alternative solutions, distinguishable by the differentsubmission_id
andtag_id
combos.metadata_position
table if any new track ingested should be flagged as a "reference track" by checking the metadata attributereferencetrack_included
in the eTUFF.The text was updated successfully, but these errors were encountered: