-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address tag identification integrity issues #164
Comments
@tagtuna can you please suggest some lookup logic we can use to determine file semantics? Thank you |
Currently
I am going to propose that we assign a
|
This also means that we always have a filename regardless of whether the submitted filename is different. This will be used when we do export of raw data. |
This matter relates also to the issue of dataset versioning (same tag dataset potentially reprocessed with data and/or metadata changes) and multi-track data scenarios. As stated in the notes above, currently the filename is the sole criterion by which a dataset is ingested with a new tag_id or not. We should consider expanding to also consider some key/limited eTUFF metadata elements, but should additionally suggest a filename convention that includes some indication of Version number. |
@vtsontos can you please outline exactly what the key/limited eTUFF metadata elements are? If you want to combine defining metadata characteristics with file name convention as well, then please advise. Thanks |
Including Tim as he is probably best positioned to define these
Thanks
Vardis
…On Sun, Apr 16, 2023, 1:24 PM Lewis John McGibbney ***@***.***> wrote:
@vtsontos <https://github.com/vtsontos> can you please outline exactly
what the *key/limited eTUFF metadata elements* are? If you want to
combine defining metadata characteristics with file name convention as
well, then please advise. Thanks
—
Reply to this email directly, view it on GitHub
<#164 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKEC6M6ONI7FBZDQAS3AN3XBQ2MRANCNFSM6AAAAAATFYGLDQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@lewismc @vtsontos A little late to the party, but this is my take
|
I understand some form of file checksum combined with the filename and 4 above metadata attributes can be a first cut way to determine if this is a new file warranting a new |
Yes @tagtuna I will work on this right now. Thanks |
@tagtuna we have noticed that in some of the older files we have, that |
@tagtuna can you exactly specify what features define a dataset? Thank you |
It's definitely an oversight on my part that some files missed the serial
number. We do require having the serial number as a must-have metadata
attribute.
However, only instrument_name is unique and the reliable way to connect any
files to a particular instrument in a given deployment.
Our discussion on including additional metadata attributes are to provide
other clues. Take this example: the same hardware was first used on a tuna
(recovered) and then reused on a shark. Let's say the client failed to
provide an unique instrument_name for these two deployments. We may be able
to distinguish by seeing the tag was deployed on a shark in the second
event
…On Thu, Apr 20, 2023, 12:42 Lewis John McGibbney ***@***.***> wrote:
@tagtuna <https://github.com/tagtuna> we have noticed that in some of the
older files we have, that serial_number is not present. Moving forward,
can we rely on serial_number always being present or should the ingestion
logic always check for this potentially being absent. If it is absent, does
this mean that we could incorrectly correlate a data (file) with an
existing dataset? Please advise. Thanks
—
Reply to this email directly, view it on GitHub
<#164 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC272JSBOJZLNGMQXXA4VXTXCC5DVANCNFSM6AAAAAATFYGLDQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
👍
I see ... although the One thing that is not clear to me @tagtuna is what we consider a "dataset". I guess it is a set of |
A dataset is a set of 1) instrument_name + 2) additional metadata + 3)
"returned" data files. This corresponds to a particular deployment of the
hardware (instrument_name) on a studied animal (additional metadata).
By "returned" data files, recorded observations were retrieved via
satellite transmission or downloaded via computer cables.
A track made up of pairs of (lat,lon) is available either when (a) recorded
directly and returned as a specific data file (e.g., GPS fixes) or (b)
calculated after the end of a deployment using different bits of the
returned data files (referred to as "geolocation"). You can see if a track
is calculated, you can use multiple methods or input parameters to drive
that estimation process. In any case, having different/ new tracks does not
contribute to a new "dataset". New tracks only add to an existing dataset
or update the track information that already exists.
Does that make sense?
…On Mon, Apr 24, 2023 at 2:22 AM Renato Marroquin ***@***.***> wrote:
It's definitely an oversight on my part that some files missed the serial
number. We do require having the serial number as a must-have metadata
attribute.
👍
However, only instrument_name is unique and the reliable way to connect any
files to a particular instrument in a given deployment.
Our discussion on including additional metadata attributes are to provide
other clues. Take this example: the same hardware was first used on a tuna
(recovered) and then reused on a shark
I see ... although the instrument_name is expected to be unique across
deployments, it "might" not be the case due to humans in the loop.
One thing that is not clear to me @tagtuna <https://github.com/tagtuna>
is what we consider a "dataset". I guess it is a set of instrument_names
(with their corresponding files) + additional metadata (species, location,
etc), or is are we considering a dataset a single instrument_name + its
corresponding files?
—
Reply to this email directly, view it on GitHub
<#164 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC272JSBAJO64SNNIUK33JTXCVXOTANCNFSM6AAAAAATFYGLDQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Currently we use the filename as the primary identifier to determine 1) an initial tag submission, and 2) a new submission for an existing tag. If we encounter the same file (by name) then we simply increment the submission number.
This logic is prone to error because it ignores all file semantics apart from filename. This task will determine additional criteria which will prevent duplicate file entries and ensure the integrity of accurate tag submissions.
The text was updated successfully, but these errors were encountered: