Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create mica from example dataset in Camtrap DP #220

Closed
2 of 6 tasks
peterdesmet opened this issue Jun 20, 2023 · 9 comments
Closed
2 of 6 tasks

Create mica from example dataset in Camtrap DP #220

peterdesmet opened this issue Jun 20, 2023 · 9 comments
Assignees
Milestone

Comments

@peterdesmet
Copy link
Member

peterdesmet commented Jun 20, 2023

Rather than maintaining our own example dataset in inst/extdata, I suggest to create mica from the example dataset in Camtrap DP (also derived from the mica project), which lives https://github.com/tdwg/camtrap-dp/tree/main/example. It is versioned and valid with the format, so https://github.com/tdwg/camtrap-dp/tree/1.0-rc.1/example follows the 1.0-rc.1 specs. It is also complete and covers a number of use cases.

@peterdesmet peterdesmet added this to the v1.0 milestone Jun 20, 2023
@damianooldoni damianooldoni self-assigned this Sep 4, 2023
@damianooldoni
Copy link
Member

damianooldoni commented Sep 5, 2023

In general I agree with @peterdesmet 👍
Small downsides about testing:

  • to avoid downloading files over and over from URLs in tests of read_camtrap_dp(), I would download them once and save them in tmp directory. In this way we can still test that read function works with local paths. After discussion with @PietrH: the nicest option is to use setup.R as described in testthat documentation. Worth to be checked. Probably overshooting as the tmp files are needed only by test-read_camtrap_dp.R. So, probably we can set it in test-read_camtrap_dp.R
  • we can modify the temporary files to create artificial parsing issues. In this way we can still check the correctness of the parsing issues returned by camtraptor.

mica_zenodo_5590881 is not needed and indeed is too big. I have used it while writing some functions as I needed bigger and more complex datapackages. But this doesn't justify its presence in inst/extdata

@peterdesmet
Copy link
Member Author

peterdesmet commented Sep 6, 2023

Tests to verify

  • test-calc_animal_pos
  • test-check_package
  • test-check_species
  • test-deployments
  • test-filter_predicates
  • test-get_cam_op: 4 fails, 2 warns
  • test-get_custom_effort
  • test-get_effort
  • test-get_n_individuals: 1 fail -> SOLVED (c587b2f)
  • test-get_n_obs: 1 fail
  • test-get_n_species: 1 fail -> SOLVED (
    7fc616b)
  • test-get_prefixes
  • test-get_rai_individuals
  • test-get_rai
  • test-get_record_table: 2 fails, 1 warn
  • test-get_scientific_name
  • test-get_species
  • test-map_dep
  • test-media
  • test-observations
  • test-read_camtrap_dp
  • test-read_wi
  • test-round_coordinates
  • test-transform_effort_to_common_units
  • test-write_dwc
  • test-write_eml: 1 fail

Example to verify

  • calc_animal_pos.R
  • check_package.R
  • check_species.R
  • data.R
  • deployments.R
  • filter_predicates.R
  • get_cam_op.R
  • get_custom_effort.R
  • get_effort.R
  • get_n_individuals.R
  • get_n_obs.R
  • get_n_species.R
  • get_prefixes.R
  • get_rai.R
  • get_record_table.R
  • get_scientific_name.R
  • get_species.R
  • map_dep.R
  • media.R
  • observations.R
  • read_camtrap_dp.R: examples still references old date in inst
  • read_wi.R
  • round_coordinates.R
  • write_dwc.R
  • write_eml.R
  • zzz.R

@peterdesmet
Copy link
Member Author

For reference, the deployments were changed as follows from old to new:

old new
- 00a2c20d
29b7d356-4bb4-4ec4-b792-2af5cc32efa8 29b7d356
577b543a-2cf1-4b23-b6d2-cda7e2eac372 577b543a
62c200a9-0e03-4495-bcd8-032944f6f5a1 62c200a9
7ca633fa-64f8-4cfc-a628-6b0c419056d7 -

@peterdesmet
Copy link
Member Author

@PietrH @damianooldoni I have reviewed all examples and tests

  • @damianooldoni see above for a couple of tests that still fail. It might have something to do with sex = NULL. Please pull the mica branch before debugging
  • @PietrH can you have a look at the failing write_eml test
  • @damianooldoni should we store something in inst/extdata now?

@damianooldoni
Copy link
Member

damianooldoni commented Sep 7, 2023

Thanks @peterdesmet. I will give a look to the failing tests asap. About the need to store something in inst/extdata, following your first comment we don't need it. And my answer confirms such choice as the downsides of this choice can be easily solved.

damianooldoni added a commit that referenced this issue Sep 26, 2023
@damianooldoni
Copy link
Member

About failure in test-get_obs, the problem seems that I count number of distinct sequenceID per species and deployment in get_obs(), see https://github.com/inbo/camtraptor/blob/main/R/get_n_obs.R#L137.
@peterdesmet: is this still the right way to count obs?
In the test, I was just counting the number of rows (after filtering) as the sequence IDs were always unique.
So, if answer to my previous question is YES, then I have to make the test more robust, if the answer is NO, then I have to improve get_n_obs().

damianooldoni added a commit that referenced this issue Sep 26, 2023
This fixes failure in test-get_n_species, related to #220
@peterdesmet
Copy link
Member Author

I don't know what meaning users subscribe to "number of observations". See this example, which has 1 Ardea cinerea, 1 female Anas platyrhynchos, 1 male Anas platyrhynchos. If you group by deployment and scientificName, you will get (for this sequence alone):

  • 2 observations (1 for Ardea, 1 for Anas) if you count the unique sequenceID/eventID (which is the same for all)
  • 3 observations (1 for Ardea, 2 for Anas) if you count the unique observationID (which is different for all)

Note that this assumes you only use event-based observations. If you count media-based observations, you will get (for this sequence alone):

  • 60 observations (1 * 30 for Ardea, 1 * 30 for Anas) if you count the unique sequenceID/eventID (which is the same for all)
  • 90 observations (1 * 30 for Ardea, 2 * 30 for Anas) if you count the unique observationID (which is different for all)

In my opinion:

  1. I would count the number of unique observationID (which should be the same as the number of rows). Counting by sequenceID/eventID would be a get_n_events().
  2. I'm surprised the number of unique sequenceID was ever the same as the number of rows, it is not intended to.

@PietrH PietrH linked a pull request Nov 3, 2023 that will close this issue
@peterdesmet peterdesmet added the camtrapdp/camtraptor To be decided if this is related to camtrapdp or camtraptor label Mar 6, 2024
@peterdesmet
Copy link
Member Author

camtrapdp now uses the Camtrap DP example dataset via example_dataset(). Up to @damianooldoni to decide what is done for camtraptor. Work in #285 was abandoned.

@peterdesmet peterdesmet removed the camtrapdp/camtraptor To be decided if this is related to camtrapdp or camtraptor label May 31, 2024
@damianooldoni
Copy link
Member

See #312. We close this issue as not relevant anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants