Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate option to deduplicate data before tiling #55

Open
julietcohen opened this issue Jun 10, 2024 · 1 comment
Open

Integrate option to deduplicate data before tiling #55

julietcohen opened this issue Jun 10, 2024 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@julietcohen
Copy link
Collaborator

Currently, deduplication in the visualization workflow starts after the input data has been staged and tiled. If deduplication is set to occur at any step in the workflow (staging, rasterization, and/or 3D tiling), then the duplicate rows are flagged with a boolean attribute, then the polygons that are True for that attribute are removed at the specified step.

For some datasets, deduplicating the data before it is tiled could be beneficial. For example, Ingmar Nitze's Arctic lake change dataset is composed of UTM zones that overlap at the edges, and he prefers to have the data deduplicated before it is input into the viz-workflow. That way, whether users are interested in the viz output (tilesets of lakes) or the input data, they can have access to only the deduplicated data.

This functionality is in the exploratory phase. An example of applying of the neighbor deduplication approach to non-tiled data can be found in this issue. One way this functionality could be integrated into the viz-staging package is by adding more acceptable inputs for the deduplication options in the config. An example: deduplicate_at could accept a new option like "before_tiling". In addition to new flexibility in the config, certain pre-deduplication steps would need to happen such as adding a source_file attribute to the input data.

@julietcohen julietcohen added the enhancement New feature or request label Jun 10, 2024
@julietcohen
Copy link
Collaborator Author

julietcohen commented Aug 14, 2024

One more consideration for this feature is that any polygons that intersect the antimeridian in the input data will need to be split prior to deduplication, which is cohesive with the need for them to be split before we stage the files anyway. This was identified with the lake change data (see here). I included an example of how to do this in R here, and example in Python is here.

@julietcohen julietcohen added the good first issue Good for newcomers label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant