Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to check unicity before copy #2

Open
thislg opened this issue Nov 27, 2023 · 3 comments
Open

Add an option to check unicity before copy #2

thislg opened this issue Nov 27, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@thislg
Copy link
Member

thislg commented Nov 27, 2023

A lot of the time we have to add custom code to validate loaded data before copying it. A common use case is to ignore duplicate lines but still continue the import.

It could be set like this:

resources:
    my_resource_name:
        load:
            extra_fields:
                valid:
                    type: boolean
                    options:
                        default: true
        post_load:
            validate:
                -
                    columns:
                        - code
                    constraint_type: unique
                    label: 'Unique code'
                    on_invalid: ignore # abort|ignore
        copy:
            strategy_options:
                copy_condition: valid IS TRUE

A subscriber on ImportEvents::POST_LOAD would then execute an UPDATE on temporary table to set the "valid" field to false on failing rows. In case of validation error, when on_invalid is set to "ignore", it would add logs "Unique code validation constraint failed. Skipping duplicate my_resource_name (code: 12345) at lines 4, 5, 6" and import would continue without copying invalid lines. If on_invalid is set to "abort", it would stop the import without copying the data.

Other validation constraints could be added, like format validation (regex), etc.
A simpler option would be to skip the validation config, instead adding an option to run an arbitrary SQL query on post_load to set the "valid" flag.

@thislg thislg added the enhancement New feature or request label Nov 27, 2023
@pierreboissinot
Copy link
Member

@thislg

Suggestions:

  • abort as default value to avoid BC, on_invalid: abort # abort|ignore
  • Should on_invalid setting be set on copy ? So the copy_condition: valid IS TRUE setting on copy is redundant

@saami783
Copy link

Hello, is this piece of code from the documentation? Because there is no indication that we can produce a custom configuration especially with "post_load".

@thislg
Copy link
Member Author

thislg commented Jun 24, 2024

Hello, is this piece of code from the documentation? Because there is no indication that we can produce a custom configuration especially with "post_load".

load and copy options are documented (see https://github.com/le-phare/import-bundle/blob/master/docs/configure/load.md and https://github.com/le-phare/import-bundle/blob/master/docs/configure/copy.md). You can't add arbitrary options so the post_load option does not exist but in this issue I suggest adding it so we can add validation constraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants