-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Customizable column names and extra config placeholder #127
Feature: Customizable column names and extra config placeholder #127
Conversation
@hrfmartins thank you for the contribution. Really appreciated. I have a few requests. Can you please sign your commits? We require all commits to be signed with GPG key. Please also run We currently have an issue with running integration tests triggered from forks. Your PR may be blocked at the moment. |
dbc3dd2
to
a335d2a
Compare
@mwojtyczka Signing with GPG done and lint + fmt ran and issues fixed :) Sorry for the inconvenience. Is there anything I can/need to do about the fork issue? Thank you |
Thank you! We are working on fixing the fork issue. Will keep you posted. |
…r other future configurations
ed300d9
to
a09b0fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hrfmartins
Can you please extend the existing guide to show how to customize the reporting columns. Perhaps a new section, sth like "Additional configuration" before the custom checks:
https://github.com/databrickslabs/dqx/blob/main/docs/dqx/docs/guide.mdx#quality-rules-and-creation-of-custom-checks
Can you please also extend demos, probably a new cell here:
https://github.com/databrickslabs/dqx/blob/main/demos/dqx_demo_library.py#L286
Co-authored-by: Marcin Wojtyczka <[email protected]>
Co-authored-by: Marcin Wojtyczka <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Provided option to customize reporting column names ([#127](#127)). In this release, the DQEngine library has been enhanced to allow for customizable reporting column names. A new constructor has been added to DQEngine, which accepts an optional ExtraParams object for extra configurations. A new Enum class, DefaultColumnNames, has been added to represent the columns used for error and warning reporting. New tests have been added to verify the application of checks with custom column naming. These changes aim to improve the customizability, flexibility, and user experience of DQEngine by providing more control over the reporting columns and resolving issue [#46](#46). * Fixed parsing error when loading checks from a file ([#165](#165)). In this release, we have addressed a parsing error that occurred when loading checks (data quality rules) from a file, fixing issue [#162](#162). The specific issue being resolved is a SQL expression parsing error. The changes include refactoring tests to eliminate code duplication and improve maintainability, as well as updating method and variable names to use `filepath` instead of "path". Additionally, new unit and integration tests have been added and manually tested to ensure the correct functionality of the updated code. * Removed usage of try_cast spark function from the checks to make sure DQX can be run on more runtimes ([#163](#163)). In this release, we have refactored the code to remove the usage of the `try_cast` Spark function and replace it with `cast` and `isNull` checks to improve code compatibility, particularly for runtimes where `try_cast` is not available. The affected functionality includes null and empty column checks, checking if a column value is in a list, and checking if a column value is a valid date or timestamp. We have added unit and integration tests to ensure functionality is working as intended. * Added filter to rules so that you can make conditional checks ([#141](#141)). The filter serves as a condition that data must meet to be evaluated by the check function. The filters restrict the evaluation of checks to only apply to rows that meet the specified conditions. This feature enhances the flexibility and customizability of data quality checks in the DQEngine.
* Provided option to customize reporting column names ([#127](#127)). In this release, the DQEngine library has been enhanced to allow for customizable reporting column names. A new constructor has been added to DQEngine, which accepts an optional ExtraParams object for extra configurations. A new Enum class, DefaultColumnNames, has been added to represent the columns used for error and warning reporting. New tests have been added to verify the application of checks with custom column naming. These changes aim to improve the customizability, flexibility, and user experience of DQEngine by providing more control over the reporting columns and resolving issue [#46](#46). * Fixed parsing error when loading checks from a file ([#165](#165)). In this release, we have addressed a parsing error that occurred when loading checks (data quality rules) from a file, fixing issue [#162](#162). The specific issue being resolved is a SQL expression parsing error. The changes include refactoring tests to eliminate code duplication and improve maintainability, as well as updating method and variable names to use `filepath` instead of "path". Additionally, new unit and integration tests have been added and manually tested to ensure the correct functionality of the updated code. * Removed usage of try_cast spark function from the checks to make sure DQX can be run on more runtimes ([#163](#163)). In this release, we have refactored the code to remove the usage of the `try_cast` Spark function and replace it with `cast` and `isNull` checks to improve code compatibility, particularly for runtimes where `try_cast` is not available. The affected functionality includes null and empty column checks, checking if a column value is in a list, and checking if a column value is a valid date or timestamp. We have added unit and integration tests to ensure functionality is working as intended. * Added filter to rules so that you can make conditional checks ([#141](#141)). The filter serves as a condition that data must meet to be evaluated by the check function. The filters restrict the evaluation of checks to only apply to rows that meet the specified conditions. This feature enhances the flexibility and customizability of data quality checks in the DQEngine.
In this PR I implemented a placeholder for extra configurations for DQEngine. I also included customizable column names to replace the custom names.
Changes
Linked issues
Resolves #46
Tests