You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, when using elementary.generate_schema_baseline_test to generate the required YAML, if a model and a source share the same name, the command prioritizes the source. This happens because generate_schema_baseline_test internally uses the macro get_node_by_name. As a result, when both a model and a source have the same name, the macro defaults to the source, leading to incorrect or unintended results. There is no way to specify which resource type (source or model) should be used, causing ambiguity in some cases.
This issue is particularly relevant in scenarios where sources are used for internal purposes (e.g., raw data ingestion) while views are created to enable users to query them. In such cases, the source and the view can exist in different catalogs but still share the same name. Since get_node_by_name does not distinguish between these cases, the generated YAML may not reflect the intended resource.
Describe the solution you'd like
Introduce an option in elementary.generate_schema_baseline_test that allows users to explicitly specify the resource type (e.g., model or source). This will ensure that the command generates the YAML for the correct resource, even when names are duplicated for source and model.
For example, the command could be extended to accept a resource_type argument:
This way, users can avoid unintended prioritization of sources over models when generating schema baseline tests.
Describe alternatives you've considered
Changing the call from get_node_by_name to get_node.sql so that the full unique ID (including project and type) is used, preventing ambiguity. However, this approach is less user-friendly since users do not always know the unique ID of their model or source and simply want to use the name.
Manually renaming models or sources to ensure uniqueness, which is not always feasible in large projects.
Running separate commands for models and sources, but this does not resolve the ambiguity when both exist with the same name.
Additional context
This feature will be particularly useful in projects where sources and models sometimes share the same name, preventing misalignment in generated YAML files.
Would you be willing to contribute this feature?
Yes, I have already submitted a pull request that adds this functionality by introducing an option to specify the resource type in generate_schema_baseline_test.
Is your feature request related to a problem? Please describe.
Currently, when using elementary.generate_schema_baseline_test to generate the required YAML, if a model and a source share the same name, the command prioritizes the source. This happens because generate_schema_baseline_test internally uses the macro get_node_by_name. As a result, when both a model and a source have the same name, the macro defaults to the source, leading to incorrect or unintended results. There is no way to specify which resource type (source or model) should be used, causing ambiguity in some cases.
This issue is particularly relevant in scenarios where sources are used for internal purposes (e.g., raw data ingestion) while views are created to enable users to query them. In such cases, the source and the view can exist in different catalogs but still share the same name. Since get_node_by_name does not distinguish between these cases, the generated YAML may not reflect the intended resource.
Describe the solution you'd like
Introduce an option in elementary.generate_schema_baseline_test that allows users to explicitly specify the resource type (e.g., model or source). This will ensure that the command generates the YAML for the correct resource, even when names are duplicated for source and model.
For example, the command could be extended to accept a resource_type argument:
dbt run-operation elementary.generate_schema_baseline_test --args '{"name": "theName", "resource_type": "model"}'
This way, users can avoid unintended prioritization of sources over models when generating schema baseline tests.
Describe alternatives you've considered
Additional context
This feature will be particularly useful in projects where sources and models sometimes share the same name, preventing misalignment in generated YAML files.
Would you be willing to contribute this feature?
Yes, I have already submitted a pull request that adds this functionality by introducing an option to specify the resource type in generate_schema_baseline_test.
Link to the PR: elementary-data/dbt-data-reliability#788
The text was updated successfully, but these errors were encountered: