-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation for missing metric_time
when querying to/through SCD
#1451
Conversation
metricflow-semantics/metricflow_semantics/model/semantics/linkable_spec_resolver.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!! I left a bunch of small notes inline, and one bigger question.
metricflow-semantics/metricflow_semantics/model/semantics/linkable_spec_resolver.py
Outdated
Show resolved
Hide resolved
metricflow-semantics/metricflow_semantics/query/issues/parsing/scd_requires_metric_time.py
Outdated
Show resolved
Hide resolved
@@ -221,3 +222,12 @@ def get_min_queryable_time_granularity(self, metric_reference: MetricReference) | |||
minimum_queryable_granularity = defined_time_granularity | |||
|
|||
return minimum_queryable_granularity | |||
|
|||
def get_joinable_scd_specs_for_metric(self, metric_reference: MetricReference) -> Sequence[LinkableInstanceSpec]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, my main question about this PR - why did you decide to put this method on MetricLookup
instead of SemanticModelLookup
? (I think we discussed putting it on SemanticModelLookup.)
Is there a reason you ended up needing to specifically get the SCDs that are joinable for the metric, instead of all SCDs that exist in the manifest? There is other logic in the group by resolver that will ensure the group bys are valid for the metric, so we shouldn't need that for this task.
Unless I'm missing some context there, I think we can simplify this PR if we do the following:
- Remove this method and instead just store all possible SCD specs on
SemanticModelLookup
. You can use that for the spec pattern matching. You should be able to build that as a sequence ofLinkableInstanceSpecs
in the__init__
logic. - Then we can avoid dealing with
LinkableElements
at all, so you could remove the newSCD_HOP
LinkableElementProperty
and all the logic associated with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in DMs but I wanna document it here: the reason I'm using the LinkableElements
is because a dimension being an SCD_HOP
or not depends on the join path.
Let's say we have 4 models: bookings
, agents
, users
and agencies
. Each booking
is booked by an user
with the help of an agent
which is part of an agency
. Each user
's bookings are always handled by the same agency
but not necessarily the same agent
, so there is a direct link between user
and agency
as well. Now suppose agents
is an SCD table.
-
If we query for
bookings_revenue
grouped byuser__agency__name
, we don't needmetric_time
since neitheruser
noragency
are SCDs. -
If we now query for
bookings_revenue
grouped byagent__agency__name
, we do needmetric_time
since we join toagency
viaagent
, andagent
is an SCD.
Since the same exact dimension (agency__name
in the example above) might or might not be treated as an SCD depending on the join path, we need to analyze this in the path resolver based on the source metric, so it can't go in SemanticModelLookup
.
metricflow-semantics/metricflow_semantics/query/validation_rules/base_validation_rule.py
Outdated
Show resolved
Hide resolved
metricflow-semantics/metricflow_semantics/query/validation_rules/metric_time_requirements.py
Outdated
Show resolved
Hide resolved
metricflow-semantics/metricflow_semantics/query/validation_rules/metric_time_requirements.py
Outdated
Show resolved
Hide resolved
metricflow-semantics/tests_metricflow_semantics/query/test_query_parser.py
Outdated
Show resolved
Hide resolved
metricflow-semantics/metricflow_semantics/model/semantics/linkable_spec_resolver.py
Outdated
Show resolved
Hide resolved
metricflow-semantics/metricflow_semantics/model/semantics/linkable_spec_resolver.py
Outdated
Show resolved
Hide resolved
metricflow-semantics/metricflow_semantics/model/semantics/linkable_spec_resolver.py
Outdated
Show resolved
Hide resolved
d3bbf19
to
53eae93
Compare
..._spec_resolver.py/str/test_create_linkable_element_set_from_join_path_multi_hop__result0.txt
Show resolved
Hide resolved
metricflow-semantics/tests_metricflow_semantics/model/semantics/test_linkable_spec_resolver.py
Show resolved
Hide resolved
@plypaul I'm asking for your review since you're the one doing most of the profiling and I refactored the I'll wait for this answer before I rebase from main :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!! I'll wait until you've rebased and added the caching logic to approve.
Also, can you add one more test? Something like this test, which will give us a snapshot of what the new error looks like to users.
..._spec_resolver.py/str/test_create_linkable_element_set_from_join_path_multi_hop__result0.txt
Show resolved
Hide resolved
03ce571
to
afee199
Compare
`MetricTimeQueryValidationRule` had some methods that were only called once and would loop multiple times over the group by items. I consolidated that into a single `_get_query_items_analysis()`. `PostResolutionQueryValidationRule` had a `_get_metric()` method that had only one callsite and was just a proxy to `_manifest_lookup.metric_lookup.get_metric()`. I removed that and added a direct call to the `ManifestLookup`.
afee199
to
3ccc016
Compare
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Since the last review, I:
|
scd_query_parser: MetricFlowQueryParser, | ||
) -> None: | ||
"""Test that queries that join through SCDs semantic models fail if no time dimensions are selected.""" | ||
with pytest.raises(InvalidQueryException) as exc_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@courtneyholcomb I changed these tests from using the match
regex to use assert_str_snapshot_equal
, and I created the 2 new .txt
files with the full error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left 2 small notes inline, but this looks great!! Awesome work!
...shots/test_query_parser.py/str/test_join_through_scd_no_time_dimension_validation__error.txt
Outdated
Show resolved
Hide resolved
) | ||
|
||
def _group_by_items_include_agg_time_dimension( | ||
@lru_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking - I think Paul said it was fine to use this lru_cache
becasue the number of items is small enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I asked him during standup and he said that was OK!
This commit adds a new step to `MetricTimeQueryValidationRule` which does the following: 1. Selects all the existing SCDs for the queried metric 2. Match them against the spec pattern of the group by input 3. Raise a `SCDRequiresMetricTimeIssue` if no `metric_time` was provided and there were matches To accomplish step 1, I had to create a new `LinkableElementProperty` called `SCD_HOP`. This new property indicates that the join path to the linkable element goes through an SCD at some point. I changed the `ValidLinkableSpecResolver` to add `SCD_HOP` to the properties of all the elements it finds whenever that element belongs to an SCD or if the path to it contains an SCD.
This commit adds the changelog entry about the SCD in join path validation.
7d515e6
to
2c8c770
Compare
Summary
This PR changes
MetricQueryValidationRule
to raiseScdRequiresMetricTimeIssue
in case the user submits a query that uses an SCD but does not group by time.Implementation
To make this happen, I:
MetricQueryValidationRule
works to return a newQueryAnalysisResult
and avoid creating a bunch of separate methods that return booleansMetricQueryValidationRule
to make it get the specs for all the joinable SCDs for the given metric, match the group by input against it and raise an issue if there's nometric_time
in the queryLinkableElementProperty
calledSCD_HOP
, which indicates that you need to join to an SCD somewhere along the path to join to that elementValidLinkableSpecResolver
to addSCD_HOP
to all the linkable elements which go to/through SCDs