You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing metricflow functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Currently, we render components of the data flow plan as sub-queries. For example, to query a simple metric called revenue, we'll first create an inner query to select the measure and dimension columns, then create an outer query to perform the metric aggregation.
SELECT
metric_time__day
, SUM(revenue) AS revenue
FROM (
SELECT
DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS metric_time__day
, product_price AS revenue
FROMANALYTICS.dbt_jstein.order_items order_item_src_78
) subq_119
GROUP BY
metric_time__day
This SQL can get more complex when additional joins are needed - for example, to include a dimension, when multiple metrics are requested in a query, or when constructing derived metrics. Issue #422 has an example of a derived metric query with additional layers of nesting. We could instead use CTEs in the SQL to calculate revenue:
with subq_119 as (
SELECT
DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS metric_time__day
, product_price AS revenue
FROMANALYTICS.dbt_jstein.order_items order_item_src_78
) subq_119
SELECT
metric_time__day
, COALESCE(revenue, 0) AS revenue
FROM subq_119
GROUP BY
metric_time__day
This provides a few key benefits.
Increases legibility of the SQL generated, especially for complex cases
(
with base_metrics as (
SELECT
DATE_TRUNC('month', ordered_at) AS metric_time__month
, location_name
, SUM(order_item_total) AS revenue
, SUM(product_cost) AS costs
, SUM(order_item_tax_paid) AS taxes
FROM dbt_test.dbt_metrics.order_items order_items_src_0
GROUP BY
DATE_TRUNC('month', ordered_at)
, location_name
)
select
revenue
, costs
, taxes
revenue - (costs + taxes) AS net_profit
from base_metrics
Risks:
We would need to test how the query optimizers on different data platforms handle CTEs. There could be cases where we see query performance decline. However, in the case of derived metrics and multiple metrics queries, we should see performance gains.
Describe alternatives you've considered
Keep the current behavior. This comes with a real performance hit for companies querying multiple metrics and derived metrics. We will also be generating less legible SQL.
Who will this benefit?
This will make it easier for analytics engineers to reason about the SQL we're generating, and improve performance for complex queries.
Jstein77
changed the title
[Feature] Use CTEs instead of sub queries in generated SQL
[SL-1755] [Feature] Use CTEs instead of sub queries in generated SQL
Feb 21, 2024
Jstein77
changed the title
[SL-1755] [Feature] Use CTEs instead of sub queries in generated SQL
[SL-1756] [SL-1755] [Feature] Use CTEs instead of sub queries in generated SQL
Feb 21, 2024
Is this your first time submitting a feature request?
Describe the feature
Currently, we render components of the data flow plan as sub-queries. For example, to query a simple metric called
revenue
, we'll first create an inner query to select the measure and dimension columns, then create an outer query to perform the metric aggregation.This SQL can get more complex when additional joins are needed - for example, to include a dimension, when multiple metrics are requested in a query, or when constructing derived metrics. Issue #422 has an example of a derived metric query with additional layers of nesting. We could instead use CTEs in the SQL to calculate
revenue
:This provides a few key benefits.
order_items
:Risks:
Describe alternatives you've considered
Keep the current behavior. This comes with a real performance hit for companies querying multiple metrics and derived metrics. We will also be generating less legible SQL.
Who will this benefit?
This will make it easier for analytics engineers to reason about the SQL we're generating, and improve performance for complex queries.
Are you interested in contributing this feature?
No response
Anything else?
No response
SL-1755
SL-1756
The text was updated successfully, but these errors were encountered: