Derived & component metrics in same mf query results in complex/unnecessary joins #422

callum-mcdata · 2023-01-27T18:14:09Z

Apologies on the bad title, not sure how exactly to describe this one?

Describe the bug
I believe that the new derived metric type does not recognize the component metrics included as metrics during the query generation. When a user queries both the derived and the component metrics, the SQL generated appears to include two different queries that are then full outer joined, similar to the pattern executed for metrics that don't share the same base table.

This assumption of what the issue is is partially derived by a comment by @QMalcolm in #416 where he mentioned our measures (which is how we treat component metrics) and then re-enforced when comparing the SQL generated by querying derived+component metrics versus just derived metrics.

Here is the SQL produced when querying both the derived & component metrics:

Derived metric + component metrics

MF Command: mf query --metrics revenue,costs,taxes,net_profit --dimensions metric_time__month,location_name

SELECT
  COALESCE(subq_4.metric_time__month, subq_10.metric_time__month) AS metric_time__month
  , COALESCE(subq_4.location_name, subq_10.location_name) AS location_name
  , MAX(subq_4.revenue) AS revenue
  , MAX(subq_4.costs) AS costs
  , MAX(subq_4.taxes) AS taxes
  , MAX(subq_10.net_profit) AS net_profit
FROM (
  ### This appears to be the subquery selecting the base metrics 
  SELECT
    DATE_TRUNC('month', ordered_at) AS metric_time__month
    , location_name
    , SUM(order_item_total) AS revenue
    , SUM(product_cost) AS costs
    , SUM(order_item_tax_paid) AS taxes
  FROM dbt_test.dbt_metrics.order_items order_items_src_0
  GROUP BY
    DATE_TRUNC('month', ordered_at)
    , location_name
) subq_4
FULL OUTER JOIN (
  ### This appears to be the subquery selecting the derived metric and ignoring the component metrics calculated in the same subquery
  SELECT
    metric_time__month
    , location_name
    , revenue - (costs + taxes) AS net_profit
  FROM (
    SELECT
      DATE_TRUNC('month', ordered_at) AS metric_time__month
      , location_name
      , SUM(order_item_total) AS revenue
      , SUM(product_cost) AS costs
      , SUM(order_item_tax_paid) AS taxes
    FROM dbt_test.dbt_metrics.order_items order_items_src_0
    GROUP BY
      DATE_TRUNC('month', ordered_at)
      , location_name
  ) subq_9
) subq_10
ON
  (
    subq_4.location_name = subq_10.location_name
  ) AND (
    subq_4.metric_time__month = subq_10.metric_time__month
  )
GROUP BY
  COALESCE(subq_4.metric_time__month, subq_10.metric_time__month)
  , COALESCE(subq_4.location_name, subq_10.location_name)

Here is the SQL produced when just querying the derived metric

Just derived metric

MF Command: mf query --metrics net_profit --dimensions metric_time__month --order metric_time__month

SELECT
  metric_time__month
  , revenue - (costs + taxes) AS net_profit
FROM (
  SELECT
    DATE_TRUNC('month', ordered_at) AS metric_time__month
    , SUM(order_item_total) AS revenue
    , SUM(product_cost) AS costs
    , SUM(order_item_tax_paid) AS taxes
  FROM dbt_test.dbt_metrics.order_items order_items_src_2
  GROUP BY
    DATE_TRUNC('month', ordered_at)
) subq_4
ORDER BY metric_time__month

Steps To Reproduce

Create two metrics based on table A.
Create a derived metric based on table A.
Query both the metrics and the derived metric.

Expected behavior
I would suspect that the query generated in the second example would be used but the component metrics being calculated in the subquery would be used.

Environment (please complete the following information):

MetricFlow Version 0.14.0

Note:

The query works just fine but wanted to 🎏 because I was digging into the new implementation of derived metrics to test out the dbt integration

The text was updated successfully, but these errors were encountered:

giuseppecg · 2024-01-09T23:52:56Z

Hi, guys!

I would like to second @callum-mcdata problem. Here we intend to use metric flow to generate a more generic datasets based on our data warehouse. However, as of right now, we can't use metricflow directly.

The problem for us is that it generates a more complicated query just to calculate the ratio and derived metrics on different subqueries, which wouldn't be a complete problem, but it locks the dimensions by which is calculated. For example, if a have mf query --metrics simple_metric, ratio_metric --group-by categorical_group, another_categorical_group,some_sk I can't get the correct values for the ratio_metric if I take some_sk afterwards outside of metricflow. So I have to always generate a new dataset 😕 and governance-wise this became a problem for us.

I get that the current for facilitates the filter option as well as other features, but it really doesn't work when you want less datasets for your business stakeholder to make sense.

Environment (please complete the following information):

MetricFlow Version 0.202.0

Note:

The query does indeed work fine, it's just a question of generalization.

Jstein77 · 2024-02-21T23:06:18Z

@giuseppecg thanks for adding more context to this issue!

I created another issue with a proposed approach to solving the complex SQL rending. Would love your eyes on that.

but it locks the dimensions by which is calculated. For example, if a have mf query --metrics simple_metric, ratio_metric --group-by categorical_group, another_categorical_group,some_sk I can't get the correct values for the ratio_metric if I take some_sk afterwards outside of metricflow. So I have to always generate a new dataset 😕 and governance-wise this became a problem for us.

I wanted to dig into this a bit more because I think it may be a separate issue. Even if we generate less complex SQL for this case, you still won't be able to re-aggregate ratio metrics across different dimensions. This is one of the drawbacks to using static data sets. What's the reason you decided to go with this approach? Is it mainly as an integration path for a BI tool we don't yet support?

soham-dasgupta · 2024-09-26T19:08:24Z

Generation of queries with FULL OUTER JOIN is much less efficient that reusing metrics from Subqueries. This is really an anti pattern. Do we have eyes on this ?

callum-mcdata added bug Something isn't working triage Tasks that need to be triaged labels Jan 27, 2023

Jstein77 added backlog and removed triage Tasks that need to be triaged labels Aug 30, 2023

Jstein77 mentioned this issue Feb 21, 2024

[SL-1756] [SL-1755] [Feature] Use CTEs instead of sub queries in generated SQL #1040

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derived & component metrics in same mf query results in complex/unnecessary joins #422

Derived & component metrics in same mf query results in complex/unnecessary joins #422

callum-mcdata commented Jan 27, 2023

giuseppecg commented Jan 9, 2024

Jstein77 commented Feb 21, 2024

soham-dasgupta commented Sep 26, 2024

Derived & component metrics in same mf query results in complex/unnecessary joins #422

Derived & component metrics in same mf query results in complex/unnecessary joins #422

Comments

callum-mcdata commented Jan 27, 2023

Note:

giuseppecg commented Jan 9, 2024

Jstein77 commented Feb 21, 2024

soham-dasgupta commented Sep 26, 2024