-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nov 2024 schema #32
base: main
Are you sure you want to change the base?
Nov 2024 schema #32
Conversation
distinct * | ||
from reporting_grain_combined | ||
-- pre-reporting grain: unions all unique dimension values | ||
pre_reporting_grain as ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, I know we discussed doing full outer joins, but after looking into it, I realized it could pose a risk of losing records depending on what table is being used in the join, as explained by here. The alternative is to do a coalesce in the join key, but that may be clunky especially with more than 3 ctes being joined. Therefore I decided to do this union all - then dedupe method. This was also the method in the old version of these models too.
…sistency test, update changelog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-reneeli thanks for this PR. A few comments below following this review.
select date_day, app_id, app_version, source_type, source_relation from app_crashes | ||
union all | ||
select date_day, app_id, app_version, source_type, source_relation from install_deletions | ||
union all | ||
select date_day, app_id, app_version, source_type, source_relation from sessions_activity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request to format this in our usual manner instead of having all fields on one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same request wherever this format is used in this PR
coalesce(id.deletions, 0) as deletions, | ||
coalesce(id.installations, 0) as installations, | ||
coalesce(sa.sessions, 0) as sessions | ||
from reporting_grain rg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request to use the below format as we typically use in our other data models for consistency.
from reporting_grain rg | |
from reporting_grain as rg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make these same updates in the other models where this format is used.
app_crashes as ( | ||
select | ||
app_id, | ||
app_version, | ||
date_day, | ||
cast(null as {{ dbt.type_string() }}) as source_type, | ||
source_relation, | ||
sum(crashes) as crashes | ||
from {{ var('app_crash_daily') }} | ||
group by 1,2,3,4,5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reasoning for making these ctes as opposed to ephemeral models as they were used in the previous version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for all the other cases of this in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed live, since these CTEs are unique per model due to the differences in grains, there's no advantage to modulating them in their own separate ephemeral models.
|
||
{% set first_date_query %} | ||
|
||
select min(date_day) as min_date_day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this makes sense, I will add the subscription staging models too. Just wanted to have a proof of concept first before spending time adding the subscription logic.
PR Overview
This PR will address the following Issue/Feature: #28
This PR will result in the following new package version: v0.5.0
Schema changes from Nov 2024
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
to be completed
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
If you had to summarize this PR in an emoji, which would it be?
💃