Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nov 2024 schema #32

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

Nov 2024 schema #32

wants to merge 34 commits into from

Conversation

fivetran-reneeli
Copy link
Contributor

@fivetran-reneeli fivetran-reneeli commented Jan 31, 2025

PR Overview

This PR will address the following Issue/Feature: #28

This PR will result in the following new package version: v0.5.0

Schema changes from Nov 2024

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

to be completed

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

If you had to summarize this PR in an emoji, which would it be?

💃

distinct *
from reporting_grain_combined
-- pre-reporting grain: unions all unique dimension values
pre_reporting_grain as (
Copy link
Contributor Author

@fivetran-reneeli fivetran-reneeli Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, I know we discussed doing full outer joins, but after looking into it, I realized it could pose a risk of losing records depending on what table is being used in the join, as explained by here. The alternative is to do a coalesce in the join key, but that may be clunky especially with more than 3 ctes being joined. Therefore I decided to do this union all - then dedupe method. This was also the method in the old version of these models too.

Copy link
Collaborator

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-reneeli thanks for this PR. A few comments below following this review.

Comment on lines 50 to 54
select date_day, app_id, app_version, source_type, source_relation from app_crashes
union all
select date_day, app_id, app_version, source_type, source_relation from install_deletions
union all
select date_day, app_id, app_version, source_type, source_relation from sessions_activity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request to format this in our usual manner instead of having all fields on one line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same request wherever this format is used in this PR

coalesce(id.deletions, 0) as deletions,
coalesce(id.installations, 0) as installations,
coalesce(sa.sessions, 0) as sessions
from reporting_grain rg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request to use the below format as we typically use in our other data models for consistency.

Suggested change
from reporting_grain rg
from reporting_grain as rg

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make these same updates in the other models where this format is used.

Comment on lines 9 to 18
app_crashes as (
select
app_id,
app_version,
date_day,
cast(null as {{ dbt.type_string() }}) as source_type,
source_relation,
sum(crashes) as crashes
from {{ var('app_crash_daily') }}
group by 1,2,3,4,5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning for making these ctes as opposed to ephemeral models as they were used in the previous version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for all the other cases of this in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed live, since these CTEs are unique per model due to the differences in grains, there's no advantage to modulating them in their own separate ephemeral models.


{% set first_date_query %}

select min(date_day) as min_date_day
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this makes sense, I will add the subscription staging models too. Just wanted to have a proof of concept first before spending time adding the subscription logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants