Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor visit and cost models, plus other small changes #111

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

katy-sadowski
Copy link
Collaborator

@katy-sadowski katy-sadowski commented Jan 25, 2025

My initial intention was to knock out some of the minor issues in our backlog, but one thing led to another and I ended up refactoring our visit and cost models. They're simpler and more performant, now, and we have a proper visit_detail table rather than a placeholder.

This PR includes the following changes. See my inline comments for more detail. Sorry the PR is so huge; I know it makes it harder to review!

  • Refactored visit models (resolves Refactor visit ID assignment #108 and Properly populate visit_detail #110)
    • Simplified the assignment of visit_occurrence_id - now happens in a single model rather than being spread across 3
    • Changes some assumptions in rollup/collapse of encounters into visit occurrences
    • Preserves connection of individual encounters to the visits they were rolled up into - and uses these encounters to populate the visit_detail table
  • Refactored cost model
    • Instead of creating separate int models per cost type, we now just add cost columns to the int tables for drug, procedure, visit. This is a simpler and more efficient approach
    • Separated out encounter costs from drug/procedure costs. The previous models were adding in the cost of the associated encounter to the cost of each drug and procedure. It seems this could lead to duplication of costs for encounters with multiple drugs/procedures, so now the encounter costs are put in their own cost rows
    • Removed assumptions around total_paid and paid_by_patient columns - it wasn't clear to me from Synthea docs that we could make this leap. So I nulled these columns out instead
    • Nulled out non-required DRG and revenue code columns rather than putting placeholder values
    • Removed condition costs - previous logic was assigning the cost from a claim to a condition based on that condition showing up on a claim and in the diagnosis table on the same date. This feels like a stretch (and the query was very slow) - instead, I believe that we should more holistically model all the data contained in claims and claims_transactions. I will file an issue for this work
  • The visit and cost refactors resolve Investigate use of dates vs timestamps for intermediate entity derivation / de-duping logic #74
  • Added datatypes to the vocabulary staging model config (final change needed to resolve Ensure data types are specified correctly in all model configs #36)
  • Added back and resolved SQLFluff checks on column prefixes (resolves Add back ignored SQLFluff checks #46 and reference issue in int__assign_all_visit_ids for visit_occurrence_id #99)

@@ -1,33 +1,15 @@
/* emergency visits */
/* collapse er claim lines with no days between them into one visit */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i chose to remove this bit of logic as it complicates the modeling, and i'm not sure we should implement this as a blanket assumption (i.e., it's plausible that someone has separate visits to the ER or urgent care 2 days in a row)

FROM {{ ref( 'stg_synthea__encounters') }}
WHERE
encounter_class = 'inpatient'
OR (encounter_class IN ('ambulatory', 'wellness', 'outpatient', 'emergency', 'urgentcare') AND encounter_start_date != encounter_stop_date)
Copy link
Collaborator Author

@katy-sadowski katy-sadowski Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any visit not labeled inpatient but that lasts more than 1 day is considered an inpatient visit. i've seen this done in other ETLs.

i could see the argument that multiday OP visits are actually bad data, though, so i was on the fence about making this assumption. we could also just keep them as OP visits and let the analyst figure it out. it probably depends on the data source. (here, since it's fake data, there is prob no right answer).

, min(e.end_datetime) AS visit_end_datetime
FROM {{ ref( 'stg_synthea__encounters') }} AS v
min(a.encounter_id) OVER (PARTITION BY a.patient_id, a.encounter_start_date) AS encounter_id
, a.encounter_id AS original_encounter_id
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the code in this model is unchanged (rows just got shifted around). this line is an exception. i'm retaining the original encounter ID so we can map all the rolled up encounters during an IP stay back to the visit occurrence for that stay.

, {{ dbt.cast("null", api.Column.translate_type("integer")) }} AS country_concept_id
, {{ dbt.cast("null", api.Column.translate_type("varchar")) }} AS country_source_value
, p.patient_latitude AS latitude
, p.patient_longitude AS longitude
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into fanning issues related to the fact that we were including lat/long in the location model, but that location_source_value (our join key from person to location) was only based on the address. I felt it best just to exclude these columns as most sources will not include this data.

@katy-sadowski
Copy link
Collaborator Author

@burrowse in case it's of interest, this PR includes changes which allow us to properly populate visit_detail (rather than duplicating VO with different IDs like is currently done in ETL-Synthea).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant