From 90c74924923c11c733d1bcc0c793d1bc03ced2c5 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Wed, 18 Sep 2024 13:16:46 +0200 Subject: [PATCH 01/28] Fleshed out and updated include_gld_purchases() flow documentation --- vignettes/function-flow.Rmd | 49 +++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index bbc6139..73d3800 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -121,14 +121,37 @@ input to extract the dates of all diabetes-specific podiatrist services. +AAI: By date + #### GLD purchases The function `include_gld_purchases()` uses `lmdb` to extract the dates -of all GLD purchases (from 1997 onwards). - - - - +of all GLD purchases. + +These dates are extracted by filtering values beginning with "A10" in +the `atc` column of the `lmdb` register. In addition to the identifier +variable (`pnr`) and date (`eksd`), additional information needed for +censoring or for classification of diabetes type are also extracted: the +type of drug (`atc`), the amount purchased (`volume` and `apk`), the +indication code (`indo`), and its brand name or vnr-number (`name` or +`vnr`). These events are then passed to a chain of exclusion functions: +`exclude_wld_purchases()`, `exclude_potential_pcos()`, +`exclude_pregnancy()` described in the sections below. + +After these exclusion functions have been applied, the output serves as +inputs to two sets of functions: + +1. the `get_diagnosis_date()` function for the final step of the + inclusion process. +2. the `get_only_insulin_purchases()`, + `get_insulin_purchases_within_180_days()`, and + `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the + classification of diabetes type. + +Since the diagnosis code data on pregnancies is insufficient to perform +censoring prior to 1997, `include_gld_purchases()` only extracts dates +from 1997 onward by default (if Medical Birth Register data is available +to use for censoring, the extraction window can be extended). ### Exclusion events @@ -136,23 +159,12 @@ of all GLD purchases (from 1997 onwards). The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as input and is used to exclude both HbA1c tests and GLD purchases during -pregnancy. +pregnancy, as these may be due to gestational diabetes, rather than type +1 or type 2 diabetes. Internally, this relies on the function `get_pregnancy_dates()` that contains the following three helper functions: -- `calculate_pregnancy_index_date_for_mc_visits_wo_end_date()` (this - might be removed with the inclusion of the birth register) -- `get_pregnancy_end_dates()`: Keep maternal care visits with an end - date and drop visits between 40 weeks before end date and 12 weeks - after end date. -- `get_maternal_care_visit_dates_without_end_date()`: Uses the output - from `get_pregnancy_end_dates()` which identifies maternal care - visits *with* end dates to derive maternal care visits *without* end - dates. below. - - - #### Glucose-lowering brand drugs for weight loss @@ -284,3 +296,4 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. + From 0504532e9464c931070306f4b15b1515bad7619d Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Wed, 18 Sep 2024 14:00:31 +0200 Subject: [PATCH 02/28] Added description of podiatrist services function flow --- vignettes/function-flow.Rmd | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 73d3800..e471fd6 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -119,9 +119,19 @@ This function contains two helper functions: The function `include_podiatrist_services()` uses `sysi` or `sssy` as input to extract the dates of all diabetes-specific podiatrist services. - - -AAI: By date +These dates are extracted by filtering values beginning with "54" in the +`spec` variable of the `sssy` and `sysi` registers by default +(alternatively, the function can take the `spec2` variable as input +instead, if that is the data available to the user). In addition, +services provided to a child of the individual (`barnmak` != 0) are +excluded using the `barnmak` variable. An internal helper function +`get_unique_honuge_dates()` is applied to generate a date variable +(`regdate`) based on the year-week (wwyy-formatted) variable (`honuge`) +in the raw data, and de-duplicates multiple services registered on the +same date. Ultimately, `include_podiatrist_services()` outputs only the +identifier variable (`pnr`) and date of the service (`regdate`) to the +`get_diagnosis_date()` function for the final step of the inclusion +process. #### GLD purchases @@ -129,7 +139,7 @@ The function `include_gld_purchases()` uses `lmdb` to extract the dates of all GLD purchases. These dates are extracted by filtering values beginning with "A10" in -the `atc` column of the `lmdb` register. In addition to the identifier +the `atc` variable of the `lmdb` register. In addition to the identifier variable (`pnr`) and date (`eksd`), additional information needed for censoring or for classification of diabetes type are also extracted: the type of drug (`atc`), the amount purchased (`volume` and `apk`), the @@ -296,4 +306,3 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. - From 7777f367bc53d521282777e0bec41e4e8c6a0817 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Wed, 18 Sep 2024 15:32:26 +0200 Subject: [PATCH 03/28] Reformated some GLD text, added HbA1c and started on pregnancy dates --- vignettes/function-flow.Rmd | 62 ++++++++++++++++++++++++------------- 1 file changed, 41 insertions(+), 21 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index e471fd6..98fb6d1 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -95,9 +95,16 @@ outputs.](images/function-flow-population.png) #### HbA1c tests above 48 mmol/mol The function `include_hba1c()` uses `lab_forsker` as the input data to -extract all events of tests above 48 mmol/mol. +extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol +(or $\geq$ 6.5% in DCCT units). To support DCCT units, the function +converts the value of these to IFCC units internally before including +all rows with `value` $\geq$ 48 and deduplicating multiple elevated +results on the same day within each individual. - +`include_hba1c()` passes a 3-column data frame containing the identifier +variable (`pnr`) and the dates of all elevated HbA1c test results. This +is passed to the `exclude_pregnancy()` function for censoring of +elevated results due to potential gestational diabetes (see below). #### Hospital diagnosis of diabetes @@ -125,13 +132,16 @@ These dates are extracted by filtering values beginning with "54" in the instead, if that is the data available to the user). In addition, services provided to a child of the individual (`barnmak` != 0) are excluded using the `barnmak` variable. An internal helper function -`get_unique_honuge_dates()` is applied to generate a date variable -(`regdate`) based on the year-week (wwyy-formatted) variable (`honuge`) -in the raw data, and de-duplicates multiple services registered on the -same date. Ultimately, `include_podiatrist_services()` outputs only the -identifier variable (`pnr`) and date of the service (`regdate`) to the -`get_diagnosis_date()` function for the final step of the inclusion -process. +`get_unique_honuge_dates()` is applied to generate a proper date +variable based on the year-week (wwyy-formatted) variable (`honuge`) +found in the raw data, and de-duplicates multiple services registered on +the same date. + +`include_podiatrist_services()` outputs a 3-column data frame containing +the identifier variable (`pnr`) and the date of the two earliest records +of diabetes-specific podiatrist services for each individual. This is +passed to the `get_diagnosis_date()` function for the final step of the +inclusion process. #### GLD purchases @@ -139,12 +149,23 @@ The function `include_gld_purchases()` uses `lmdb` to extract the dates of all GLD purchases. These dates are extracted by filtering values beginning with "A10" in -the `atc` variable of the `lmdb` register. In addition to the identifier -variable (`pnr`) and date (`eksd`), additional information needed for -censoring or for classification of diabetes type are also extracted: the -type of drug (`atc`), the amount purchased (`volume` and `apk`), the -indication code (`indo`), and its brand name or vnr-number (`name` or -`vnr`). These events are then passed to a chain of exclusion functions: +the `atc` variable of the `lmdb` register. Since the diagnosis code data +on pregnancies (see below) is insufficient to perform censoring prior to +1997, `include_gld_purchases()` only extracts dates from 1997 onward by +default (if Medical Birth Register data is available to use for +censoring, the extraction window can be extended). + +This function outputs a `data.frame` with the following variables needed +later in the classification part of the function flow: + +- identifier variable (`pnr`) +- date (`eksd`) +- type of drug (`atc`) +- amount purchased (`volume` and `apk`) +- indication code (`indo`) +- brand name or vnr-number (`name` or `vnr`) + +These events are then passed to a chain of exclusion functions: `exclude_wld_purchases()`, `exclude_potential_pcos()`, `exclude_pregnancy()` described in the sections below. @@ -158,11 +179,6 @@ inputs to two sets of functions: `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the classification of diabetes type. -Since the diagnosis code data on pregnancies is insufficient to perform -censoring prior to 1997, `include_gld_purchases()` only extracts dates -from 1997 onward by default (if Medical Birth Register data is available -to use for censoring, the extraction window can be extended). - ### Exclusion events #### HbA1c tests and GLD purchases during pregnancy @@ -173,7 +189,11 @@ pregnancy, as these may be due to gestational diabetes, rather than type 1 or type 2 diabetes. Internally, this relies on the function `get_pregnancy_dates()` that -contains the following three helper functions: +uses diagnoses registered in the National Patient Register to extract +the dates of all pregnancy ending (live births or miscarriages). These +are identified by filtering values beginning with "DO0[0-6]", "DO8[0-4]" +or "DZ3[37]" in the `c_diag` variable in the LPR2 data (`diagnosekode` +in LPR3 data). From 6382624f9702d54711d51744d5502d6a2cd0d8ce Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Thu, 19 Sep 2024 10:40:37 +0200 Subject: [PATCH 04/28] Reworded include_hba1c section --- vignettes/function-flow.Rmd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 98fb6d1..500b4c8 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -96,10 +96,10 @@ outputs.](images/function-flow-population.png) The function `include_hba1c()` uses `lab_forsker` as the input data to extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol -(or $\geq$ 6.5% in DCCT units). To support DCCT units, the function -converts the value of these to IFCC units internally before including -all rows with `value` $\geq$ 48 and deduplicating multiple elevated -results on the same day within each individual. +(IFCC units, `analysiscode` NPU27300) or $\geq$ 6.5% (DCCT units, +`analysiscode` NPU03835). Multiple elevated results on the same day +within each individual are deduplicated, to account for the same test +result being reported as two rows (one for IFCC, one for DCCT units). `include_hba1c()` passes a 3-column data frame containing the identifier variable (`pnr`) and the dates of all elevated HbA1c test results. This From 4fd5903e201ef11650bfb735ab47008eaaefea2b Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Thu, 19 Sep 2024 12:46:40 +0200 Subject: [PATCH 05/28] Added lpr-joins, started on describing lpr processing --- vignettes/function-flow.Rmd | 99 ++++++++++++++++++++++++++++++------- 1 file changed, 82 insertions(+), 17 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 500b4c8..e685680 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -101,25 +101,83 @@ extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol within each individual are deduplicated, to account for the same test result being reported as two rows (one for IFCC, one for DCCT units). -`include_hba1c()` passes a 3-column data frame containing the identifier -variable (`pnr`) and the dates of all elevated HbA1c test results. This -is passed to the `exclude_pregnancy()` function for censoring of -elevated results due to potential gestational diabetes (see below). +`include_hba1c()` outputs a 2-column data frame containing the following +variables: + +- identifier variable (`pnr`) +- the dates of all elevated HbA1c test results (`do_pos_hba1c`). + +The output is passed to the `exclude_pregnancy()` function for censoring +of elevated results due to potential gestational diabetes (see below). #### Hospital diagnosis of diabetes -The function `include_diabetes_diagnoses()` uses the hospital contacts -from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes -diagnoses from both ICD 8 and ICD 10 are included. +**Joining LPR2 and LPR3 data** -This function contains two helper functions: +The helper functions `join_lpr2()` and `join_lpr3()` join records of +diagnoses to administrative information in LPR2-formatted and +LPR3-formatted data, respectively. + +`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, joins the +required information by record number (`recnum`), and outputs a +`data.frame` with the following variables: + +- identifier variable (`pnr`) +- date (`d_inddto`) +- department specialty (`c_spec`) +- diagnosis code (`c_diag`) +- diagnosis type (`c_diagtype`) + +`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, joins the +required information by record number (`dw_ek_kontakt`), and outputs a +`data.frame` with the following variables: -- `keep_diabetes_icd10()` -- `keep_diabetes_icd8()` +- identifier variable (`cpr`) +- date (`dato_start`) +- department specialty (`hovedspeciale_ans`) +- diagnosis code (`diagnosekode`) +- diagnosis type (`diagnosetype`) +- diagnosis retracted (`senere_afkraeftet`) - +These outputs are passed to `include_diabetes_diagnoses()` for further +processing, see below. - +**Processing of diagnoses** + +The function `include_diabetes_diagnoses()` uses the hospital contacts +from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for +inclusion, as well as additional information needed to classify diabetes +type. Diabetes diagnoses from both ICD-8 and ICD-10 are included. + +The function takes the outputs of `join_lpr2()` and `join_lpr3()` as +inputs and processes each input separately: + +- LPR2-data: + - Include all diabetes diagnoses, registered as primary (A) or + secondary (B) diagnoses, regardless of type or department: + `c_diag` starts with "DE1[0-4]", "249", or "250" and + `c_diagtype` either "A" or "B" + - Define whether the diagnosis was made made by an + endocrinological (`c_spec` = 8) or other medical department + (`c_spec` \< 8 or 9-30) +- LPR3: + - remove retracted diagnoses (LPR3) + +Internally, these intermediate results are joined, so +`include_diabetes_diagnoses()` outputs a single `data.frame` with the +following variables (one row for each individual): + +- identifier variable (`pnr`) +- date of the first diabetes diagnosis (`do_diagnosis_1`) +- date of the second diabetes diagnosis (`do_diagnosis_2`) +- number of type 1 diabetes-specific diagnosis codes from + endocrinological departments (`n_t1d_endo`) +- number of type 2 diabetes-specific diagnosis codes from + endocrinological departments (`n_t2d_endo`) +- number of type 1 diabetes-specific diagnosis codes from medical + departments (`n_t1d_medical`) +- number of type 2 diabetes-specific diagnosis codes from medical + departments (`n_t2d_medical`) #### Diabetes-specific podiatrist services @@ -137,11 +195,17 @@ variable based on the year-week (wwyy-formatted) variable (`honuge`) found in the raw data, and de-duplicates multiple services registered on the same date. -`include_podiatrist_services()` outputs a 3-column data frame containing -the identifier variable (`pnr`) and the date of the two earliest records -of diabetes-specific podiatrist services for each individual. This is -passed to the `get_diagnosis_date()` function for the final step of the -inclusion process. +`include_podiatrist_services()` outputs a 3-column data frame with one +row for each individual, containing the following variables: + +- identifier variable (`pnr`) +- the date of the first diabetes-specific podiatrist record + (`do_podiatrist_1`) +- the date of the second diabetes-specific podiatrist record + (`do_podiatrist_2`) + +The output is passed to the `get_diagnosis_date()` function for the +final step of the inclusion process. #### GLD purchases @@ -326,3 +390,4 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. + From 29fea86708df5597fd24cd54e32cd68cd7e87943 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Thu, 19 Sep 2024 13:39:01 +0200 Subject: [PATCH 06/28] Finished LPR/diagnosis part of function flow --- vignettes/function-flow.Rmd | 46 +++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 10 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index e685680..b819067 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -150,20 +150,42 @@ inclusion, as well as additional information needed to classify diabetes type. Diabetes diagnoses from both ICD-8 and ICD-10 are included. The function takes the outputs of `join_lpr2()` and `join_lpr3()` as -inputs and processes each input separately: +inputs and processes each input separately to generate the following +internal variables: - LPR2-data: - - Include all diabetes diagnoses, registered as primary (A) or - secondary (B) diagnoses, regardless of type or department: - `c_diag` starts with "DE1[0-4]", "249", or "250" and - `c_diagtype` either "A" or "B" - - Define whether the diagnosis was made made by an - endocrinological (`c_spec` = 8) or other medical department + - `pnr`: identifier variable + - `do_diagnosis`: include all diabetes diagnoses, registered as + primary (A) or secondary (B) diagnoses, regardless of type or + department: `c_diag` starts with "DE1[0-4]", "249", or "250" and + `c_diagtype` is either "A" or "B" + - `is_primary`: Define whether the diagnosis was a primary + diagnosis (`c_diagtype` == "A") + - `is_t1d`: Define whether the diagnosis was T1D-specific + (`c_diag` starts with "DE10" or "249") + - `is_t2d`: Define whether the diagnosis was T2D-specific + (`c_diag` starts with "DE11" or "250") + - `department`: Define whether the diagnosis was made made by an + endocrinological (`c_spec` == 8) or other medical department (`c_spec` \< 8 or 9-30) - LPR3: - - remove retracted diagnoses (LPR3) - -Internally, these intermediate results are joined, so + - `pnr`: identifier variable + - `do_diagnosis`: include all diabetes diagnoses, registered as + primary (A) or secondary (B) diagnoses, regardless of type or + department: `diagnosekode` starts with "DE1[0-4]" and + `diagnosetype` is either "A" or "B", but exclude retracted + diagnoses (`senere_afkraeftet` == "Ja") + - `is_primary`: Define whether the diagnosis was a primary + diagnosis (`diagnosetype` == "A") + - `is_t1d`: Define whether the diagnosis was T1D-specific + (`c_diag` starts with "DE10") + - `is_t2d`: Define whether the diagnosis was T2D-specific + (`c_diag` starts with "DE11") + - `department`: Define whether the diagnosis was made made by an + endocrinological (`c_spec` == 8) or other medical department + (`c_spec` \< 30 & != 8) + +These intermediate results are combined for further processing, and `include_diabetes_diagnoses()` outputs a single `data.frame` with the following variables (one row for each individual): @@ -179,6 +201,10 @@ following variables (one row for each individual): - number of type 2 diabetes-specific diagnosis codes from medical departments (`n_t2d_medical`) +The output is passed to the `get_diagnosis_date()` function for the +final step of the inclusion process and is subsequently used to classify +diabetes type. + #### Diabetes-specific podiatrist services The function `include_podiatrist_services()` uses `sysi` or `sssy` as From f9d7661da21cdb99772946d5dc06e85e95025636 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Thu, 19 Sep 2024 13:53:09 +0200 Subject: [PATCH 07/28] fixed a new things to describe LPR3 processing --- vignettes/function-flow.Rmd | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index b819067..52f79da 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -178,12 +178,19 @@ internal variables: - `is_primary`: Define whether the diagnosis was a primary diagnosis (`diagnosetype` == "A") - `is_t1d`: Define whether the diagnosis was T1D-specific - (`c_diag` starts with "DE10") + (`diagnosekode` starts with "DE10") - `is_t2d`: Define whether the diagnosis was T2D-specific - (`c_diag` starts with "DE11") + (`diagnosekode` starts with "DE11") - `department`: Define whether the diagnosis was made made by an - endocrinological (`c_spec` == 8) or other medical department - (`c_spec` \< 30 & != 8) + endocrinological (`hovedspeciale_ans` == "medicinsk + endokrinologi") or other medical department (`hovedspeciale_ans` + either "Blandet medicin og kirurgi", "Intern medicin", + "Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin", + "Kardiologi", "Medicinsk allergologi", "Medicinsk + gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi", + "Reumatologi", "Palliativ medicin", "Akut medicin", + "Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or + "Tropemedicin") These intermediate results are combined for further processing, and `include_diabetes_diagnoses()` outputs a single `data.frame` with the From 9a05d81815ab60889220564e4416f917b7226c15 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Thu, 19 Sep 2024 14:01:27 +0200 Subject: [PATCH 08/28] specified that only primary diagnoses go into type classification --- vignettes/function-flow.Rmd | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 52f79da..44f7c81 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -199,14 +199,14 @@ following variables (one row for each individual): - identifier variable (`pnr`) - date of the first diabetes diagnosis (`do_diagnosis_1`) - date of the second diabetes diagnosis (`do_diagnosis_2`) -- number of type 1 diabetes-specific diagnosis codes from +- number of type 1 diabetes-specific primary diagnosis codes from endocrinological departments (`n_t1d_endo`) -- number of type 2 diabetes-specific diagnosis codes from +- number of type 2 diabetes-specific primary diagnosis codes from endocrinological departments (`n_t2d_endo`) -- number of type 1 diabetes-specific diagnosis codes from medical - departments (`n_t1d_medical`) -- number of type 2 diabetes-specific diagnosis codes from medical - departments (`n_t2d_medical`) +- number of type 1 diabetes-specific primary diagnosis codes from + medical departments (`n_t1d_medical`) +- number of type 2 diabetes-specific primary diagnosis codes from + medical departments (`n_t2d_medical`) The output is passed to the `get_diagnosis_date()` function for the final step of the inclusion process and is subsequently used to classify @@ -423,4 +423,3 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. - From f03a4dac4ba55213856e93c4d665966b072bdb10 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen <67263135+Aastedet@users.noreply.github.com> Date: Thu, 19 Sep 2024 14:44:51 +0200 Subject: [PATCH 09/28] Update vignettes/function-flow.Rmd MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Signe Kirk Brødbæk <40836345+signekb@users.noreply.github.com> --- vignettes/function-flow.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 44f7c81..5722926 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -245,7 +245,7 @@ final step of the inclusion process. The function `include_gld_purchases()` uses `lmdb` to extract the dates of all GLD purchases. -These dates are extracted by filtering values beginning with "A10" in +These dates are extracted by including all values beginning with "A10" in the `atc` variable of the `lmdb` register. Since the diagnosis code data on pregnancies (see below) is insufficient to perform censoring prior to 1997, `include_gld_purchases()` only extracts dates from 1997 onward by From bc889d4e829098ae84f1e4ffbc57f9d560f997c9 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Thu, 19 Sep 2024 15:17:56 +0200 Subject: [PATCH 10/28] switched the order of inclusion sections and mentioned that some of the inputs go to exclusion functions --- vignettes/function-flow.Rmd | 45 +++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 44f7c81..7cf35b9 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -92,25 +92,7 @@ outputs.](images/function-flow-population.png) ### Inclusion events -#### HbA1c tests above 48 mmol/mol - -The function `include_hba1c()` uses `lab_forsker` as the input data to -extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol -(IFCC units, `analysiscode` NPU27300) or $\geq$ 6.5% (DCCT units, -`analysiscode` NPU03835). Multiple elevated results on the same day -within each individual are deduplicated, to account for the same test -result being reported as two rows (one for IFCC, one for DCCT units). - -`include_hba1c()` outputs a 2-column data frame containing the following -variables: - -- identifier variable (`pnr`) -- the dates of all elevated HbA1c test results (`do_pos_hba1c`). - -The output is passed to the `exclude_pregnancy()` function for censoring -of elevated results due to potential gestational diabetes (see below). - -#### Hospital diagnosis of diabetes +#### Hospital diagnoses **Joining LPR2 and LPR3 data** @@ -139,10 +121,11 @@ required information by record number (`dw_ek_kontakt`), and outputs a - diagnosis type (`diagnosetype`) - diagnosis retracted (`senere_afkraeftet`) -These outputs are passed to `include_diabetes_diagnoses()` for further -processing, see below. +These outputs are passed to `include_diabetes_diagnoses()` (and to +`get_pregnancy_dates()`, see exclusion events) for further processing +below. -**Processing of diagnoses** +**Processing of diabetes diagnoses** The function `include_diabetes_diagnoses()` uses the hospital contacts from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for @@ -240,6 +223,24 @@ row for each individual, containing the following variables: The output is passed to the `get_diagnosis_date()` function for the final step of the inclusion process. +#### HbA1c tests above 48 mmol/mol + +The function `include_hba1c()` uses `lab_forsker` as the input data to +extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol +(IFCC units, `analysiscode` NPU27300) or $\geq$ 6.5% (DCCT units, +`analysiscode` NPU03835). Multiple elevated results on the same day +within each individual are deduplicated, to account for the same test +result being reported as two rows (one for IFCC, one for DCCT units). + +`include_hba1c()` outputs a 2-column data frame containing the following +variables: + +- identifier variable (`pnr`) +- the dates of all elevated HbA1c test results (`do_pos_hba1c`). + +The output is passed to the `exclude_pregnancy()` function for censoring +of elevated results due to potential gestational diabetes (see below). + #### GLD purchases The function `include_gld_purchases()` uses `lmdb` to extract the dates From 7525b606cb6e39bcf959be35eb15ff7dbef1a336 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 20 Sep 2024 13:09:55 +0200 Subject: [PATCH 11/28] fixed spec to speciale variable name --- vignettes/function-flow.Rmd | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index ccaabbe..d107b7e 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -201,7 +201,7 @@ The function `include_podiatrist_services()` uses `sysi` or `sssy` as input to extract the dates of all diabetes-specific podiatrist services. These dates are extracted by filtering values beginning with "54" in the -`spec` variable of the `sssy` and `sysi` registers by default +`speciale` variable of the `sssy` and `sysi` registers by default (alternatively, the function can take the `spec2` variable as input instead, if that is the data available to the user). In addition, services provided to a child of the individual (`barnmak` != 0) are @@ -246,12 +246,12 @@ of elevated results due to potential gestational diabetes (see below). The function `include_gld_purchases()` uses `lmdb` to extract the dates of all GLD purchases. -These dates are extracted by including all values beginning with "A10" in -the `atc` variable of the `lmdb` register. Since the diagnosis code data -on pregnancies (see below) is insufficient to perform censoring prior to -1997, `include_gld_purchases()` only extracts dates from 1997 onward by -default (if Medical Birth Register data is available to use for -censoring, the extraction window can be extended). +These dates are extracted by including all values beginning with "A10" +in the `atc` variable of the `lmdb` register. Since the diagnosis code +data on pregnancies (see below) is insufficient to perform censoring +prior to 1997, `include_gld_purchases()` only extracts dates from 1997 +onward by default (if Medical Birth Register data is available to use +for censoring, the extraction window can be extended). This function outputs a `data.frame` with the following variables needed later in the classification part of the function flow: From 092824e540cc2e13ad8ed81ebd44e7dfb14d0c13 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 20 Sep 2024 13:26:09 +0200 Subject: [PATCH 12/28] Removed "name" or "vnr" variables from GLD function flow. Autoformatting made a few changes after resolving previous merge conflict. --- vignettes/function-flow.Rmd | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 134e577..11bd3df 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -91,10 +91,12 @@ that get or extract a condition or joins data or function outputs.](images/function-flow-population.png) ### Inclusion events + ```{r, include=FALSE} library(dplyr) library(osdc) ``` + #### Hospital diagnoses **Joining LPR2 and LPR3 data** @@ -180,11 +182,11 @@ internal variables: These intermediate results are combined for further processing, and `include_diabetes_diagnoses()` outputs a single `data.frame` with the -following variables (one row for each individual): +following variables (up to two rows per individual): - identifier variable (`pnr`) -- date of the first diabetes diagnosis (`do_diagnosis_1`) -- date of the second diabetes diagnosis (`do_diagnosis_2`) +- dates of the first and second hospital diabetes diagnosis + (`diagnosis_dates`) - number of type 1 diabetes-specific primary diagnosis codes from endocrinological departments (`n_t1d_endo`) - number of type 2 diabetes-specific primary diagnosis codes from @@ -229,10 +231,11 @@ final step of the inclusion process. #### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%) The function `include_hba1c()` uses `lab_forsker` as the input data to -extract the dates of all elevated HbA1c test results, using the appropriate cut-offs: +extract the dates of all elevated HbA1c test results, using the +appropriate cut-offs: -- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol -- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% . +- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol +- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% . ```{r, echo=FALSE} algorithm |> @@ -240,9 +243,9 @@ algorithm |> knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.") ``` -Multiple elevated results on the same day -within each individual are deduplicated, to account for the same test -result often being reported twice (one for IFCC, one for DCCT units). +Multiple elevated results on the same day within each individual are +deduplicated, to account for the same test result often being reported +twice (one for IFCC, one for DCCT units). `include_hba1c()` outputs a 2-column data frame containing the following variables: @@ -273,7 +276,6 @@ later in the classification part of the function flow: - type of drug (`atc`) - amount purchased (`volume` and `apk`) - indication code (`indo`) -- brand name or vnr-number (`name` or `vnr`) These events are then passed to a chain of exclusion functions: `exclude_wld_purchases()`, `exclude_potential_pcos()`, @@ -436,3 +438,4 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. + From 20f58862389081b8f5e45291eb3a1f26955a3e6f Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 20 Sep 2024 13:37:39 +0200 Subject: [PATCH 13/28] Updates join_lpr function description to filter to necessary diagnoses. --- vignettes/function-flow.Rmd | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 11bd3df..127f6db 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -105,9 +105,10 @@ The helper functions `join_lpr2()` and `join_lpr3()` join records of diagnoses to administrative information in LPR2-formatted and LPR3-formatted data, respectively. -`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, joins the -required information by record number (`recnum`), and outputs a -`data.frame` with the following variables: +`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the +necessary diagnoses (`c_diag` starting with "DO", "DZ3", "DE1[0-4]", +"249", or "250"), joins the required information by record number +(`recnum`), and outputs a `data.frame` with the following variables: - identifier variable (`pnr`) - date (`d_inddto`) @@ -115,9 +116,11 @@ required information by record number (`recnum`), and outputs a - diagnosis code (`c_diag`) - diagnosis type (`c_diagtype`) -`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, joins the -required information by record number (`dw_ek_kontakt`), and outputs a -`data.frame` with the following variables: +`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to +the necessary diagnoses (`diagnosekode` starting with "DO", "DZ3", or +"DE1[0-4]"), joins the required information by record number +(`dw_ek_kontakt`), and outputs a `data.frame` with the following +variables: - identifier variable (`cpr`) - date (`dato_start`) @@ -438,4 +441,3 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. - From 61b5d27f69ff0ed68a871909fce0801e5e4bfaf1 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 20 Sep 2024 13:42:29 +0200 Subject: [PATCH 14/28] Removed section on weightloss drugs, since we're no longer including drugs with a dual-use for weightloss. --- vignettes/function-flow.Rmd | 7 ------- 1 file changed, 7 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 127f6db..5f2fdcf 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -312,13 +312,6 @@ in LPR3 data). -#### Glucose-lowering brand drugs for weight loss - -The function `exclude_wld_purchases()` uses lmdb as input and excludes -the brand drugs Saxenda and Wegovy. - - - #### Metformin purchases for women below age 40 The function `exclude_potential_pcos()` as input to exclude all From 7b9738d7443a85f5e7efce80b5af46e37e740ed9 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen <67263135+Aastedet@users.noreply.github.com> Date: Fri, 20 Sep 2024 13:58:14 +0200 Subject: [PATCH 15/28] Update vignettes/function-flow.Rmd Co-authored-by: Luke W. Johnston --- vignettes/function-flow.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 5f2fdcf..f50c4c8 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -111,7 +111,7 @@ necessary diagnoses (`c_diag` starting with "DO", "DZ3", "DE1[0-4]", (`recnum`), and outputs a `data.frame` with the following variables: - identifier variable (`pnr`) -- date (`d_inddto`) +- date (originally `d_inddto`, renamed to `date`) - department specialty (`c_spec`) - diagnosis code (`c_diag`) - diagnosis type (`c_diagtype`) From 3a95d4f2d61db75de1da62a19e10113916641d87 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 20 Sep 2024 13:58:49 +0200 Subject: [PATCH 16/28] Added description of exclude_potential_pcos() --- vignettes/function-flow.Rmd | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 5f2fdcf..550561c 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -296,6 +296,20 @@ inputs to two sets of functions: ### Exclusion events +#### Metformin purchases potentially for the treatment of polycystic ovary syndrome + +The function `exclude_potential_pcos()` takes the output from +`include_gld_purchases()` and `bef` (information on sex and date of +birth) as inputs and censors (filters out) all purchases of metformin in +women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` = +"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an +indication code suggesting treatment of polycystic ovary syndrome (`atc` += "A10BA02" & `sex` = "woman" & `indication_code` either "0000092", +"0000276", "0000781"). + +After these exclusions are made, the output is passed to +`exclude_pregnancy()` for further censoring, described below: + #### HbA1c tests and GLD purchases during pregnancy The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as @@ -312,19 +326,6 @@ in LPR3 data). -#### Metformin purchases for women below age 40 - -The function `exclude_potential_pcos()` as input to exclude all -purchases of metformin by women below age 40 (i.e., \<= 39 years old) at -the date of purchase. It relies on `bef` as input. - -This function contains two helper functions: - -- `keep_women()` -- `drop_age_40_below()` - - - ### Get diagnosis date The function `get_diagnosis_date()` combines the outputs from the From fe257a64d79068d906262d0b1d9ac9ae1fa49c6e Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 20 Sep 2024 14:17:27 +0200 Subject: [PATCH 17/28] Renamed some variables. --- vignettes/function-flow.Rmd | 42 ++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index e6bd2c4..0eb7731 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -122,8 +122,8 @@ the necessary diagnoses (`diagnosekode` starting with "DO", "DZ3", or (`dw_ek_kontakt`), and outputs a `data.frame` with the following variables: -- identifier variable (`cpr`) -- date (`dato_start`) +- identifier variable (originally `cpr`, renamed to `pnr`) +- date (originally `dato_start`, renamed to `date`) - department specialty (`hovedspeciale_ans`) - diagnosis code (`diagnosekode`) - diagnosis type (`diagnosetype`) @@ -146,10 +146,10 @@ internal variables: - LPR2-data: - `pnr`: identifier variable - - `do_diagnosis`: include all diabetes diagnoses, registered as - primary (A) or secondary (B) diagnoses, regardless of type or - department: `c_diag` starts with "DE1[0-4]", "249", or "250" and - `c_diagtype` is either "A" or "B" + - `dates`: dates of all included diabetes diagnoses: + - registered as primary (A) or secondary (B) diagnoses, regardless + of type or department: - `c_diag` starts with "DE1[0-4]", "249", + or "250" and `c_diagtype` is either "A" or "B" - `is_primary`: Define whether the diagnosis was a primary diagnosis (`c_diagtype` == "A") - `is_t1d`: Define whether the diagnosis was T1D-specific @@ -161,11 +161,11 @@ internal variables: (`c_spec` \< 8 or 9-30) - LPR3: - `pnr`: identifier variable - - `do_diagnosis`: include all diabetes diagnoses, registered as - primary (A) or secondary (B) diagnoses, regardless of type or - department: `diagnosekode` starts with "DE1[0-4]" and - `diagnosetype` is either "A" or "B", but exclude retracted - diagnoses (`senere_afkraeftet` == "Ja") + - `dates`: dates of all included diabetes diagnoses: + - Registered as primary (A) or secondary (B) diagnoses, regardless + of type or department, but exclude retracted diagnoses: - + `diagnosekode` starts with "DE1[0-4]", `diagnosetype` is either + "A" or "B" and `senere_afkraeftet` == "Nej") - `is_primary`: Define whether the diagnosis was a primary diagnosis (`diagnosetype` == "A") - `is_t1d`: Define whether the diagnosis was T1D-specific @@ -189,7 +189,7 @@ following variables (up to two rows per individual): - identifier variable (`pnr`) - dates of the first and second hospital diabetes diagnosis - (`diagnosis_dates`) + (`diagnosis_date`) - number of type 1 diabetes-specific primary diagnosis codes from endocrinological departments (`n_t1d_endo`) - number of type 2 diabetes-specific primary diagnosis codes from @@ -219,14 +219,12 @@ variable based on the year-week (wwyy-formatted) variable (`honuge`) found in the raw data, and de-duplicates multiple services registered on the same date. -`include_podiatrist_services()` outputs a 3-column data frame with one -row for each individual, containing the following variables: +`include_podiatrist_services()` outputs a 2-column data frame with up to +two rows for each individual, containing the following variables: - identifier variable (`pnr`) -- the date of the first diabetes-specific podiatrist record - (`do_podiatrist_1`) -- the date of the second diabetes-specific podiatrist record - (`do_podiatrist_2`) +- the dates of the first and second diabetes-specific podiatrist + record (`dates`) The output is passed to the `get_diagnosis_date()` function for the final step of the inclusion process. @@ -275,10 +273,11 @@ This function outputs a `data.frame` with the following variables needed later in the classification part of the function flow: - identifier variable (`pnr`) -- date (`eksd`) +- date (originally `eksd`, renamed to `date`) - type of drug (`atc`) -- amount purchased (`volume` and `apk`) -- indication code (`indo`) +- amount purchased (`volume` and `number_of_packages` (originally + named `apk`)) +- indication code (originally `indo`, renamed to `indication_code`) These events are then passed to a chain of exclusion functions: `exclude_wld_purchases()`, `exclude_potential_pcos()`, @@ -435,3 +434,4 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. + From 4bba18e11fbaa77c22cc21253478a7d094498614 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 20 Sep 2024 14:35:14 +0200 Subject: [PATCH 18/28] Added censoring/exclusion function description --- vignettes/function-flow.Rmd | 37 +++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 0eb7731..dfd3394 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -283,16 +283,6 @@ These events are then passed to a chain of exclusion functions: `exclude_wld_purchases()`, `exclude_potential_pcos()`, `exclude_pregnancy()` described in the sections below. -After these exclusion functions have been applied, the output serves as -inputs to two sets of functions: - -1. the `get_diagnosis_date()` function for the final step of the - inclusion process. -2. the `get_only_insulin_purchases()`, - `get_insulin_purchases_within_180_days()`, and - `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the - classification of diabetes type. - ### Exclusion events #### Metformin purchases potentially for the treatment of polycystic ovary syndrome @@ -311,19 +301,30 @@ After these exclusions are made, the output is passed to #### HbA1c tests and GLD purchases during pregnancy -The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as -input and is used to exclude both HbA1c tests and GLD purchases during -pregnancy, as these may be due to gestational diabetes, rather than type -1 or type 2 diabetes. +The function `exclude_pregnancy()` takes the combined outputs from +`join_lpr2()`, `join_lpr3()`, `include_hba1c()`, and +`exclude_potential_pcos()` and uses diagnoses from LPR2 or LPR3 to +exclude both elevated HbA1c tests and GLD purchases during pregnancy, as +these may be due to gestational diabetes, rather than type 1 or type 2 +diabetes. Internally, this relies on the function `get_pregnancy_dates()` that uses diagnoses registered in the National Patient Register to extract the dates of all pregnancy ending (live births or miscarriages). These -are identified by filtering values beginning with "DO0[0-6]", "DO8[0-4]" -or "DZ3[37]" in the `c_diag` variable in the LPR2 data (`diagnosekode` -in LPR3 data). +are identified by filtering +`values beginning with "DO0[0-6]", "DO8[0-4]" or "DZ3[37]" in the`c_diag`variable in the LPR2 data (`diagnosekode`in LPR3 data). The dates output by`get_pregnancy_dates()\` +are used to exclude all inclusion events registered between 40 weeks +before and 12 weeks after a pregnancy ending. - +After these exclusion functions have been applied, the output serves as +inputs to two sets of functions: + +1. the `get_diagnosis_date()` function for the final step of the + inclusion process. +2. the `get_only_insulin_purchases()`, + `get_insulin_purchases_within_180_days()`, and + `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the + classification of diabetes type. ### Get diagnosis date From 7cca920bf2f2b3eddb4a8c600e62582c007c045f Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 27 Sep 2024 12:05:15 +0200 Subject: [PATCH 19/28] Added correct diagnoses to filter to in lpr_join() functions. Reformatted output variable lists to start with variable names, followed by a short description. Also added info on renamed variables. --- vignettes/function-flow.Rmd | 189 +++++++++++++++++++----------------- 1 file changed, 99 insertions(+), 90 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index dfd3394..0c4d923 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -106,28 +106,30 @@ diagnoses to administrative information in LPR2-formatted and LPR3-formatted data, respectively. `join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the -necessary diagnoses (`c_diag` starting with "DO", "DZ3", "DE1[0-4]", -"249", or "250"), joins the required information by record number -(`recnum`), and outputs a `data.frame` with the following variables: +necessary diagnoses (`c_diag` starting with "DO0[0-6]", "DO8[0-4]", +"DZ3[37]", "DE1[0-4]", "249", or "250"), joins the required information +by record number (`recnum`), and outputs a `data.frame` with the +following variables: -- identifier variable (`pnr`) -- date (originally `d_inddto`, renamed to `date`) -- department specialty (`c_spec`) -- diagnosis code (`c_diag`) -- diagnosis type (`c_diagtype`) +- `pnr`: identifier variable +- `date`: date of the recorded diagnosis (renamed from `d_inddto`) +- `specialty`: department specialty (renamed from `c_spec`) +- `diagnosis`: diagnosis code (renamed from `c_diag`) +- `diagnosis_type`: diagnosis type (renamed from `c_diagtype`) `join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to -the necessary diagnoses (`diagnosekode` starting with "DO", "DZ3", or -"DE1[0-4]"), joins the required information by record number -(`dw_ek_kontakt`), and outputs a `data.frame` with the following -variables: - -- identifier variable (originally `cpr`, renamed to `pnr`) -- date (originally `dato_start`, renamed to `date`) -- department specialty (`hovedspeciale_ans`) -- diagnosis code (`diagnosekode`) -- diagnosis type (`diagnosetype`) -- diagnosis retracted (`senere_afkraeftet`) +the necessary diagnoses (`diagnosekode` starting with "DO0[0-6]", +"DO8[0-4]", "DZ3[37]" or "DE1[0-4]"), joins the required information by +record number (`dw_ek_kontakt`), and outputs a `data.frame` with the +following variables: + +- `pnr`: identifier variable (renamed from `cpr`) +- `date`: date of the recorded diagnosis (renamed from `dato_start`) +- `specialty`: department specialty (renamed from `hovedspeciale_ans`) +- `diagnosis`: diagnosis code (renamed from `diagnosekode`) +- `diagnosis_type`: diagnosis type (renamed from `diagnosetype`) +- `diagnosis_retracted`: if the diagnosis was later retracted (renamed + from `senere_afkraeftet`) These outputs are passed to `include_diabetes_diagnoses()` (and to `get_pregnancy_dates()`, see exclusion events) for further processing @@ -144,64 +146,68 @@ The function takes the outputs of `join_lpr2()` and `join_lpr3()` as inputs and processes each input separately to generate the following internal variables: -- LPR2-data: +- From `join_lpr2`: - `pnr`: identifier variable - - `dates`: dates of all included diabetes diagnoses: + - `date`: dates of all included diabetes diagnoses: - registered as primary (A) or secondary (B) diagnoses, regardless - of type or department: - `c_diag` starts with "DE1[0-4]", "249", - or "250" and `c_diagtype` is either "A" or "B" + of type or department: + - `diagnosis` starts with "DE1[0-4]", "249" or "250", and + `diagnosis_type` is either "A" or "B" - `is_primary`: Define whether the diagnosis was a primary - diagnosis (`c_diagtype` == "A") + diagnosis (`diagnosis_type` == "A") - `is_t1d`: Define whether the diagnosis was T1D-specific - (`c_diag` starts with "DE10" or "249") + (`diagnosis` starts with "DE10" or "249") - `is_t2d`: Define whether the diagnosis was T2D-specific - (`c_diag` starts with "DE11" or "250") + (`diagnosis` starts with "DE11" or "250") - `department`: Define whether the diagnosis was made made by an - endocrinological (`c_spec` == 8) or other medical department - (`c_spec` \< 8 or 9-30) -- LPR3: + endocrinological (`specialty` == 8) or other medical department + (`specialty` \< 8 or 9-30) +- From `join_lpr3()`: - `pnr`: identifier variable - - `dates`: dates of all included diabetes diagnoses: - - Registered as primary (A) or secondary (B) diagnoses, regardless - of type or department, but exclude retracted diagnoses: - - `diagnosekode` starts with "DE1[0-4]", `diagnosetype` is either - "A" or "B" and `senere_afkraeftet` == "Nej") + - `date`: dates of all included diabetes diagnoses: + - registered as primary (A) or secondary (B) diagnoses, regardless + of type or department, but exclude retracted diagnoses: + - `diagnosis` starts with "DE1[0-4]", `diagnosis_type` is + either "A" or "B" and `diagnosis_retracted` == "Nej" - `is_primary`: Define whether the diagnosis was a primary - diagnosis (`diagnosetype` == "A") + diagnosis (`diagnosis_type` == "A") - `is_t1d`: Define whether the diagnosis was T1D-specific - (`diagnosekode` starts with "DE10") + (`diagnosis` starts with "DE10") - `is_t2d`: Define whether the diagnosis was T2D-specific - (`diagnosekode` starts with "DE11") + (`diagnosis` starts with "DE11") - `department`: Define whether the diagnosis was made made by an - endocrinological (`hovedspeciale_ans` == "medicinsk - endokrinologi") or other medical department (`hovedspeciale_ans` - either "Blandet medicin og kirurgi", "Intern medicin", - "Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin", - "Kardiologi", "Medicinsk allergologi", "Medicinsk - gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi", - "Reumatologi", "Palliativ medicin", "Akut medicin", - "Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or - "Tropemedicin") - -These intermediate results are combined for further processing, and -`include_diabetes_diagnoses()` outputs a single `data.frame` with the -following variables (up to two rows per individual): - -- identifier variable (`pnr`) -- dates of the first and second hospital diabetes diagnosis - (`diagnosis_date`) -- number of type 1 diabetes-specific primary diagnosis codes from - endocrinological departments (`n_t1d_endo`) -- number of type 2 diabetes-specific primary diagnosis codes from - endocrinological departments (`n_t2d_endo`) -- number of type 1 diabetes-specific primary diagnosis codes from - medical departments (`n_t1d_medical`) -- number of type 2 diabetes-specific primary diagnosis codes from - medical departments (`n_t2d_medical`) - -The output is passed to the `get_diagnosis_date()` function for the -final step of the inclusion process and is subsequently used to classify -diabetes type. + endocrinological department (`specialty` == "medicinsk + endokrinologi" -\> `department` == "endocrinological") or other + medical department (`specialty` either "Blandet medicin og + kirurgi", "Intern medicin", "Geriatri", "Hepatologi", + "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk + allergologi", "Medicinsk gastroenterologi", "Medicinsk + lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin", + "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi", + "Fysiurgi", or "Tropemedicin" -\> `department` == "medical") + +Internally, these intermediate results are combined and processed +together. And ultimately, `include_diabetes_diagnoses()` outputs a +single `data.frame` with the following variables (up to two rows per +individual): + +- `pnr`: identifier variable +- `dates`: dates of the first and second hospital diabetes diagnosis +- `n_t1d_endocrinology`: number of type 1 diabetes-specific primary + diagnosis codes from endocrinological departments +- `n_t2d_endocrinology`: number of type 2 diabetes-specific primary + diagnosis codes from endocrinological departments +- `n_t1d_medical`: number of type 1 diabetes-specific primary + diagnosis codes from medical departments +- `n_t2d_medical`: number of type 2 diabetes-specific primary + diagnosis codes from medical departments + +This output is passed to the `join_inclusions()` function, where the +`dates` variable is used for the final step of the inclusion process. +The variables of counts of diabetes type-specific primary diagnoses are +carried over for the subsequent classification of diabetes type, +initially as inputs to the `get_t1d_primary_diagnosis()` and +`get_majority_of_t1d_diagnoses()` functions. #### Diabetes-specific podiatrist services @@ -222,12 +228,12 @@ the same date. `include_podiatrist_services()` outputs a 2-column data frame with up to two rows for each individual, containing the following variables: -- identifier variable (`pnr`) -- the dates of the first and second diabetes-specific podiatrist - record (`dates`) +- `pnr`: identifier variable +- `dates`: the dates of the first and second diabetes-specific + podiatrist record -The output is passed to the `get_diagnosis_date()` function for the -final step of the inclusion process. +The output is passed to the `join_inclusions()` function for the final +step of the inclusion process. #### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%) @@ -251,8 +257,8 @@ twice (one for IFCC, one for DCCT units). `include_hba1c()` outputs a 2-column data frame containing the following variables: -- identifier variable (`pnr`) -- the dates of all elevated HbA1c test results (`dates`). +- `pnr`: identifier variable +- `dates`: the dates of all elevated HbA1c test results The output is passed to the `exclude_pregnancy()` function for censoring of elevated results due to potential gestational diabetes (see below). @@ -272,12 +278,14 @@ for censoring, the extraction window can be extended). This function outputs a `data.frame` with the following variables needed later in the classification part of the function flow: -- identifier variable (`pnr`) -- date (originally `eksd`, renamed to `date`) -- type of drug (`atc`) -- amount purchased (`volume` and `number_of_packages` (originally - named `apk`)) -- indication code (originally `indo`, renamed to `indication_code`) +- `pnr`: identifier variable +- `date`: dates of all purchases of GLD (renamed from `eksd`) +- `atc`: type of drug +- `contained_doses`: amount purchased, in number of defined daily + doses (DDD). Calculated as `volume` (doses contained in the + purchased package) times `apk` (number of packages purchased) +- `indication_code`: indication code of the prescription (renamed from + `indo`) These events are then passed to a chain of exclusion functions: `exclude_wld_purchases()`, `exclude_potential_pcos()`, @@ -294,7 +302,7 @@ women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` = "woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an indication code suggesting treatment of polycystic ovary syndrome (`atc` = "A10BA02" & `sex` = "woman" & `indication_code` either "0000092", -"0000276", "0000781"). +"0000276" or "0000781"). After these exclusions are made, the output is passed to `exclude_pregnancy()` for further censoring, described below: @@ -311,17 +319,19 @@ diabetes. Internally, this relies on the function `get_pregnancy_dates()` that uses diagnoses registered in the National Patient Register to extract the dates of all pregnancy ending (live births or miscarriages). These -are identified by filtering -`values beginning with "DO0[0-6]", "DO8[0-4]" or "DZ3[37]" in the`c_diag`variable in the LPR2 data (`diagnosekode`in LPR3 data). The dates output by`get_pregnancy_dates()\` -are used to exclude all inclusion events registered between 40 weeks -before and 12 weeks after a pregnancy ending. +are identified by `diagnosis` values beginning with "DO0[0-6]", +"DO8[0-4]" or "DZ3[37]". The dates output by`get_pregnancy_dates()\` are +used to exclude all inclusion events registered between 40 weeks before +and 12 weeks after a pregnancy ending. After these exclusion functions have been applied, the output serves as inputs to two sets of functions: -1. the `get_diagnosis_date()` function for the final step of the - inclusion process. -2. the `get_only_insulin_purchases()`, +1. the censored HbA1c and GLD data are passed to the + `join_inclusions()` function for the final step of the inclusion + process. +2. the censored GLD data is passed to the + `get_only_insulin_purchases()`, `get_insulin_purchases_within_180_days()`, and `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the classification of diabetes type. @@ -369,8 +379,8 @@ OSDC algorithm includes the following criteria: diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in the previous steps. 2. `get_only_insulin_purchases()` which relies on the GLD purchases - from Lægemiddelsdatabasen to get patients where all GLD purchases - are insulin only. + from Lægemiddeldatabasen to get patients where all GLD purchases are + insulin only. 3. `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses) which again relies on primary hospital diagnoses from LPR. 4. `get_insulin_purchase_within_180_days()` which relies on both @@ -435,4 +445,3 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. - From b40412c4b85f560d3ada63bed8ac23da575dd35f Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Fri, 27 Sep 2024 12:34:56 +0200 Subject: [PATCH 20/28] changed specialty values to align with the PR with a refactored create_lpr2() function --- vignettes/function-flow.Rmd | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 0c4d923..2360270 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -160,8 +160,9 @@ internal variables: - `is_t2d`: Define whether the diagnosis was T2D-specific (`diagnosis` starts with "DE11" or "250") - `department`: Define whether the diagnosis was made made by an - endocrinological (`specialty` == 8) or other medical department - (`specialty` \< 8 or 9-30) + endocrinological (`specialty` == 8 -\> `department` == + "endocrinology") or other medical department (`specialty` \< 8 + or 9-30 -\> `department` == "other medical") - From `join_lpr3()`: - `pnr`: identifier variable - `date`: dates of all included diabetes diagnoses: @@ -177,14 +178,15 @@ internal variables: (`diagnosis` starts with "DE11") - `department`: Define whether the diagnosis was made made by an endocrinological department (`specialty` == "medicinsk - endokrinologi" -\> `department` == "endocrinological") or other + endokrinologi" -\> `department` == "endocrinology") or other medical department (`specialty` either "Blandet medicin og kirurgi", "Intern medicin", "Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk allergologi", "Medicinsk gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin", "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi", - "Fysiurgi", or "Tropemedicin" -\> `department` == "medical") + "Fysiurgi", or "Tropemedicin" -\> `department` == "other + medical") Internally, these intermediate results are combined and processed together. And ultimately, `include_diabetes_diagnoses()` outputs a From 35118e82e93762bb23ee72893a6f03fd6cdd0ae8 Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Tue, 17 Dec 2024 00:42:11 +0100 Subject: [PATCH 21/28] Joining inclusions and definition. Looking to add type classification. --- vignettes/function-flow.Rmd | 165 ++++++++++++++++++++++++++---------- 1 file changed, 121 insertions(+), 44 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 2360270..1d8ceec 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -151,8 +151,8 @@ internal variables: - `date`: dates of all included diabetes diagnoses: - registered as primary (A) or secondary (B) diagnoses, regardless of type or department: - - `diagnosis` starts with "DE1[0-4]", "249" or "250", and - `diagnosis_type` is either "A" or "B" + - Keep rows where `diagnosis` starts with "DE1[0-4]", "249" or + "250", and `diagnosis_type` is either "A" or "B" - `is_primary`: Define whether the diagnosis was a primary diagnosis (`diagnosis_type` == "A") - `is_t1d`: Define whether the diagnosis was T1D-specific @@ -160,16 +160,17 @@ internal variables: - `is_t2d`: Define whether the diagnosis was T2D-specific (`diagnosis` starts with "DE11" or "250") - `department`: Define whether the diagnosis was made made by an - endocrinological (`specialty` == 8 -\> `department` == - "endocrinology") or other medical department (`specialty` \< 8 - or 9-30 -\> `department` == "other medical") + endocrinological (if `specialty` == 8 then `department` == + "endocrinology") or other medical department (if `specialty` \< + 8 or 9-30 then `department` == "other medical") - From `join_lpr3()`: - `pnr`: identifier variable - `date`: dates of all included diabetes diagnoses: - registered as primary (A) or secondary (B) diagnoses, regardless of type or department, but exclude retracted diagnoses: - - `diagnosis` starts with "DE1[0-4]", `diagnosis_type` is - either "A" or "B" and `diagnosis_retracted` == "Nej" + - Keep rows where `diagnosis` starts with "DE1[0-4]", + `diagnosis_type` is either "A" or "B" and + `diagnosis_retracted` == "Nej" - `is_primary`: Define whether the diagnosis was a primary diagnosis (`diagnosis_type` == "A") - `is_t1d`: Define whether the diagnosis was T1D-specific @@ -177,15 +178,15 @@ internal variables: - `is_t2d`: Define whether the diagnosis was T2D-specific (`diagnosis` starts with "DE11") - `department`: Define whether the diagnosis was made made by an - endocrinological department (`specialty` == "medicinsk - endokrinologi" -\> `department` == "endocrinology") or other - medical department (`specialty` either "Blandet medicin og + endocrinological department (if `specialty` == "medicinsk + endokrinologi" then `department` == "endocrinology") or other + medical department (if `specialty` is any of "Blandet medicin og kirurgi", "Intern medicin", "Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk allergologi", "Medicinsk gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin", "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi", - "Fysiurgi", or "Tropemedicin" -\> `department` == "other + "Fysiurgi", or "Tropemedicin" then `department` == "other medical") Internally, these intermediate results are combined and processed @@ -271,14 +272,21 @@ The function `include_gld_purchases()` uses `lmdb` to extract the dates of all GLD purchases. These dates are extracted by including all values beginning with "A10" -in the `atc` variable of the `lmdb` register. Since the diagnosis code -data on pregnancies (see below) is insufficient to perform censoring -prior to 1997, `include_gld_purchases()` only extracts dates from 1997 -onward by default (if Medical Birth Register data is available to use -for censoring, the extraction window can be extended). - -This function outputs a `data.frame` with the following variables needed -later in the classification part of the function flow: +in the `atc` variable of the `lmdb` register, except for +glucose-lowering drugs that may be used for other conditions than +diabetes: GLP-RAs (`atc` start with "A10BJ") or +dapagliflozin/empagliflozin (`atc` = "A10BK01" or "A10BK03"). + +Since the diagnosis code data on pregnancies (see below) is insufficient +to perform censoring prior to 1997, `include_gld_purchases()` only +extracts dates from 1997 onward by default (if Medical Birth Register +data is available to use for censoring, the extraction window can be +extended). + +This function outputs a long `data.frame` (since all dates of purchases +must be kept for later use in classifyin diabetes type) with the +following variables needed later in the classification part of the +function flow: - `pnr`: identifier variable - `date`: dates of all purchases of GLD (renamed from `eksd`) @@ -290,8 +298,8 @@ later in the classification part of the function flow: `indo`) These events are then passed to a chain of exclusion functions: -`exclude_wld_purchases()`, `exclude_potential_pcos()`, -`exclude_pregnancy()` described in the sections below. +`exclude_potential_pcos()` and `exclude_pregnancy()` described in the +sections below. ### Exclusion events @@ -301,13 +309,16 @@ The function `exclude_potential_pcos()` takes the output from `include_gld_purchases()` and `bef` (information on sex and date of birth) as inputs and censors (filters out) all purchases of metformin in women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` = -"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an -indication code suggesting treatment of polycystic ovary syndrome (`atc` -= "A10BA02" & `sex` = "woman" & `indication_code` either "0000092", -"0000276" or "0000781"). +"woman" & age at purchase (`date`-`date_of_birth`) \< 40 years) or an +indication code suggesting the prescription was made for treatment of +polycystic ovary syndrome (`atc` = "A10BA02" & `sex` = "woman" & +`indication_code` either of "0000092", "0000276" or "0000781"). -After these exclusions are made, the output is passed to -`exclude_pregnancy()` for further censoring, described below: +This function only performs a filtering operation, and output retains +the same structure and variables as the input passed from +`include_gld_purchases()`. After these exclusions are made, the output +is passed to `exclude_pregnancy()` for further censoring, described +below: #### HbA1c tests and GLD purchases during pregnancy @@ -320,16 +331,17 @@ diabetes. Internally, this relies on the function `get_pregnancy_dates()` that uses diagnoses registered in the National Patient Register to extract -the dates of all pregnancy ending (live births or miscarriages). These -are identified by `diagnosis` values beginning with "DO0[0-6]", -"DO8[0-4]" or "DZ3[37]". The dates output by`get_pregnancy_dates()\` are -used to exclude all inclusion events registered between 40 weeks before -and 12 weeks after a pregnancy ending. +the dates of all recorded pregnancy endings (live births or +miscarriages). These are identified by `diagnosis` values beginning with +"DO0[0-6]", "DO8[0-4]" or "DZ3[37]". The dates output by +`get_pregnancy_dates()` are used to exclude all inclusion events +registered between 40 weeks before and 12 weeks after a pregnancy +ending. After these exclusion functions have been applied, the output serves as inputs to two sets of functions: -1. the censored HbA1c and GLD data are passed to the +1. The censored HbA1c and GLD data are passed to the `join_inclusions()` function for the final step of the inclusion process. 2. the censored GLD data is passed to the @@ -338,20 +350,61 @@ inputs to two sets of functions: `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the classification of diabetes type. +### Join inclusion events + +The function `join_inclusions()` appends/row-binds the dates output from +functions the process the four types of inclusion events by `pnr`. Thus, +it takes as input the following variables output from the following +functions: + +- From `include_diabetes_diagnoses()`: + - `pnr`: identifier variable + - `dates`: dates of the first and second hospital diabetes + diagnosis +- From `include_podiatrist_services()` + - `pnr`: identifier variable + - `dates`: the dates of the first and second diabetes-specific + podiatrist record +- From `exclude_pregnancy()`: + - `pnr`: identifier variable + - `dates`: the dates of the first and second elevated HbA1c test + results (after censoring) +- From `exclude_pregnancy()`: + - `pnr`: identifier variable + - `date`: dates of all purchases of GLD + - The dates of the first and second purchase of GLD of each + individual are extracted from these and appended as two rows + to the ´dates´ variable. + +The output from the function is a `data.frame` containing two variables +(`pnr` and `dates`) and 1 to 8 rows per ´pnr´. This outputn is passed to +`get_diagnosis_date()`. + ### Get diagnosis date -The function `get_diagnosis_date()` combines the outputs from the -inclusion and exclusion functions to get the final diagnosis date. -Initially, it drops the first inclusion and exclusion events from the -function outputs with the helper `drop_first_event()`, so that only -those with two or more events are kept. This is then used to assign an -initial diagnosis according to OSDC. Then, all the outputs are joined -together with `join_diagnosis_dates()`. +The function `get_diagnosis_date()` takes the output from +`join_inclusions()` and defines the final diagnosis date based on all +the inclusion event types. + +First, the inputs are sorted by `dates` within each level of `pnr`, then +the earliest value of `dates` is dropped function outputs with the +helper `drop_first_event()`, so that only those with two or more events +are included. The date of inclusion, `raw_inclusion_date`, is then +defined as the earliest value of `dates`in the remaining rows for each +individual (effectively the date of the second recorded inclusion +event). A third variable, `stable_inclusion_date`, is defined based on +`raw_inclusion_date` (if `raw_inclusion_date` \< stable inclusion +threshold (default "31-12-1997"), then `stable_inclusion_date` is set to +`NA`, else it is set to`raw_inclusion_date`). This variable serves to +limit the included cohort to only individuals with valid date of +inclusion (and thereby valid age at inclusion & duration of diabetes). + +`get_diagnosis_date()` outputs a `data.frame` with the following +variables needed later in the classification part of the function flow: -Finally, the dates outside of the data coverage period are dropped with -`drop_diagnosis_dates_outside_coverage()` to end with a final diagnosis -date. For details on this censoring based on periods with insufficient -data coverage, see the `vignette("design")`. +- `pnr`: identifier variable +- `raw_inclusion_date`: date of inclusion +- `stable_inclusion_date`: date of inclusion of valid incident cases ### Classifying the diabetes type @@ -360,6 +413,30 @@ extracted diabetes population as having either T1D or T2D. As described in the `vignette("design")`, individuals not classified as T1D cases are classified as T2D cases. +As the diabetes type classification incorporates an evaluation of the +time from diagnosis/inclusion to first subsequent purchase of insulin, +the `get_diabetes_type` function has to take inputs on the date of +diagnosis and all purchases of GLD drugs. In addition, several helper +functions are applied to extract additional information from the +censored GLD data to use for classification of diabetes type: + +``` +`get_only_insulin_purchases()`, +`get_insulin_purchases_within_180_days()`, and +`get_insulin_is_two_thirds_of_gld_doses()` +``` + +Thus, the function takes the following inputs/variables: + +- From `get_diagnosis_date()` + - `pnr` + - `raw_inclusion_date` +- From `exclude_pregnancy()`: Information on historic GLD data: + - `pnr`: identifier variable + - `date`: dates of all purchases of GLD. + + + The output is a `data.frame` that includes one row per individual in the diabetes population: one column with their PNR, two columns with inclusion dates (one "stable" date and one "raw" date - see the From cbee21f035c8b511a58ba568c5ba15b8a320008e Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Tue, 17 Dec 2024 20:56:30 +0100 Subject: [PATCH 22/28] Removed helper function for dropping first event as it seemed a bit excessive. Added a bit more detail to the raw_inclusion_date vs stable_inclusion_date. --- vignettes/function-flow.Rmd | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 1d8ceec..5d26e6e 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -387,17 +387,18 @@ The function `get_diagnosis_date()` takes the output from the inclusion event types. First, the inputs are sorted by `dates` within each level of `pnr`, then -the earliest value of `dates` is dropped function outputs with the -helper `drop_first_event()`, so that only those with two or more events -are included. The date of inclusion, `raw_inclusion_date`, is then -defined as the earliest value of `dates`in the remaining rows for each -individual (effectively the date of the second recorded inclusion -event). A third variable, `stable_inclusion_date`, is defined based on -`raw_inclusion_date` (if `raw_inclusion_date` \< stable inclusion -threshold (default "31-12-1997"), then `stable_inclusion_date` is set to -`NA`, else it is set to`raw_inclusion_date`). This variable serves to -limit the included cohort to only individuals with valid date of -inclusion (and thereby valid age at inclusion & duration of diabetes). +the earliest value of `dates` is dropped, so that only those with two or +more events are included. The date of inclusion, `raw_inclusion_date`, +is then defined as the earliest value of `dates`in the remaining rows +for each individual (effectively the date of the second recorded +inclusion event). A third variable, `stable_inclusion_date`, is defined +based on `raw_inclusion_date` (if `raw_inclusion_date` \< stable +inclusion threshold (one year after medication data starts to contribute +to inclusions. Default "31-12-1997"), then `stable_inclusion_date` is +set to `NA`, else it is set to`raw_inclusion_date`). This variable +serves to limit the included cohort to only individuals with valid date +of inclusion (and thereby valid age at inclusion & duration of +diabetes). `get_diagnosis_date()` outputs a `data.frame` with the following variables needed later in the classification part of the function flow: From 3820459d82e41cc656ed1471df964644364bf0ff Mon Sep 17 00:00:00 2001 From: Anders Aasted Isaksen Date: Tue, 17 Dec 2024 23:56:57 +0100 Subject: [PATCH 23/28] Added function flow description of get_diabetes_type() and its helper functions. --- vignettes/function-flow.Rmd | 126 ++++++++++++++++++++++++++++-------- 1 file changed, 98 insertions(+), 28 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index 5d26e6e..c2d1780 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -377,7 +377,7 @@ functions: to the ´dates´ variable. The output from the function is a `data.frame` containing two variables -(`pnr` and `dates`) and 1 to 8 rows per ´pnr´. This outputn is passed to +(`pnr` and `dates`) and 1 to 8 rows per ´pnr´. This output is passed to `get_diagnosis_date()`. ### Get diagnosis date @@ -401,12 +401,15 @@ of inclusion (and thereby valid age at inclusion & duration of diabetes). `get_diagnosis_date()` outputs a `data.frame` with the following -variables needed later in the classification part of the function flow: +variables: - `pnr`: identifier variable - `raw_inclusion_date`: date of inclusion - `stable_inclusion_date`: date of inclusion of valid incident cases +This output is passed to the `get_diabetes_type()` function and used to +classify the diabetes type as described below. + ### Classifying the diabetes type The next step of the OSDC algorithm classifies individuals from the @@ -416,38 +419,104 @@ classified as T2D cases. As the diabetes type classification incorporates an evaluation of the time from diagnosis/inclusion to first subsequent purchase of insulin, -the `get_diabetes_type` function has to take inputs on the date of -diagnosis and all purchases of GLD drugs. In addition, several helper -functions are applied to extract additional information from the -censored GLD data to use for classification of diabetes type: - -``` -`get_only_insulin_purchases()`, -`get_insulin_purchases_within_180_days()`, and -`get_insulin_is_two_thirds_of_gld_doses()` -``` +the `get_diabetes_type()` function has to take the date of diagnosis and +all purchases of GLD drugs (after censoring) as inputs. In addition, +information on diabetes type-specific primary diagnoses from hospitals +is also a requirement. -Thus, the function takes the following inputs/variables: +Thus, the function takes the following inputs from +`get_diagnosis_date()`, `exclude_pregnancy()`, and +`include_diabetes_diagnoses()`: -- From `get_diagnosis_date()` +- From `get_diagnosis_date()`: Information on date of diagnosis of + diabetes - `pnr` - `raw_inclusion_date` -- From `exclude_pregnancy()`: Information on historic GLD data: + - `stable_inclusion_date` +- From `exclude_pregnancy()`: Information on historic GLD purchases: - `pnr`: identifier variable - `date`: dates of all purchases of GLD. - - - -The output is a `data.frame` that includes one row per individual in the -diabetes population: one column with their PNR, two columns with -inclusion dates (one "stable" date and one "raw" date - see the -`vignette("design")` for an elaboration on what that entails), and one -column with the diabetes type. - - - -![Flow of functions for classifying diabetes status using the `osdc` -package.](images/function-flow-classification.png) + - `atc`: type of drug + - `contained_doses`: defined daily doses of drug contained in + purchase +- From `include_diabetes_diagnoses()`: Information on diabetes + type-specific primary diagnoses from hospitals: + - `pnr`: identifier variable + - `n_t1d_endocrinology`: number of type 1 diabetes-specific + primary diagnosis codes from endocrinological departments + - `n_t2d_endocrinology`: number of type 2 diabetes-specific + primary diagnosis codes from endocrinological departments + - `n_t1d_medical`: number of type 1 diabetes-specific primary + diagnosis codes from medical departments + - `n_t2d_medical`: number of type 2 diabetes-specific primary + diagnosis codes from medical departments + +For each `pnr` number, several helper functions are applied to these +inputs to extract additional information from the censored GLD data and +diagnoses to use for classification of diabetes type. All of these +return a single value (`TRUE`, otherwise `FALSE`) for each individual: + +- `get_only_insulin_purchases()`: + - Inputs passed from `exclude_pregnancy()`: + - `atc` + - Outputs: + - only_insulin_purchases = `TRUE` if no purchases with `atc` + starting with "A10A" are present +- `get_insulin_purchases_within_180_days()` + - Inputs passed from `exclude_pregnancy()`: + - `date` & `atc` + - Inputs passed from `get_diagnosis_date()`: + - `raw_inclusion_date` + - Outputs: `TRUE` If any purchases with `atc` starting with "A10A" + have a `date` between 0 and 180 days higher than + `raw_inclusion_date` +- `get_insulin_is_two_thirds_of_gld_doses()` + - Inputs passed from `exclude_pregnancy()`: + - `contained_doses` & `atc` + - Outputs: `TRUE` If the sum of `contained_doses` of rows of `atc` + starting with "A10A" (except "A10AE5") is at least twice the sum + of `contained_doses` of rows of `atc` starting with "A10B" or + "A10AE5" +- `get_any_t1d_primary_diagnoses()`: + - Inputs passed from `include_diabetes_diagnoses()`: + - `n_t1d_endocrinology` & `n_t1d_medical` + - Outputs: `TRUE` if the combined sum of the inputs is 1 or above. +- `get_type_diagnoses_from_endocrinology()`: + - Inputs passed from `include_diabetes_diagnoses()`: + - `n_t1d_endocrinology`, `n_t2d_endocrinology` + - Outputs: `type_diagnoses_from_endocrinology` = `TRUE` if the + combined sum of the inputs is 1 or above +- `get_type_diagnosis_majority()`: + - Inputs passed from `include_diabetes_diagnoses()`: + - `n_t1d_endocrinology`, `n_t2d_endocrinology`, + `n_t1d_medical` & `n_t2d_medical` + - Inputs passed from `get_type_diagnoses_from_endocrinology()`: + - `type_diagnoses_from_endocrinology` + - Outputs: `TRUE` if `type_diagnoses_from_endocrinology` == `TRUE` + and `n_t1d_endocrinology` is above `n_t2d_endocrinology`. Also + `TRUE` if `type_diagnoses_from_endocrinology` = `FALSE` and + `n_t1d_medical` is above `n_t2d_medical` + +`get_diabetes_type()` evaluates all the outputs from the helper +functions to define diabetes type for each individual. Diabetes type is +classified as "T1D" if: + +- `only_insulin_purchases` == `TRUE` & `any_t1d_primary_diagnoses` == + `TRUE` +- Or `only_insulin_purchases` == `FALSE` & `any_t1d_primary_diagnoses` + == `TRUE` & `type_diagnosis_majority` == `TRUE` & + `insulin_is_two_thirds_of_gld_doses` == `TRUE` & + `insulin_purchases_within_180_days` == `TRUE` + +`get_diabetes_type()` returns a `data.frame` with one row per `pnr` +number and four columns: `pnr`, `stable_inclusion_date`, +`raw_inclusion_date` & `diabetes_type`. This is the final product of the +OSDC algorithm. See the `vignette("design")` for an more detail on the +two inclusion dates and their intended use-cases. + + + + #### Type 1 classification @@ -525,3 +594,4 @@ is within a time-period of insufficient data coverage, contains the inclusion date of this individual. + From d294071ba24f2c3a92c2348c8942327cdfb9a73c Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Wed, 18 Dec 2024 17:50:20 +0100 Subject: [PATCH 24/28] docs: :pencil2: small edits from review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Signe Kirk Brødbæk <40836345+signekb@users.noreply.github.com> --- vignettes/function-flow.Rmd | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index c2d1780..4240f1d 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -99,7 +99,7 @@ library(osdc) #### Hospital diagnoses -**Joining LPR2 and LPR3 data** +#### Joining LPR2 and LPR3 data The helper functions `join_lpr2()` and `join_lpr3()` join records of diagnoses to administrative information in LPR2-formatted and @@ -114,7 +114,7 @@ following variables: - `pnr`: identifier variable - `date`: date of the recorded diagnosis (renamed from `d_inddto`) - `specialty`: department specialty (renamed from `c_spec`) -- `diagnosis`: diagnosis code (renamed from `c_diag`) +- `diagnosis_code`: diagnosis code (renamed from `c_diag`) - `diagnosis_type`: diagnosis type (renamed from `c_diagtype`) `join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to @@ -126,7 +126,7 @@ following variables: - `pnr`: identifier variable (renamed from `cpr`) - `date`: date of the recorded diagnosis (renamed from `dato_start`) - `specialty`: department specialty (renamed from `hovedspeciale_ans`) -- `diagnosis`: diagnosis code (renamed from `diagnosekode`) +- `diagnosis_code`: diagnosis code (renamed from `diagnosekode`) - `diagnosis_type`: diagnosis type (renamed from `diagnosetype`) - `diagnosis_retracted`: if the diagnosis was later retracted (renamed from `senere_afkraeftet`) @@ -135,7 +135,7 @@ These outputs are passed to `include_diabetes_diagnoses()` (and to `get_pregnancy_dates()`, see exclusion events) for further processing below. -**Processing of diabetes diagnoses** +#### Processing of diabetes diagnoses The function `include_diabetes_diagnoses()` uses the hospital contacts from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for @@ -207,7 +207,7 @@ individual): This output is passed to the `join_inclusions()` function, where the `dates` variable is used for the final step of the inclusion process. -The variables of counts of diabetes type-specific primary diagnoses are +The variables of counts of diabetes type-specific primary diagnoses (the four columns prefixed `n_` above) are carried over for the subsequent classification of diabetes type, initially as inputs to the `get_t1d_primary_diagnosis()` and `get_majority_of_t1d_diagnoses()` functions. @@ -232,7 +232,7 @@ the same date. two rows for each individual, containing the following variables: - `pnr`: identifier variable -- `dates`: the dates of the first and second diabetes-specific +- `date`: the dates of the first and second diabetes-specific podiatrist record The output is passed to the `join_inclusions()` function for the final @@ -284,7 +284,7 @@ data is available to use for censoring, the extraction window can be extended). This function outputs a long `data.frame` (since all dates of purchases -must be kept for later use in classifyin diabetes type) with the +must be kept for later use in classifying diabetes type) with the following variables needed later in the classification part of the function flow: @@ -318,7 +318,7 @@ This function only performs a filtering operation, and output retains the same structure and variables as the input passed from `include_gld_purchases()`. After these exclusions are made, the output is passed to `exclude_pregnancy()` for further censoring, described -below: +below. #### HbA1c tests and GLD purchases during pregnancy @@ -330,8 +330,8 @@ these may be due to gestational diabetes, rather than type 1 or type 2 diabetes. Internally, this relies on the function `get_pregnancy_dates()` that -uses diagnoses registered in the National Patient Register to extract -the dates of all recorded pregnancy endings (live births or +uses diagnoses registered in LPR2 and LPR3 to extract +the dates of all recorded pregnancy endings (live births and miscarriages). These are identified by `diagnosis` values beginning with "DO0[0-6]", "DO8[0-4]" or "DZ3[37]". The dates output by `get_pregnancy_dates()` are used to exclude all inclusion events From 7bd69cbd449b6c88a7b1786cf3c14f3a4b86651e Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Wed, 18 Dec 2024 22:41:47 +0100 Subject: [PATCH 25/28] docs: added lpr_diag algorithm logic to csv --- data-raw/algorithm.csv | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/data-raw/algorithm.csv b/data-raw/algorithm.csv index dd9060e..28425e9 100644 --- a/data-raw/algorithm.csv +++ b/data-raw/algorithm.csv @@ -1,4 +1,6 @@ -name,logic -hba1c,(analysiscode == 'NPU27300' AND value >= 48) OR (analysiscode == 'NPU03835' AND value >= 6.5) -gld,atc =~ '^A10' - +register,name,title,logic,comments +lab_forsker,hba1c,HbA1c inclusion,(analysiscode == 'NPU27300' AND value >= 48) OR (analysiscode == 'NPU03835' AND value >= 6.5),Is the IFCC units for NPU27300 and DCCT units for NPU03835 +lmdb,gld,Glucose-lowering drug inclusion,atc =~ '^A10' & !(atc =~ '^(A10BJ|A10BK01|A10BK03)'),Do not keep GLP-RAs or dapagliflozin/empagliflozin drugs +lpr_diag,lpr2,LPR2 diabetes diagnoses codes,c_diag =~ '^(DO0[0-6]|DO8[0-4]|DZ3[37]|DE1[0-4]|249|250)' AND (c_diagtype == 'A' OR c_diagtype == 'B'),'A' c_diagtype means primary diagnosis. +lpr_diag,lpr2_is_t1d,LPR2 diagnoses codes for T1D,c_diag =~ '^(DE10|249)', +lpr_diag,lpr2_is_t2d,LPR2 diagnoses codes for T2D,c_diag =~ '^(DE11|250)', From 73980ec6de645ab4e7dfb9638340b96986b2fe28 Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Wed, 18 Dec 2024 22:42:21 +0100 Subject: [PATCH 26/28] docs: :memo: updated roxygen docs based on text from Anders --- R/include-gld-purchases.R | 35 ++++++++++++++++++++++++++++------- R/include-hba1c.R | 25 +++++++++++++++++++------ 2 files changed, 47 insertions(+), 13 deletions(-) diff --git a/R/include-gld-purchases.R b/R/include-gld-purchases.R index e37b19d..9516884 100644 --- a/R/include-gld-purchases.R +++ b/R/include-gld-purchases.R @@ -1,9 +1,30 @@ #' Include only those who have a purchase of a glucose lowering drug (GLD). #' -#' See [algorithm] for the logic used to filter these patients. +#' But don't include glucose-lowering drugs that may be used for other +#' conditions than diabetes like GLP-RAs or dapagliflozin/empagliflozin drugs. +#' Since the diagnosis code data on pregnancies (see below) is insufficient to +#' perform censoring prior to 1997, `include_gld_purchases()` only extracts +#' dates from 1997 onward by default (if Medical Birth Register data is +#' available to use for censoring, the extraction window can be extended). +#' +#' @param lmdb The `lmdb` register. +#' +#' @return The same type as the input data, default as a [tibble::tibble()], in +#' a long format with all dates of purchases kept and the following variables: +#' +#' - `pnr`: Personal identification variable. +#' - `date`: The dates of all purchases of GLD. +#' - `atc`: The ATC code for the type of drug. +#' - `contained_doses`: The amount of doses purchased, in number of defined daily +#' doses (DDD). +#' - `indication_code`: The indication code of the prescription (renamed from +#' `indo`). +#' +#' These events are then passed to a chain of exclusion functions: +#' `exclude_potential_pcos()` and `exclude_pregnancy()`. #' -#' @return The same type as the input data, default as a [tibble::tibble()]. #' @keywords internal +#' @inherit algorithm seealso #' #' @examples #' \dontrun{ @@ -18,16 +39,16 @@ include_gld_purchases <- function(lmdb) { column_names_to_lower() |> # Use !! to inject the expression into filter. dplyr::filter(!!criteria) |> + # `volume` is the doses contained in the purchased package and `apk` is the + # number of packages purchased + dplyr::mutate(contained_doses = .data$volume * .data$apk) |> # Keep only the columns we need. dplyr::select( "pnr", # Change to date to work with later functions. date = "eksd", "atc", - "volume", - "apk", - "indo", - "name", - "vnr" + "contained_doses", + "indication_code" ) } diff --git a/R/include-hba1c.R b/R/include-hba1c.R index 802acc1..0ebfb64 100644 --- a/R/include-hba1c.R +++ b/R/include-hba1c.R @@ -1,24 +1,37 @@ #' Include only those with HbA1c in the required range. #' #' In the `lab_forsker` register, NPU27300 is HbA1c in the modern units (IFCC) -#' while NPU03835 is HbA1c in old units (DCCT). +#' while NPU03835 is HbA1c in old units (DCCT). Multiple elevated results on the +#' same day within each individual are deduplicated, to account for the same +#' test result often being reported twice (one for IFCC, one for DCCT units). #' -#' @param data The `lab_forsker` register. +#' The output is passed to the `exclude_pregnancy()` function for +#' filtering of elevated results due to potential gestational diabetes (see +#' below). +#' +#' @param lab_forsker The `lab_forsker` register. #' #' @return An object of the same input type, default as a [tibble::tibble()], -#' with two columns: `pnr` and `included_hba1c`. +#' with three columns: +#' +#' - `pnr`: Personal identification variable. +#' - `dates`: The dates of all elevated HbA1c test results. +#' - `included_hba1c`: A logical variable indicating that the HbA1c test +#' was included. Used as an indicator and reminder in other internal +#' functions. +#' #' @keywords internal #' #' @examples #' \dontrun{ #' register_data$lab_forsker |> include_hba1c() #' } -include_hba1c <- function(data) { - verify_required_variables(data, "lab_forsker") +include_hba1c <- function(lab_forsker) { + verify_required_variables(lab_forsker, "lab_forsker") criteria <- get_algorithm_logic("hba1c") |> # To convert the string into an R expression. rlang::parse_expr() - data |> + lab_forsker |> column_names_to_lower() |> # Use !! to inject the expression into filter. dplyr::filter(!!criteria) |> From 94f1f1f17f85c6be13c5b6b49deed5b62e93712e Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Wed, 18 Dec 2024 22:43:16 +0100 Subject: [PATCH 27/28] docs: :memo: add roxygen docs to algorithm data object --- R/osdc-package.R | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/R/osdc-package.R b/R/osdc-package.R index 6efccea..96a8e76 100644 --- a/R/osdc-package.R +++ b/R/osdc-package.R @@ -29,7 +29,12 @@ utils::globalVariables(".data") #' Is a [tibble::tibble()] with two columns: #' #' \describe{ +#' \item{register}{Optional. The register used for this criteria.} #' \item{name}{The inclusion or exclusion criteria name.} +#' \item{title}{The title to use when displaying the algorithmic logic in tables.} #' \item{logic}{The logic for the criteria.} +#' \item{comments}{Some additional comments on the criteria.} #' } +#' @seealso See the `vignette("alogrithm")` and [algorithm] for the logic used +#' to filter these patients. "algorithm" From cdfcea765d0cf5f6c4a50d2788a820853321bc0d Mon Sep 17 00:00:00 2001 From: "Luke W. Johnston" Date: Wed, 18 Dec 2024 22:43:59 +0100 Subject: [PATCH 28/28] docs: :construction: began moving algorithm logic into separate file and created pseudocode --- vignettes/algorithm.Rmd | 88 +++++++ vignettes/function-flow.Rmd | 469 +++++++++++++++++++----------------- 2 files changed, 333 insertions(+), 224 deletions(-) create mode 100644 vignettes/algorithm.Rmd diff --git a/vignettes/algorithm.Rmd b/vignettes/algorithm.Rmd new file mode 100644 index 0000000..92bb2d4 --- /dev/null +++ b/vignettes/algorithm.Rmd @@ -0,0 +1,88 @@ +--- +title: "Algorithm" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Algorithm} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r setup} +library(osdc) +library(tidyverse) +``` + +## `lpr_diag` + +```{r, echo=FALSE} +algorithm |> + filter(str_detect(register, "lpr_diag") |> + knitr::kable() +``` + +## `lpr_adm` + +- `c_spec` (hospital department) categorize as "endocrinology" if it + equals 8 or as "other medical" if it is \< 8 or equals either 9 to + 30. + +## `diagnoser` + +- `diagnosekode` starts with "DO0[0-6]", "DO8[0-4]", "DZ3[37]" or + "DE1[0-4]". + - Is T1D if `diagnosekode` starts with "DE10". + - Is T2D if `diagnosekode` starts with "DE11". +- `diagnosetype` is equal to either "A" or "B". + - Is a primary diagnosis if it equals "A". +- `senere_afkraeftet` (if the diagnosis was later retracted) is equal + to "Nej". + +## `kontakter` + +- `hovedspeciale_ans` (hospital department) is categorized as + "endocrinology" if it equals "medicinsk endokrinologi" or as "other + medical" if it equals any of "Blandet medicin og kirurgi", "Intern + medicin", "Geriatri", "Hepatologi", "Hæmatologi", + "Infektionsmedicin", "Kardiologi", "Medicinsk allergologi", + "Medicinsk gastroenterologi", "Medicinsk lungesygdomme", + "Nefrologi", "Reumatologi", "Palliativ medicin", "Akut medicin", + "Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or + "Tropemedicin". + +## `lab_forsker` + +```{r, echo=FALSE} +algorithm |> + filter(name == "hba1c") |> + knitr::kable() +``` + +## `ssyi` and `sssy` + +- `speciale` starts with "54". + - Alternatively, use `spec2` if available. +- `barnmark` (services provided to a child of the individual) is not + equal to 0. + +## `lmdb` + +```{r, echo=FALSE} +algorithm |> + filter(name == "gld") |> + knitr::kable() +``` + +## `bef` and only GLD (via `lmdb`) + +To remove those with potential polycystic ovary syndrome: + +- `atc` starts with "A10BA02" and `koen` is equal to 2 (woman) and + (`date` minus `foed_dato` (birth date) is less than 40 or + `indication_code` equals one of "0000092", "0000276" or "0000781") diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd index c9d895f..b22de3d 100644 --- a/vignettes/function-flow.Rmd +++ b/vignettes/function-flow.Rmd @@ -90,235 +90,257 @@ exclusion events, respectively). Uncoloured boxes are helper functions that get or extract a condition or joins data or function outputs.](images/function-flow-population.svg) -### Inclusion events +## Inclusion events ```{r, include=FALSE} library(dplyr) library(osdc) ``` -#### Hospital diagnoses - -#### Joining LPR2 and LPR3 data - -The helper functions `join_lpr2()` and `join_lpr3()` join records of -diagnoses to administrative information in LPR2-formatted and -LPR3-formatted data, respectively. - -`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the -necessary diagnoses (`c_diag` starting with "DO0[0-6]", "DO8[0-4]", -"DZ3[37]", "DE1[0-4]", "249", or "250"), joins the required information -by record number (`recnum`), and outputs a `data.frame` with the -following variables: - -- `pnr`: identifier variable -- `date`: date of the recorded diagnosis (renamed from `d_inddto`) -- `specialty`: department specialty (renamed from `c_spec`) -- `diagnosis_code`: diagnosis code (renamed from `c_diag`) -- `diagnosis_type`: diagnosis type (renamed from `c_diagtype`) - -`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to -the necessary diagnoses (`diagnosekode` starting with "DO0[0-6]", -"DO8[0-4]", "DZ3[37]" or "DE1[0-4]"), joins the required information by -record number (`dw_ek_kontakt`), and outputs a `data.frame` with the -following variables: - -- `pnr`: identifier variable (renamed from `cpr`) -- `date`: date of the recorded diagnosis (renamed from `dato_start`) -- `specialty`: department specialty (renamed from `hovedspeciale_ans`) -- `diagnosis_code`: diagnosis code (renamed from `diagnosekode`) -- `diagnosis_type`: diagnosis type (renamed from `diagnosetype`) -- `diagnosis_retracted`: if the diagnosis was later retracted (renamed - from `senere_afkraeftet`) - -These outputs are passed to `include_diabetes_diagnoses()` (and to -`get_pregnancy_dates()`, see exclusion events) for further processing -below. - -#### Processing of diabetes diagnoses - -The function `include_diabetes_diagnoses()` uses the hospital contacts -from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for -inclusion, as well as additional information needed to classify diabetes -type. Diabetes diagnoses from both ICD-8 and ICD-10 are included. - -The function takes the outputs of `join_lpr2()` and `join_lpr3()` as -inputs and processes each input separately to generate the following -internal variables: - -- From `join_lpr2`: - - `pnr`: identifier variable - - `date`: dates of all included diabetes diagnoses: - - registered as primary (A) or secondary (B) diagnoses, regardless - of type or department: - - Keep rows where `diagnosis` starts with "DE1[0-4]", "249" or - "250", and `diagnosis_type` is either "A" or "B" - - `is_primary`: Define whether the diagnosis was a primary - diagnosis (`diagnosis_type` == "A") - - `is_t1d`: Define whether the diagnosis was T1D-specific - (`diagnosis` starts with "DE10" or "249") - - `is_t2d`: Define whether the diagnosis was T2D-specific - (`diagnosis` starts with "DE11" or "250") - - `department`: Define whether the diagnosis was made made by an - endocrinological (if `specialty` == 8 then `department` == - "endocrinology") or other medical department (if `specialty` \< - 8 or 9-30 then `department` == "other medical") -- From `join_lpr3()`: - - `pnr`: identifier variable - - `date`: dates of all included diabetes diagnoses: - - registered as primary (A) or secondary (B) diagnoses, regardless - of type or department, but exclude retracted diagnoses: - - Keep rows where `diagnosis` starts with "DE1[0-4]", - `diagnosis_type` is either "A" or "B" and - `diagnosis_retracted` == "Nej" - - `is_primary`: Define whether the diagnosis was a primary - diagnosis (`diagnosis_type` == "A") - - `is_t1d`: Define whether the diagnosis was T1D-specific - (`diagnosis` starts with "DE10") - - `is_t2d`: Define whether the diagnosis was T2D-specific - (`diagnosis` starts with "DE11") - - `department`: Define whether the diagnosis was made made by an - endocrinological department (if `specialty` == "medicinsk - endokrinologi" then `department` == "endocrinology") or other - medical department (if `specialty` is any of "Blandet medicin og - kirurgi", "Intern medicin", "Geriatri", "Hepatologi", - "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk - allergologi", "Medicinsk gastroenterologi", "Medicinsk - lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin", - "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi", - "Fysiurgi", or "Tropemedicin" then `department` == "other - medical") - -Internally, these intermediate results are combined and processed -together. And ultimately, `include_diabetes_diagnoses()` outputs a -single `data.frame` with the following variables (up to two rows per -individual): - -- `pnr`: identifier variable -- `dates`: dates of the first and second hospital diabetes diagnosis -- `n_t1d_endocrinology`: number of type 1 diabetes-specific primary - diagnosis codes from endocrinological departments -- `n_t2d_endocrinology`: number of type 2 diabetes-specific primary - diagnosis codes from endocrinological departments -- `n_t1d_medical`: number of type 1 diabetes-specific primary - diagnosis codes from medical departments -- `n_t2d_medical`: number of type 2 diabetes-specific primary - diagnosis codes from medical departments - -This output is passed to the `join_inclusions()` function, where the -`dates` variable is used for the final step of the inclusion process. -The variables of counts of diabetes type-specific primary diagnoses (the four columns prefixed `n_` above) are -carried over for the subsequent classification of diabetes type, -initially as inputs to the `get_t1d_primary_diagnosis()` and -`get_majority_of_t1d_diagnoses()` functions. - -#### Diabetes-specific podiatrist services - -The function `include_podiatrist_services()` uses `sysi` or `sssy` as -input to extract the dates of all diabetes-specific podiatrist services. - -These dates are extracted by filtering values beginning with "54" in the -`speciale` variable of the `sssy` and `sysi` registers by default -(alternatively, the function can take the `spec2` variable as input -instead, if that is the data available to the user). In addition, -services provided to a child of the individual (`barnmak` != 0) are -excluded using the `barnmak` variable. An internal helper function -`get_unique_honuge_dates()` is applied to generate a proper date -variable based on the year-week (wwyy-formatted) variable (`honuge`) -found in the raw data, and de-duplicates multiple services registered on -the same date. - -`include_podiatrist_services()` outputs a 2-column data frame with up to -two rows for each individual, containing the following variables: - -- `pnr`: identifier variable -- `date`: the dates of the first and second diabetes-specific - podiatrist record - -The output is passed to the `join_inclusions()` function for the final -step of the inclusion process. - -#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%) - -The function `include_hba1c()` uses `lab_forsker` as the input data to -extract the dates of all elevated HbA1c test results, using the -appropriate cut-offs: - -- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol -- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% . - -```{r, echo=FALSE} -algorithm |> - filter(name == "hba1c") |> - knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.") +### `join_lpr2()` + +```{r} +#' Process and join the two LPR2 registers to extract diabetes diagnoses data. +#' +#' The output is used as inputs to `include_diabetes_diagnoses()` (and to +#' `get_pregnancy_dates()`, see exclusion events). +#' +#' @param lpr_diag The LPR2 register containing diabetes diagnoses. +#' @param lpr_adm The LPR2 register containing hospital admissions. +#' +#' @return The same type as the input data, default as a [tibble::tibble()], +#' with the following columns: +#' +#' - `pnr`: The personal identification variable. +#' - `date`: The date of all the recorded diagnosis (renamed from `d_inddto`). +#' - `is_primary_diagnosis`: Whether the diagnosis was a primary diagnosis. +#' - `is_t1d`: Whether the diagnosis was T1D-specific. +#' - `is_t2d`: Whether the diagnosis was T2D-specific. +#' - `department`: Whether the diagnosis was made made by an +#' endocrinology or other medical department. +#' +#' @keywords internal +#' @inherit algorithm seealso +#' +#' @examples +#' join_lpr2( +#' lpr_diag = register_data$lpr_diag, +#' lpr_adm = register_data$lpr_adm +#' ) +join_lpr2 <- function(lpr_diag, lpr_adm) { + # Filter using the algorithm for LPR2 + lpr_diag |> + # join(lpr_adm, by = "recnum") |> + dplyr::select( + pnr, + date = d_inddto + # is_primary_diagnosis = + # is_t1d = + # is_t2d = + # department = + ) +} ``` -Multiple elevated results on the same day within each individual are -deduplicated, to account for the same test result often being reported -twice (one for IFCC, one for DCCT units). - -`include_hba1c()` outputs a 2-column data frame containing the following -variables: - -- `pnr`: identifier variable -- `dates`: the dates of all elevated HbA1c test results - -The output is passed to the `exclude_pregnancy()` function for censoring -of elevated results due to potential gestational diabetes (see below). - -#### GLD purchases - -The function `include_gld_purchases()` uses `lmdb` to extract the dates -of all GLD purchases. +### `join_lpr3()` + +```{r} +#' Process and join the two LPR3 registers to extract diabetes diagnoses data. +#' +#' The output is used as inputs to `include_diabetes_diagnoses()` (and to +#' `get_pregnancy_dates()`, see exclusion events). +#' +#' @param diagnoser The LPR3 register containing diabetes diagnoses. +#' @param kontakter The LPR3 register containing hospital contacts/admissions. +#' +#' @return The same type as the input data, default as a [tibble::tibble()], +#' with the following columns: +#' +#' - `pnr`: The personal identification variable. +#' - `date`: The date of all the recorded diagnosis (renamed from `d_inddto`). +#' - `is_primary_diagnosis`: Whether the diagnosis was a primary +#' diagnosis. +#' - `is_t1d`: Whether the diagnosis was T1D-specific +#' - `is_t2d`: Whether the diagnosis was T2D-specific. +#' - `department`: Define whether the diagnosis was made made by an +#' endocrinology department. +#' +#' @keywords internal +#' @inherit algorithm seealso +#' +#' @examples +#' join_lpr3( +#' diagnoser = register_data$diagnoser, +#' kontakter = register_data$kontakter +#' ) +join_lpr3 <- function(diagnoser, kontakter) { + # Filter using the algorithm for LPR3 + diagnoser |> + # join(kontakter, by = "dw_ek_kontakt") |> + dplyr::select( + "pnr" = "cpr", + "date" = "dato_start" + # is_primary_diagnosis = + # is_t1d = + # is_t2d = + # department = + ) +} +``` -These dates are extracted by including all values beginning with "A10" -in the `atc` variable of the `lmdb` register, except for -glucose-lowering drugs that may be used for other conditions than -diabetes: GLP-RAs (`atc` start with "A10BJ") or -dapagliflozin/empagliflozin (`atc` = "A10BK01" or "A10BK03"). +### `include_diabetes_diagnosis()` + +```{r} +#' Include diabetes diagnoses from LPR2 and LPR3. +#' +#' Uses the hospital contacts from LPR2 and LPR3 to include all dates of diabetes +#' diagnoses to use for inclusion, as well as additional information needed to classify diabetes +#' type. Diabetes diagnoses from both ICD-8 and ICD-10 are included. +#' +#' The output is used as inputs to `join_inclusions()`. +#' This output is passed to the `join_inclusions()` function, where the +#' `dates` variable is used for the final step of the inclusion process. +#' The variables of counts of diabetes type-specific primary diagnoses (the +#' four columns prefixed `n_` above) are carried over for the subsequent +#' classification of diabetes type, initially as inputs to the +#' `get_t1d_primary_diagnosis()` and `get_majority_of_t1d_diagnoses()` +#' functions. +#' +#' @param lpr2 The output from `join_lpr2()`. +#' @param lpr3 The output from `join_lpr3()`. +#' +#' @return The same type as the input data, default as a [tibble::tibble()], +#' with the following columns and up to two rows per individual: +#' +#' - `pnr`: The personal identification variable. +#' - `dates`: The dates of the first and second hospital diabetes diagnosis. +#' - `n_t1d_endocrinology`: The number of type 1 diabetes-specific primary +#' diagnosis codes from endocrinology departments. +#' - `n_t2d_endocrinology`: The number of type 2 diabetes-specific primary +#' diagnosis codes from endocrinology departments. +#' - `n_t1d_medical`: The number of type 1 diabetes-specific primary +#' diagnosis codes from medical departments. +#' - `n_t2d_medical`: The number of type 2 diabetes-specific primary +#' diagnosis codes from medical departments. +#' +#' @keywords internal +#' @inherit algorithm seealso +#' +#' @examples +#' include_diabetes_diagnosis( +#' lpr2 = join_lpr2(register_data$lpr_diag, register_data$lpr_adm), +#' lpr3 = join_lpr3(register_data$diagnoser, register_data$kontakter) +#' ) +include_diabetes_diagnosis <- function(lpr2, lpr3) { + # Combine and process the two inputs + lpr2 |> + dplyr::full_join(lpr3, by = "pnr") |> + dplyr::select( + "pnr", + "dates" = "date" + # n_t1d_endocrinology = + # n_t2d_endocrinology = + # n_t1d_medical = + # n_t2d_medical = + ) +} +``` -Since the diagnosis code data on pregnancies (see below) is insufficient -to perform censoring prior to 1997, `include_gld_purchases()` only -extracts dates from 1997 onward by default (if Medical Birth Register -data is available to use for censoring, the extraction window can be -extended). +### `include_podiatrist_services()` + +```{r} +#' Include diabetes-specific podiatrist services. +#' +#' Uses the `sysi` or `sssy` registers as input to extract the dates of all +#' diabetes-specific podiatrist services. Removes duplicate services on the +#' same date +#' +#' The output is passed to the `join_inclusions()` function for the final +#' step of the inclusion process. +#' +#' @return The same type as the input data, default as a [tibble::tibble()], +#' with two columns and up to two rows for each individual: +#' +#' - `pnr`: identifier variable +#' - `date`: the dates of the first and second diabetes-specific +#' podiatrist record +#' +#' @keywords internal +#' @inherit algorithm seealso +#' +#' @examples +#' include_podiatrist_services(register_data$sssy, register_data$sysi) +include_podiatrist_services <- function(sssy, sysi) { + # Filter using the algorithm for podiatrist services + sssy |> + dplyr::full_join(sysi, by = dplyr::join_by(pnr, barnmak, speciale, honuge)) |> + # Filtering... + dplyr::select( + pnr, + date = tidy_honuge_dates(honuge) + ) |> + # Remove duplicate multiple services on the same date + dplyr::distinct() +} +``` -This function outputs a long `data.frame` (since all dates of purchases -must be kept for later use in classifying diabetes type) with the -following variables needed later in the classification part of the -function flow: +```{r} +#' Converts the "WWYY" date format to the ISO8601 standard date format. +#' +#' Since the original date format ("WWYY") doesn't include a day, we assume it +#' would be the first day of that week. +#' +#' @param date The date variable in the format "WWYY". +#' +#' @returns A character vector of unique dates in the format "YYYY-MM-DD". +#' @keywords internal +#' @inherit algorithm seealso +#' +#' @examples +#' wwyy_to_yyymmdd(c("0452", "5302", "3232")) +wwyy_to_yyyymmdd <- function(date) { + # Process the honuge variable to get a proper date variable + date +} +``` -- `pnr`: identifier variable -- `date`: dates of all purchases of GLD (renamed from `eksd`) -- `atc`: type of drug -- `contained_doses`: amount purchased, in number of defined daily - doses (DDD). Calculated as `volume` (doses contained in the - purchased package) times `apk` (number of packages purchased) -- `indication_code`: indication code of the prescription (renamed from - `indo`) - -These events are then passed to a chain of exclusion functions: -`exclude_potential_pcos()` and `exclude_pregnancy()` described in the -sections below. - -### Exclusion events - -#### Metformin purchases potentially for the treatment of polycystic ovary syndrome - -The function `exclude_potential_pcos()` takes the output from -`include_gld_purchases()` and `bef` (information on sex and date of -birth) as inputs and censors (filters out) all purchases of metformin in -women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` = -"woman" & age at purchase (`date`-`date_of_birth`) \< 40 years) or an -indication code suggesting the prescription was made for treatment of -polycystic ovary syndrome (`atc` = "A10BA02" & `sex` = "woman" & -`indication_code` either of "0000092", "0000276" or "0000781"). - -This function only performs a filtering operation, and output retains -the same structure and variables as the input passed from -`include_gld_purchases()`. After these exclusions are made, the output -is passed to `exclude_pregnancy()` for further censoring, described -below. +### `include_hba1c()` + +See `?include_hba1c` for more information. + +### `include_gld_purchases()` + +See `?include_gld_purchases` for more information. + +## Exclusion events + +### `exclude_potential_pcos()` + +```{r} +#' Exclude metformin purchases potentially for the treatment of polycystic ovary syndrome. +#' +#' Takes the output from `include_gld_purchases()` and `bef` (information on sex and date of birth) to do the exclusions. +#' This function only performs a filtering operation so outputs the same structure and variables as the input from `include_gld_purchases()`. +#' After these exclusions are made, the output is used by `exclude_pregnancy()`. +#' +#' @param gld_purchases The output from `include_gld_purchases()`. +#' @param bef The `bef` register. +#' +#' @return The same type as the input data, default as a [tibble::tibble()]. Also has the same columns as `include_gld_purchases()`. +#' @keywords internal +#' @inherit algorithm seealso +#' +#' @examples +#' exclude_potential_pcos( +#' gld_purchases = include_gld_purchases(register_data$lmdb), +#' bef = register_data$bef +#' ) +exclude_potential_pcos <- function(gld_purchases, bef) { + # Filter using the algorithm for potential PCOS + gld_purchases |> + dplyr::full_join(bef, by = dplyr::join_by(.data$pnr)) +} +``` #### HbA1c tests and GLD purchases during pregnancy @@ -330,13 +352,12 @@ these may be due to gestational diabetes, rather than type 1 or type 2 diabetes. Internally, this relies on the function `get_pregnancy_dates()` that -uses diagnoses registered in LPR2 and LPR3 to extract -the dates of all recorded pregnancy endings (live births and -miscarriages). These are identified by `diagnosis` values beginning with -"DO0[0-6]", "DO8[0-4]" or "DZ3[37]". The dates output by -`get_pregnancy_dates()` are used to exclude all inclusion events -registered between 40 weeks before and 12 weeks after a pregnancy -ending. +uses diagnoses registered in LPR2 and LPR3 to extract the dates of all +recorded pregnancy endings (live births and miscarriages). These are +identified by `diagnosis` values beginning with "DO0[0-6]", "DO8[0-4]" +or "DZ3[37]". The dates output by `get_pregnancy_dates()` are used to +exclude all inclusion events registered between 40 weeks before and 12 +weeks after a pregnancy ending. After these exclusion functions have been applied, the output serves as inputs to two sets of functions: