From 90c74924923c11c733d1bcc0c793d1bc03ced2c5 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Wed, 18 Sep 2024 13:16:46 +0200
Subject: [PATCH 01/28] Fleshed out and updated include_gld_purchases() flow
 documentation

---
 vignettes/function-flow.Rmd | 49 +++++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index bbc6139..73d3800 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -121,14 +121,37 @@ input to extract the dates of all diabetes-specific podiatrist services.
 
 <!-- TODO: Add details on how this filtering should be done -->
 
+AAI: By date
+
 #### GLD purchases
 
 The function `include_gld_purchases()` uses `lmdb` to extract the dates
-of all GLD purchases (from 1997 onwards).
-
-<!-- TODO: Add details on how this filtering should be done -->
-
-<!-- TODO: Add this + link to resource "For details about this, see [link]." -->
+of all GLD purchases.
+
+These dates are extracted by filtering values beginning with "A10" in
+the `atc` column of the `lmdb` register. In addition to the identifier
+variable (`pnr`) and date (`eksd`), additional information needed for
+censoring or for classification of diabetes type are also extracted: the
+type of drug (`atc`), the amount purchased (`volume` and `apk`), the
+indication code (`indo`), and its brand name or vnr-number (`name` or
+`vnr`). These events are then passed to a chain of exclusion functions:
+`exclude_wld_purchases()`, `exclude_potential_pcos()`,
+`exclude_pregnancy()` described in the sections below.
+
+After these exclusion functions have been applied, the output serves as
+inputs to two sets of functions:
+
+1.  the `get_diagnosis_date()` function for the final step of the
+    inclusion process.
+2.  the `get_only_insulin_purchases()`,
+    `get_insulin_purchases_within_180_days()`, and
+    `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
+    classification of diabetes type.
+
+Since the diagnosis code data on pregnancies is insufficient to perform
+censoring prior to 1997, `include_gld_purchases()` only extracts dates
+from 1997 onward by default (if Medical Birth Register data is available
+to use for censoring, the extraction window can be extended).
 
 ### Exclusion events
 
@@ -136,23 +159,12 @@ of all GLD purchases (from 1997 onwards).
 
 The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
 input and is used to exclude both HbA1c tests and GLD purchases during
-pregnancy.
+pregnancy, as these may be due to gestational diabetes, rather than type
+1 or type 2 diabetes.
 
 Internally, this relies on the function `get_pregnancy_dates()` that
 contains the following three helper functions:
 
--   `calculate_pregnancy_index_date_for_mc_visits_wo_end_date()` (this
-    might be removed with the inclusion of the birth register)
--   `get_pregnancy_end_dates()`: Keep maternal care visits with an end
-    date and drop visits between 40 weeks before end date and 12 weeks
-    after end date.
--   `get_maternal_care_visit_dates_without_end_date()`: Uses the output
-    from `get_pregnancy_end_dates()` which identifies maternal care
-    visits *with* end dates to derive maternal care visits *without* end
-    dates. below.
-
-<!-- TODO: What is done with the mc visits without end dates then? -->
-
 <!-- TODO: Add details on how this filtering should be done -->
 
 #### Glucose-lowering brand drugs for weight loss
@@ -284,3 +296,4 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
+

From 0504532e9464c931070306f4b15b1515bad7619d Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Wed, 18 Sep 2024 14:00:31 +0200
Subject: [PATCH 02/28] Added description of podiatrist services function flow

---
 vignettes/function-flow.Rmd | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 73d3800..e471fd6 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -119,9 +119,19 @@ This function contains two helper functions:
 The function `include_podiatrist_services()` uses `sysi` or `sssy` as
 input to extract the dates of all diabetes-specific podiatrist services.
 
-<!-- TODO: Add details on how this filtering should be done -->
-
-AAI: By date
+These dates are extracted by filtering values beginning with "54" in the
+`spec` variable of the `sssy` and `sysi` registers by default
+(alternatively, the function can take the `spec2` variable as input
+instead, if that is the data available to the user). In addition,
+services provided to a child of the individual (`barnmak` != 0) are
+excluded using the `barnmak` variable. An internal helper function
+`get_unique_honuge_dates()` is applied to generate a date variable
+(`regdate`) based on the year-week (wwyy-formatted) variable (`honuge`)
+in the raw data, and de-duplicates multiple services registered on the
+same date. Ultimately, `include_podiatrist_services()` outputs only the
+identifier variable (`pnr`) and date of the service (`regdate`) to the
+`get_diagnosis_date()` function for the final step of the inclusion
+process.
 
 #### GLD purchases
 
@@ -129,7 +139,7 @@ The function `include_gld_purchases()` uses `lmdb` to extract the dates
 of all GLD purchases.
 
 These dates are extracted by filtering values beginning with "A10" in
-the `atc` column of the `lmdb` register. In addition to the identifier
+the `atc` variable of the `lmdb` register. In addition to the identifier
 variable (`pnr`) and date (`eksd`), additional information needed for
 censoring or for classification of diabetes type are also extracted: the
 type of drug (`atc`), the amount purchased (`volume` and `apk`), the
@@ -296,4 +306,3 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
-

From 7777f367bc53d521282777e0bec41e4e8c6a0817 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Wed, 18 Sep 2024 15:32:26 +0200
Subject: [PATCH 03/28] Reformated some GLD text, added HbA1c and started on
 pregnancy dates

---
 vignettes/function-flow.Rmd | 62 ++++++++++++++++++++++++-------------
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index e471fd6..98fb6d1 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -95,9 +95,16 @@ outputs.](images/function-flow-population.png)
 #### HbA1c tests above 48 mmol/mol
 
 The function `include_hba1c()` uses `lab_forsker` as the input data to
-extract all events of tests above 48 mmol/mol.
+extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol
+(or $\geq$ 6.5% in DCCT units). To support DCCT units, the function
+converts the value of these to IFCC units internally before including
+all rows with `value` $\geq$ 48 and deduplicating multiple elevated
+results on the same day within each individual.
 
-<!-- TODO: Add details on how this filtering should be done -->
+`include_hba1c()` passes a 3-column data frame containing the identifier
+variable (`pnr`) and the dates of all elevated HbA1c test results. This
+is passed to the `exclude_pregnancy()` function for censoring of
+elevated results due to potential gestational diabetes (see below).
 
 #### Hospital diagnosis of diabetes
 
@@ -125,13 +132,16 @@ These dates are extracted by filtering values beginning with "54" in the
 instead, if that is the data available to the user). In addition,
 services provided to a child of the individual (`barnmak` != 0) are
 excluded using the `barnmak` variable. An internal helper function
-`get_unique_honuge_dates()` is applied to generate a date variable
-(`regdate`) based on the year-week (wwyy-formatted) variable (`honuge`)
-in the raw data, and de-duplicates multiple services registered on the
-same date. Ultimately, `include_podiatrist_services()` outputs only the
-identifier variable (`pnr`) and date of the service (`regdate`) to the
-`get_diagnosis_date()` function for the final step of the inclusion
-process.
+`get_unique_honuge_dates()` is applied to generate a proper date
+variable based on the year-week (wwyy-formatted) variable (`honuge`)
+found in the raw data, and de-duplicates multiple services registered on
+the same date.
+
+`include_podiatrist_services()` outputs a 3-column data frame containing
+the identifier variable (`pnr`) and the date of the two earliest records
+of diabetes-specific podiatrist services for each individual. This is
+passed to the `get_diagnosis_date()` function for the final step of the
+inclusion process.
 
 #### GLD purchases
 
@@ -139,12 +149,23 @@ The function `include_gld_purchases()` uses `lmdb` to extract the dates
 of all GLD purchases.
 
 These dates are extracted by filtering values beginning with "A10" in
-the `atc` variable of the `lmdb` register. In addition to the identifier
-variable (`pnr`) and date (`eksd`), additional information needed for
-censoring or for classification of diabetes type are also extracted: the
-type of drug (`atc`), the amount purchased (`volume` and `apk`), the
-indication code (`indo`), and its brand name or vnr-number (`name` or
-`vnr`). These events are then passed to a chain of exclusion functions:
+the `atc` variable of the `lmdb` register. Since the diagnosis code data
+on pregnancies (see below) is insufficient to perform censoring prior to
+1997, `include_gld_purchases()` only extracts dates from 1997 onward by
+default (if Medical Birth Register data is available to use for
+censoring, the extraction window can be extended).
+
+This function outputs a `data.frame` with the following variables needed
+later in the classification part of the function flow:
+
+-   identifier variable (`pnr`)
+-   date (`eksd`)
+-   type of drug (`atc`)
+-   amount purchased (`volume` and `apk`)
+-   indication code (`indo`)
+-   brand name or vnr-number (`name` or `vnr`)
+
+These events are then passed to a chain of exclusion functions:
 `exclude_wld_purchases()`, `exclude_potential_pcos()`,
 `exclude_pregnancy()` described in the sections below.
 
@@ -158,11 +179,6 @@ inputs to two sets of functions:
     `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
     classification of diabetes type.
 
-Since the diagnosis code data on pregnancies is insufficient to perform
-censoring prior to 1997, `include_gld_purchases()` only extracts dates
-from 1997 onward by default (if Medical Birth Register data is available
-to use for censoring, the extraction window can be extended).
-
 ### Exclusion events
 
 #### HbA1c tests and GLD purchases during pregnancy
@@ -173,7 +189,11 @@ pregnancy, as these may be due to gestational diabetes, rather than type
 1 or type 2 diabetes.
 
 Internally, this relies on the function `get_pregnancy_dates()` that
-contains the following three helper functions:
+uses diagnoses registered in the National Patient Register to extract
+the dates of all pregnancy ending (live births or miscarriages). These
+are identified by filtering values beginning with "DO0[0-6]", "DO8[0-4]"
+or "DZ3[37]" in the `c_diag` variable in the LPR2 data (`diagnosekode`
+in LPR3 data).
 
 <!-- TODO: Add details on how this filtering should be done -->
 

From 6382624f9702d54711d51744d5502d6a2cd0d8ce Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Thu, 19 Sep 2024 10:40:37 +0200
Subject: [PATCH 04/28] Reworded include_hba1c section

---
 vignettes/function-flow.Rmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 98fb6d1..500b4c8 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -96,10 +96,10 @@ outputs.](images/function-flow-population.png)
 
 The function `include_hba1c()` uses `lab_forsker` as the input data to
 extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol
-(or $\geq$ 6.5% in DCCT units). To support DCCT units, the function
-converts the value of these to IFCC units internally before including
-all rows with `value` $\geq$ 48 and deduplicating multiple elevated
-results on the same day within each individual.
+(IFCC units, `analysiscode` NPU27300) or $\geq$ 6.5% (DCCT units,
+`analysiscode` NPU03835). Multiple elevated results on the same day
+within each individual are deduplicated, to account for the same test
+result being reported as two rows (one for IFCC, one for DCCT units).
 
 `include_hba1c()` passes a 3-column data frame containing the identifier
 variable (`pnr`) and the dates of all elevated HbA1c test results. This

From 4fd5903e201ef11650bfb735ab47008eaaefea2b Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Thu, 19 Sep 2024 12:46:40 +0200
Subject: [PATCH 05/28] Added lpr-joins, started on describing lpr processing

---
 vignettes/function-flow.Rmd | 99 ++++++++++++++++++++++++++++++-------
 1 file changed, 82 insertions(+), 17 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 500b4c8..e685680 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -101,25 +101,83 @@ extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol
 within each individual are deduplicated, to account for the same test
 result being reported as two rows (one for IFCC, one for DCCT units).
 
-`include_hba1c()` passes a 3-column data frame containing the identifier
-variable (`pnr`) and the dates of all elevated HbA1c test results. This
-is passed to the `exclude_pregnancy()` function for censoring of
-elevated results due to potential gestational diabetes (see below).
+`include_hba1c()` outputs a 2-column data frame containing the following
+variables:
+
+-   identifier variable (`pnr`)
+-   the dates of all elevated HbA1c test results (`do_pos_hba1c`).
+
+The output is passed to the `exclude_pregnancy()` function for censoring
+of elevated results due to potential gestational diabetes (see below).
 
 #### Hospital diagnosis of diabetes
 
-The function `include_diabetes_diagnoses()` uses the hospital contacts
-from LPR2 and 3 to include all dates of diabetes diagnoses. Diabetes
-diagnoses from both ICD 8 and ICD 10 are included.
+**Joining LPR2 and LPR3 data**
 
-This function contains two helper functions:
+The helper functions `join_lpr2()` and `join_lpr3()` join records of
+diagnoses to administrative information in LPR2-formatted and
+LPR3-formatted data, respectively.
+
+`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, joins the
+required information by record number (`recnum`), and outputs a
+`data.frame` with the following variables:
+
+-   identifier variable (`pnr`)
+-   date (`d_inddto`)
+-   department specialty (`c_spec`)
+-   diagnosis code (`c_diag`)
+-   diagnosis type (`c_diagtype`)
+
+`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, joins the
+required information by record number (`dw_ek_kontakt`), and outputs a
+`data.frame` with the following variables:
 
--   `keep_diabetes_icd10()`
--   `keep_diabetes_icd8()`
+-   identifier variable (`cpr`)
+-   date (`dato_start`)
+-   department specialty (`hovedspeciale_ans`)
+-   diagnosis code (`diagnosekode`)
+-   diagnosis type (`diagnosetype`)
+-   diagnosis retracted (`senere_afkraeftet`)
 
-<!-- TODO: Add details on how this filtering should be done, e.g., diagnosis codes -->
+These outputs are passed to `include_diabetes_diagnoses()` for further
+processing, see below.
 
-<!-- TODO: Which specific ICD 8 and 10 codes are included? -->
+**Processing of diagnoses**
+
+The function `include_diabetes_diagnoses()` uses the hospital contacts
+from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
+inclusion, as well as additional information needed to classify diabetes
+type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.
+
+The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
+inputs and processes each input separately:
+
+-   LPR2-data:
+    -   Include all diabetes diagnoses, registered as primary (A) or
+        secondary (B) diagnoses, regardless of type or department:
+        `c_diag` starts with "DE1[0-4]", "249", or "250" and
+        `c_diagtype` either "A" or "B"
+    -   Define whether the diagnosis was made made by an
+        endocrinological (`c_spec` = 8) or other medical department
+        (`c_spec` \< 8 or 9-30)
+-   LPR3:
+    -   remove retracted diagnoses (LPR3)
+
+Internally, these intermediate results are joined, so
+`include_diabetes_diagnoses()` outputs a single `data.frame` with the
+following variables (one row for each individual):
+
+-   identifier variable (`pnr`)
+-   date of the first diabetes diagnosis (`do_diagnosis_1`)
+-   date of the second diabetes diagnosis (`do_diagnosis_2`)
+-   number of type 1 diabetes-specific diagnosis codes from
+    endocrinological departments (`n_t1d_endo`)
+-   number of type 2 diabetes-specific diagnosis codes from
+    endocrinological departments (`n_t2d_endo`)
+-   number of type 1 diabetes-specific diagnosis codes from medical
+    departments (`n_t1d_medical`)
+-   number of type 2 diabetes-specific diagnosis codes from medical
+    departments (`n_t2d_medical`)
 
 #### Diabetes-specific podiatrist services
 
@@ -137,11 +195,17 @@ variable based on the year-week (wwyy-formatted) variable (`honuge`)
 found in the raw data, and de-duplicates multiple services registered on
 the same date.
 
-`include_podiatrist_services()` outputs a 3-column data frame containing
-the identifier variable (`pnr`) and the date of the two earliest records
-of diabetes-specific podiatrist services for each individual. This is
-passed to the `get_diagnosis_date()` function for the final step of the
-inclusion process.
+`include_podiatrist_services()` outputs a 3-column data frame with one
+row for each individual, containing the following variables:
+
+-   identifier variable (`pnr`)
+-   the date of the first diabetes-specific podiatrist record
+    (`do_podiatrist_1`)
+-   the date of the second diabetes-specific podiatrist record
+    (`do_podiatrist_2`)
+
+The output is passed to the `get_diagnosis_date()` function for the
+final step of the inclusion process.
 
 #### GLD purchases
 
@@ -326,3 +390,4 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
+

From 29fea86708df5597fd24cd54e32cd68cd7e87943 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Thu, 19 Sep 2024 13:39:01 +0200
Subject: [PATCH 06/28] Finished LPR/diagnosis part of function flow

---
 vignettes/function-flow.Rmd | 46 +++++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index e685680..b819067 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -150,20 +150,42 @@ inclusion, as well as additional information needed to classify diabetes
 type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.
 
 The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
-inputs and processes each input separately:
+inputs and processes each input separately to generate the following
+internal variables:
 
 -   LPR2-data:
-    -   Include all diabetes diagnoses, registered as primary (A) or
-        secondary (B) diagnoses, regardless of type or department:
-        `c_diag` starts with "DE1[0-4]", "249", or "250" and
-        `c_diagtype` either "A" or "B"
-    -   Define whether the diagnosis was made made by an
-        endocrinological (`c_spec` = 8) or other medical department
+    -   `pnr`: identifier variable
+    -   `do_diagnosis`: include all diabetes diagnoses, registered as
+        primary (A) or secondary (B) diagnoses, regardless of type or
+        department: `c_diag` starts with "DE1[0-4]", "249", or "250" and
+        `c_diagtype` is either "A" or "B"
+    -   `is_primary`: Define whether the diagnosis was a primary
+        diagnosis (`c_diagtype` == "A")
+    -   `is_t1d`: Define whether the diagnosis was T1D-specific
+        (`c_diag` starts with "DE10" or "249")
+    -   `is_t2d`: Define whether the diagnosis was T2D-specific
+        (`c_diag` starts with "DE11" or "250")
+    -   `department`: Define whether the diagnosis was made made by an
+        endocrinological (`c_spec` == 8) or other medical department
         (`c_spec` \< 8 or 9-30)
 -   LPR3:
-    -   remove retracted diagnoses (LPR3)
-
-Internally, these intermediate results are joined, so
+    -   `pnr`: identifier variable
+    -   `do_diagnosis`: include all diabetes diagnoses, registered as
+        primary (A) or secondary (B) diagnoses, regardless of type or
+        department: `diagnosekode` starts with "DE1[0-4]" and
+        `diagnosetype` is either "A" or "B", but exclude retracted
+        diagnoses (`senere_afkraeftet` == "Ja")
+    -   `is_primary`: Define whether the diagnosis was a primary
+        diagnosis (`diagnosetype` == "A")
+    -   `is_t1d`: Define whether the diagnosis was T1D-specific
+        (`c_diag` starts with "DE10")
+    -   `is_t2d`: Define whether the diagnosis was T2D-specific
+        (`c_diag` starts with "DE11")
+    -   `department`: Define whether the diagnosis was made made by an
+        endocrinological (`c_spec` == 8) or other medical department
+        (`c_spec` \< 30 & != 8)
+
+These intermediate results are combined for further processing, and
 `include_diabetes_diagnoses()` outputs a single `data.frame` with the
 following variables (one row for each individual):
 
@@ -179,6 +201,10 @@ following variables (one row for each individual):
 -   number of type 2 diabetes-specific diagnosis codes from medical
     departments (`n_t2d_medical`)
 
+The output is passed to the `get_diagnosis_date()` function for the
+final step of the inclusion process and is subsequently used to classify
+diabetes type.
+
 #### Diabetes-specific podiatrist services
 
 The function `include_podiatrist_services()` uses `sysi` or `sssy` as

From f9d7661da21cdb99772946d5dc06e85e95025636 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Thu, 19 Sep 2024 13:53:09 +0200
Subject: [PATCH 07/28] fixed a new things to describe LPR3 processing

---
 vignettes/function-flow.Rmd | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index b819067..52f79da 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -178,12 +178,19 @@ internal variables:
     -   `is_primary`: Define whether the diagnosis was a primary
         diagnosis (`diagnosetype` == "A")
     -   `is_t1d`: Define whether the diagnosis was T1D-specific
-        (`c_diag` starts with "DE10")
+        (`diagnosekode` starts with "DE10")
     -   `is_t2d`: Define whether the diagnosis was T2D-specific
-        (`c_diag` starts with "DE11")
+        (`diagnosekode` starts with "DE11")
     -   `department`: Define whether the diagnosis was made made by an
-        endocrinological (`c_spec` == 8) or other medical department
-        (`c_spec` \< 30 & != 8)
+        endocrinological (`hovedspeciale_ans` == "medicinsk
+        endokrinologi") or other medical department (`hovedspeciale_ans`
+        either "Blandet medicin og kirurgi", "Intern medicin",
+        "Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin",
+        "Kardiologi", "Medicinsk allergologi", "Medicinsk
+        gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi",
+        "Reumatologi", "Palliativ medicin", "Akut medicin",
+        "Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or
+        "Tropemedicin")
 
 These intermediate results are combined for further processing, and
 `include_diabetes_diagnoses()` outputs a single `data.frame` with the

From 9a05d81815ab60889220564e4416f917b7226c15 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Thu, 19 Sep 2024 14:01:27 +0200
Subject: [PATCH 08/28] specified that only primary diagnoses go into type
 classification

---
 vignettes/function-flow.Rmd | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 52f79da..44f7c81 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -199,14 +199,14 @@ following variables (one row for each individual):
 -   identifier variable (`pnr`)
 -   date of the first diabetes diagnosis (`do_diagnosis_1`)
 -   date of the second diabetes diagnosis (`do_diagnosis_2`)
--   number of type 1 diabetes-specific diagnosis codes from
+-   number of type 1 diabetes-specific primary diagnosis codes from
     endocrinological departments (`n_t1d_endo`)
--   number of type 2 diabetes-specific diagnosis codes from
+-   number of type 2 diabetes-specific primary diagnosis codes from
     endocrinological departments (`n_t2d_endo`)
--   number of type 1 diabetes-specific diagnosis codes from medical
-    departments (`n_t1d_medical`)
--   number of type 2 diabetes-specific diagnosis codes from medical
-    departments (`n_t2d_medical`)
+-   number of type 1 diabetes-specific primary diagnosis codes from
+    medical departments (`n_t1d_medical`)
+-   number of type 2 diabetes-specific primary diagnosis codes from
+    medical departments (`n_t2d_medical`)
 
 The output is passed to the `get_diagnosis_date()` function for the
 final step of the inclusion process and is subsequently used to classify
@@ -423,4 +423,3 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
-

From f03a4dac4ba55213856e93c4d665966b072bdb10 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <67263135+Aastedet@users.noreply.github.com>
Date: Thu, 19 Sep 2024 14:44:51 +0200
Subject: [PATCH 09/28] Update vignettes/function-flow.Rmd
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Signe Kirk Brødbæk <40836345+signekb@users.noreply.github.com>
---
 vignettes/function-flow.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 44f7c81..5722926 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -245,7 +245,7 @@ final step of the inclusion process.
 The function `include_gld_purchases()` uses `lmdb` to extract the dates
 of all GLD purchases.
 
-These dates are extracted by filtering values beginning with "A10" in
+These dates are extracted by including all values beginning with "A10" in
 the `atc` variable of the `lmdb` register. Since the diagnosis code data
 on pregnancies (see below) is insufficient to perform censoring prior to
 1997, `include_gld_purchases()` only extracts dates from 1997 onward by

From bc889d4e829098ae84f1e4ffbc57f9d560f997c9 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Thu, 19 Sep 2024 15:17:56 +0200
Subject: [PATCH 10/28] switched the order of inclusion sections and mentioned
 that some of the inputs go to exclusion functions

---
 vignettes/function-flow.Rmd | 45 +++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 44f7c81..7cf35b9 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -92,25 +92,7 @@ outputs.](images/function-flow-population.png)
 
 ### Inclusion events
 
-#### HbA1c tests above 48 mmol/mol
-
-The function `include_hba1c()` uses `lab_forsker` as the input data to
-extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol
-(IFCC units, `analysiscode` NPU27300) or $\geq$ 6.5% (DCCT units,
-`analysiscode` NPU03835). Multiple elevated results on the same day
-within each individual are deduplicated, to account for the same test
-result being reported as two rows (one for IFCC, one for DCCT units).
-
-`include_hba1c()` outputs a 2-column data frame containing the following
-variables:
-
--   identifier variable (`pnr`)
--   the dates of all elevated HbA1c test results (`do_pos_hba1c`).
-
-The output is passed to the `exclude_pregnancy()` function for censoring
-of elevated results due to potential gestational diabetes (see below).
-
-#### Hospital diagnosis of diabetes
+#### Hospital diagnoses
 
 **Joining LPR2 and LPR3 data**
 
@@ -139,10 +121,11 @@ required information by record number (`dw_ek_kontakt`), and outputs a
 -   diagnosis type (`diagnosetype`)
 -   diagnosis retracted (`senere_afkraeftet`)
 
-These outputs are passed to `include_diabetes_diagnoses()` for further
-processing, see below.
+These outputs are passed to `include_diabetes_diagnoses()` (and to
+`get_pregnancy_dates()`, see exclusion events) for further processing
+below.
 
-**Processing of diagnoses**
+**Processing of diabetes diagnoses**
 
 The function `include_diabetes_diagnoses()` uses the hospital contacts
 from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
@@ -240,6 +223,24 @@ row for each individual, containing the following variables:
 The output is passed to the `get_diagnosis_date()` function for the
 final step of the inclusion process.
 
+#### HbA1c tests above 48 mmol/mol
+
+The function `include_hba1c()` uses `lab_forsker` as the input data to
+extract the dates of all elevated HbA1c test results: $\geq$ 48 mmol/mol
+(IFCC units, `analysiscode` NPU27300) or $\geq$ 6.5% (DCCT units,
+`analysiscode` NPU03835). Multiple elevated results on the same day
+within each individual are deduplicated, to account for the same test
+result being reported as two rows (one for IFCC, one for DCCT units).
+
+`include_hba1c()` outputs a 2-column data frame containing the following
+variables:
+
+-   identifier variable (`pnr`)
+-   the dates of all elevated HbA1c test results (`do_pos_hba1c`).
+
+The output is passed to the `exclude_pregnancy()` function for censoring
+of elevated results due to potential gestational diabetes (see below).
+
 #### GLD purchases
 
 The function `include_gld_purchases()` uses `lmdb` to extract the dates

From 7525b606cb6e39bcf959be35eb15ff7dbef1a336 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 20 Sep 2024 13:09:55 +0200
Subject: [PATCH 11/28] fixed spec to speciale variable name

---
 vignettes/function-flow.Rmd | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index ccaabbe..d107b7e 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -201,7 +201,7 @@ The function `include_podiatrist_services()` uses `sysi` or `sssy` as
 input to extract the dates of all diabetes-specific podiatrist services.
 
 These dates are extracted by filtering values beginning with "54" in the
-`spec` variable of the `sssy` and `sysi` registers by default
+`speciale` variable of the `sssy` and `sysi` registers by default
 (alternatively, the function can take the `spec2` variable as input
 instead, if that is the data available to the user). In addition,
 services provided to a child of the individual (`barnmak` != 0) are
@@ -246,12 +246,12 @@ of elevated results due to potential gestational diabetes (see below).
 The function `include_gld_purchases()` uses `lmdb` to extract the dates
 of all GLD purchases.
 
-These dates are extracted by including all values beginning with "A10" in
-the `atc` variable of the `lmdb` register. Since the diagnosis code data
-on pregnancies (see below) is insufficient to perform censoring prior to
-1997, `include_gld_purchases()` only extracts dates from 1997 onward by
-default (if Medical Birth Register data is available to use for
-censoring, the extraction window can be extended).
+These dates are extracted by including all values beginning with "A10"
+in the `atc` variable of the `lmdb` register. Since the diagnosis code
+data on pregnancies (see below) is insufficient to perform censoring
+prior to 1997, `include_gld_purchases()` only extracts dates from 1997
+onward by default (if Medical Birth Register data is available to use
+for censoring, the extraction window can be extended).
 
 This function outputs a `data.frame` with the following variables needed
 later in the classification part of the function flow:

From 092824e540cc2e13ad8ed81ebd44e7dfb14d0c13 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 20 Sep 2024 13:26:09 +0200
Subject: [PATCH 12/28] Removed "name" or "vnr" variables from GLD function
 flow. Autoformatting made a few changes after resolving previous merge
 conflict.

---
 vignettes/function-flow.Rmd | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 134e577..11bd3df 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -91,10 +91,12 @@ that get or extract a condition or joins data or function
 outputs.](images/function-flow-population.png)
 
 ### Inclusion events
+
 ```{r, include=FALSE}
 library(dplyr)
 library(osdc)
 ```
+
 #### Hospital diagnoses
 
 **Joining LPR2 and LPR3 data**
@@ -180,11 +182,11 @@ internal variables:
 
 These intermediate results are combined for further processing, and
 `include_diabetes_diagnoses()` outputs a single `data.frame` with the
-following variables (one row for each individual):
+following variables (up to two rows per individual):
 
 -   identifier variable (`pnr`)
--   date of the first diabetes diagnosis (`do_diagnosis_1`)
--   date of the second diabetes diagnosis (`do_diagnosis_2`)
+-   dates of the first and second hospital diabetes diagnosis
+    (`diagnosis_dates`)
 -   number of type 1 diabetes-specific primary diagnosis codes from
     endocrinological departments (`n_t1d_endo`)
 -   number of type 2 diabetes-specific primary diagnosis codes from
@@ -229,10 +231,11 @@ final step of the inclusion process.
 #### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
 
 The function `include_hba1c()` uses `lab_forsker` as the input data to
-extract the dates of all elevated HbA1c test results, using the appropriate cut-offs:
+extract the dates of all elevated HbA1c test results, using the
+appropriate cut-offs:
 
-- IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol
-- DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% .
+-   IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol
+-   DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% .
 
 ```{r, echo=FALSE}
 algorithm |> 
@@ -240,9 +243,9 @@ algorithm |>
 	knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
 ```
 
-Multiple elevated results on the same day
-within each individual are deduplicated, to account for the same test
-result often being reported twice (one for IFCC, one for DCCT units).
+Multiple elevated results on the same day within each individual are
+deduplicated, to account for the same test result often being reported
+twice (one for IFCC, one for DCCT units).
 
 `include_hba1c()` outputs a 2-column data frame containing the following
 variables:
@@ -273,7 +276,6 @@ later in the classification part of the function flow:
 -   type of drug (`atc`)
 -   amount purchased (`volume` and `apk`)
 -   indication code (`indo`)
--   brand name or vnr-number (`name` or `vnr`)
 
 These events are then passed to a chain of exclusion functions:
 `exclude_wld_purchases()`, `exclude_potential_pcos()`,
@@ -436,3 +438,4 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
+

From 20f58862389081b8f5e45291eb3a1f26955a3e6f Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 20 Sep 2024 13:37:39 +0200
Subject: [PATCH 13/28] Updates join_lpr function description to filter to
 necessary diagnoses.

---
 vignettes/function-flow.Rmd | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 11bd3df..127f6db 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -105,9 +105,10 @@ The helper functions `join_lpr2()` and `join_lpr3()` join records of
 diagnoses to administrative information in LPR2-formatted and
 LPR3-formatted data, respectively.
 
-`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, joins the
-required information by record number (`recnum`), and outputs a
-`data.frame` with the following variables:
+`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the
+necessary diagnoses (`c_diag` starting with "DO", "DZ3", "DE1[0-4]",
+"249", or "250"), joins the required information by record number
+(`recnum`), and outputs a `data.frame` with the following variables:
 
 -   identifier variable (`pnr`)
 -   date (`d_inddto`)
@@ -115,9 +116,11 @@ required information by record number (`recnum`), and outputs a
 -   diagnosis code (`c_diag`)
 -   diagnosis type (`c_diagtype`)
 
-`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, joins the
-required information by record number (`dw_ek_kontakt`), and outputs a
-`data.frame` with the following variables:
+`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to
+the necessary diagnoses (`diagnosekode` starting with "DO", "DZ3", or
+"DE1[0-4]"), joins the required information by record number
+(`dw_ek_kontakt`), and outputs a `data.frame` with the following
+variables:
 
 -   identifier variable (`cpr`)
 -   date (`dato_start`)
@@ -438,4 +441,3 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
-

From 61b5d27f69ff0ed68a871909fce0801e5e4bfaf1 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 20 Sep 2024 13:42:29 +0200
Subject: [PATCH 14/28] Removed section on weightloss drugs, since we're no
 longer including drugs with a dual-use for weightloss.

---
 vignettes/function-flow.Rmd | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 127f6db..5f2fdcf 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -312,13 +312,6 @@ in LPR3 data).
 
 <!-- TODO: Add details on how this filtering should be done -->
 
-#### Glucose-lowering brand drugs for weight loss
-
-The function `exclude_wld_purchases()` uses lmdb as input and excludes
-the brand drugs Saxenda and Wegovy.
-
-<!-- TODO: Add details on how this filtering should be done -->
-
 #### Metformin purchases for women below age 40
 
 The function `exclude_potential_pcos()` as input to exclude all

From 7b9738d7443a85f5e7efce80b5af46e37e740ed9 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <67263135+Aastedet@users.noreply.github.com>
Date: Fri, 20 Sep 2024 13:58:14 +0200
Subject: [PATCH 15/28] Update vignettes/function-flow.Rmd

Co-authored-by: Luke W. Johnston <lwjohnst86@users.noreply.github.com>
---
 vignettes/function-flow.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 5f2fdcf..f50c4c8 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -111,7 +111,7 @@ necessary diagnoses (`c_diag` starting with "DO", "DZ3", "DE1[0-4]",
 (`recnum`), and outputs a `data.frame` with the following variables:
 
 -   identifier variable (`pnr`)
--   date (`d_inddto`)
+-   date (originally `d_inddto`, renamed to `date`)
 -   department specialty (`c_spec`)
 -   diagnosis code (`c_diag`)
 -   diagnosis type (`c_diagtype`)

From 3a95d4f2d61db75de1da62a19e10113916641d87 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 20 Sep 2024 13:58:49 +0200
Subject: [PATCH 16/28] Added description of exclude_potential_pcos()

---
 vignettes/function-flow.Rmd | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 5f2fdcf..550561c 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -296,6 +296,20 @@ inputs to two sets of functions:
 
 ### Exclusion events
 
+#### Metformin purchases potentially for the treatment of polycystic ovary syndrome
+
+The function `exclude_potential_pcos()` takes the output from
+`include_gld_purchases()` and `bef` (information on sex and date of
+birth) as inputs and censors (filters out) all purchases of metformin in
+women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` =
+"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an
+indication code suggesting treatment of polycystic ovary syndrome (`atc`
+= "A10BA02" & `sex` = "woman" & `indication_code` either "0000092",
+"0000276", "0000781").
+
+After these exclusions are made, the output is passed to
+`exclude_pregnancy()` for further censoring, described below:
+
 #### HbA1c tests and GLD purchases during pregnancy
 
 The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
@@ -312,19 +326,6 @@ in LPR3 data).
 
 <!-- TODO: Add details on how this filtering should be done -->
 
-#### Metformin purchases for women below age 40
-
-The function `exclude_potential_pcos()` as input to exclude all
-purchases of metformin by women below age 40 (i.e., \<= 39 years old) at
-the date of purchase. It relies on `bef` as input.
-
-This function contains two helper functions:
-
--   `keep_women()`
--   `drop_age_40_below()`
-
-<!-- TODO: Add details on how this filtering should be done -->
-
 ### Get diagnosis date
 
 The function `get_diagnosis_date()` combines the outputs from the

From fe257a64d79068d906262d0b1d9ac9ae1fa49c6e Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 20 Sep 2024 14:17:27 +0200
Subject: [PATCH 17/28] Renamed some variables.

---
 vignettes/function-flow.Rmd | 42 ++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index e6bd2c4..0eb7731 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -122,8 +122,8 @@ the necessary diagnoses (`diagnosekode` starting with "DO", "DZ3", or
 (`dw_ek_kontakt`), and outputs a `data.frame` with the following
 variables:
 
--   identifier variable (`cpr`)
--   date (`dato_start`)
+-   identifier variable (originally `cpr`, renamed to `pnr`)
+-   date (originally `dato_start`, renamed to `date`)
 -   department specialty (`hovedspeciale_ans`)
 -   diagnosis code (`diagnosekode`)
 -   diagnosis type (`diagnosetype`)
@@ -146,10 +146,10 @@ internal variables:
 
 -   LPR2-data:
     -   `pnr`: identifier variable
-    -   `do_diagnosis`: include all diabetes diagnoses, registered as
-        primary (A) or secondary (B) diagnoses, regardless of type or
-        department: `c_diag` starts with "DE1[0-4]", "249", or "250" and
-        `c_diagtype` is either "A" or "B"
+    -   `dates`: dates of all included diabetes diagnoses:
+    -   registered as primary (A) or secondary (B) diagnoses, regardless
+        of type or department: - `c_diag` starts with "DE1[0-4]", "249",
+        or "250" and `c_diagtype` is either "A" or "B"
     -   `is_primary`: Define whether the diagnosis was a primary
         diagnosis (`c_diagtype` == "A")
     -   `is_t1d`: Define whether the diagnosis was T1D-specific
@@ -161,11 +161,11 @@ internal variables:
         (`c_spec` \< 8 or 9-30)
 -   LPR3:
     -   `pnr`: identifier variable
-    -   `do_diagnosis`: include all diabetes diagnoses, registered as
-        primary (A) or secondary (B) diagnoses, regardless of type or
-        department: `diagnosekode` starts with "DE1[0-4]" and
-        `diagnosetype` is either "A" or "B", but exclude retracted
-        diagnoses (`senere_afkraeftet` == "Ja")
+    -   `dates`: dates of all included diabetes diagnoses:
+    -   Registered as primary (A) or secondary (B) diagnoses, regardless
+        of type or department, but exclude retracted diagnoses: -
+        `diagnosekode` starts with "DE1[0-4]", `diagnosetype` is either
+        "A" or "B" and `senere_afkraeftet` == "Nej")
     -   `is_primary`: Define whether the diagnosis was a primary
         diagnosis (`diagnosetype` == "A")
     -   `is_t1d`: Define whether the diagnosis was T1D-specific
@@ -189,7 +189,7 @@ following variables (up to two rows per individual):
 
 -   identifier variable (`pnr`)
 -   dates of the first and second hospital diabetes diagnosis
-    (`diagnosis_dates`)
+    (`diagnosis_date`)
 -   number of type 1 diabetes-specific primary diagnosis codes from
     endocrinological departments (`n_t1d_endo`)
 -   number of type 2 diabetes-specific primary diagnosis codes from
@@ -219,14 +219,12 @@ variable based on the year-week (wwyy-formatted) variable (`honuge`)
 found in the raw data, and de-duplicates multiple services registered on
 the same date.
 
-`include_podiatrist_services()` outputs a 3-column data frame with one
-row for each individual, containing the following variables:
+`include_podiatrist_services()` outputs a 2-column data frame with up to
+two rows for each individual, containing the following variables:
 
 -   identifier variable (`pnr`)
--   the date of the first diabetes-specific podiatrist record
-    (`do_podiatrist_1`)
--   the date of the second diabetes-specific podiatrist record
-    (`do_podiatrist_2`)
+-   the dates of the first and second diabetes-specific podiatrist
+    record (`dates`)
 
 The output is passed to the `get_diagnosis_date()` function for the
 final step of the inclusion process.
@@ -275,10 +273,11 @@ This function outputs a `data.frame` with the following variables needed
 later in the classification part of the function flow:
 
 -   identifier variable (`pnr`)
--   date (`eksd`)
+-   date (originally `eksd`, renamed to `date`)
 -   type of drug (`atc`)
--   amount purchased (`volume` and `apk`)
--   indication code (`indo`)
+-   amount purchased (`volume` and `number_of_packages` (originally
+    named `apk`))
+-   indication code (originally `indo`, renamed to `indication_code`)
 
 These events are then passed to a chain of exclusion functions:
 `exclude_wld_purchases()`, `exclude_potential_pcos()`,
@@ -435,3 +434,4 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
+

From 4bba18e11fbaa77c22cc21253478a7d094498614 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 20 Sep 2024 14:35:14 +0200
Subject: [PATCH 18/28] Added censoring/exclusion function description

---
 vignettes/function-flow.Rmd | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 0eb7731..dfd3394 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -283,16 +283,6 @@ These events are then passed to a chain of exclusion functions:
 `exclude_wld_purchases()`, `exclude_potential_pcos()`,
 `exclude_pregnancy()` described in the sections below.
 
-After these exclusion functions have been applied, the output serves as
-inputs to two sets of functions:
-
-1.  the `get_diagnosis_date()` function for the final step of the
-    inclusion process.
-2.  the `get_only_insulin_purchases()`,
-    `get_insulin_purchases_within_180_days()`, and
-    `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
-    classification of diabetes type.
-
 ### Exclusion events
 
 #### Metformin purchases potentially for the treatment of polycystic ovary syndrome
@@ -311,19 +301,30 @@ After these exclusions are made, the output is passed to
 
 #### HbA1c tests and GLD purchases during pregnancy
 
-The function `exclude_pregnancy()` uses diagnoses from LPR2 or LPR3 as
-input and is used to exclude both HbA1c tests and GLD purchases during
-pregnancy, as these may be due to gestational diabetes, rather than type
-1 or type 2 diabetes.
+The function `exclude_pregnancy()` takes the combined outputs from
+`join_lpr2()`, `join_lpr3()`, `include_hba1c()`, and
+`exclude_potential_pcos()` and uses diagnoses from LPR2 or LPR3 to
+exclude both elevated HbA1c tests and GLD purchases during pregnancy, as
+these may be due to gestational diabetes, rather than type 1 or type 2
+diabetes.
 
 Internally, this relies on the function `get_pregnancy_dates()` that
 uses diagnoses registered in the National Patient Register to extract
 the dates of all pregnancy ending (live births or miscarriages). These
-are identified by filtering values beginning with "DO0[0-6]", "DO8[0-4]"
-or "DZ3[37]" in the `c_diag` variable in the LPR2 data (`diagnosekode`
-in LPR3 data).
+are identified by filtering
+`values beginning with "DO0[0-6]", "DO8[0-4]" or "DZ3[37]" in the`c_diag`variable in the LPR2 data (`diagnosekode`in LPR3 data). The dates output by`get_pregnancy_dates()\`
+are used to exclude all inclusion events registered between 40 weeks
+before and 12 weeks after a pregnancy ending.
 
-<!-- TODO: Add details on how this filtering should be done -->
+After these exclusion functions have been applied, the output serves as
+inputs to two sets of functions:
+
+1.  the `get_diagnosis_date()` function for the final step of the
+    inclusion process.
+2.  the `get_only_insulin_purchases()`,
+    `get_insulin_purchases_within_180_days()`, and
+    `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
+    classification of diabetes type.
 
 ### Get diagnosis date
 

From 7cca920bf2f2b3eddb4a8c600e62582c007c045f Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 27 Sep 2024 12:05:15 +0200
Subject: [PATCH 19/28] Added correct diagnoses to filter to in lpr_join()
 functions. Reformatted output variable lists to start with variable names,
 followed by a short description. Also added info on renamed variables.

---
 vignettes/function-flow.Rmd | 189 +++++++++++++++++++-----------------
 1 file changed, 99 insertions(+), 90 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index dfd3394..0c4d923 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -106,28 +106,30 @@ diagnoses to administrative information in LPR2-formatted and
 LPR3-formatted data, respectively.
 
 `join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the
-necessary diagnoses (`c_diag` starting with "DO", "DZ3", "DE1[0-4]",
-"249", or "250"), joins the required information by record number
-(`recnum`), and outputs a `data.frame` with the following variables:
+necessary diagnoses (`c_diag` starting with "DO0[0-6]", "DO8[0-4]",
+"DZ3[37]", "DE1[0-4]", "249", or "250"), joins the required information
+by record number (`recnum`), and outputs a `data.frame` with the
+following variables:
 
--   identifier variable (`pnr`)
--   date (originally `d_inddto`, renamed to `date`)
--   department specialty (`c_spec`)
--   diagnosis code (`c_diag`)
--   diagnosis type (`c_diagtype`)
+-   `pnr`: identifier variable
+-   `date`: date of the recorded diagnosis (renamed from `d_inddto`)
+-   `specialty`: department specialty (renamed from `c_spec`)
+-   `diagnosis`: diagnosis code (renamed from `c_diag`)
+-   `diagnosis_type`: diagnosis type (renamed from `c_diagtype`)
 
 `join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to
-the necessary diagnoses (`diagnosekode` starting with "DO", "DZ3", or
-"DE1[0-4]"), joins the required information by record number
-(`dw_ek_kontakt`), and outputs a `data.frame` with the following
-variables:
-
--   identifier variable (originally `cpr`, renamed to `pnr`)
--   date (originally `dato_start`, renamed to `date`)
--   department specialty (`hovedspeciale_ans`)
--   diagnosis code (`diagnosekode`)
--   diagnosis type (`diagnosetype`)
--   diagnosis retracted (`senere_afkraeftet`)
+the necessary diagnoses (`diagnosekode` starting with "DO0[0-6]",
+"DO8[0-4]", "DZ3[37]" or "DE1[0-4]"), joins the required information by
+record number (`dw_ek_kontakt`), and outputs a `data.frame` with the
+following variables:
+
+-   `pnr`: identifier variable (renamed from `cpr`)
+-   `date`: date of the recorded diagnosis (renamed from `dato_start`)
+-   `specialty`: department specialty (renamed from `hovedspeciale_ans`)
+-   `diagnosis`: diagnosis code (renamed from `diagnosekode`)
+-   `diagnosis_type`: diagnosis type (renamed from `diagnosetype`)
+-   `diagnosis_retracted`: if the diagnosis was later retracted (renamed
+    from `senere_afkraeftet`)
 
 These outputs are passed to `include_diabetes_diagnoses()` (and to
 `get_pregnancy_dates()`, see exclusion events) for further processing
@@ -144,64 +146,68 @@ The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
 inputs and processes each input separately to generate the following
 internal variables:
 
--   LPR2-data:
+-   From `join_lpr2`:
     -   `pnr`: identifier variable
-    -   `dates`: dates of all included diabetes diagnoses:
+    -   `date`: dates of all included diabetes diagnoses:
     -   registered as primary (A) or secondary (B) diagnoses, regardless
-        of type or department: - `c_diag` starts with "DE1[0-4]", "249",
-        or "250" and `c_diagtype` is either "A" or "B"
+        of type or department:
+        -   `diagnosis` starts with "DE1[0-4]", "249" or "250", and
+            `diagnosis_type` is either "A" or "B"
     -   `is_primary`: Define whether the diagnosis was a primary
-        diagnosis (`c_diagtype` == "A")
+        diagnosis (`diagnosis_type` == "A")
     -   `is_t1d`: Define whether the diagnosis was T1D-specific
-        (`c_diag` starts with "DE10" or "249")
+        (`diagnosis` starts with "DE10" or "249")
     -   `is_t2d`: Define whether the diagnosis was T2D-specific
-        (`c_diag` starts with "DE11" or "250")
+        (`diagnosis` starts with "DE11" or "250")
     -   `department`: Define whether the diagnosis was made made by an
-        endocrinological (`c_spec` == 8) or other medical department
-        (`c_spec` \< 8 or 9-30)
--   LPR3:
+        endocrinological (`specialty` == 8) or other medical department
+        (`specialty` \< 8 or 9-30)
+-   From `join_lpr3()`:
     -   `pnr`: identifier variable
-    -   `dates`: dates of all included diabetes diagnoses:
-    -   Registered as primary (A) or secondary (B) diagnoses, regardless
-        of type or department, but exclude retracted diagnoses: -
-        `diagnosekode` starts with "DE1[0-4]", `diagnosetype` is either
-        "A" or "B" and `senere_afkraeftet` == "Nej")
+    -   `date`: dates of all included diabetes diagnoses:
+    -   registered as primary (A) or secondary (B) diagnoses, regardless
+        of type or department, but exclude retracted diagnoses:
+        -   `diagnosis` starts with "DE1[0-4]", `diagnosis_type` is
+            either "A" or "B" and `diagnosis_retracted` == "Nej"
     -   `is_primary`: Define whether the diagnosis was a primary
-        diagnosis (`diagnosetype` == "A")
+        diagnosis (`diagnosis_type` == "A")
     -   `is_t1d`: Define whether the diagnosis was T1D-specific
-        (`diagnosekode` starts with "DE10")
+        (`diagnosis` starts with "DE10")
     -   `is_t2d`: Define whether the diagnosis was T2D-specific
-        (`diagnosekode` starts with "DE11")
+        (`diagnosis` starts with "DE11")
     -   `department`: Define whether the diagnosis was made made by an
-        endocrinological (`hovedspeciale_ans` == "medicinsk
-        endokrinologi") or other medical department (`hovedspeciale_ans`
-        either "Blandet medicin og kirurgi", "Intern medicin",
-        "Geriatri", "Hepatologi", "Hæmatologi", "Infektionsmedicin",
-        "Kardiologi", "Medicinsk allergologi", "Medicinsk
-        gastroenterologi", "Medicinsk lungesygdomme", "Nefrologi",
-        "Reumatologi", "Palliativ medicin", "Akut medicin",
-        "Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or
-        "Tropemedicin")
-
-These intermediate results are combined for further processing, and
-`include_diabetes_diagnoses()` outputs a single `data.frame` with the
-following variables (up to two rows per individual):
-
--   identifier variable (`pnr`)
--   dates of the first and second hospital diabetes diagnosis
-    (`diagnosis_date`)
--   number of type 1 diabetes-specific primary diagnosis codes from
-    endocrinological departments (`n_t1d_endo`)
--   number of type 2 diabetes-specific primary diagnosis codes from
-    endocrinological departments (`n_t2d_endo`)
--   number of type 1 diabetes-specific primary diagnosis codes from
-    medical departments (`n_t1d_medical`)
--   number of type 2 diabetes-specific primary diagnosis codes from
-    medical departments (`n_t2d_medical`)
-
-The output is passed to the `get_diagnosis_date()` function for the
-final step of the inclusion process and is subsequently used to classify
-diabetes type.
+        endocrinological department (`specialty` == "medicinsk
+        endokrinologi" -\> `department` == "endocrinological") or other
+        medical department (`specialty` either "Blandet medicin og
+        kirurgi", "Intern medicin", "Geriatri", "Hepatologi",
+        "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk
+        allergologi", "Medicinsk gastroenterologi", "Medicinsk
+        lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin",
+        "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi",
+        "Fysiurgi", or "Tropemedicin" -\> `department` == "medical")
+
+Internally, these intermediate results are combined and processed
+together. And ultimately, `include_diabetes_diagnoses()` outputs a
+single `data.frame` with the following variables (up to two rows per
+individual):
+
+-   `pnr`: identifier variable
+-   `dates`: dates of the first and second hospital diabetes diagnosis
+-   `n_t1d_endocrinology`: number of type 1 diabetes-specific primary
+    diagnosis codes from endocrinological departments
+-   `n_t2d_endocrinology`: number of type 2 diabetes-specific primary
+    diagnosis codes from endocrinological departments
+-   `n_t1d_medical`: number of type 1 diabetes-specific primary
+    diagnosis codes from medical departments
+-   `n_t2d_medical`: number of type 2 diabetes-specific primary
+    diagnosis codes from medical departments
+
+This output is passed to the `join_inclusions()` function, where the
+`dates` variable is used for the final step of the inclusion process.
+The variables of counts of diabetes type-specific primary diagnoses are
+carried over for the subsequent classification of diabetes type,
+initially as inputs to the `get_t1d_primary_diagnosis()` and
+`get_majority_of_t1d_diagnoses()` functions.
 
 #### Diabetes-specific podiatrist services
 
@@ -222,12 +228,12 @@ the same date.
 `include_podiatrist_services()` outputs a 2-column data frame with up to
 two rows for each individual, containing the following variables:
 
--   identifier variable (`pnr`)
--   the dates of the first and second diabetes-specific podiatrist
-    record (`dates`)
+-   `pnr`: identifier variable
+-   `dates`: the dates of the first and second diabetes-specific
+    podiatrist record
 
-The output is passed to the `get_diagnosis_date()` function for the
-final step of the inclusion process.
+The output is passed to the `join_inclusions()` function for the final
+step of the inclusion process.
 
 #### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
 
@@ -251,8 +257,8 @@ twice (one for IFCC, one for DCCT units).
 `include_hba1c()` outputs a 2-column data frame containing the following
 variables:
 
--   identifier variable (`pnr`)
--   the dates of all elevated HbA1c test results (`dates`).
+-   `pnr`: identifier variable
+-   `dates`: the dates of all elevated HbA1c test results
 
 The output is passed to the `exclude_pregnancy()` function for censoring
 of elevated results due to potential gestational diabetes (see below).
@@ -272,12 +278,14 @@ for censoring, the extraction window can be extended).
 This function outputs a `data.frame` with the following variables needed
 later in the classification part of the function flow:
 
--   identifier variable (`pnr`)
--   date (originally `eksd`, renamed to `date`)
--   type of drug (`atc`)
--   amount purchased (`volume` and `number_of_packages` (originally
-    named `apk`))
--   indication code (originally `indo`, renamed to `indication_code`)
+-   `pnr`: identifier variable
+-   `date`: dates of all purchases of GLD (renamed from `eksd`)
+-   `atc`: type of drug
+-   `contained_doses`: amount purchased, in number of defined daily
+    doses (DDD). Calculated as `volume` (doses contained in the
+    purchased package) times `apk` (number of packages purchased)
+-   `indication_code`: indication code of the prescription (renamed from
+    `indo`)
 
 These events are then passed to a chain of exclusion functions:
 `exclude_wld_purchases()`, `exclude_potential_pcos()`,
@@ -294,7 +302,7 @@ women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` =
 "woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an
 indication code suggesting treatment of polycystic ovary syndrome (`atc`
 = "A10BA02" & `sex` = "woman" & `indication_code` either "0000092",
-"0000276", "0000781").
+"0000276" or "0000781").
 
 After these exclusions are made, the output is passed to
 `exclude_pregnancy()` for further censoring, described below:
@@ -311,17 +319,19 @@ diabetes.
 Internally, this relies on the function `get_pregnancy_dates()` that
 uses diagnoses registered in the National Patient Register to extract
 the dates of all pregnancy ending (live births or miscarriages). These
-are identified by filtering
-`values beginning with "DO0[0-6]", "DO8[0-4]" or "DZ3[37]" in the`c_diag`variable in the LPR2 data (`diagnosekode`in LPR3 data). The dates output by`get_pregnancy_dates()\`
-are used to exclude all inclusion events registered between 40 weeks
-before and 12 weeks after a pregnancy ending.
+are identified by `diagnosis` values beginning with "DO0[0-6]",
+"DO8[0-4]" or "DZ3[37]". The dates output by`get_pregnancy_dates()\` are
+used to exclude all inclusion events registered between 40 weeks before
+and 12 weeks after a pregnancy ending.
 
 After these exclusion functions have been applied, the output serves as
 inputs to two sets of functions:
 
-1.  the `get_diagnosis_date()` function for the final step of the
-    inclusion process.
-2.  the `get_only_insulin_purchases()`,
+1.  the censored HbA1c and GLD data are passed to the
+    `join_inclusions()` function for the final step of the inclusion
+    process.
+2.  the censored GLD data is passed to the
+    `get_only_insulin_purchases()`,
     `get_insulin_purchases_within_180_days()`, and
     `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
     classification of diabetes type.
@@ -369,8 +379,8 @@ OSDC algorithm includes the following criteria:
     diagnoses extracted from `lpr_diag` (LPR2) and `diagnoser` (LPR3) in
     the previous steps.
 2.  `get_only_insulin_purchases()` which relies on the GLD purchases
-    from Lægemiddelsdatabasen to get patients where all GLD purchases
-    are insulin only.
+    from Lægemiddeldatabasen to get patients where all GLD purchases are
+    insulin only.
 3.  `get_majority_of_t1d_diagnoses()` (as compared to T2D diagnoses)
     which again relies on primary hospital diagnoses from LPR.
 4.  `get_insulin_purchase_within_180_days()` which relies on both
@@ -435,4 +445,3 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
-

From b40412c4b85f560d3ada63bed8ac23da575dd35f Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Fri, 27 Sep 2024 12:34:56 +0200
Subject: [PATCH 20/28] changed specialty values to align with the PR with a
 refactored create_lpr2() function

---
 vignettes/function-flow.Rmd | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 0c4d923..2360270 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -160,8 +160,9 @@ internal variables:
     -   `is_t2d`: Define whether the diagnosis was T2D-specific
         (`diagnosis` starts with "DE11" or "250")
     -   `department`: Define whether the diagnosis was made made by an
-        endocrinological (`specialty` == 8) or other medical department
-        (`specialty` \< 8 or 9-30)
+        endocrinological (`specialty` == 8 -\> `department` ==
+        "endocrinology") or other medical department (`specialty` \< 8
+        or 9-30 -\> `department` == "other medical")
 -   From `join_lpr3()`:
     -   `pnr`: identifier variable
     -   `date`: dates of all included diabetes diagnoses:
@@ -177,14 +178,15 @@ internal variables:
         (`diagnosis` starts with "DE11")
     -   `department`: Define whether the diagnosis was made made by an
         endocrinological department (`specialty` == "medicinsk
-        endokrinologi" -\> `department` == "endocrinological") or other
+        endokrinologi" -\> `department` == "endocrinology") or other
         medical department (`specialty` either "Blandet medicin og
         kirurgi", "Intern medicin", "Geriatri", "Hepatologi",
         "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk
         allergologi", "Medicinsk gastroenterologi", "Medicinsk
         lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin",
         "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi",
-        "Fysiurgi", or "Tropemedicin" -\> `department` == "medical")
+        "Fysiurgi", or "Tropemedicin" -\> `department` == "other
+        medical")
 
 Internally, these intermediate results are combined and processed
 together. And ultimately, `include_diabetes_diagnoses()` outputs a

From 35118e82e93762bb23ee72893a6f03fd6cdd0ae8 Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Tue, 17 Dec 2024 00:42:11 +0100
Subject: [PATCH 21/28] Joining inclusions and definition. Looking to add type
 classification.

---
 vignettes/function-flow.Rmd | 165 ++++++++++++++++++++++++++----------
 1 file changed, 121 insertions(+), 44 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 2360270..1d8ceec 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -151,8 +151,8 @@ internal variables:
     -   `date`: dates of all included diabetes diagnoses:
     -   registered as primary (A) or secondary (B) diagnoses, regardless
         of type or department:
-        -   `diagnosis` starts with "DE1[0-4]", "249" or "250", and
-            `diagnosis_type` is either "A" or "B"
+        -   Keep rows where `diagnosis` starts with "DE1[0-4]", "249" or
+            "250", and `diagnosis_type` is either "A" or "B"
     -   `is_primary`: Define whether the diagnosis was a primary
         diagnosis (`diagnosis_type` == "A")
     -   `is_t1d`: Define whether the diagnosis was T1D-specific
@@ -160,16 +160,17 @@ internal variables:
     -   `is_t2d`: Define whether the diagnosis was T2D-specific
         (`diagnosis` starts with "DE11" or "250")
     -   `department`: Define whether the diagnosis was made made by an
-        endocrinological (`specialty` == 8 -\> `department` ==
-        "endocrinology") or other medical department (`specialty` \< 8
-        or 9-30 -\> `department` == "other medical")
+        endocrinological (if `specialty` == 8 then `department` ==
+        "endocrinology") or other medical department (if `specialty` \<
+        8 or 9-30 then `department` == "other medical")
 -   From `join_lpr3()`:
     -   `pnr`: identifier variable
     -   `date`: dates of all included diabetes diagnoses:
     -   registered as primary (A) or secondary (B) diagnoses, regardless
         of type or department, but exclude retracted diagnoses:
-        -   `diagnosis` starts with "DE1[0-4]", `diagnosis_type` is
-            either "A" or "B" and `diagnosis_retracted` == "Nej"
+        -   Keep rows where `diagnosis` starts with "DE1[0-4]",
+            `diagnosis_type` is either "A" or "B" and
+            `diagnosis_retracted` == "Nej"
     -   `is_primary`: Define whether the diagnosis was a primary
         diagnosis (`diagnosis_type` == "A")
     -   `is_t1d`: Define whether the diagnosis was T1D-specific
@@ -177,15 +178,15 @@ internal variables:
     -   `is_t2d`: Define whether the diagnosis was T2D-specific
         (`diagnosis` starts with "DE11")
     -   `department`: Define whether the diagnosis was made made by an
-        endocrinological department (`specialty` == "medicinsk
-        endokrinologi" -\> `department` == "endocrinology") or other
-        medical department (`specialty` either "Blandet medicin og
+        endocrinological department (if `specialty` == "medicinsk
+        endokrinologi" then `department` == "endocrinology") or other
+        medical department (if `specialty` is any of "Blandet medicin og
         kirurgi", "Intern medicin", "Geriatri", "Hepatologi",
         "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk
         allergologi", "Medicinsk gastroenterologi", "Medicinsk
         lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin",
         "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi",
-        "Fysiurgi", or "Tropemedicin" -\> `department` == "other
+        "Fysiurgi", or "Tropemedicin" then `department` == "other
         medical")
 
 Internally, these intermediate results are combined and processed
@@ -271,14 +272,21 @@ The function `include_gld_purchases()` uses `lmdb` to extract the dates
 of all GLD purchases.
 
 These dates are extracted by including all values beginning with "A10"
-in the `atc` variable of the `lmdb` register. Since the diagnosis code
-data on pregnancies (see below) is insufficient to perform censoring
-prior to 1997, `include_gld_purchases()` only extracts dates from 1997
-onward by default (if Medical Birth Register data is available to use
-for censoring, the extraction window can be extended).
-
-This function outputs a `data.frame` with the following variables needed
-later in the classification part of the function flow:
+in the `atc` variable of the `lmdb` register, except for
+glucose-lowering drugs that may be used for other conditions than
+diabetes: GLP-RAs (`atc` start with "A10BJ") or
+dapagliflozin/empagliflozin (`atc` = "A10BK01" or "A10BK03").
+
+Since the diagnosis code data on pregnancies (see below) is insufficient
+to perform censoring prior to 1997, `include_gld_purchases()` only
+extracts dates from 1997 onward by default (if Medical Birth Register
+data is available to use for censoring, the extraction window can be
+extended).
+
+This function outputs a long `data.frame` (since all dates of purchases
+must be kept for later use in classifyin diabetes type) with the
+following variables needed later in the classification part of the
+function flow:
 
 -   `pnr`: identifier variable
 -   `date`: dates of all purchases of GLD (renamed from `eksd`)
@@ -290,8 +298,8 @@ later in the classification part of the function flow:
     `indo`)
 
 These events are then passed to a chain of exclusion functions:
-`exclude_wld_purchases()`, `exclude_potential_pcos()`,
-`exclude_pregnancy()` described in the sections below.
+`exclude_potential_pcos()` and `exclude_pregnancy()` described in the
+sections below.
 
 ### Exclusion events
 
@@ -301,13 +309,16 @@ The function `exclude_potential_pcos()` takes the output from
 `include_gld_purchases()` and `bef` (information on sex and date of
 birth) as inputs and censors (filters out) all purchases of metformin in
 women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` =
-"woman" & date at purchase (`date`-`date_of_birth`) \< 40 years) or an
-indication code suggesting treatment of polycystic ovary syndrome (`atc`
-= "A10BA02" & `sex` = "woman" & `indication_code` either "0000092",
-"0000276" or "0000781").
+"woman" & age at purchase (`date`-`date_of_birth`) \< 40 years) or an
+indication code suggesting the prescription was made for treatment of
+polycystic ovary syndrome (`atc` = "A10BA02" & `sex` = "woman" &
+`indication_code` either of "0000092", "0000276" or "0000781").
 
-After these exclusions are made, the output is passed to
-`exclude_pregnancy()` for further censoring, described below:
+This function only performs a filtering operation, and output retains
+the same structure and variables as the input passed from
+`include_gld_purchases()`. After these exclusions are made, the output
+is passed to `exclude_pregnancy()` for further censoring, described
+below:
 
 #### HbA1c tests and GLD purchases during pregnancy
 
@@ -320,16 +331,17 @@ diabetes.
 
 Internally, this relies on the function `get_pregnancy_dates()` that
 uses diagnoses registered in the National Patient Register to extract
-the dates of all pregnancy ending (live births or miscarriages). These
-are identified by `diagnosis` values beginning with "DO0[0-6]",
-"DO8[0-4]" or "DZ3[37]". The dates output by`get_pregnancy_dates()\` are
-used to exclude all inclusion events registered between 40 weeks before
-and 12 weeks after a pregnancy ending.
+the dates of all recorded pregnancy endings (live births or
+miscarriages). These are identified by `diagnosis` values beginning with
+"DO0[0-6]", "DO8[0-4]" or "DZ3[37]". The dates output by
+`get_pregnancy_dates()` are used to exclude all inclusion events
+registered between 40 weeks before and 12 weeks after a pregnancy
+ending.
 
 After these exclusion functions have been applied, the output serves as
 inputs to two sets of functions:
 
-1.  the censored HbA1c and GLD data are passed to the
+1.  The censored HbA1c and GLD data are passed to the
     `join_inclusions()` function for the final step of the inclusion
     process.
 2.  the censored GLD data is passed to the
@@ -338,20 +350,61 @@ inputs to two sets of functions:
     `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
     classification of diabetes type.
 
+### Join inclusion events
+
+The function `join_inclusions()` appends/row-binds the dates output from
+functions the process the four types of inclusion events by `pnr`. Thus,
+it takes as input the following variables output from the following
+functions:
+
+-   From `include_diabetes_diagnoses()`:
+    -   `pnr`: identifier variable
+    -   `dates`: dates of the first and second hospital diabetes
+        diagnosis
+-   From `include_podiatrist_services()`
+    -   `pnr`: identifier variable
+    -   `dates`: the dates of the first and second diabetes-specific
+        podiatrist record
+-   From `exclude_pregnancy()`:
+    -   `pnr`: identifier variable
+    -   `dates`: the dates of the first and second elevated HbA1c test
+        results (after censoring)
+-   From `exclude_pregnancy()`:
+    -   `pnr`: identifier variable
+    -   `date`: dates of all purchases of GLD
+        -   The dates of the first and second purchase of GLD of each
+            individual are extracted from these and appended as two rows
+            to the ´dates´ variable.
+
+The output from the function is a `data.frame` containing two variables
+(`pnr` and `dates`) and 1 to 8 rows per ´pnr´. This outputn is passed to
+`get_diagnosis_date()`.
+
 ### Get diagnosis date
 
-The function `get_diagnosis_date()` combines the outputs from the
-inclusion and exclusion functions to get the final diagnosis date.
-Initially, it drops the first inclusion and exclusion events from the
-function outputs with the helper `drop_first_event()`, so that only
-those with two or more events are kept. This is then used to assign an
-initial diagnosis according to OSDC. Then, all the outputs are joined
-together with `join_diagnosis_dates()`.
+The function `get_diagnosis_date()` takes the output from
+`join_inclusions()` and defines the final diagnosis date based on all
+the inclusion event types.
+
+First, the inputs are sorted by `dates` within each level of `pnr`, then
+the earliest value of `dates` is dropped function outputs with the
+helper `drop_first_event()`, so that only those with two or more events
+are included. The date of inclusion, `raw_inclusion_date`, is then
+defined as the earliest value of `dates`in the remaining rows for each
+individual (effectively the date of the second recorded inclusion
+event). A third variable, `stable_inclusion_date`, is defined based on
+`raw_inclusion_date` (if `raw_inclusion_date` \< stable inclusion
+threshold (default "31-12-1997"), then `stable_inclusion_date` is set to
+`NA`, else it is set to`raw_inclusion_date`). This variable serves to
+limit the included cohort to only individuals with valid date of
+inclusion (and thereby valid age at inclusion & duration of diabetes).
+
+`get_diagnosis_date()` outputs a `data.frame` with the following
+variables needed later in the classification part of the function flow:
 
-Finally, the dates outside of the data coverage period are dropped with
-`drop_diagnosis_dates_outside_coverage()` to end with a final diagnosis
-date. For details on this censoring based on periods with insufficient
-data coverage, see the `vignette("design")`.
+-   `pnr`: identifier variable
+-   `raw_inclusion_date`: date of inclusion
+-   `stable_inclusion_date`: date of inclusion of valid incident cases
 
 ### Classifying the diabetes type
 
@@ -360,6 +413,30 @@ extracted diabetes population as having either T1D or T2D. As described
 in the `vignette("design")`, individuals not classified as T1D cases are
 classified as T2D cases.
 
+As the diabetes type classification incorporates an evaluation of the
+time from diagnosis/inclusion to first subsequent purchase of insulin,
+the `get_diabetes_type` function has to take inputs on the date of
+diagnosis and all purchases of GLD drugs. In addition, several helper
+functions are applied to extract additional information from the
+censored GLD data to use for classification of diabetes type:
+
+```         
+`get_only_insulin_purchases()`,
+`get_insulin_purchases_within_180_days()`, and
+`get_insulin_is_two_thirds_of_gld_doses()`
+```
+
+Thus, the function takes the following inputs/variables:
+
+-   From `get_diagnosis_date()`
+    -   `pnr`
+    -   `raw_inclusion_date`
+-   From `exclude_pregnancy()`: Information on historic GLD data:
+    -   `pnr`: identifier variable
+    -   `date`: dates of all purchases of GLD.
+
+<!-- TODO: Finish this section! Everything below here is  not reliable.-->
+
 The output is a `data.frame` that includes one row per individual in the
 diabetes population: one column with their PNR, two columns with
 inclusion dates (one "stable" date and one "raw" date - see the

From cbee21f035c8b511a58ba568c5ba15b8a320008e Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Tue, 17 Dec 2024 20:56:30 +0100
Subject: [PATCH 22/28] Removed helper function for dropping first event as it
 seemed a bit excessive.

Added a bit more detail to the raw_inclusion_date vs stable_inclusion_date.
---
 vignettes/function-flow.Rmd | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 1d8ceec..5d26e6e 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -387,17 +387,18 @@ The function `get_diagnosis_date()` takes the output from
 the inclusion event types.
 
 First, the inputs are sorted by `dates` within each level of `pnr`, then
-the earliest value of `dates` is dropped function outputs with the
-helper `drop_first_event()`, so that only those with two or more events
-are included. The date of inclusion, `raw_inclusion_date`, is then
-defined as the earliest value of `dates`in the remaining rows for each
-individual (effectively the date of the second recorded inclusion
-event). A third variable, `stable_inclusion_date`, is defined based on
-`raw_inclusion_date` (if `raw_inclusion_date` \< stable inclusion
-threshold (default "31-12-1997"), then `stable_inclusion_date` is set to
-`NA`, else it is set to`raw_inclusion_date`). This variable serves to
-limit the included cohort to only individuals with valid date of
-inclusion (and thereby valid age at inclusion & duration of diabetes).
+the earliest value of `dates` is dropped, so that only those with two or
+more events are included. The date of inclusion, `raw_inclusion_date`,
+is then defined as the earliest value of `dates`in the remaining rows
+for each individual (effectively the date of the second recorded
+inclusion event). A third variable, `stable_inclusion_date`, is defined
+based on `raw_inclusion_date` (if `raw_inclusion_date` \< stable
+inclusion threshold (one year after medication data starts to contribute
+to inclusions. Default "31-12-1997"), then `stable_inclusion_date` is
+set to `NA`, else it is set to`raw_inclusion_date`). This variable
+serves to limit the included cohort to only individuals with valid date
+of inclusion (and thereby valid age at inclusion & duration of
+diabetes).
 
 `get_diagnosis_date()` outputs a `data.frame` with the following
 variables needed later in the classification part of the function flow:

From 3820459d82e41cc656ed1471df964644364bf0ff Mon Sep 17 00:00:00 2001
From: Anders Aasted Isaksen <ANDAAS@onerm.dk>
Date: Tue, 17 Dec 2024 23:56:57 +0100
Subject: [PATCH 23/28] Added function flow description of get_diabetes_type()
 and its helper functions.

---
 vignettes/function-flow.Rmd | 126 ++++++++++++++++++++++++++++--------
 1 file changed, 98 insertions(+), 28 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index 5d26e6e..c2d1780 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -377,7 +377,7 @@ functions:
             to the ´dates´ variable.
 
 The output from the function is a `data.frame` containing two variables
-(`pnr` and `dates`) and 1 to 8 rows per ´pnr´. This outputn is passed to
+(`pnr` and `dates`) and 1 to 8 rows per ´pnr´. This output is passed to
 `get_diagnosis_date()`.
 
 ### Get diagnosis date
@@ -401,12 +401,15 @@ of inclusion (and thereby valid age at inclusion & duration of
 diabetes).
 
 `get_diagnosis_date()` outputs a `data.frame` with the following
-variables needed later in the classification part of the function flow:
+variables:
 
 -   `pnr`: identifier variable
 -   `raw_inclusion_date`: date of inclusion
 -   `stable_inclusion_date`: date of inclusion of valid incident cases
 
+This output is passed to the `get_diabetes_type()` function and used to
+classify the diabetes type as described below.
+
 ### Classifying the diabetes type
 
 The next step of the OSDC algorithm classifies individuals from the
@@ -416,38 +419,104 @@ classified as T2D cases.
 
 As the diabetes type classification incorporates an evaluation of the
 time from diagnosis/inclusion to first subsequent purchase of insulin,
-the `get_diabetes_type` function has to take inputs on the date of
-diagnosis and all purchases of GLD drugs. In addition, several helper
-functions are applied to extract additional information from the
-censored GLD data to use for classification of diabetes type:
-
-```         
-`get_only_insulin_purchases()`,
-`get_insulin_purchases_within_180_days()`, and
-`get_insulin_is_two_thirds_of_gld_doses()`
-```
+the `get_diabetes_type()` function has to take the date of diagnosis and
+all purchases of GLD drugs (after censoring) as inputs. In addition,
+information on diabetes type-specific primary diagnoses from hospitals
+is also a requirement.
 
-Thus, the function takes the following inputs/variables:
+Thus, the function takes the following inputs from
+`get_diagnosis_date()`, `exclude_pregnancy()`, and
+`include_diabetes_diagnoses()`:
 
--   From `get_diagnosis_date()`
+-   From `get_diagnosis_date()`: Information on date of diagnosis of
+    diabetes
     -   `pnr`
     -   `raw_inclusion_date`
--   From `exclude_pregnancy()`: Information on historic GLD data:
+    -   `stable_inclusion_date`
+-   From `exclude_pregnancy()`: Information on historic GLD purchases:
     -   `pnr`: identifier variable
     -   `date`: dates of all purchases of GLD.
-
-<!-- TODO: Finish this section! Everything below here is  not reliable.-->
-
-The output is a `data.frame` that includes one row per individual in the
-diabetes population: one column with their PNR, two columns with
-inclusion dates (one "stable" date and one "raw" date - see the
-`vignette("design")` for an elaboration on what that entails), and one
-column with the diabetes type.
-
-<!-- TODO: add a link to the specific section where this is described -->
-
-![Flow of functions for classifying diabetes status using the `osdc`
-package.](images/function-flow-classification.png)
+    -   `atc`: type of drug
+    -   `contained_doses`: defined daily doses of drug contained in
+        purchase
+-   From `include_diabetes_diagnoses()`: Information on diabetes
+    type-specific primary diagnoses from hospitals:
+    -   `pnr`: identifier variable
+    -   `n_t1d_endocrinology`: number of type 1 diabetes-specific
+        primary diagnosis codes from endocrinological departments
+    -   `n_t2d_endocrinology`: number of type 2 diabetes-specific
+        primary diagnosis codes from endocrinological departments
+    -   `n_t1d_medical`: number of type 1 diabetes-specific primary
+        diagnosis codes from medical departments
+    -   `n_t2d_medical`: number of type 2 diabetes-specific primary
+        diagnosis codes from medical departments
+
+For each `pnr` number, several helper functions are applied to these
+inputs to extract additional information from the censored GLD data and
+diagnoses to use for classification of diabetes type. All of these
+return a single value (`TRUE`, otherwise `FALSE`) for each individual:
+
+-   `get_only_insulin_purchases()`:
+    -   Inputs passed from `exclude_pregnancy()`:
+        -   `atc`
+    -   Outputs:
+        -   only_insulin_purchases = `TRUE` if no purchases with `atc`
+            starting with "A10A" are present
+-   `get_insulin_purchases_within_180_days()`
+    -   Inputs passed from `exclude_pregnancy()`:
+        -   `date` & `atc`
+    -   Inputs passed from `get_diagnosis_date()`:
+        -   `raw_inclusion_date`
+    -   Outputs: `TRUE` If any purchases with `atc` starting with "A10A"
+        have a `date` between 0 and 180 days higher than
+        `raw_inclusion_date`
+-   `get_insulin_is_two_thirds_of_gld_doses()`
+    -   Inputs passed from `exclude_pregnancy()`:
+        -   `contained_doses` & `atc`
+    -   Outputs: `TRUE` If the sum of `contained_doses` of rows of `atc`
+        starting with "A10A" (except "A10AE5") is at least twice the sum
+        of `contained_doses` of rows of `atc` starting with "A10B" or
+        "A10AE5"
+-   `get_any_t1d_primary_diagnoses()`:
+    -   Inputs passed from `include_diabetes_diagnoses()`:
+        -   `n_t1d_endocrinology` & `n_t1d_medical`
+    -   Outputs: `TRUE` if the combined sum of the inputs is 1 or above.
+-   `get_type_diagnoses_from_endocrinology()`:
+    -   Inputs passed from `include_diabetes_diagnoses()`:
+        -   `n_t1d_endocrinology`, `n_t2d_endocrinology`
+    -   Outputs: `type_diagnoses_from_endocrinology` = `TRUE` if the
+        combined sum of the inputs is 1 or above
+-   `get_type_diagnosis_majority()`:
+    -   Inputs passed from `include_diabetes_diagnoses()`:
+        -   `n_t1d_endocrinology`, `n_t2d_endocrinology`,
+            `n_t1d_medical` & `n_t2d_medical`
+    -   Inputs passed from `get_type_diagnoses_from_endocrinology()`:
+        -   `type_diagnoses_from_endocrinology`
+    -   Outputs: `TRUE` if `type_diagnoses_from_endocrinology` == `TRUE`
+        and `n_t1d_endocrinology` is above `n_t2d_endocrinology`. Also
+        `TRUE` if `type_diagnoses_from_endocrinology` = `FALSE` and
+        `n_t1d_medical` is above `n_t2d_medical`
+
+`get_diabetes_type()` evaluates all the outputs from the helper
+functions to define diabetes type for each individual. Diabetes type is
+classified as "T1D" if:
+
+-   `only_insulin_purchases` == `TRUE` & `any_t1d_primary_diagnoses` ==
+    `TRUE`
+-   Or `only_insulin_purchases` == `FALSE` & `any_t1d_primary_diagnoses`
+    == `TRUE` & `type_diagnosis_majority` == `TRUE` &
+    `insulin_is_two_thirds_of_gld_doses` == `TRUE` &
+    `insulin_purchases_within_180_days` == `TRUE`
+
+`get_diabetes_type()` returns a `data.frame` with one row per `pnr`
+number and four columns: `pnr`, `stable_inclusion_date`,
+`raw_inclusion_date` & `diabetes_type`. This is the final product of the
+OSDC algorithm. See the `vignette("design")` for an more detail on the
+two inclusion dates and their intended use-cases.
+
+<!-- TODO: Create updated image similar to https://aastedet.github.io/dissertation/4-results.html#fig-osdc-type-flow to reflect the new diabetes type logic and embed image here for reference-->
+
+<!-- TODO:  The following explanatory sections on T1D and T2D classification need to be aligned with the technical sections above, and possibly moved up to them-->
 
 #### Type 1 classification
 
@@ -525,3 +594,4 @@ is within a time-period of insufficient data coverage,
 contains the inclusion date of this individual.
 
 <!-- TODO: Specify the "stable" time-period: e.g., later than 1997 -->
+

From d294071ba24f2c3a92c2348c8942327cdfb9a73c Mon Sep 17 00:00:00 2001
From: "Luke W. Johnston" <lwjohnst86@users.noreply.github.com>
Date: Wed, 18 Dec 2024 17:50:20 +0100
Subject: [PATCH 24/28] docs: :pencil2: small edits from review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Signe Kirk Brødbæk <40836345+signekb@users.noreply.github.com>
---
 vignettes/function-flow.Rmd | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index c2d1780..4240f1d 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -99,7 +99,7 @@ library(osdc)
 
 #### Hospital diagnoses
 
-**Joining LPR2 and LPR3 data**
+#### Joining LPR2 and LPR3 data
 
 The helper functions `join_lpr2()` and `join_lpr3()` join records of
 diagnoses to administrative information in LPR2-formatted and
@@ -114,7 +114,7 @@ following variables:
 -   `pnr`: identifier variable
 -   `date`: date of the recorded diagnosis (renamed from `d_inddto`)
 -   `specialty`: department specialty (renamed from `c_spec`)
--   `diagnosis`: diagnosis code (renamed from `c_diag`)
+-   `diagnosis_code`: diagnosis code (renamed from `c_diag`)
 -   `diagnosis_type`: diagnosis type (renamed from `c_diagtype`)
 
 `join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to
@@ -126,7 +126,7 @@ following variables:
 -   `pnr`: identifier variable (renamed from `cpr`)
 -   `date`: date of the recorded diagnosis (renamed from `dato_start`)
 -   `specialty`: department specialty (renamed from `hovedspeciale_ans`)
--   `diagnosis`: diagnosis code (renamed from `diagnosekode`)
+-   `diagnosis_code`: diagnosis code (renamed from `diagnosekode`)
 -   `diagnosis_type`: diagnosis type (renamed from `diagnosetype`)
 -   `diagnosis_retracted`: if the diagnosis was later retracted (renamed
     from `senere_afkraeftet`)
@@ -135,7 +135,7 @@ These outputs are passed to `include_diabetes_diagnoses()` (and to
 `get_pregnancy_dates()`, see exclusion events) for further processing
 below.
 
-**Processing of diabetes diagnoses**
+#### Processing of diabetes diagnoses
 
 The function `include_diabetes_diagnoses()` uses the hospital contacts
 from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
@@ -207,7 +207,7 @@ individual):
 
 This output is passed to the `join_inclusions()` function, where the
 `dates` variable is used for the final step of the inclusion process.
-The variables of counts of diabetes type-specific primary diagnoses are
+The variables of counts of diabetes type-specific primary diagnoses (the four columns prefixed `n_` above) are
 carried over for the subsequent classification of diabetes type,
 initially as inputs to the `get_t1d_primary_diagnosis()` and
 `get_majority_of_t1d_diagnoses()` functions.
@@ -232,7 +232,7 @@ the same date.
 two rows for each individual, containing the following variables:
 
 -   `pnr`: identifier variable
--   `dates`: the dates of the first and second diabetes-specific
+-   `date`: the dates of the first and second diabetes-specific
     podiatrist record
 
 The output is passed to the `join_inclusions()` function for the final
@@ -284,7 +284,7 @@ data is available to use for censoring, the extraction window can be
 extended).
 
 This function outputs a long `data.frame` (since all dates of purchases
-must be kept for later use in classifyin diabetes type) with the
+must be kept for later use in classifying diabetes type) with the
 following variables needed later in the classification part of the
 function flow:
 
@@ -318,7 +318,7 @@ This function only performs a filtering operation, and output retains
 the same structure and variables as the input passed from
 `include_gld_purchases()`. After these exclusions are made, the output
 is passed to `exclude_pregnancy()` for further censoring, described
-below:
+below.
 
 #### HbA1c tests and GLD purchases during pregnancy
 
@@ -330,8 +330,8 @@ these may be due to gestational diabetes, rather than type 1 or type 2
 diabetes.
 
 Internally, this relies on the function `get_pregnancy_dates()` that
-uses diagnoses registered in the National Patient Register to extract
-the dates of all recorded pregnancy endings (live births or
+uses diagnoses registered in LPR2 and LPR3 to extract
+the dates of all recorded pregnancy endings (live births and
 miscarriages). These are identified by `diagnosis` values beginning with
 "DO0[0-6]", "DO8[0-4]" or "DZ3[37]". The dates output by
 `get_pregnancy_dates()` are used to exclude all inclusion events

From 7bd69cbd449b6c88a7b1786cf3c14f3a4b86651e Mon Sep 17 00:00:00 2001
From: "Luke W. Johnston" <lwjohnst@gmail.com>
Date: Wed, 18 Dec 2024 22:41:47 +0100
Subject: [PATCH 25/28] docs: added lpr_diag algorithm logic to csv

---
 data-raw/algorithm.csv | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/data-raw/algorithm.csv b/data-raw/algorithm.csv
index dd9060e..28425e9 100644
--- a/data-raw/algorithm.csv
+++ b/data-raw/algorithm.csv
@@ -1,4 +1,6 @@
-name,logic
-hba1c,(analysiscode == 'NPU27300' AND value >= 48) OR (analysiscode == 'NPU03835' AND value >= 6.5)
-gld,atc =~ '^A10'
-
+register,name,title,logic,comments
+lab_forsker,hba1c,HbA1c inclusion,(analysiscode == 'NPU27300' AND value >= 48) OR (analysiscode == 'NPU03835' AND value >= 6.5),Is the IFCC units for NPU27300 and DCCT units for NPU03835
+lmdb,gld,Glucose-lowering drug inclusion,atc =~ '^A10' & !(atc =~ '^(A10BJ|A10BK01|A10BK03)'),Do not keep GLP-RAs or dapagliflozin/empagliflozin drugs
+lpr_diag,lpr2,LPR2 diabetes diagnoses codes,c_diag =~ '^(DO0[0-6]|DO8[0-4]|DZ3[37]|DE1[0-4]|249|250)' AND (c_diagtype == 'A' OR c_diagtype == 'B'),'A' c_diagtype means primary diagnosis.
+lpr_diag,lpr2_is_t1d,LPR2 diagnoses codes for T1D,c_diag =~ '^(DE10|249)',
+lpr_diag,lpr2_is_t2d,LPR2 diagnoses codes for T2D,c_diag =~ '^(DE11|250)',

From 73980ec6de645ab4e7dfb9638340b96986b2fe28 Mon Sep 17 00:00:00 2001
From: "Luke W. Johnston" <lwjohnst@gmail.com>
Date: Wed, 18 Dec 2024 22:42:21 +0100
Subject: [PATCH 26/28] docs: :memo: updated roxygen docs based on text from
 Anders

---
 R/include-gld-purchases.R | 35 ++++++++++++++++++++++++++++-------
 R/include-hba1c.R         | 25 +++++++++++++++++++------
 2 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/R/include-gld-purchases.R b/R/include-gld-purchases.R
index e37b19d..9516884 100644
--- a/R/include-gld-purchases.R
+++ b/R/include-gld-purchases.R
@@ -1,9 +1,30 @@
 #' Include only those who have a purchase of a glucose lowering drug (GLD).
 #'
-#' See [algorithm] for the logic used to filter these patients.
+#' But don't include glucose-lowering drugs that may be used for other
+#' conditions than diabetes like GLP-RAs or dapagliflozin/empagliflozin drugs.
+#' Since the diagnosis code data on pregnancies (see below) is insufficient to
+#' perform censoring prior to 1997, `include_gld_purchases()` only extracts
+#' dates from 1997 onward by default (if Medical Birth Register data is
+#' available to use for censoring, the extraction window can be extended).
+#'
+#' @param lmdb The `lmdb` register.
+#'
+#' @return The same type as the input data, default as a [tibble::tibble()], in
+#'   a long format with all dates of purchases kept and the following variables:
+#'
+#'   -   `pnr`: Personal identification variable.
+#'   -   `date`: The dates of all purchases of GLD.
+#'   -   `atc`: The ATC code for the type of drug.
+#'   -   `contained_doses`: The amount of doses purchased, in number of defined daily
+#'       doses (DDD).
+#'   -   `indication_code`: The indication code of the prescription (renamed from
+#'       `indo`).
+#'
+#'   These events are then passed to a chain of exclusion functions:
+#'   `exclude_potential_pcos()` and `exclude_pregnancy()`.
 #'
-#' @return The same type as the input data, default as a [tibble::tibble()].
 #' @keywords internal
+#' @inherit algorithm seealso
 #'
 #' @examples
 #' \dontrun{
@@ -18,16 +39,16 @@ include_gld_purchases <- function(lmdb) {
     column_names_to_lower() |>
     # Use !! to inject the expression into filter.
     dplyr::filter(!!criteria) |>
+    # `volume` is the doses contained in the purchased package and `apk` is the
+    # number of packages purchased
+    dplyr::mutate(contained_doses = .data$volume * .data$apk) |>
     # Keep only the columns we need.
     dplyr::select(
       "pnr",
       # Change to date to work with later functions.
       date = "eksd",
       "atc",
-      "volume",
-      "apk",
-      "indo",
-      "name",
-      "vnr"
+      "contained_doses",
+      "indication_code"
     )
 }
diff --git a/R/include-hba1c.R b/R/include-hba1c.R
index 802acc1..0ebfb64 100644
--- a/R/include-hba1c.R
+++ b/R/include-hba1c.R
@@ -1,24 +1,37 @@
 #' Include only those with HbA1c in the required range.
 #'
 #' In the `lab_forsker` register, NPU27300 is HbA1c in the modern units (IFCC)
-#' while NPU03835 is HbA1c in old units (DCCT).
+#' while NPU03835 is HbA1c in old units (DCCT). Multiple elevated results on the
+#' same day within each individual are deduplicated, to account for the same
+#' test result often being reported twice (one for IFCC, one for DCCT units).
 #'
-#' @param data The `lab_forsker` register.
+#' The output is passed to the `exclude_pregnancy()` function for
+#' filtering of elevated results due to potential gestational diabetes (see
+#' below).
+#'
+#' @param lab_forsker The `lab_forsker` register.
 #'
 #' @return An object of the same input type, default as a [tibble::tibble()],
-#'   with two columns: `pnr` and `included_hba1c`.
+#'   with three columns:
+#'
+#'   - `pnr`: Personal identification variable.
+#'   - `dates`: The dates of all elevated HbA1c test results.
+#'   - `included_hba1c`: A logical variable indicating that the HbA1c test
+#'   was included. Used as an indicator and reminder in other internal
+#'   functions.
+#'
 #' @keywords internal
 #'
 #' @examples
 #' \dontrun{
 #' register_data$lab_forsker |> include_hba1c()
 #' }
-include_hba1c <- function(data) {
-  verify_required_variables(data, "lab_forsker")
+include_hba1c <- function(lab_forsker) {
+  verify_required_variables(lab_forsker, "lab_forsker")
   criteria <- get_algorithm_logic("hba1c") |>
     # To convert the string into an R expression.
     rlang::parse_expr()
-  data |>
+  lab_forsker |>
     column_names_to_lower() |>
     # Use !! to inject the expression into filter.
     dplyr::filter(!!criteria) |>

From 94f1f1f17f85c6be13c5b6b49deed5b62e93712e Mon Sep 17 00:00:00 2001
From: "Luke W. Johnston" <lwjohnst@gmail.com>
Date: Wed, 18 Dec 2024 22:43:16 +0100
Subject: [PATCH 27/28] docs: :memo: add roxygen docs to algorithm data object

---
 R/osdc-package.R | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/R/osdc-package.R b/R/osdc-package.R
index 6efccea..96a8e76 100644
--- a/R/osdc-package.R
+++ b/R/osdc-package.R
@@ -29,7 +29,12 @@ utils::globalVariables(".data")
 #' Is a [tibble::tibble()] with two columns:
 #'
 #' \describe{
+#'  \item{register}{Optional. The register used for this criteria.}
 #'  \item{name}{The inclusion or exclusion criteria name.}
+#'  \item{title}{The title to use when displaying the algorithmic logic in tables.}
 #'  \item{logic}{The logic for the criteria.}
+#'  \item{comments}{Some additional comments on the criteria.}
 #' }
+#' @seealso See the `vignette("alogrithm")` and [algorithm] for the logic used
+#'   to filter these patients.
 "algorithm"

From cdfcea765d0cf5f6c4a50d2788a820853321bc0d Mon Sep 17 00:00:00 2001
From: "Luke W. Johnston" <lwjohnst@gmail.com>
Date: Wed, 18 Dec 2024 22:43:59 +0100
Subject: [PATCH 28/28] docs: :construction: began moving algorithm logic into
 separate file and created pseudocode

---
 vignettes/algorithm.Rmd     |  88 +++++++
 vignettes/function-flow.Rmd | 469 +++++++++++++++++++-----------------
 2 files changed, 333 insertions(+), 224 deletions(-)
 create mode 100644 vignettes/algorithm.Rmd

diff --git a/vignettes/algorithm.Rmd b/vignettes/algorithm.Rmd
new file mode 100644
index 0000000..92bb2d4
--- /dev/null
+++ b/vignettes/algorithm.Rmd
@@ -0,0 +1,88 @@
+---
+title: "Algorithm"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Algorithm}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+```{r setup}
+library(osdc)
+library(tidyverse)
+```
+
+## `lpr_diag`
+
+```{r, echo=FALSE}
+algorithm |>
+  filter(str_detect(register, "lpr_diag") |>
+  knitr::kable()
+```
+
+## `lpr_adm`
+
+-   `c_spec` (hospital department) categorize as "endocrinology" if it
+    equals 8 or as "other medical" if it is \< 8 or equals either 9 to
+    30. 
+
+## `diagnoser`
+
+-   `diagnosekode` starts with "DO0[0-6]", "DO8[0-4]", "DZ3[37]" or
+    "DE1[0-4]".
+    -   Is T1D if `diagnosekode` starts with "DE10".
+    -   Is T2D if `diagnosekode` starts with "DE11".
+-   `diagnosetype` is equal to either "A" or "B".
+    -   Is a primary diagnosis if it equals "A".
+-   `senere_afkraeftet` (if the diagnosis was later retracted) is equal
+    to "Nej".
+
+## `kontakter`
+
+-   `hovedspeciale_ans` (hospital department) is categorized as
+    "endocrinology" if it equals "medicinsk endokrinologi" or as "other
+    medical" if it equals any of "Blandet medicin og kirurgi", "Intern
+    medicin", "Geriatri", "Hepatologi", "Hæmatologi",
+    "Infektionsmedicin", "Kardiologi", "Medicinsk allergologi",
+    "Medicinsk gastroenterologi", "Medicinsk lungesygdomme",
+    "Nefrologi", "Reumatologi", "Palliativ medicin", "Akut medicin",
+    "Dermato-venerologi", "Neurologi", "Onkologi", "Fysiurgi", or
+    "Tropemedicin".
+
+## `lab_forsker`
+
+```{r, echo=FALSE}
+algorithm |>
+  filter(name == "hba1c") |>
+  knitr::kable()
+```
+
+## `ssyi` and `sssy`
+
+-   `speciale` starts with "54".
+    -   Alternatively, use `spec2` if available.
+-   `barnmark` (services provided to a child of the individual) is not
+    equal to 0.
+
+## `lmdb`
+
+```{r, echo=FALSE}
+algorithm |>
+  filter(name == "gld") |>
+  knitr::kable()
+```
+
+## `bef` and only GLD (via `lmdb`)
+
+To remove those with potential polycystic ovary syndrome:
+
+-   `atc` starts with "A10BA02" and `koen` is equal to 2 (woman) and
+    (`date` minus `foed_dato` (birth date) is less than 40 or
+    `indication_code` equals one of "0000092", "0000276" or "0000781")
diff --git a/vignettes/function-flow.Rmd b/vignettes/function-flow.Rmd
index c9d895f..b22de3d 100644
--- a/vignettes/function-flow.Rmd
+++ b/vignettes/function-flow.Rmd
@@ -90,235 +90,257 @@ exclusion events, respectively). Uncoloured boxes are helper functions
 that get or extract a condition or joins data or function
 outputs.](images/function-flow-population.svg)
 
-### Inclusion events
+## Inclusion events
 
 ```{r, include=FALSE}
 library(dplyr)
 library(osdc)
 ```
 
-#### Hospital diagnoses
-
-#### Joining LPR2 and LPR3 data
-
-The helper functions `join_lpr2()` and `join_lpr3()` join records of
-diagnoses to administrative information in LPR2-formatted and
-LPR3-formatted data, respectively.
-
-`join_lpr2()` takes `lpr_diag` and `lpr_adm` as inputs, filters to the
-necessary diagnoses (`c_diag` starting with "DO0[0-6]", "DO8[0-4]",
-"DZ3[37]", "DE1[0-4]", "249", or "250"), joins the required information
-by record number (`recnum`), and outputs a `data.frame` with the
-following variables:
-
--   `pnr`: identifier variable
--   `date`: date of the recorded diagnosis (renamed from `d_inddto`)
--   `specialty`: department specialty (renamed from `c_spec`)
--   `diagnosis_code`: diagnosis code (renamed from `c_diag`)
--   `diagnosis_type`: diagnosis type (renamed from `c_diagtype`)
-
-`join_lpr3()` takes `diagnoser` and `kontakter` as inputs, filters to
-the necessary diagnoses (`diagnosekode` starting with "DO0[0-6]",
-"DO8[0-4]", "DZ3[37]" or "DE1[0-4]"), joins the required information by
-record number (`dw_ek_kontakt`), and outputs a `data.frame` with the
-following variables:
-
--   `pnr`: identifier variable (renamed from `cpr`)
--   `date`: date of the recorded diagnosis (renamed from `dato_start`)
--   `specialty`: department specialty (renamed from `hovedspeciale_ans`)
--   `diagnosis_code`: diagnosis code (renamed from `diagnosekode`)
--   `diagnosis_type`: diagnosis type (renamed from `diagnosetype`)
--   `diagnosis_retracted`: if the diagnosis was later retracted (renamed
-    from `senere_afkraeftet`)
-
-These outputs are passed to `include_diabetes_diagnoses()` (and to
-`get_pregnancy_dates()`, see exclusion events) for further processing
-below.
-
-#### Processing of diabetes diagnoses
-
-The function `include_diabetes_diagnoses()` uses the hospital contacts
-from LPR2 and LPR3 to include all dates of diabetes diagnoses to use for
-inclusion, as well as additional information needed to classify diabetes
-type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.
-
-The function takes the outputs of `join_lpr2()` and `join_lpr3()` as
-inputs and processes each input separately to generate the following
-internal variables:
-
--   From `join_lpr2`:
-    -   `pnr`: identifier variable
-    -   `date`: dates of all included diabetes diagnoses:
-    -   registered as primary (A) or secondary (B) diagnoses, regardless
-        of type or department:
-        -   Keep rows where `diagnosis` starts with "DE1[0-4]", "249" or
-            "250", and `diagnosis_type` is either "A" or "B"
-    -   `is_primary`: Define whether the diagnosis was a primary
-        diagnosis (`diagnosis_type` == "A")
-    -   `is_t1d`: Define whether the diagnosis was T1D-specific
-        (`diagnosis` starts with "DE10" or "249")
-    -   `is_t2d`: Define whether the diagnosis was T2D-specific
-        (`diagnosis` starts with "DE11" or "250")
-    -   `department`: Define whether the diagnosis was made made by an
-        endocrinological (if `specialty` == 8 then `department` ==
-        "endocrinology") or other medical department (if `specialty` \<
-        8 or 9-30 then `department` == "other medical")
--   From `join_lpr3()`:
-    -   `pnr`: identifier variable
-    -   `date`: dates of all included diabetes diagnoses:
-    -   registered as primary (A) or secondary (B) diagnoses, regardless
-        of type or department, but exclude retracted diagnoses:
-        -   Keep rows where `diagnosis` starts with "DE1[0-4]",
-            `diagnosis_type` is either "A" or "B" and
-            `diagnosis_retracted` == "Nej"
-    -   `is_primary`: Define whether the diagnosis was a primary
-        diagnosis (`diagnosis_type` == "A")
-    -   `is_t1d`: Define whether the diagnosis was T1D-specific
-        (`diagnosis` starts with "DE10")
-    -   `is_t2d`: Define whether the diagnosis was T2D-specific
-        (`diagnosis` starts with "DE11")
-    -   `department`: Define whether the diagnosis was made made by an
-        endocrinological department (if `specialty` == "medicinsk
-        endokrinologi" then `department` == "endocrinology") or other
-        medical department (if `specialty` is any of "Blandet medicin og
-        kirurgi", "Intern medicin", "Geriatri", "Hepatologi",
-        "Hæmatologi", "Infektionsmedicin", "Kardiologi", "Medicinsk
-        allergologi", "Medicinsk gastroenterologi", "Medicinsk
-        lungesygdomme", "Nefrologi", "Reumatologi", "Palliativ medicin",
-        "Akut medicin", "Dermato-venerologi", "Neurologi", "Onkologi",
-        "Fysiurgi", or "Tropemedicin" then `department` == "other
-        medical")
-
-Internally, these intermediate results are combined and processed
-together. And ultimately, `include_diabetes_diagnoses()` outputs a
-single `data.frame` with the following variables (up to two rows per
-individual):
-
--   `pnr`: identifier variable
--   `dates`: dates of the first and second hospital diabetes diagnosis
--   `n_t1d_endocrinology`: number of type 1 diabetes-specific primary
-    diagnosis codes from endocrinological departments
--   `n_t2d_endocrinology`: number of type 2 diabetes-specific primary
-    diagnosis codes from endocrinological departments
--   `n_t1d_medical`: number of type 1 diabetes-specific primary
-    diagnosis codes from medical departments
--   `n_t2d_medical`: number of type 2 diabetes-specific primary
-    diagnosis codes from medical departments
-
-This output is passed to the `join_inclusions()` function, where the
-`dates` variable is used for the final step of the inclusion process.
-The variables of counts of diabetes type-specific primary diagnoses (the four columns prefixed `n_` above) are
-carried over for the subsequent classification of diabetes type,
-initially as inputs to the `get_t1d_primary_diagnosis()` and
-`get_majority_of_t1d_diagnoses()` functions.
-
-#### Diabetes-specific podiatrist services
-
-The function `include_podiatrist_services()` uses `sysi` or `sssy` as
-input to extract the dates of all diabetes-specific podiatrist services.
-
-These dates are extracted by filtering values beginning with "54" in the
-`speciale` variable of the `sssy` and `sysi` registers by default
-(alternatively, the function can take the `spec2` variable as input
-instead, if that is the data available to the user). In addition,
-services provided to a child of the individual (`barnmak` != 0) are
-excluded using the `barnmak` variable. An internal helper function
-`get_unique_honuge_dates()` is applied to generate a proper date
-variable based on the year-week (wwyy-formatted) variable (`honuge`)
-found in the raw data, and de-duplicates multiple services registered on
-the same date.
-
-`include_podiatrist_services()` outputs a 2-column data frame with up to
-two rows for each individual, containing the following variables:
-
--   `pnr`: identifier variable
--   `date`: the dates of the first and second diabetes-specific
-    podiatrist record
-
-The output is passed to the `join_inclusions()` function for the final
-step of the inclusion process.
-
-#### HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
-
-The function `include_hba1c()` uses `lab_forsker` as the input data to
-extract the dates of all elevated HbA1c test results, using the
-appropriate cut-offs:
-
--   IFCC units: `analysiscode` NPU27300, any `value` $\geq$ 48 mmol/mol
--   DCCT units: `analysiscode` NPU03835: any `value` $\geq$ 6.5% .
-
-```{r, echo=FALSE}
-algorithm |>
-  filter(name == "hba1c") |>
-  knitr::kable(caption = "Algorithm used in the implementation for including HbA1c.")
+### `join_lpr2()`
+
+```{r}
+#' Process and join the two LPR2 registers to extract diabetes diagnoses data.
+#'
+#' The output is used as inputs to `include_diabetes_diagnoses()` (and to
+#' `get_pregnancy_dates()`, see exclusion events).
+#'
+#' @param lpr_diag The LPR2 register containing diabetes diagnoses.
+#' @param lpr_adm The LPR2 register containing hospital admissions.
+#'
+#' @return The same type as the input data, default as a [tibble::tibble()],
+#'  with the following columns:
+#'
+#'  -   `pnr`: The personal identification variable.
+#'  -   `date`: The date of all the recorded diagnosis (renamed from `d_inddto`).
+#'  -   `is_primary_diagnosis`: Whether the diagnosis was a primary diagnosis.
+#'  -   `is_t1d`: Whether the diagnosis was T1D-specific.
+#'  -   `is_t2d`: Whether the diagnosis was T2D-specific.
+#'  -   `department`: Whether the diagnosis was made made by an
+#'      endocrinology or other medical department.
+#'
+#' @keywords internal
+#' @inherit algorithm seealso
+#'
+#' @examples
+#' join_lpr2(
+#'   lpr_diag = register_data$lpr_diag,
+#'   lpr_adm = register_data$lpr_adm
+#' )
+join_lpr2 <- function(lpr_diag, lpr_adm) {
+  # Filter using the algorithm for LPR2
+  lpr_diag |>
+    # join(lpr_adm, by = "recnum") |>
+    dplyr::select(
+      pnr,
+      date = d_inddto
+      # is_primary_diagnosis =
+      # is_t1d =
+      # is_t2d =
+      # department =
+    )
+}
 ```
 
-Multiple elevated results on the same day within each individual are
-deduplicated, to account for the same test result often being reported
-twice (one for IFCC, one for DCCT units).
-
-`include_hba1c()` outputs a 2-column data frame containing the following
-variables:
-
--   `pnr`: identifier variable
--   `dates`: the dates of all elevated HbA1c test results
-
-The output is passed to the `exclude_pregnancy()` function for censoring
-of elevated results due to potential gestational diabetes (see below).
-
-#### GLD purchases
-
-The function `include_gld_purchases()` uses `lmdb` to extract the dates
-of all GLD purchases.
+### `join_lpr3()`
+
+```{r}
+#' Process and join the two LPR3 registers to extract diabetes diagnoses data.
+#'
+#' The output is used as inputs to `include_diabetes_diagnoses()` (and to
+#' `get_pregnancy_dates()`, see exclusion events).
+#'
+#' @param diagnoser The LPR3 register containing diabetes diagnoses.
+#' @param kontakter The LPR3 register containing hospital contacts/admissions.
+#'
+#' @return The same type as the input data, default as a [tibble::tibble()],
+#'  with the following columns:
+#'
+#'  -   `pnr`: The personal identification variable.
+#'  -   `date`: The date of all the recorded diagnosis (renamed from `d_inddto`).
+#'  -   `is_primary_diagnosis`: Whether the diagnosis was a primary
+#'      diagnosis.
+#'  -   `is_t1d`: Whether the diagnosis was T1D-specific
+#'  -   `is_t2d`: Whether the diagnosis was T2D-specific.
+#'  -   `department`: Define whether the diagnosis was made made by an
+#'      endocrinology department.
+#'
+#' @keywords internal
+#' @inherit algorithm seealso
+#'
+#' @examples
+#' join_lpr3(
+#'   diagnoser = register_data$diagnoser,
+#'   kontakter = register_data$kontakter
+#' )
+join_lpr3 <- function(diagnoser, kontakter) {
+  # Filter using the algorithm for LPR3
+  diagnoser |>
+    # join(kontakter, by = "dw_ek_kontakt") |>
+    dplyr::select(
+      "pnr" = "cpr",
+      "date" = "dato_start"
+      # is_primary_diagnosis =
+      # is_t1d =
+      # is_t2d =
+      # department =
+    )
+}
+```
 
-These dates are extracted by including all values beginning with "A10"
-in the `atc` variable of the `lmdb` register, except for
-glucose-lowering drugs that may be used for other conditions than
-diabetes: GLP-RAs (`atc` start with "A10BJ") or
-dapagliflozin/empagliflozin (`atc` = "A10BK01" or "A10BK03").
+### `include_diabetes_diagnosis()`
+
+```{r}
+#' Include diabetes diagnoses from LPR2 and LPR3.
+#'
+#' Uses the hospital contacts from LPR2 and LPR3 to include all dates of diabetes
+#' diagnoses to use for inclusion, as well as additional information needed to classify diabetes
+#' type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.
+#'
+#' The output is used as inputs to `join_inclusions()`.
+#' This output is passed to the `join_inclusions()` function, where the
+#' `dates` variable is used for the final step of the inclusion process.
+#' The variables of counts of diabetes type-specific primary diagnoses (the
+#' four columns prefixed `n_` above) are carried over for the subsequent
+#' classification of diabetes type, initially as inputs to the
+#' `get_t1d_primary_diagnosis()` and `get_majority_of_t1d_diagnoses()`
+#' functions.
+#'
+#' @param lpr2 The output from `join_lpr2()`.
+#' @param lpr3 The output from `join_lpr3()`.
+#'
+#' @return The same type as the input data, default as a [tibble::tibble()],
+#'  with the following columns and up to two rows per individual:
+#'
+#'  -   `pnr`: The personal identification variable.
+#'  -   `dates`: The dates of the first and second hospital diabetes diagnosis.
+#'  -   `n_t1d_endocrinology`: The number of type 1 diabetes-specific primary
+#'      diagnosis codes from endocrinology departments.
+#'  -   `n_t2d_endocrinology`: The number of type 2 diabetes-specific primary
+#'      diagnosis codes from endocrinology departments.
+#'  -   `n_t1d_medical`: The number of type 1 diabetes-specific primary
+#'      diagnosis codes from medical departments.
+#'  -  `n_t2d_medical`: The number of type 2 diabetes-specific primary
+#'      diagnosis codes from medical departments.
+#'
+#' @keywords internal
+#' @inherit algorithm seealso
+#'
+#' @examples
+#' include_diabetes_diagnosis(
+#'   lpr2 = join_lpr2(register_data$lpr_diag, register_data$lpr_adm),
+#'   lpr3 = join_lpr3(register_data$diagnoser, register_data$kontakter)
+#' )
+include_diabetes_diagnosis <- function(lpr2, lpr3) {
+  # Combine and process the two inputs
+  lpr2 |>
+    dplyr::full_join(lpr3, by = "pnr") |>
+    dplyr::select(
+      "pnr",
+      "dates" = "date"
+      # n_t1d_endocrinology =
+      # n_t2d_endocrinology =
+      # n_t1d_medical =
+      # n_t2d_medical =
+    )
+}
+```
 
-Since the diagnosis code data on pregnancies (see below) is insufficient
-to perform censoring prior to 1997, `include_gld_purchases()` only
-extracts dates from 1997 onward by default (if Medical Birth Register
-data is available to use for censoring, the extraction window can be
-extended).
+### `include_podiatrist_services()`
+
+```{r}
+#' Include diabetes-specific podiatrist services.
+#'
+#' Uses the `sysi` or `sssy` registers as input to extract the dates of all
+#' diabetes-specific podiatrist services. Removes duplicate services on the
+#' same date
+#'
+#' The output is passed to the `join_inclusions()` function for the final
+#' step of the inclusion process.
+#'
+#' @return The same type as the input data, default as a [tibble::tibble()],
+#'   with two columns and up to two rows for each individual:
+#'
+#'   -   `pnr`: identifier variable
+#'   -   `date`: the dates of the first and second diabetes-specific
+#'       podiatrist record
+#'
+#' @keywords internal
+#' @inherit algorithm seealso
+#'
+#' @examples
+#' include_podiatrist_services(register_data$sssy, register_data$sysi)
+include_podiatrist_services <- function(sssy, sysi) {
+  # Filter using the algorithm for podiatrist services
+  sssy |>
+    dplyr::full_join(sysi, by = dplyr::join_by(pnr, barnmak, speciale, honuge)) |>
+    # Filtering...
+    dplyr::select(
+      pnr,
+      date = tidy_honuge_dates(honuge)
+    ) |>
+    # Remove duplicate multiple services on the same date
+    dplyr::distinct()
+}
+```
 
-This function outputs a long `data.frame` (since all dates of purchases
-must be kept for later use in classifying diabetes type) with the
-following variables needed later in the classification part of the
-function flow:
+```{r}
+#' Converts the "WWYY" date format to the ISO8601 standard date format.
+#'
+#' Since the original date format ("WWYY") doesn't include a day, we assume it
+#' would be the first day of that week.
+#'
+#' @param date The date variable in the format "WWYY".
+#'
+#' @returns A character vector of unique dates in the format "YYYY-MM-DD".
+#' @keywords internal
+#' @inherit algorithm seealso
+#'
+#' @examples
+#' wwyy_to_yyymmdd(c("0452", "5302", "3232"))
+wwyy_to_yyyymmdd <- function(date) {
+  # Process the honuge variable to get a proper date variable
+  date
+}
+```
 
--   `pnr`: identifier variable
--   `date`: dates of all purchases of GLD (renamed from `eksd`)
--   `atc`: type of drug
--   `contained_doses`: amount purchased, in number of defined daily
-    doses (DDD). Calculated as `volume` (doses contained in the
-    purchased package) times `apk` (number of packages purchased)
--   `indication_code`: indication code of the prescription (renamed from
-    `indo`)
-
-These events are then passed to a chain of exclusion functions:
-`exclude_potential_pcos()` and `exclude_pregnancy()` described in the
-sections below.
-
-### Exclusion events
-
-#### Metformin purchases potentially for the treatment of polycystic ovary syndrome
-
-The function `exclude_potential_pcos()` takes the output from
-`include_gld_purchases()` and `bef` (information on sex and date of
-birth) as inputs and censors (filters out) all purchases of metformin in
-women below age 40 at the date of purchase (`atc` = "A10BA02" & `sex` =
-"woman" & age at purchase (`date`-`date_of_birth`) \< 40 years) or an
-indication code suggesting the prescription was made for treatment of
-polycystic ovary syndrome (`atc` = "A10BA02" & `sex` = "woman" &
-`indication_code` either of "0000092", "0000276" or "0000781").
-
-This function only performs a filtering operation, and output retains
-the same structure and variables as the input passed from
-`include_gld_purchases()`. After these exclusions are made, the output
-is passed to `exclude_pregnancy()` for further censoring, described
-below.
+### `include_hba1c()`
+
+See `?include_hba1c` for more information.
+
+### `include_gld_purchases()`
+
+See `?include_gld_purchases` for more information.
+
+## Exclusion events
+
+### `exclude_potential_pcos()`
+
+```{r}
+#' Exclude metformin purchases potentially for the treatment of polycystic ovary syndrome.
+#'
+#' Takes the output from `include_gld_purchases()` and `bef` (information on sex and date of birth) to do the exclusions.
+#' This function only performs a filtering operation so outputs the same structure and variables as the input from `include_gld_purchases()`.
+#' After these exclusions are made, the output is used by `exclude_pregnancy()`.
+#'
+#' @param gld_purchases The output from `include_gld_purchases()`.
+#' @param bef The `bef` register.
+#'
+#' @return The same type as the input data, default as a [tibble::tibble()]. Also has the same columns as `include_gld_purchases()`.
+#' @keywords internal
+#' @inherit algorithm seealso
+#'
+#' @examples
+#' exclude_potential_pcos(
+#'   gld_purchases = include_gld_purchases(register_data$lmdb),
+#'   bef = register_data$bef
+#' )
+exclude_potential_pcos <- function(gld_purchases, bef) {
+  # Filter using the algorithm for potential PCOS
+  gld_purchases |>
+    dplyr::full_join(bef, by = dplyr::join_by(.data$pnr))
+}
+```
 
 #### HbA1c tests and GLD purchases during pregnancy
 
@@ -330,13 +352,12 @@ these may be due to gestational diabetes, rather than type 1 or type 2
 diabetes.
 
 Internally, this relies on the function `get_pregnancy_dates()` that
-uses diagnoses registered in LPR2 and LPR3 to extract
-the dates of all recorded pregnancy endings (live births and
-miscarriages). These are identified by `diagnosis` values beginning with
-"DO0[0-6]", "DO8[0-4]" or "DZ3[37]". The dates output by
-`get_pregnancy_dates()` are used to exclude all inclusion events
-registered between 40 weeks before and 12 weeks after a pregnancy
-ending.
+uses diagnoses registered in LPR2 and LPR3 to extract the dates of all
+recorded pregnancy endings (live births and miscarriages). These are
+identified by `diagnosis` values beginning with "DO0[0-6]", "DO8[0-4]"
+or "DZ3[37]". The dates output by `get_pregnancy_dates()` are used to
+exclude all inclusion events registered between 40 weeks before and 12
+weeks after a pregnancy ending.
 
 After these exclusion functions have been applied, the output serves as
 inputs to two sets of functions: