Separating individual functions in cohort diagnostics #1043

mdlavallee92 · 2023-04-04T21:29:10Z

A use case I have for cohort diagnostics is running one of the low-level methods of cohort diagnostics such as getIncidenceRate or getVisitContext for a single cohort instead of executing the entire executeDiagnostics or even for a set of cohorts. Is it within the development scope of Cohort Diagnostics to expose these individual functions?

for example:

#CDM variables
connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql", 
  server = keyring::key_get("cdm_server"), 
  user = keyring::key_get("cdm_user"), 
  password = keyring::key_get("cdm_password"), 
  port = keyring::key_get("cdm_port"), 
)
cdmDatabaseSchema <- keyring::key_get("cdmSchema")
vocabularyDatabaseSchema <- keyring::key_get("cdmSchema")
cohortDatabaseSchema <- keyring::key_get("writeSchema")
cohortTable <- "cohort"

connectionTest <- DatabaseConnector::connect(connectionDetails)


#create cohort tables 
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = cohortTable)
# Next create the tables on the database
CohortGenerator::createCohortTables(
  connection = connectionTest,
  cohortTableNames = cohortTableNames,
  cohortDatabaseSchema = cohortDatabaseSchema,
  incremental = FALSE
)
# Import cohort csv (id | json | sql)
cohortDefinitionSet <- readr::read_csv("cohortsToCreate.csv")

# Generate the cohort set
CohortGenerator::generateCohortSet(
  connection = connectionTest,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTableNames = cohortTableNames,
  cohortDefinitionSet = cohortDefinitionSet,
  incremental = FALSE
)

# get incidence rate
ir <- getIncidenceRate(
  connection = connectionTest,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTable = cohortTableNames$cohort,
  cdmDatabaseSchema = cdmDatabaseSchema,
  vocabularyDatabaseSchema = vocabularyDatabaseSchema,
  tempEmulationSchema = NULL,
  cohortId = cohortDefinitionSet$cohortId
)

plotIr(ir) # my example plotting function for report

The text was updated successfully, but these errors were encountered:

mdlavallee92 · 2023-04-04T21:35:34Z

@azimov we talked about this idea back at the symposium and I have revisited it now. In a fork I have individualized some of these functions. I have tried to add unit tests to the proposed exposed functions. However I am encountering an issue in the testing environment with SqlRender::loadRenderTranslateSql.

Take this code-block. In the testing environment it returns an error even though I know this file exists.

Cannot find 'GetCalendarYearRange.sql' in the 'sql' or 'sql/sql_server' folder of the 'CohortDiagnostics' package.

To avoid this issue in run tests, I unpacked the SqlRender::loadRenderTranslateSql function example below:

  sql <- fs::path_package("CohortDiagnostics", "sql/sql_server/GetCalendarYearRange.sql") %>%
    readr::read_file() %>%
    SqlRender::render(cdm_database_schema = cdmDatabaseSchema) %>%
    SqlRender::translate(targetDialect = connection@dbms)

but that would require editing a lot of code-blocks in CohortDiagnostics. Have you encountered this issue in tests?

azimov · 2023-04-04T23:20:28Z

@mdlavallee92 Is this using devtools::test? devtools::load_all will break the SqlRender loader for some reason related to how it finds the files.

An easy workaround on linux/mac is to create the symbolic link ln -s inst/sql sql or a shortcut folder from inst/sql to the root folder of the project tree.

Doing this should also work with devtools::load_all()

If you're using RCmd check in R studio (a setting you must adjust, it will try devtools by default I think) the tests should then normally find the SQL files.

I was unaware of fs::path_package adding this into SqlRender directly may fix this issue in place of system.file.

azimov · 2023-04-04T23:27:59Z

@mdlavallee92 - To the main point of this question. I do intend to expose the functions for use outside of the package, however I can't give a clear timeline. I would also vastly prefer to write unit tests for all the functions directly because the current test suite takes a long time to run and errors are often not especially meaningful.

So the steps I would like to see are:

Decide on which functions should be exposed
Add checkmate parameter checking for all functions to expose
Add all required docstrings
Add good unit tests for these functions (ideally ones that actually run quickly)

Until then It should be possible for you to call these functions with triple colon calls,::: e.g. CohortDiagnostics:::getIncidenceRate(...) until this happens. But doing this it will not be possible to guarantee any kind of API stability.

mdlavallee92 · 2023-04-05T13:18:09Z

@mdlavallee92 Is this using devtools::test? devtools::load_all will break the SqlRender loader for some reason related to how it finds the files.

An easy workaround on linux/mac is to create the symbolic link ln -s inst/sql sql or a shortcut folder from inst/sql to the root folder of the project tree.

Doing this should also work with devtools::load_all()

If you're using RCmd check in R studio (a setting you must adjust, it will try devtools by default I think) the tests should then normally find the SQL files.

I was unaware of fs::path_package adding this into SqlRender directly may fix this issue in place of system.file.

Yeah that is what I am talking about the devtools::test. I tried to swap out system.file with fs::path_package in a SqlRender fork and it still leads to the same issue, not being able to find the file. Probably an issue with devtools and not SqlRender. It seems either function (system.file or fs::path_package ) will work if placed directly in the test but not if sourcing from a function from the SqlRender package, weird. I am working on a windows computer but I will try the shortcut folder see if that works within the tests. Thanks! Back to the primary issue I can help you with this checklist.

mdlavallee92 · 2023-04-05T14:34:07Z

Context
When I have used CohortDiagnostics (CD), I think of it as a pipeline rather than an R package because it a) takes in settings, b) runs the options based on the input settings, c) is "chatty" meaning it tells the user what happened and d) dumps output to a folder to lend itself to a post-processing/visualization step (i.e. your cool RMM pkg). The reason I bring this up is that I think it would be helpful to provide users access to the "worker" methods so they can layer their diagnostics pipeline. For example, I want to run incidence, concept set diagnostics and cohort relationship on a few cohorts. Maybe I want to add my custom diagnostic step to this. I can access the suggested "workers" and build my own pipeline script. By saying this, I don't intend to suggest scrapping the global execute executeDiagnostics. This function is very useful and I use it a lot already since it does everything for me. It is also a target output for your very cool RMM package making it easy to build a dashboad off a few lines of code. Going back to the pipeline mindset, I think this would help unit testing, debugging and also usership. Thinking of the pipeline also gives me ideas of what to target in terms of key functions to expose.

Pipeline

From the executeDiagnostics wrapper, these are the steps I see:

Setup - accumulate settings, checkmate, start logger, prep cohortDefinitionSet
Cdm Info
ObservationPeriod Range
Create concept table - note this is needed to run the concept set diagnostics and visit context
Get Cohort Counts - single function comes from CohortGenerator
Get Inclusion Stats - single function comes from CohortGenerator::getCohortStats
Run Concept Set Diagnostics
- runIncludedSourceConcept
- runOrphanConcepts
- runBreakdownEvents
Get time series
Get Visit Context - note relies on concept set diagnostics run and concept table temp
Get Incidence Rate
Get Cohort Relationship (overlap in the shiny)
Cohort Characterization
- run baseline
- run temporal (your chronograph looking tables, -999 to 999 time windows)
Clean-up - drop temp concept table, write tables as csv and zip

In steps 5-12 there are other wrappers to functionalize the "worker" method on multiple cohort definitions, prep them for saving, provide logging information and export. There are a few exceptions such as
cohort relationship which requires the enitre set to work.

Suggestion: Expose the singular diagnostic methods in the pipeline

Some of these methods are already exposed (runCohortTimeSeriesDiagnostics), others have a dedictated function but are not exposed or documented (i.e.getIncidenceRate) and others are nested within other functions (i.e. concept set diagnostic functions). This would be my list of functions to expose:

get ObservationPeriod range (nested in executeDiagnostics)
prep Concept table (nested in executeDiagnostics and runConceptSetDiagnostics) - note this is a dependency for the concept set diagnostics and visit context steps, should be handled as a special case
getIncidenceRate (function exists, not exposed)
getVisitContext (function exists, not exposed)
runCohortTimeSeriesDiagnostics (already done)
runCohortRelationshipDiagnostics (already done)
runIncludedSourceConcepts (nested in runConceptSetDiagnostics)
runBreakdownIndexEvents (nested in runConceptSetDiagnostics)
runOrphanConcepts (nested in runConceptSetDiagnostics)

Other steps in the pipeline are already functions from other packages....cohorts counts and inclusion stats from Cohort Generator and Cohort Characterization from FeatureExtraction

What do you think @azimov?

ablack3 · 2023-04-05T15:09:14Z

I was unaware of fs::path_package adding this into SqlRender directly may fix this issue in place of system.file.

You could also try pkgload:::shim_system.file() in loadRenderTranslateSql which should work with devtools::load_all.

mdlavallee92 · 2023-04-05T20:14:21Z

Here is the start of the idea I have, needs some further polishing (checkmates etc) but lot of the code stays the same. Just a matter of disentangling the dependencies, particularly with the concept set diagnostics.

In the process of building the unit tests to make sure these work in isolation. Started testing in Eunomia on cohort 17492 from the cohorts to create tests you already have in there.

Take a look and see what you think! No worries if this is in the wrong direction.

azimov · 2023-04-05T23:29:05Z

@mdlavallee92 I completley agree with the principle of separating this out into distinct parts of a pipeline. Part of the reasoning behind RMM was to allow people to begin extending Ohdsi result sets in a consistent manner.

I can imagine someone extending the pipeline of CohortDiagnostics pretty trivially by adding their own diagnostics functions that they maybe aren't ready (or willing) to share with the wider world but want to include in their results.

Just adding a function and exporting the results according to their model specification and uploaded. With the changes to OhdsiShinyModules and Jenna's ShinyAppBuilder package you can customize which shiny views are added and add your own custom modules that should slot in. Indeed, this is part of the thinking behind the Strageus package, though I expect that the implementation there will not currently meet these needs.

mdlavallee92 added the question Further information is requested label Apr 4, 2023

azimov added the v 4.0.0 label Oct 1, 2024

azimov added this to the Release version 4.0.0 milestone Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separating individual functions in cohort diagnostics #1043

Separating individual functions in cohort diagnostics #1043

mdlavallee92 commented Apr 4, 2023 •

edited

Loading

mdlavallee92 commented Apr 4, 2023

azimov commented Apr 4, 2023

azimov commented Apr 4, 2023

mdlavallee92 commented Apr 5, 2023

mdlavallee92 commented Apr 5, 2023

ablack3 commented Apr 5, 2023 •

edited

Loading

mdlavallee92 commented Apr 5, 2023

azimov commented Apr 5, 2023

Separating individual functions in cohort diagnostics #1043

Separating individual functions in cohort diagnostics #1043

Comments

mdlavallee92 commented Apr 4, 2023 • edited Loading

mdlavallee92 commented Apr 4, 2023

azimov commented Apr 4, 2023

azimov commented Apr 4, 2023

mdlavallee92 commented Apr 5, 2023

mdlavallee92 commented Apr 5, 2023

ablack3 commented Apr 5, 2023 • edited Loading

mdlavallee92 commented Apr 5, 2023

azimov commented Apr 5, 2023

mdlavallee92 commented Apr 4, 2023 •

edited

Loading

ablack3 commented Apr 5, 2023 •

edited

Loading