Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating individual functions in cohort diagnostics #1043

Open
mdlavallee92 opened this issue Apr 4, 2023 · 8 comments
Open

Separating individual functions in cohort diagnostics #1043

mdlavallee92 opened this issue Apr 4, 2023 · 8 comments
Labels
question Further information is requested v 4.0.0

Comments

@mdlavallee92
Copy link
Collaborator

mdlavallee92 commented Apr 4, 2023

A use case I have for cohort diagnostics is running one of the low-level methods of cohort diagnostics such as getIncidenceRate or getVisitContext for a single cohort instead of executing the entire executeDiagnostics or even for a set of cohorts. Is it within the development scope of Cohort Diagnostics to expose these individual functions?

for example:

#CDM variables
connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql", 
  server = keyring::key_get("cdm_server"), 
  user = keyring::key_get("cdm_user"), 
  password = keyring::key_get("cdm_password"), 
  port = keyring::key_get("cdm_port"), 
)
cdmDatabaseSchema <- keyring::key_get("cdmSchema")
vocabularyDatabaseSchema <- keyring::key_get("cdmSchema")
cohortDatabaseSchema <- keyring::key_get("writeSchema")
cohortTable <- "cohort"

connectionTest <- DatabaseConnector::connect(connectionDetails)


#create cohort tables 
cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = cohortTable)
# Next create the tables on the database
CohortGenerator::createCohortTables(
  connection = connectionTest,
  cohortTableNames = cohortTableNames,
  cohortDatabaseSchema = cohortDatabaseSchema,
  incremental = FALSE
)
# Import cohort csv (id | json | sql)
cohortDefinitionSet <- readr::read_csv("cohortsToCreate.csv")

# Generate the cohort set
CohortGenerator::generateCohortSet(
  connection = connectionTest,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTableNames = cohortTableNames,
  cohortDefinitionSet = cohortDefinitionSet,
  incremental = FALSE
)

# get incidence rate
ir <- getIncidenceRate(
  connection = connectionTest,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTable = cohortTableNames$cohort,
  cdmDatabaseSchema = cdmDatabaseSchema,
  vocabularyDatabaseSchema = vocabularyDatabaseSchema,
  tempEmulationSchema = NULL,
  cohortId = cohortDefinitionSet$cohortId
)

plotIr(ir) # my example plotting function for report

@mdlavallee92 mdlavallee92 added the question Further information is requested label Apr 4, 2023
@mdlavallee92
Copy link
Collaborator Author

@azimov we talked about this idea back at the symposium and I have revisited it now. In a fork I have individualized some of these functions. I have tried to add unit tests to the proposed exposed functions. However I am encountering an issue in the testing environment with SqlRender::loadRenderTranslateSql.

Take this code-block. In the testing environment it returns an error even though I know this file exists.

Cannot find 'GetCalendarYearRange.sql' in the 'sql' or 'sql/sql_server' folder of the 'CohortDiagnostics' package.

To avoid this issue in run tests, I unpacked the SqlRender::loadRenderTranslateSql function example below:

  sql <- fs::path_package("CohortDiagnostics", "sql/sql_server/GetCalendarYearRange.sql") %>%
    readr::read_file() %>%
    SqlRender::render(cdm_database_schema = cdmDatabaseSchema) %>%
    SqlRender::translate(targetDialect = connection@dbms)

but that would require editing a lot of code-blocks in CohortDiagnostics. Have you encountered this issue in tests?

@azimov
Copy link
Collaborator

azimov commented Apr 4, 2023

@mdlavallee92 Is this using devtools::test? devtools::load_all will break the SqlRender loader for some reason related to how it finds the files.

An easy workaround on linux/mac is to create the symbolic link ln -s inst/sql sql or a shortcut folder from inst/sql to the root folder of the project tree.

Doing this should also work with devtools::load_all()

If you're using RCmd check in R studio (a setting you must adjust, it will try devtools by default I think) the tests should then normally find the SQL files.

I was unaware of fs::path_package adding this into SqlRender directly may fix this issue in place of system.file.

@azimov
Copy link
Collaborator

azimov commented Apr 4, 2023

@mdlavallee92 - To the main point of this question. I do intend to expose the functions for use outside of the package, however I can't give a clear timeline. I would also vastly prefer to write unit tests for all the functions directly because the current test suite takes a long time to run and errors are often not especially meaningful.

So the steps I would like to see are:

  • Decide on which functions should be exposed
  • Add checkmate parameter checking for all functions to expose
  • Add all required docstrings
  • Add good unit tests for these functions (ideally ones that actually run quickly)

Until then It should be possible for you to call these functions with triple colon calls,::: e.g. CohortDiagnostics:::getIncidenceRate(...) until this happens. But doing this it will not be possible to guarantee any kind of API stability.

@mdlavallee92
Copy link
Collaborator Author

@mdlavallee92 Is this using devtools::test? devtools::load_all will break the SqlRender loader for some reason related to how it finds the files.

An easy workaround on linux/mac is to create the symbolic link ln -s inst/sql sql or a shortcut folder from inst/sql to the root folder of the project tree.

Doing this should also work with devtools::load_all()

If you're using RCmd check in R studio (a setting you must adjust, it will try devtools by default I think) the tests should then normally find the SQL files.

I was unaware of fs::path_package adding this into SqlRender directly may fix this issue in place of system.file.

Yeah that is what I am talking about the devtools::test. I tried to swap out system.file with fs::path_package in a SqlRender fork and it still leads to the same issue, not being able to find the file. Probably an issue with devtools and not SqlRender. It seems either function (system.file or fs::path_package ) will work if placed directly in the test but not if sourcing from a function from the SqlRender package, weird. I am working on a windows computer but I will try the shortcut folder see if that works within the tests. Thanks! Back to the primary issue I can help you with this checklist.

@mdlavallee92
Copy link
Collaborator Author

Context
When I have used CohortDiagnostics (CD), I think of it as a pipeline rather than an R package because it a) takes in settings, b) runs the options based on the input settings, c) is "chatty" meaning it tells the user what happened and d) dumps output to a folder to lend itself to a post-processing/visualization step (i.e. your cool RMM pkg). The reason I bring this up is that I think it would be helpful to provide users access to the "worker" methods so they can layer their diagnostics pipeline. For example, I want to run incidence, concept set diagnostics and cohort relationship on a few cohorts. Maybe I want to add my custom diagnostic step to this. I can access the suggested "workers" and build my own pipeline script. By saying this, I don't intend to suggest scrapping the global execute executeDiagnostics. This function is very useful and I use it a lot already since it does everything for me. It is also a target output for your very cool RMM package making it easy to build a dashboad off a few lines of code. Going back to the pipeline mindset, I think this would help unit testing, debugging and also usership. Thinking of the pipeline also gives me ideas of what to target in terms of key functions to expose.

Pipeline

From the executeDiagnostics wrapper, these are the steps I see:

  1. Setup - accumulate settings, checkmate, start logger, prep cohortDefinitionSet
  2. Cdm Info
  3. ObservationPeriod Range
  4. Create concept table - note this is needed to run the concept set diagnostics and visit context
  5. Get Cohort Counts - single function comes from CohortGenerator
  6. Get Inclusion Stats - single function comes from CohortGenerator::getCohortStats
  7. Run Concept Set Diagnostics
    - runIncludedSourceConcept
    - runOrphanConcepts
    - runBreakdownEvents
  8. Get time series
  9. Get Visit Context - note relies on concept set diagnostics run and concept table temp
  10. Get Incidence Rate
  11. Get Cohort Relationship (overlap in the shiny)
  12. Cohort Characterization
    - run baseline
    - run temporal (your chronograph looking tables, -999 to 999 time windows)
  13. Clean-up - drop temp concept table, write tables as csv and zip

In steps 5-12 there are other wrappers to functionalize the "worker" method on multiple cohort definitions, prep them for saving, provide logging information and export. There are a few exceptions such as
cohort relationship which requires the enitre set to work.

Suggestion: Expose the singular diagnostic methods in the pipeline

Some of these methods are already exposed (runCohortTimeSeriesDiagnostics), others have a dedictated function but are not exposed or documented (i.e.getIncidenceRate) and others are nested within other functions (i.e. concept set diagnostic functions). This would be my list of functions to expose:

  • get ObservationPeriod range (nested in executeDiagnostics)
  • prep Concept table (nested in executeDiagnostics and runConceptSetDiagnostics) - note this is a dependency for the concept set diagnostics and visit context steps, should be handled as a special case
  • getIncidenceRate (function exists, not exposed)
  • getVisitContext (function exists, not exposed)
  • runCohortTimeSeriesDiagnostics (already done)
  • runCohortRelationshipDiagnostics (already done)
  • runIncludedSourceConcepts (nested in runConceptSetDiagnostics)
  • runBreakdownIndexEvents (nested in runConceptSetDiagnostics)
  • runOrphanConcepts (nested in runConceptSetDiagnostics)

Other steps in the pipeline are already functions from other packages....cohorts counts and inclusion stats from Cohort Generator and Cohort Characterization from FeatureExtraction

What do you think @azimov?

@ablack3
Copy link
Collaborator

ablack3 commented Apr 5, 2023

I was unaware of fs::path_package adding this into SqlRender directly may fix this issue in place of system.file.

You could also try pkgload:::shim_system.file() in loadRenderTranslateSql which should work with devtools::load_all.

@mdlavallee92
Copy link
Collaborator Author

Here is the start of the idea I have, needs some further polishing (checkmates etc) but lot of the code stays the same. Just a matter of disentangling the dependencies, particularly with the concept set diagnostics.

In the process of building the unit tests to make sure these work in isolation. Started testing in Eunomia on cohort 17492 from the cohorts to create tests you already have in there.

Take a look and see what you think! No worries if this is in the wrong direction.

@azimov
Copy link
Collaborator

azimov commented Apr 5, 2023

@mdlavallee92 I completley agree with the principle of separating this out into distinct parts of a pipeline. Part of the reasoning behind RMM was to allow people to begin extending Ohdsi result sets in a consistent manner.

I can imagine someone extending the pipeline of CohortDiagnostics pretty trivially by adding their own diagnostics functions that they maybe aren't ready (or willing) to share with the wider world but want to include in their results.

Just adding a function and exporting the results according to their model specification and uploaded. With the changes to OhdsiShinyModules and Jenna's ShinyAppBuilder package you can customize which shiny views are added and add your own custom modules that should slot in. Indeed, this is part of the thinking behind the Strageus package, though I expect that the implementation there will not currently meet these needs.

@azimov azimov added the v 4.0.0 label Oct 1, 2024
@azimov azimov added this to the Release version 4.0.0 milestone Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested v 4.0.0
Projects
None yet
Development

No branches or pull requests

3 participants