-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss: package cohort validation strategy #83
Comments
Assuming the Pipeline Steps
This will mean that within a single snapshot of the repo, the package and all the dependencies used to produce risk metrics would have existed simultaneously at that time. AlternativesIn place of step 2, there are more comprehensive, but more computationally intensive alternatives:
|
This seems very feasible @dgkf. I think that the current validation pipeline could be well transformed into such an algorithm. A not-too-greedy implementation of the first alternative should not be too difficult either. |
@dgkf this process seems straightforward to me, at least if we restrict ourselves, for this MVP, to the metrics that can be produced when the |
Sounds good @yannfeat - Initially, I'd do whatever is most actionable for getting the overall process in place. Even if the metrics aren't 100% accurate, just getting it hooked up to the Longer-term, thinking more about the details of the process, I think we'll need to invest some time in ensuring we're grabbing the packages defined in the |
@dgkf I am implementing the comparison of the packages: pharmaR/repos@1618191d8ce878fc9c894ecbf29e86a0458355c4. Do we actually want to see if a new package version has been published, or if there is a more recent release? |
Ah, I see. It looks like For simplicity, I would suggest picking one specific architecture/R version for development. I would probably start with ubuntu + R4.4. So to answer your question, we should only care about new releases for the particular version of R that the repo corresponds to. |
Ok, I have calculated the risk metrics on the set of packages with differences: https://github.com/pharmaR/repos/blob/feature/riskscore/dev/poc_cohort_validation.R This approach is not very solid, as I am comparing the versions of releases on GitHub but calculating the metrics from CRAN. Unfortunately, |
That's okay! The github repos that Certainly for a first proof of concept, I think we can safely assume that the source code in github.com/cran is an accurate reflection of the source code tarball that you would get from CRAN. |
As @mmengelbier brought up at today's meeting, we have a challenge of producing validation documentation that are inter-dependent.
A package's validation results are dependent on the broader set of available packages, and we should be intentional with how we manage that relationship.
In this issue we hope to settle on a strategy for managing inter-package relationships and their effects on metrics, with the goal of aligning on which steps should be taken by a validation pipeline.
The text was updated successfully, but these errors were encountered: