Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add butcher package #128

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add butcher package #128

wants to merge 2 commits into from

Conversation

inglesp
Copy link
Contributor

@inglesp inglesp commented Feb 23, 2023

See #127 for details.

@inglesp inglesp changed the title Add butcher Add butcher package Feb 23, 2023
Copy link
Member

@bloodearnest bloodearnest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure about supporting and using different repos to get packages - it's a policy change for the R image. We've previously rejected requests for packages because they're not on CRAN.

It may be a reasonable policy change to make, but if we do, then what repos are allowed? Is github allowed as a source?

@bloodearnest
Copy link
Member

I think the posit repo is maybe just a frozen cran mirror, so maybe its ok?

I am wary of date based pinning of older versions. I actually think the first suggestion of adding dependencies manually is cleaner from a maintenance pov

@remlapmot
Copy link
Contributor

Using snapshot repos is the easiest way to add a new package without updating its dependencies that I know of. If a new package had a lot of dependencies, working out the order and version in which to include those could take some figuring out.

So for me this looks good.

That's right, the RStudio/Posit CRAN snapshots are CRAN at a certain date, like the Microsoft CRAN snapshot repos we used to add dtwclust and randomForest.

r-docker/packages.txt

Lines 106 to 107 in 2f2498a

dtwclust # repo https://cran.microsoft.com/snapshot/2022-09-23/
randomForest # repo: https://cran.microsoft.com/snapshot/2022-02-03

Sadly, soon Microsoft are shutting down their CRAN snapshotting service, the RStudio/Posit snapshot repos are the only public large scale snapshotting service left that I know of. I gave the URL to the Posit source package snapshot repo rather than the URL to their Focal binary package snapshot repo because there's a chance they might have used different system libraries.

As we chatted about before the most recent syntax additions to R came in R 4.1.0, so if you were to do a new image, say with R 4.1.3 that could include updated versions of the packages. My guess is that for probably all the packages their current CRAN version will install into R 4.1.3. If there are any that require say R 4.2.0 then you'd have to go back a version or two for those packages.


export PACKAGE="$1"
if [ "$2" = "NULL" ]; then
Copy link
Member

@bloodearnest bloodearnest Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just specify the default value of repos as NULL

e.g. remove line 4, and replace 7-9 with:

REPOS=${2:-NULL}

And rather than add escape quotes in bash, I would just add them in the R fragment, as per suggested change. Then we're consisently handling package and repos quoting in the same way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to add a default but I suspect that the original non-escaping of the quotes around $REPOS is the better way to go because in R NULL is null, whereas "NULL" is not null, i.e.

is.null(NULL)
#> [1] TRUE
is.null("NULL")
#> [1] FALSE

# install the package using the cache
RUN --mount=type=cache,target=/cache,id=/cache-2004 bash -c "R -e 'renv::activate(); renv::install(\"$PACKAGE\"); renv::snapshot(type=\"all\")'"
RUN --mount=type=cache,target=/cache,id=/cache-2004 bash -c "R -e 'renv::activate(); renv::install(\"$PACKAGE\", repos = $REPOS); renv::snapshot(type=\"all\")'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN --mount=type=cache,target=/cache,id=/cache-2004 bash -c "R -e 'renv::activate(); renv::install(\"$PACKAGE\", repos = $REPOS); renv::snapshot(type=\"all\")'"
RUN --mount=type=cache,target=/cache,id=/cache-2004 bash -c "R -e 'renv::activate(); renv::install(\"$PACKAGE\", repos = \"$REPOS\"); renv::snapshot(type=\"all\")'"

"Package": "lobstr",
"Version": "1.1.1",
"Source": "Repository",
"Repository": "RSPM",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me a bit. Presumably you passing in https://packagemanager.posit.co/cran/2021-06-29 as repo when adding butcher, as per Tom's issue. But what we end up with is just "RSPM". We seem to have lost the "https://packagemanager.posit.co/cran/2021-06-29" url somehow. So I suspect this won't build on another system that doesn't have the build cached.

Naively, I would have expected a new "RSPM" entry in the "Repositories" section of renv.lock, but this doesn't seem to be in this commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spot that there isn't an entry for RSPM in "Repositories" in renv.lock (RSPM = RStudio Package Manager a.k.a. Posit Package Manager). This turns out to be ok because CRAN is listed in renv.lock and renv::restore() falls back to that, so it will restore when lobstr not in the cache.

Here's a screenshot of the output from my test example, which is here
2023-02-26 08_55_21-AsPowerBar

@bloodearnest
Copy link
Member

Using snapshot repos is the easiest way to add a new package without updating its dependencies that I know of. If a new package had a lot of dependencies, working out the order and version in which to include those could take some figuring out.

So for me this looks good.

That's right, the RStudio/Posit CRAN snapshots are CRAN at a certain date, like the Microsoft CRAN snapshot repos we used to add dtwclust and randomForest.

r-docker/packages.txt

Lines 106 to 107 in 2f2498a

dtwclust # repo https://cran.microsoft.com/snapshot/2022-09-23/
randomForest # repo: https://cran.microsoft.com/snapshot/2022-02-03

Sadly, soon Microsoft are shutting down their CRAN snapshotting service, the RStudio/Posit snapshot repos are the only public large scale snapshotting service left that I know of. I gave the URL to the Posit source package snapshot repo rather than the URL to their Focal binary package snapshot repo because there's a chance they might have used different system libraries.

Ok, I yield to your in depth R ecosystem knowledge :)

As we chatted about before the most recent syntax additions to R came in R 4.1.0, so if you were to do a new image, say with R 4.1.3 that could include updated versions of the packages. My guess is that for probably all the packages their current CRAN version will install into R 4.1.3. If there are any that require say R 4.2.0 then you'd have to go back a version or two for those packages.

Yep, my vague plan had been to install all the same packages as this image, but at whatever version is current when we initially publish the image. And then sunset this image, freezing the package additions, and only supporting new packages on the newer image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants