Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap helper from renv.lock #5

Open
philipp-baumann opened this issue Aug 1, 2023 · 26 comments
Open

bootstrap helper from renv.lock #5

philipp-baumann opened this issue Aug 1, 2023 · 26 comments
Assignees
Labels
enhancement New feature or request

Comments

@philipp-baumann
Copy link
Collaborator

philipp-baumann commented Aug 1, 2023

Hey @b-rodrigues , I thought it would be nice for people to be able to transition from a {renv} env to a nix setup in individual projects. For this, a helper that simply parses JSON in renv.lock and grabs the packages for the rix::rix() r_pkgs arg would be straight-forward to implement. If useful, I am happy to draft a PR addressing it.

@b-rodrigues
Copy link
Contributor

that would be an amazing addition please feel free to code it up 😁

@philipp-baumann
Copy link
Collaborator Author

maybe I will wait until you commit your recent drafts into new remote branch :-)

@b-rodrigues
Copy link
Contributor

I think a challenge here is to first list only the required packages that are actually called by the user using library(xxx) and not all of xxx dependencies, because these would need to got into propagatedBuildInputs. Packages would need to get downloaded from the CRAN archives. We can use fetchzip() to do so:
https://github.com/b-rodrigues/rix/blob/d505382ad8ef759a8257227a19043e228a6d70dd/R/find_rev.R#L142

If we have the list of packages that are actually called by the user, and pass that to fetchzip() (or rather, fetchzips() which handles multiple packages see: https://github.com/b-rodrigues/rix/blob/d505382ad8ef759a8257227a19043e228a6d70dd/R/find_rev.R#L203

then that should be simple enough I think.

@philipp-baumann
Copy link
Collaborator Author

I'll also have a look at how {renv} implements the snapshotting.

@philipp-baumann philipp-baumann self-assigned this Aug 12, 2023
@philipp-baumann philipp-baumann added the enhancement New feature or request label Aug 13, 2023
@philipp-baumann
Copy link
Collaborator Author

directionality: renv => nix

@philipp-baumann
Copy link
Collaborator Author

way of interfacing renv: via requireNamespace(), eventually have it in Suggests field.

@RichardJActon
Copy link
Contributor

I implemented the 'dumbest' version of this in my fork for personal use generating default.nix files from projects with renv.lock files. For the couple of projects with big renv.lock files I've tried it out on I've had to delete any packages installed from github from the list as these, expectedly, cause build errors. Also the gt package which seems to be failing because of it's dependency of V8.

RichardJActon@7061274

This may be something for it's own issue but here are the logs for the V8 build issue:

building '/nix/store/w4n5szzkapj921gfgf5yynijdvc2r15r-r-V8-5.0.0.drv'...
Running phase: unpackPhase
unpacking source archive /nix/store/92nvrcn6w8k2ylbrjzmd3lalwri0c4mv-V8_5.0.0.tar.gz
source root is V8
setting SOURCE_DATE_EPOCH to timestamp 1723808402 of file V8/MD5
Running phase: patchPhase
Running phase: updateAutotoolsGnuConfigScriptsPhase
Running phase: configurePhase
patching script interpreter paths in configure
Running phase: buildPhase
Running phase: checkPhase
Running phase: installPhase
* installing *source* package 'V8' ...
file 'configure' has the wrong MD5 checksum
** using staged installation
Found C++20 compiler: /nix/store/zznja5f8v3jafffyah1rk46vpfcn38dv-gcc-wrapper-13.3.0/bin/c++
Found INCLUDE_DIR and/or LIB_DIR!
Using CXXCPP=/nix/store/zznja5f8v3jafffyah1rk46vpfcn38dv-gcc-wrapper-13.3.0/bin/c++ -std=gnu++20 -E
Using PKG_CFLAGS=-I/nix/store/bsqha32xvv1j20bwkhr2wsnclk4qsi3a-nodejs-20.17.0-libv8/include -I/usr/include/v8
Using PKG_LIBS=-L/nix/store/bsqha32xvv1j20bwkhr2wsnclk4qsi3a-nodejs-20.17.0-libv8/lib -lv8
Running feature test for pointer compression...
/nix/store/81xsp348yfgmaan9r5055mcdjfw7a8wc-binutils-2.42/bin/ld: /build/ccDtqrCX.o: in function `main':
/build/V8/tools/test.cpp:7:(.text.startup+0x36): undefined reference to `v8::platform::NewDefaultPlatform(int, v8::platform::IdleTaskSupport, v8::platform::InProcessStackDumping, std::unique_ptr<v8::TracingController, std::default_delete<v8::TracingController> >)'
/nix/store/81xsp348yfgmaan9r5055mcdjfw7a8wc-binutils-2.42/bin/ld: /build/V8/tools/test.cpp:8:(.text.startup+0x48): undefined reference to `v8::V8::InitializePlatform(v8::Platform*)'
/nix/store/81xsp348yfgmaan9r5055mcdjfw7a8wc-binutils-2.42/bin/ld: /build/ccDtqrCX.o: in function `v8::V8::Initialize()':
/nix/store/bsqha32xvv1j20bwkhr2wsnclk4qsi3a-nodejs-20.17.0-libv8/include/v8-initialization.h:104:(.text.startup+0x4f): undefined reference to `v8::V8::Initialize(int)'
collect2: error: ld returned 1 exit status
Pointer compression not needed
/nix/store/81xsp348yfgmaan9r5055mcdjfw7a8wc-binutils-2.42/bin/ld: /build/ccHf8mQK.o: in function `main':
/build/V8/tools/version.cpp:5:(.text.startup+0x2): undefined reference to `v8::V8::GetVersion()'
collect2: error: ld returned 1 exit status
./configure: line 154: ./v8version: not found
** libs
using C++ compiler: 'g++ (GCC) 13.3.0'
using C++20
rm -f V8.so RcppExports.o bindings.o
/nix/store/zznja5f8v3jafffyah1rk46vpfcn38dv-gcc-wrapper-13.3.0/bin/c++ -std=gnu++20 -I"/nix/store/z70v7siag32vkpwapfzmj84nw6y43gh9-R-4.4.1/lib/R/include" -DNDEBUG -I/nix/store/bsqha32xvv1j20bwkhr2wsnclk4qsi3a-nodejs-20.17.0-libv8/include -I/usr/include/v8 -DV8_ENABLE_CHECKS -I'/nix/store/6a7vz17pibzimdjy4js648r5jvc7v5jl-r-Rcpp-1.0.13/library/Rcpp/include'    -fvisibility=hidden -fpic  -g -O2   -c RcppExports.cpp -o RcppExports.o
/nix/store/zznja5f8v3jafffyah1rk46vpfcn38dv-gcc-wrapper-13.3.0/bin/c++ -std=gnu++20 -I"/nix/store/z70v7siag32vkpwapfzmj84nw6y43gh9-R-4.4.1/lib/R/include" -DNDEBUG -I/nix/store/bsqha32xvv1j20bwkhr2wsnclk4qsi3a-nodejs-20.17.0-libv8/include -I/usr/include/v8 -DV8_ENABLE_CHECKS -I'/nix/store/6a7vz17pibzimdjy4js648r5jvc7v5jl-r-Rcpp-1.0.13/library/Rcpp/include'    -fvisibility=hidden -fpic  -g -O2   -c bindings.cpp -o bindings.o
/nix/store/zznja5f8v3jafffyah1rk46vpfcn38dv-gcc-wrapper-13.3.0/bin/c++ -std=gnu++20 -shared -L/nix/store/z70v7siag32vkpwapfzmj84nw6y43gh9-R-4.4.1/lib/R/lib -o V8.so RcppExports.o bindings.o -L/nix/store/bsqha32xvv1j20bwkhr2wsnclk4qsi3a-nodejs-20.17.0-libv8/lib -lv8 -L/nix/store/z70v7siag32vkpwapfzmj84nw6y43gh9-R-4.4.1/lib/R/lib -lR
installing to /nix/store/m5l59cnkq6042d6jf7bnph0jwmq5g068-r-V8-5.0.0/library/00LOCK-V8/00new/V8/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for 'V8' in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/nix/store/m5l59cnkq6042d6jf7bnph0jwmq5g068-r-V8-5.0.0/library/00LOCK-V8/00new/V8/libs/V8.so':
  /nix/store/m5l59cnkq6042d6jf7bnph0jwmq5g068-r-V8-5.0.0/library/00LOCK-V8/00new/V8/libs/V8.so: undefined symbol: _ZN2v815ArrayBufferView9CheckCastEPNS_5ValueE
Error: loading failed
Execution halted
ERROR: loading failed
* removing '/nix/store/m5l59cnkq6042d6jf7bnph0jwmq5g068-r-V8-5.0.0/library/V8'
error: builder for '/nix/store/w4n5szzkapj921gfgf5yynijdvc2r15r-r-V8-5.0.0.drv' failed with exit code 1;
       last 10 log lines:
       > ** building package indices
       > ** installing vignettes
       > ** testing if installed package can be loaded from temporary location
       > Error: package or namespace load failed for 'V8' in dyn.load(file, DLLpath = DLLpath, ...):
       >  unable to load shared object '/nix/store/m5l59cnkq6042d6jf7bnph0jwmq5g068-r-V8-5.0.0/library/00LOCK-V8/00new/V8/libs/V8.so':
       >   /nix/store/m5l59cnkq6042d6jf7bnph0jwmq5g068-r-V8-5.0.0/library/00LOCK-V8/00new/V8/libs/V8.so: undefined symbol: _ZN2v815ArrayBufferView9CheckCastEPNS_5ValueE
       > Error: loading failed
       > Execution halted
       > ERROR: loading failed
       > * removing '/nix/store/m5l59cnkq6042d6jf7bnph0jwmq5g068-r-V8-5.0.0/library/V8'
       For full logs, run 'nix log /nix/store/w4n5szzkapj921gfgf5yynijdvc2r15r-r-V8-5.0.0.drv'.

@b-rodrigues
Copy link
Contributor

Hey that's pretty neat, I actually wanted to implement exactly this, just listing the packages from an renv.lock file without handling any versions, and call this the "fast" method (while handling github packages though). And then have an "accurate" method that would, ideally, deal with versions.

As for V8, thanks for letting us know, I confirm it's currently broken, and you don't seem to be the only one bitten by it: NixOS/nixpkgs#339861

the fix will soon be merged to master, in the meantime you should be able to use the recommend commit from that thread as the r_ver from the call to rix().

@RichardJActon
Copy link
Contributor

RichardJActon commented Oct 11, 2024

Thanks for the info on V8 @b-rodrigues. Happy clean it up a little and open a PR if you want the "fast" approach in upstream while people figure out the details of a more complete solution.

should be pretty easy to add a check for github remotes that removes them by default and throws a warning.

@b-rodrigues
Copy link
Contributor

I’d be more than happy! But I don’t think we should remove the packages, but handle them instead. For example, here is the latest echarts4r from Github:

    "echarts4r": {
      "Package": "echarts4r",
      "Version": "0.4.5.9000",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteRepo": "echarts4r",
      "RemoteUsername": "JohnCoene",
      "RemoteRef": "HEAD",
      "RemoteSha": "063883bbb71584bdb651f20be6f2bb34d55c9f6a",
      "Requirements": [
        "R",
        "broom",
        "corrplot",
        "countrycode",
        "dplyr",
        "htmltools",
        "htmlwidgets",
        "jsonlite",
        "purrr",
        "rstudioapi",
        "scales",
        "shiny"
      ],
      "Hash": "7062f4dffb9a9b144c7f690cf2d214be"
    },

We need to provide "package_name", "repo_url" and "commit" to git_pkgs. Using "RemoteType": "github", "RemoteRepo": "echarts4r" and "RemoteUsername": "JohnCoene" we can build the repo_url, we can use the "RemoteSha" as the commit and "Package" as the package_name. We could then provide this as well.

@RichardJActon
Copy link
Contributor

OK interesting I'm still a nix packaging noob - easy enough to extract that repo info from the renv.lock file but I'm not certain what to do with it after that...

I added a function to get the R version from the lock file and, for now, an option to filter out packages the source of which is not "Repository" to my fork.

@b-rodrigues
Copy link
Contributor

b-rodrigues commented Oct 11, 2024

You can pass github packages to rix() like so:

rix(
  r_ver = "4.2.1",
  r_pkgs = c("dplyr", "janitor"),
  git_pkgs = list(
    list(
      package_name = "housing",
      repo_url = "https://github.com/rap4all/housing/",
      commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"
    ),
    list(
      package_name = "fusen",
      repo_url = "https://github.com/ThinkR-open/fusen",
      commit = "d617172447d2947efb20ad6a4463742b8a5d79dc"
    )
  ),
  ide = "other",
  project_path = path_default_nix,
  overwrite = TRUE
)

rix() takes care of generating the correct Nix expression. So we need to extract this info about the Github packages from the renv.lock files to pass it down to rix().

@RichardJActon
Copy link
Contributor

Ah cool I missed that.

The question is then what is the best interface for this?
Processing the renv.lock file produces values that can be passed to 4 different arguments of rix(): r_ver, r_pkgs, git_pkgs, and local_r_pkgs. The latter 3 are all from processing the package information.

Following on with my current approach I would make 4 functions 1 for each argument, this seems cumbersome, but doesn't involve changing the behavior of rix() at all, so is nice for backwards compatibility.

Alternately an argument could be added to rix() to supply a path to an renv.lock file which automatically sets all these values based on the file, This seems nicer and more efficient.

Then what would be the desired behavior of this argument if it is set and one of the existing ones is also set? Should the existing arguments supersede the values supplied by the renv.lock file? and/or should some kind of exception/message be thrown if both are supplied?.

Maybe add 3 arguments to rix():

  • use_renv_lock = FALSE
  • renv_lock_path = "renv.lock"
  • allow_renv_overrides = FALSE

Adding use_renv_lock allows the default path to be set without defaulting to using renv.lock and preserving the current behavior. Alternatively to keep the number of arguments down we could go with leaving renv_lock_path unset by default and only use renv.lock if the user supplies the path, or go with something like renv_lock which will accept TRUE or a path and default to the current directory if TRUE is passed.

If allow_renv_overrides is TRUE r_ver, r_pkgs, git_pkgs, and local_r_pkgs can override values inferred from the renv.lock else setting use_renv_lock and any of these others throws an error saying to pick one or the other or allow overrides.

@b-rodrigues
Copy link
Contributor

I think this should be a completely separate function called renv2nix() which uses rix() under the hood to generate the Nix expression. Packages should only be passed to r_pkgs unless they come from github, or are local (but does renv.lock list local packages?)

@RichardJActon
Copy link
Contributor

OK sounds good - what do you think of that function having arguments which allow you to override things from the renv file or do you think it best to discourage this by design?

I believe renv can capture local packages - I tried it out a while ago when I was using a locally developed package in another project I think it just stored the path somewhere I'll check how it did it exactly.

@b-rodrigues
Copy link
Contributor

I don't think we should override the renv.lock file, but perhaps we should generate the actual R call to rix() as well, so people could change that if they want

@RichardJActon
Copy link
Contributor

As an option to facilitate debugging or as the default behavior? - I'd lean towards the former.

@b-rodrigues
Copy link
Contributor

Agreed

@RichardJActon
Copy link
Contributor

I've got a version of renv2nix() working in my fork, including an option to return the rix call it generates, it can handle github/gitlab remotes but I've not played with any of the other remote types.

When renv installs a local package it's expecting a path to a local source folder not a compressed built package like local_r_pkg seems to so not quite sure how to make those two play nice.

I need to write some tests to check it handles things OK when the inputs stray from the 'happy path' but the basics are there.

@b-rodrigues
Copy link
Contributor

Looks very nice, that's amazing!

If you allow me a little pre-review:

  • it would be great if you could add an argument to renv2nix() called method with a default value set to "fast"', and then an if statement testing if method == "fast"in which case your logic applies, and if not, an error gets raised with a message such as "Not implement yet". It would be importart to also usematch.args(), the other allowed value being "accurate"` (or something)
  • regarding the code dealing with github and gitlab packages, I think this should be in another helper function, that would take the remote as an input ("github", "gitlab" and such) and return the list git_pkgs.

Thank you once again, this looks really promising!

@mihem
Copy link
Contributor

mihem commented Oct 12, 2024

Sorry to interrupt your discussion. I just wanted to say that i think that this conversion from renv to rix is super important for many users I think. I am reading your book Bruno about reproducible pipelines in R (which is great and very much needed for reproducible data science in R ) and came across this issue.
I tried rix on a small project and it worked great. But I have an existing project (where I used renv to increase reproducibility) and which has > 200 libraries. From my understanding I would need to manually write down all packages (minus dependencies)? Letting a function handle that sounds much better or is there an easier already existing way that I am missing? Using renv with docker this is easy, because you can just run renv::restore in the Dockerfile. But then building this image takes some time (and usually in the process of analysis I need to add several libraries sequentially and would need to rebuilt it several times then).

@RichardJActon
Copy link
Contributor

I've added the method argument and factored out most of git_pkgs processing to another function. Because the remote type is listed in the renv entry for the package I factored out a function where you can assert the type of remote you are expecting but which defaults to using the one documented for a given package. It returns not the list you would give to git_pkgs but an item in the list that you would give to git_pkgs. I use a loop in renv2nix() to iterate over the packages from non-primary repository sources with this function and assemble the list to hand to git_pkgs.

@mihem all I'm really doing here with this "fast" renv2nix() is getting the list of all the r packages in an renv.lock file and passing this to the r_pkgs argument of rix(). It's only marginally more sophisticated than doing this:

rix(r_pkgs = names(jsonlite::read_json("renv.lock")$Packages))

because I've added the ability to detect and attempt to build packages recorded as coming from github and gitlab remotes. Its not checking that the versions of the packages installed from nix match those in the renv.lock file - I think that will take quite a lot more work.

This explicitly installs all of the transitive dependencies of the primary dependencies of your project. These would be installed anyway but the transitive dependencies could be implicit as they are represented in the nix derivations of the primary dependencies. So whilst you don't need to explicitly list them it doesn't really do any harm to do so. At least a I understand it.

@mihem
Copy link
Contributor

mihem commented Oct 20, 2024

@RichardJActon Well it's still very useful, evenif it's not much more than this one liner. So great work :)
But I agree, also extracting the version would be what it would take to make this truly useful. Otherwise there will be probably some version disagreement or how would you envision reproducing a renv frozen project in this way @b-rodrigues ?

@b-rodrigues
Copy link
Contributor

b-rodrigues commented Oct 20, 2024

I'm currently thinking about having exact versions of packages and am experimenting a bit but there are some issues:

  • not all versions of all packages are on nixpkgs, so we would need to somehow write the expressions for the missing versions
  • some packages were broken a long time in nixpkgs an got fixed recently, and some others were working and got broken because of newer dependencies deprecating features. For example in July libxml2 deprecated a feature which impacts the libspatialite library, in turn impacting R packages like icosa. These are rather rare situations, but they cause quite some headaches when they happen

I'm trying to see if I can fork nixpkgs and have all CRAN packages versions in there, and backport fixes as well. But it's not that easy because there are a lot of moving parts. Another issue is that I would like to guarantee that it works for Linux and macos and that adds macos specific headaches as well

@RichardJActon
Copy link
Contributor

The 'not all versions of all R packages being in nixpkgs' thing was why I really wasn't sure where to start when trying to do an 'accurate' version of this, I've not checked if all versions of R are available.

A hack for when a specific version is needed that's not in nixpkgs could be to get that version from a remote but this wouldn't necessarily solve problems where the package won't build because of some system dependency issue. This doesn't generalize well so I'd say it's better addressed by some like what @b-rodrigues suggests but may help people fix individual environments.

@b-rodrigues
Copy link
Contributor

Coming back to this, I have been busy these past months with generating dated CRAN snapshots for Nix. It's all explained here https://github.com/rstats-on-nix/daily_cran

rix will be using these dated snapshots if everything goes right. I have this implemented in a separate branch already and believe we could use these dated snapshots for the accurate method. It still doesn't cover every package version from CRAN, far from it, (already just because it starts from 2019), but it gets us closer. My idea would be to have a lookup table linking package versions and dates, and then picking the most common date as the snapshot to use. It is also possible to use several sources actually as described here (for flakes but same logic applies) https://mplanchard.com/posts/installing-a-specific-version-of-a-package-with-nix.html#option-one:-a-different-nixpkgs-version

so maybe using two or three dates will bring us very close to an accurate method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants