Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bioconductor or CRAN #191

Open
rcannood opened this issue Nov 6, 2024 · 7 comments
Open

Bioconductor or CRAN #191

rcannood opened this issue Nov 6, 2024 · 7 comments

Comments

@rcannood
Copy link
Collaborator

rcannood commented Nov 6, 2024

Related to #183 (comment)

Are we aiming for a CRAN or a BioConductor release?

What are the pros for both platforms?

@lazappi
Copy link
Collaborator

lazappi commented Nov 7, 2024

I'll probably think of some more stuff and update this but here is what comes to mind at the moment.

Bioconductor

Pros

  • Community. Most of the R single-cell developers are there. Helps build a connection between Bioconductor and scverse.
  • Packages. Most of the packages likely to depend on {anndataR} are there so integrating with them might be easier.
  • H5ADAnnData code was originally written using {rhdf5} (I didn't realise the PR changing this had been merged already...)
  • There is a lot of support for delayed on-disk matrices etc. which we can/should use
  • Thorough package review code review
  • Release cycle
  • When we started the project at the hackathon we decided to aim for Bioconductor

Cons

  • Release cycle
  • More opinionated package guidelines
  • Slightly more difficult to integrate with CRAN packages

CRAN

Pros

  • Current H5ADAnnData implementation using {hdf5r}
  • Quicker package review process
  • No release cycle
  • Fewer guidelines/restrictions to follow
  • Slightly easy for both Bioconductor and CRAN packages to use {anndataR}

Cons

  • Less community, not many single-cell packages on CRAN
  • Less predictable package review process

Open questions

  • I'm not sure where R {SpatialData} is planning to go and if we should coordinate with them?
  • Where is there better support for Zarr/file formats we might want to use?

For me, the community is a big deal and we should try to be a part of that, even if it means some restrictions on the development process. I think there is a lot to gain by being on Bioconductor (both socially and technically) and all you lose is some flexibility. That said, I'm open to aiming for CRAN if there are good (probably technical) arguments for why that would be better.

@rcannood
Copy link
Collaborator Author

rcannood commented Nov 8, 2024

Thanks for the information!

From the perspective of a user, I feel like the pros and cons for Bioconductor vs. CRAN is that Bioc packages are typically a lot slower and more cumbersome to install. I suppose that we can mitigate this somewhat by making sure that we don't add too many required dependencies.

Do you think that by releasing on Bioconductor, we'll need to have more required packages than if we install on CRAN (other than BiocManager)?

@Artur-man
Copy link

FYI, R SpatialData will start happening next week, and @LouiseDck will be there too. We will have people from both scverse and BioC so I think this topic will come up!

@lazappi
Copy link
Collaborator

lazappi commented Nov 11, 2024

Do you think that by releasing on Bioconductor, we'll need to have more required packages than if we install on CRAN (other than BiocManager)?

It shouldn't be any different (at least for required dependencies). {SingleCellExperiment} can stay as a suggested dependency (or at least it should be able to) and things like {BiocStyle} also go in Suggests. I guess it's possible that things like {rhdf5} have more dependencies that {hdf5r} but you would have to check. If we started using more of the Bioconductor infrastructure that would introduce dependencies but presumably we would only do that if it had other benefits.

@LouiseDck
Copy link
Collaborator

I think the most important point is the first one made by @lazappi:

Community. Most of the R single-cell developers are there. Helps build a connection between Bioconductor and scverse.

For that reason alone, I think it makes most sense to release on Bioconductor.
I do agree with @rcannood that it is more cumbersome as a user, and I don't love the release cycle, but I don't think these are big enough concerns. I don't think users exists that just wouldn't use a package because it was on Bioconductor instead of cran?

(on the other hand, I do not really care that much about where the package ends up, as long as it gets released at some point 😅)

I think this is a different discussion than whether or not we use {rhdf5} or (hdf5r}, IIRC it would be possible to submit to Bioconductor regardless.

@lazappi
Copy link
Collaborator

lazappi commented Nov 12, 2024

I think this is a different discussion than whether or not we use {rhdf5} or (hdf5r}, IIRC it would be possible to submit to Bioconductor regardless.

I think a Bioconductor reviewer could push us to use {rhdf5} unless there is a technical reason we can't so it's a bit related. I think I got everything working with the latest {rhdf5} at some stage though.

@Artur-man
Copy link

Artur-man commented Nov 15, 2024

We discussed the matter this week together with @LouiseDck, @keller-mark and @vjcitn (founding contributor to BioC).

Vince kindly told us that {anndataR} is really welcome in BioC, and once the review for submission begins we can talk about dependency bureaucratics ({pizzarr}, {rhdf5}, {hdf5r} etc.). Although, BioC core team (as vince also mentioned) may prefer dependencies from the BioC collection, they are pretty open to CRAN packages as dependencies if there is a good reason :D, but of course they are not as strict as one may think.

Hope this helps ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants