Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeReviewRequest] #8

Open
melina-leite opened this issue Nov 7, 2021 · 2 comments
Open

[CodeReviewRequest] #8

melina-leite opened this issue Nov 7, 2021 · 2 comments
Labels
Code_Review Request a code review

Comments

@melina-leite
Copy link

Author: Melina de Souza Leite

Repo: https://github.com/melina-leite/analysis_matrixBirds

Aim: The repository contains the data and code for the analysis of my PrePrint "Matrix quality determines the strength of habitat loss filtering on bird communities at the landscape scale". It basically documents all the steps I took for the analysis and results of the study, including data wrangling, descriptive statistics, modeling, and models' diagnostic.
Nobody has reviewed (or even seen) this code, so I'm pretty sure there are many mistakes and ways to improve it in terms of readability and reproducibility, at least.

File Info: The *.Rmd files in the main folder are numbered with the sequence of the data analysis (0 to 5), it should run without any problem. The reader/user could also follow all the analysis with the *.pdf or *.htlm outputs of the Rmarkdown files.

@melina-leite
Copy link
Author

melina-leite commented Dec 1, 2021

Thanks, @DrMattG for the review!

The changes I've done in the repo since then:

Cleanup:

  • I created a scripts folder for the .Rmd scripts, I kept the .html outputs in the main directory.
  • I created a references folder to accomodate the references .bib and .csl files
  • I created a appendices_suppinfo folder for the .pdf appendices files and the main suppl. info file (combination of the 3 appendices file)

Then, I renamed some of the .Rmd files, because the file 1_APPENDIX_1.pdf is indeed from .docx file and not a .Rmd -> it contains the supplementary text for the study and I got just 1 figure and 2 tables from the 0_data_preparation.Rmd script.

I was in doubt about how to separate the files that I produced for the supplementary info, which are .pdf outputs from 3 of the .Rmd files (all chunks echo=F), from the .html outputs (all chunks echo=T), which I think may help the reader to follow the analysis or just figure out one or two steps or code of interest (e.g. models code). That's why I separated the appendix .pdf outputs from the htlm.

renv

This was totally new for me! Thanks for letting me know! I used the .Rproj options in Rstudio to set it. I think it works, but I still have to read and learn more about it!

Btw, do you guys think I should commit the .Rproj folder and files?

working directory (?!)

I found something weird and I want to share it with you. It's about setting a different working directory of the .Rmd file while rendering it and, at the same time, a different folder to save the output, considering that I was using the Rproj working directory in Rstudio. I can talk more about it in our meeting!

@melina-leite
Copy link
Author

Points discussed and learned in our Zoom meeting today:

  • The package renv suggested by Matt: We didn't have time to explore it. But it can be automatically set in Rstudio project options > Environments.

Confusing working directories:

  • Rmd files are set to run inside the directory where the file is, but we can change it in Rstudio Global Options > R Markdown > Evaluate chunks in directory: to the Rproj working directory (for example). Btw, do not (.git)ignore the Rproj files in the commit! It will be easier for the user.

  • For default, Rmd files will knit the output in the same directory, to change it to the main repo directory, I used the code in the YAML:

knit: (function(inputFile, encoding) {
  rmarkdown::render(inputFile, encoding = encoding, output_dir = "../") })
  • Use the package here to locate correctly the files in the repo, independently of the Rproj working directory or the WD you set.

CODE cleaning:

  • Use the package pacman to load many packages. p_load() function.

  • Trick: magrittr pipe "%<>%" to modify an object while applying tidyverse functions. But you have to load magrittr separately because this pipe is not available if loading only the tidyverse packages.

  • Suggestion to use the function convert from package hablar to convert classes for groups of columns in objects, instead of apply/sapply functions.

  • I should try to document more the code chunks (#) and also for the unusual functions, tell the package before (e.g. tidyr::)

  • Move the load("model.Rdata") for the models to be just before the model chunk (eval=F) and explain why I'm saving the model in Rdata file (it takes time to run).

  • For the very long code chunks that repeat the same command for all the 4 datasets (assemblages), I should think of making functions and then just change the dataset (place it in a separate code chunk just before the main chunk). It saves a lot of space!

  • What about creating a GitHub Page for the HTML files that I already have???

  • Ugly boxplots, please no! What about violin plots, or just dotplots (ggbeeswarm is an option)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code_Review Request a code review
Projects
None yet
Development

No branches or pull requests

1 participant