Skip to content

Latest commit

 

History

History
57 lines (44 loc) · 4.85 KB

reproducible_research.md

File metadata and controls

57 lines (44 loc) · 4.85 KB

Reproducible research

To guarantee that all research is reproducible, code must be organized as a package. Here is the briefest explanation why it's a good idea. More comprehensive explanations can be found by "research compendium" keyword (rrrpkg has a good description).

Good practices

The very least one can do to preserve proper folder structure to store your data and results. Article "Stop the working directory insanity" has a detailed explanation of the problem. To solve this problem without any overhead I propose dataorganizer package. Here is a brief guide on how to develop packages in RStudio.

Package structure

  • All functions are stored in .R files inside "R/" folder maybe with rare exception of single-use functions
  • All analysis is performed in .Rmd files in either "vignette" or "analysis" folders, grouped by subfolders with meaningful names
  • If you re-use (e.g. copy-paste) the same code more than twice, move it to a function
  • If your chunk takes longer than 10 lines, create a function with a proper name to understand what's going on (except chunks with data loading)
  • After you get important result, commit all changes in "R/" and in your current .Rmd notebook right away.
  • Avoid writing functions longer than 50 lines and with nesting level > 3. If it's the case, consider extracting sub-functions.
  • Try to organize your code and data hierarchically: use subfolders for both notebooks and data files
  • Though clean code is nice, don't be ashamed to store dirty code in your private repository: it's much better than not to store at all
  • Please document your functions for an R package, both for yourself in the future and for sharing work with others. This is done via roxygen2. Trust me, you'll thank yourself later.

Code style

In our packages I suggest the following code style (it's fine to have different in your analysis, as far as it's consistent for the whole project):

  1. All functions are lowerCamelCase. See here why naming functions with dots is a bed idea (tldr: R uses dots for S3 functions)
  2. All variables are lower case with dot as a separator (e.g. "n.pcs" or "count.matrix")
  3. All files are named in a snake case with capital R as the extension (e.g. utility_functions.R)
  4. Spaces around matrix operations, but not around function parameters (e.g. "x + 2 / 3" or "f(x=1, y=(2 / 3))")
  5. Parentheses are required everywhere except one-line "return" or "stop" statements or short messages (i.e. "message" or "cat")

Indeed, there are many style guides for R. One of the most popular is the Hadley Wickham's. But these guides are constantly changed, so there is no "best" option.

Creating a research package

To create a good research compendium structure you may use workflowr and dataorganizer packages in addition to normal R package. Examples: Epilepsy, cacoa.

  1. Create a new R package project using RStudio
  2. Pick a license (e.g., usethis::use_gpl3_license())
  3. Initialize this project as a workflowr repo: workflowr::wflow_start("./", existing=TRUE, git=FALSE) (see details)
  4. Initialize dataorganizer folders: dataorganizer::CreateFolders() (details)
  5. Add packages to requirements: usethis::use_package("workflowr"); usethis::use_dev_package("dataorganizer")
  6. Create a GitHub repositiory and add it as a remote to your local folder (follow GitHub instructions that will be shown)

Resources