diff --git a/index.qmd b/index.qmd index 6324904..73b1c2c 100644 --- a/index.qmd +++ b/index.qmd @@ -14,20 +14,20 @@ knitr::write_bib( Reproducibility and open scientific practices are increasingly demanded of, and needed by, scientists and researchers in our modern research -environments. As we our tools for generating data become more -sophisticated and powerful, we also need to start using more -sophisticated and powerful tools for processing it. Training on how to -use these tools and build modern data analysis skills is lacking for -researchers, even though this work is highly time-consuming and -technical. As a consequence of this unawareness of the need for these -skills, how *exactly* data is processed is poorly, if at all, described -in scientific studies. This hidden aspect of research could have major +environments. As our tools for generating data become more sophisticated +and powerful, we also need to start using more sophisticated and +powerful tools for processing it. Training on how to use these tools and +how to build modern data analysis skills is lacking for researchers, +even though this work is highly time-consuming and technical. As a +consequence of an unawareness of the need for these skills, how +*exactly* data is processed is poorly, if at all, described in +scientific studies. This hidden aspect of research could have major impacts on the reproducibility of studies. Therefore, this course was created specifically to start addressing these types of problems. The course is designed as a series of participatory live-coding lessons, -where the instructor and learner code together, and is interspersed with -hands-on exercises and group work using real-world datasets. This +where the instructor and learners code together, and is interspersed +with hands-on exercises and group work using real-world datasets. This website contains all of the material for the course, from reading material to exercises to images. It is structured as a book, with "chapters" as lessons, given in order of appearance. We make heavy use @@ -49,9 +49,9 @@ Want to contribute to this course? Check out the [README](https://github.com/rostools/r-cubed-intermediate/blob/main/README.md) file as well as the [CONTRIBUTING](https://github.com/rostools/r-cubed-intermediate/blob/main/CONTRIBUTING.md) -file on the GitLab repository for more details. The main way to +file on the GitHub repository for more details. The main way to contribute is by using [GitHub](https://github.com/) and creating a [new -Issue](https://github.com/rostools/r-cubed-intermediate/issues/new) to +issue](https://github.com/rostools/r-cubed-intermediate/issues/new) to make comments and give feedback for the material. ## Target audiences diff --git a/preamble/pre-course.qmd b/preamble/pre-course.qmd index bdce378..61313b2 100644 --- a/preamble/pre-course.qmd +++ b/preamble/pre-course.qmd @@ -156,7 +156,7 @@ Checking Git config settings: survey questions**. *Note* that while GitHub is a natural connection to using Git, given the limited time available, we will not be going over how to use GitHub. If you want to learn about using GitHub, check out -the +this [session](https://r-cubed-intro.rostools.org/sessions/version-control.html) on it in the introduction course. @@ -264,6 +264,9 @@ usethis::with_project( ) ``` +Throughout the course, we will use this document as a sandbox to test +code out and then move the finished code to other files. + ## Download the course data {#sec-download-data} To best demonstrate the concepts in the course, we ideally should work @@ -290,8 +293,9 @@ website](https://physionet.org/content/mmash/1.0.0/): use personal data, but it *does not prohibit sharing it or making it public*! GDPR and Open Data are not in conflict. -> *Note*: Sometimes the PhysioNet website is slow. If that's the case, -> use [**this alternative link**](resources/mmash-page.html) instead. +> *Note*: Sometimes the PhysioNet website, where the MMASH data is +> described, is slow. If that's the case, use [**this alternative +> link**](resources/mmash-page.html) instead. After looking over the MMASH website, you need to setup where to store the dataset to prepare it for later processing. While in your `LearnR3` @@ -672,6 +676,16 @@ fs::file_copy( ) ``` +Notice that in the file above, we have added comments to help segment +sections in the code and explain what is happening in the script. In +general, adding comments to your code helps not only when others read +the script, but also you in the future, if/when you forget what was done +or why it was done. It also creates sections in your code that makes it +easier to get an overview of the code. However, there is a balance here. +Too many comments can negatively impact readability, so as much as +possible, write code in a way that explains what the code is doing, +rather than rely on comments. + You now have the data ready for the course! At this point, please run this function in the Console: @@ -767,3 +781,4 @@ withr::with_dir( } ) ``` + diff --git a/preamble/syllabus.qmd b/preamble/syllabus.qmd index 78f27c3..4bb3d72 100644 --- a/preamble/syllabus.qmd +++ b/preamble/syllabus.qmd @@ -19,22 +19,23 @@ irreproducible results. With this course, we aim to begin addressing this gap. Using a highly practical approach that revolves around code-along sessions (instructor and learner coding together), hands-on exercises, and group work, -participants of the course will be able to: +participants of the course will know: -1. Learn and demonstrate what an open and reproducible data processing - and analysis workflow looks like. -2. Learn and apply some fundamental concepts, techniques, and skills +1. How to demonstrate what an open and reproducible data processing and + analysis workflow looks like. +2. How to apply some fundamental concepts, techniques, and skills needed for processing and managing data in a reproducible and well-documented way. -3. Learn where to go to get help and to continue learning modern data - science and analysis skills. +3. Where to go to get help and to continue learning modern data science + and analysis skills. -By the end of the course, participants will: have improved their -competency in processing and wrangling datasets; have improved their -proficiency in using the [R](https://www.r-project.org/) statistical -computing language; know how to write re-usable and well-documented -code; and know how to make modern and reproducible data analysis -projects. +By the end of the course, participants will: + +1. Have improved their competency in processing and wrangling datasets; +2. Have improved their proficiency in using the + [R](https://www.r-project.org/) statistical computing language; +3. Know how to write re-usable and well-documented code; +4. Know how to make modern and reproducible data analysis projects. ## Is this course for you? {#sec-is-it-for-you} @@ -43,7 +44,7 @@ This course is designed in a specific way and is ideal for you if: - You are a researcher, preferably working in the biomedical field (ranging from experimental to epidemiological). Specifically, this course targets those working on topics in diabetes and metabolism. -- You currently or will soon do quantitative data analysis. +- You currently are or will soon do quantitative data analysis. - You either: - have taken the [introduction to Reproducible Research in R course](https://r-cubed-intro.rostools.org/), since this course @@ -57,10 +58,10 @@ This course is designed in a specific way and is ideal for you if: Considering that this is a natural extension of the [introductory r-cubed course](https://r-cubed-intro.rostools.org/), this course incorporates tools learned during that course, including basic Git usage -as well as use of RStudio R Projects. If you *do not* have familiarity -with these tools, you will need to go over the material from the -introduction course beforehand (more details about pre-course tasks will -be sent out a couple of weeks before the course). +as well as the use of RStudio R Projects. If you *do not* have +familiarity with these tools, you will need to go over the material from +the introduction course beforehand (more details about pre-course tasks +will be sent out a couple of weeks before the course). While having these assumptions help to focus the content of the course, if you have an interest in learning R but don't fit any of the above diff --git a/sessions/dplyr-joins.qmd b/sessions/dplyr-joins.qmd index f029193..5c3352f 100644 --- a/sessions/dplyr-joins.qmd +++ b/sessions/dplyr-joins.qmd @@ -19,6 +19,8 @@ covering.](/images/overview-create-project-data.svg){#fig-overview-create-projec ## Learning objectives +The learning objectives of this session are: + 1. Learn what regular expressions are and how to use them on character data. 2. Learn about and apply the various ways data can be joined. diff --git a/sessions/functionals.qmd b/sessions/functionals.qmd index 277fbe3..22174fa 100644 --- a/sessions/functionals.qmd +++ b/sessions/functionals.qmd @@ -53,8 +53,8 @@ functionals are better coding patterns to use compared to loops. Doing the code-along should also help reinforce this concept. Also highlight that the resources appendix has some links for continued -learning for this and that the RStudio `{purrr}` cheatsheet is an -amazing resource to use. +learning for this and that the Posit `{purrr}` cheatsheet is an amazing +resource to use. ::: ::: callout-note @@ -85,7 +85,7 @@ entire vector (e.g. `c(1, 2, 3, 4)`) and R will know what to do with it. ![A function using vectorization. Notice how a set of items is included *all at once* in the `func()` function and outputs a single item on the -right. Modified from the [RStudio purrr +right. Modified from the [Posit purrr cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/master/purrr.pdf).](/images/vectorization.png){#fig-vectorization width="45%"} @@ -123,7 +123,7 @@ difficult to easily explain. Because of this and because there are better and easier ways of writing R code to replace for loops, we will **not** be covering loops in this course. -A functional on the other hand is a function that can also use a +A **functional** on the other hand is a function that can also use a function as one of its arguments. @fig-functionals shows how the functional `map()` from the `{purrr}` package works by taking a vector (or list), applying a function to each of those items, and outputting @@ -131,10 +131,10 @@ the results from each function. The name `map()` doesn't mean a geographic map, it is the mathematical meaning of map: To use a function on each item in a set of items. -![A functional that uses a function to apply it to each item in a -vector. Notice how each of the green coloured boxes are placed into the -`func()` function and outputs the same number of blue boxes as there are -green boxes. Modified from the [RStudio purrr +![A functional, in this case `map()`, applies a function to each item in +a vector. Notice how each of the green coloured boxes are placed into +the `func()` function and outputs the same number of blue boxes as there +are green boxes. Modified from the [Posit purrr cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/master/purrr.pdf).](/images/functionals.png){#fig-functionals width="90%"} @@ -244,20 +244,21 @@ usethis::use_package("purrr") ``` Since `{purrr}` is part of the `{tidyverse}`, we don't need to load it -with `library()`. The next step for using the `map()` functional is to +with `library()`. Before we'll use the `map()` functional, we need to get a vector or list of all the dataset files available to us. We will return to using the `{fs}` package, which has a function called -`dir_ls()` that finds files of a certain pattern. In our case, the -pattern is `user_info.csv`. So, let's add `library(fs)` to the `setup` -code chunk. Then, go to the bottom of the `doc/learning.qmd` document, -create a new header called `## Using map`, and create a code chunk below -that with {{< var keybind.chunk >}} +`dir_ls()` that finds files of a certain pattern. So, let's add +`library(fs)` to the `setup` code chunk. Then, go to the bottom of the +`doc/learning.qmd` document, create a new header called `## Using map`, +and create a code chunk below that with {{< var keybind.chunk >}} The `dir_ls()` function takes the path that we want to search (`data-raw/mmash/`), uses the argument `regexp` (short for [regular expression](https://r4ds.had.co.nz/strings.html#matching-patterns-with-regular-expressions) or also `regex`) to find the pattern, and `recurse` to look in all -subfolders. We'll cover regular expressions more in the next session. +subfolders. We'll cover regular expressions more in the next session. In +our case, the pattern is `user_info.csv`, so the code should look like +this: ```{r list-user-info-files} #| filename: "doc/learning.qmd" @@ -300,7 +301,7 @@ user_info_list[[1]] This is great because with one line of code we imported all these datasets! But we're missing an important bit of information: The user ID. A powerful feature of the `{purrr}` package is that it has other -functions to make working with functionals easier. We know `map()` +functions to make it easier to work with functionals. We know `map()` always outputs a list. But what we want is a single data frame at the end that also contains the user ID information. @@ -336,9 +337,9 @@ user_info_df |> We're using the base R `|>` pipe rather than the `{magrittr}` pipe `%>%` as more documentation and packages are using or relying on it. In terms of functionality, they are nearly the same, with some small differences. -It ultimately doesn't matter which one you use, but we're using to be -consistent with other documentation and with the general trend to -recommend it over the `{magrittr}` pipe. +It ultimately doesn't matter which one you use, but we're using the base +R `|>` pipe to be consistent with other documentation and with the +general trend to recommend it over the `{magrittr}` pipe. ::: Now that we have this working, let's **add and commit** the changes to @@ -378,12 +379,15 @@ works to import the other three datasets. - Within `function()`, set two new arguments called `file_pattern` and `import_function`. - Within the code, replace and re-write `"user_info.csv"` with - `file_pattern` (this is *without* quotes around it) and + `file_pattern` (this is *without* quotes around it, otherwise R + will interpret it as the pattern to look for in the `regexp` + argument, with the value `"file_pattern"` and not as the value + from `file_pattern` argument we created for our function) and `import_user_info` with `import_function` (also *without* quotes). - Create generic intermediate objects (instead of `user_info_files` and `user_info_df`). So, replace and re-write - `user_info_file` with `data_files` and `user_info_df` with + `user_info_files` with `data_files` and `user_info_df` with `combined_data`. - Use `return(combined_data)` at the end of the function to output the imported data frame. @@ -446,9 +450,9 @@ import_multiple_files("saliva.csv", import_saliva) ## Adding to the processing script and clean up Quarto document -We've now made a function that imports multiple data files based on the -type of data file, we can start using this function directly, like we -did in the exercise above for the saliva data. We've already imported +Now that we've made a function that imports multiple data files based on +the type of data file, we can start using this function directly, like +we did in the exercise above for the saliva data. We've already imported the `user_info_df` previously, but now we should do some tidying up of our Quarto file and to start updating the `data-raw/mmash.R` script. Why are we doing that? Because the Quarto file is only a sandbox to test @@ -474,8 +478,11 @@ To test that things work, we'll create an HTML document from our Quarto document by using the "Render" / "Knit" button at the top of the pane or with {{< var keybind.render >}}. Once it creates the file, it should either pop up or open in the Viewer pane on the side. If it works, then -we can move on and open up the `data-raw/mmash.R` script. Before -continuing, we'll move the `library(fs)` line to right below the +we can move on and open up the `data-raw/mmash.R` script. If not, it +means that there is an issue in your code and that it won't be +reproducible. + +Before continuing, we'll move the `library(fs)` line to right below the `library(here)`. Then, inside `data-raw/mmash.R`, copy and paste the two lines of code in the code chunk above to the bottom of the script. Afterwards, go the top of the script and right below the `library(fs)` @@ -503,7 +510,7 @@ them know they can read more about this in this section. We're taking a quick detour to briefly talk about a concept that perfectly illustrates how vectorization and functionals fit into doing data analysis. The concept is called the -[split-apply-combine](https://r-cubed-intro.rostools.org/session/wrangling.html#split-apply-combine-summarizing-data) +[split-apply-combine](https://r-cubed-intro.rostools.org/sessions/data-management.html#split-apply-combine-summarizing-data) technique, which we covered in the beginner R course. The method is: 1. Split the data into groups (e.g. diabetes status). @@ -513,7 +520,7 @@ technique, which we covered in the beginner R course. The method is: that you can use to make a plot or table). So when you split data into multiple groups, you make a *vector* that -you can than apply (i.e. the *map* functional) some statistical +you can then apply (i.e. using the *map* functional) some statistical technique to each group through *vectorization*. This technique works really well for a range of tasks, including for our task of summarizing some of the MMASH data so we can merge it all into one dataset. @@ -530,8 +537,8 @@ they will see how it works then. We covered this in the introduction course, so we should not cover it again here. ::: -Functionals and vectorization are an integral component of how R works -and they appear throughout many of R's functions and packages. They are +Functionals and vectorization are integral components of how R works and +they appear throughout many of R's functions and packages. They are particularly used throughout the `{tidyverse}` packages like `{dplyr}`. Let's get into some more advanced features of `{dplyr}` functions that work as functionals. Before we continue, re-run the code for getting @@ -564,11 +571,11 @@ it](https://r-cubed-intro.rostools.org/sessions/data-management.html#chaining-fu from the beginner course. ::: -But many `{dplyr}` verbs can also take functions as input. When you -combine `select()` with the `where()` function, you can select different -variables. The `where()` function is a `tidyselect` helper, a set of -functions that make it easier to select variables. Some additional -helper functions are listed in @tbl-tidyselect-helpers. +But many `{dplyr}` verbs can also take functions as input. The +`{tidyselect}` package provides many of such helper functions that make +it easier to select variables. For instance, when you combine `select()` +with the `where()` function, you can easily select different variables. +Some additional helper functions are listed in @tbl-tidyselect-helpers. ```{r tbl-tidyselect-helpers} #| echo: false @@ -741,8 +748,8 @@ saliva_df |> summarise(across(cortisol_norm, list(mean = mean))) ``` -If we wanted to do that for all numeric columns and also calculate -`sd()`: +Now, let's collect some of the concepts from above to calculate the mean +and standard deviation for all numeric columns in the `saliva_df`: ```{r} #| filename: "doc/learning.qmd" @@ -762,9 +769,10 @@ With the RR dataset, each participant had almost 100,000 data points recorded over two days of collection. So if we want to join with the other datasets, we need to calculate summary measures by at least `file_path_id` and also preferably by `day` as well. In this case, we -need to `group_by()` these two variables before summarising that lets us -use the split-apply-combine technique. Let's first summarise by taking -the mean of `ibi_s` (which is the inter-beat interval in seconds): +need to `group_by()` these two variables before summarising. In this +way, we use the split-apply-combine technique. Let's first summarise by +taking the mean of `ibi_s` (which is the inter-beat interval in +seconds): ```{r} #| filename: "doc/learning.qmd" @@ -909,6 +917,14 @@ Like with the `RR.csv` dataset, let's process the `Actigraph.csv` dataset so that it makes it easier to join with the other datasets later. Make sure to read the warning block below. +::: {.callout-warning appearance="default"} +Since the `actigraph_df` dataset is quite large, we **strongly** +recommend not using `View()` or selecting the dataframe in the +Environments pane to view it. For many computers, your R session will +**crash**! Instead type out `glimpse(actigraph_df)` or simply +`actigraph_df` in the Console. +::: + 1. Like usual, create a new Markdown header called e.g. `## Exercise: Summarise Actigraph` and insert a new code chunk below that with {{< var keybind.chunk >}}. @@ -936,14 +952,6 @@ later. Make sure to read the warning block below. 10. **Add and commit** the changes you've made into the Git history with {{< var keybind.git >}}. -::: {.callout-warning appearance="default"} -Since the `actigraph_df` dataset is quite large, we **strongly** -recommend not using `View()` or selecting the dataframe in the -Environments pane to view it. For many computers, your R session will -**crash**! Instead type out `glimpse(actigraph_df)` or simply -`actigraph_df` in the Console. -::: - ```{r solution-summarise-actigraph} #| eval: true #| output: false @@ -1029,4 +1037,3 @@ before we move on to the next exercise. rm(actigraph_df, rr_df) save.image(here::here("_temp/functionals.RData")) ``` - diff --git a/sessions/functions.qmd b/sessions/functions.qmd index b54805a..a42b564 100644 --- a/sessions/functions.qmd +++ b/sessions/functions.qmd @@ -60,7 +60,12 @@ bundled sequence of steps that achieve a specific action. For instance, the `+` (to add) is a function, `mean()` is a function, `[]` (to subset or extract) is a function, and so on. In simple terms, functions are made of a function call, its arguments, and the function body: -`function(argument1, argument2) { ...body with R code... }`. + +``` {.r filename="Console"} +function(argument1, argument2){ + # body of function with R code +} +``` Because R is open source, anyone can see how things work underneath. So, if we want to see what a function does underneath, we type out the @@ -115,19 +120,19 @@ name <- function(argument1, argument2) { ``` Writing your own functions can be absolutely amazing and fun and -powerful... but you also often want to pull your hair out with -frustration at errors that are difficult to understand and fix. The best -way to deal with this is by debugging. Due to time and to the challenge -of making meaningful debugging exercises (solutions to problems are very -dependent on the project), read @sec-extra-material in your own time for -some instructions on debugging and dealing with another common problem -you might encounter with R. +powerful, but you also often want to pull your hair out with frustration +at errors that are difficult to understand and fix. The best way to deal +with this is by debugging. Due to time and to the challenge of making +meaningful debugging exercises (solutions to problems are very dependent +on the project), read @sec-extra-material in your own time for some +instructions on debugging and dealing with another common problem you +might encounter with R. ::: Let's write a simple example. First, create a new Markdown header called `## Making a function` and create a code chunk below that with -{{< var keybind.chunk >}} . Then, inside the function, we'll write this -code out: +{{< var keybind.chunk >}} . Then, inside the code chunk, we'll write +this code out: ```{r create-add-function} #| filename: "doc/learning.qmd" @@ -145,7 +150,7 @@ your new function, with arguments to give it. add_numbers(1, 2) ``` -The function name is fairly good... `add_numbers` is read as "add +The function name is fairly good; `add_numbers` is read as "add numbers". While we generally want to write code that describes what it does by reading it, it's also good practice to add some formal documentation to the function. Use the "Insert Roxygen Skeleton" in the @@ -233,6 +238,10 @@ the time throughout course and that this workflow is also what you'd use in your daily work. ::: +In `doc/learning.qmd`, create a new Markdown header called +`## Import the user data with a function` and create a code chunk below +that with {{< var keybind.chunk >}} . + So, step one. Let's take the code we wrote for importing the `user_info` data and convert that as a function: @@ -435,7 +444,7 @@ ___(here("data-raw/mmash/user_1/saliva.csv")) #| output: false #| code-fold: true #| code-summary: "**Click for the solution**. Only click if you are struggling or are out of time." -#' Import the MMASH saliva dataset. +#' Import the MMASH saliva file. #' #' @param file_path Path to the user saliva data file. #' @@ -668,9 +677,9 @@ add_numbers <- function(num1, num2) { This is *very* **bad practice** and can have some unintended and serious consequences that you might not notice or that won't give any warning or error. The correct way of indicating which package a function comes from -is instead by using `packagename::`, which you've seen and used many -times in this course. We won't get into the reasons why this is -incorrect because it can quickly get quite technical. +is instead by using `packagename::functionname`, which you've seen and +used many times in this course. We won't get into the reasons why this +is incorrect because it can quickly get quite technical. ::: {.callout-note appearance="minimal" collapse="true"} ## Instructor note @@ -683,14 +692,14 @@ hand, `library()` will throw an error if it can't find the package, which is what you expect if your code depends on a package. ::: -Another reason to use `packagename::` for each function from an R -package you use in your own function is that it explicitly tells R (and -us the readers) where the function comes from. Because the same function -name can be used by multiple packages, if you don't explicitly state -which package the function is from, R will use the function that it -finds first... which isn't always the function you meant to use. We also -do this step at the end of making the function because doing it while we -create it can be quite tedious. +Another reason to use `packagename::functionname` for each function from +an R package you use in your own function is that it explicitly tells R +(and us the readers) where the function comes from. Because the same +function name can be used by multiple packages, if you don't explicitly +state which package the function is from, R will use the function that +it finds first - which isn't always the function you meant to use. We +also do this step at the end of making the function because doing it +while we create it can be quite tedious. ::: Alright, let's go into `R/functions.R` and add `readr::` to each of the @@ -787,7 +796,7 @@ ___ <- function(___) { #| output: false #| code-fold: true #| code-summary: "**Click for the solution**. Only click if you are struggling or are out of time." -#' Import the MMASH saliva dataset. +#' Import the MMASH saliva file #' #' @param file_path Path to the user saliva data file. #' @@ -808,7 +817,7 @@ import_saliva <- function(file_path) { return(saliva_data) } -#' Import the MMASH RR dataset (heart beat-to-beat interval). +#' Import the MMASH RR file (heart beat-to-beat interval). #' #' @param file_path Path to the user RR data file. #' @@ -830,7 +839,7 @@ import_rr <- function(file_path) { return(rr_data) } -#' Import the MMASH Actigraph dataset (accelerometer). +#' Import the MMASH Actigraph file (accelerometer). #' #' @param file_path Path to the user Actigraph data file. #' @@ -899,4 +908,3 @@ next session. #| include: false save.image(here::here("_temp/functions.RData")) ``` - diff --git a/sessions/importing.qmd b/sessions/importing.qmd index 3232daf..d648cc7 100644 --- a/sessions/importing.qmd +++ b/sessions/importing.qmd @@ -61,13 +61,13 @@ Specific objectives are to: The ultimate goal for the beginning phases of a data analysis project is to eventually save a version of the raw data that is specific to your -research questions. The first step to processing data is to import it -into R so we can work on it. So for now, we'll open up the -`doc/learning.qmd` file so we can start building and testing out the -code. There should be a `setup` code chunk already be in the file, where -we will put the `library()` code for loading the `{tidyverse}` package, -which has `{readr}` bundled with it, as well as `library(here)`. It -should look like this: +research questions and enables you to conduct your analyses. The first +step to processing data is to import it into R so we can work on it. So +for now, we'll open up the `doc/learning.qmd` file so we can start +building and testing out the code. There should be a `setup` code chunk +already be in the file, where we will put the `library()` code for +loading the `{tidyverse}` package, which has `{readr}` bundled with it, +as well as `library(here)`. It should look like this: ```{{r setup}} library(tidyverse) @@ -78,9 +78,9 @@ This `setup` code chunk is a special, named code chunk that tells R to run this code chunk first whenever you open this Quarto file and run code inside of the file. It's in this `setup` code chunk that we will add `library()` functions when we want to load other packages. After -adding this code chunk, create a new header by typing out -`## Importing raw data`, followed by creating a new code chunk right -below it using {{< var keybind.chunk >}}. +adding this code chunk, create a new header below the `setup` code chunk +by typing out `## Importing raw data`, followed by creating a new code +chunk right below it using {{< var keybind.chunk >}}. ::: callout-note ## Reading task: \~5 minutes @@ -141,10 +141,10 @@ user_1_info_data <- read_csv(user_1_info_file) You'll see the output mention using `spec()` to use in the argument `col_types`. And that it has 5 columns, one called `...1`. If we look at -the CSV file though, we see that there are only four columns with -names... but that technically there is a first empty column without a -column header. So, let's figure out what this message means. Let's go to -the **Console** and type out: +the CSV file though, we see that there are only four columns with names, +but that technically there is a first empty column without a column +header. So, let's figure out what this message means. Let's go to the +**Console** and type out: ``` {.r filename="Console"} ?readr::spec @@ -324,9 +324,9 @@ user_1_info_data ## Instructor note Verbally emphasize that when you read from a larger dataset (that isn't -your project dataset), its better to *explicitly* select the columns you -want. It's faster to import and you make it clear from the beginning -which variables you want. +your project specific dataset), its better to *explicitly* select the +columns you want. It's faster to import and you make it clear from the +beginning which variables you want. ::: Why might we use `spec()` and `col_types`? It's good practice, at least diff --git a/sessions/introduction.qmd b/sessions/introduction.qmd index 0ed54ee..c37dce1 100644 --- a/sessions/introduction.qmd +++ b/sessions/introduction.qmd @@ -175,8 +175,8 @@ Our workflow and process will be something like: Because our *aim* for the R script is to produce a specific dataset output, while the *aim* of a Quarto file is to create an output *document* (like HTML) and to test reproducibility. We - use specific tools or file formats for specific purposes base on - their design. + use specific tools or file formats for specific purposes based + on their design. - Remove the old code from the Quarto document (`doc/learning.qmd`). - Once we've finished prototyping code and moved it over into a more "final" location, we remove left over code because it isn't @@ -190,7 +190,8 @@ Our workflow and process will be something like: transparent and makes it easier to share your code by uploading to GitHub. Using version control should be a standard practice to doing better science since it fits with the philosophy of - doing science. + doing science (e.g., transparency, reproducibility, and + documentation). - **Note**: While we covered GitHub in the introductory course, we can't assume everyone will have taken that course. Because of that, we won't be using GitHub in this course. diff --git a/sessions/pivots.qmd b/sessions/pivots.qmd index abc404e..5062d5e 100644 --- a/sessions/pivots.qmd +++ b/sessions/pivots.qmd @@ -27,6 +27,8 @@ and remind everyone the 'what' and 'why' of what we are doing. ## Learning objectives +The learning objective for this session is: + 1. Using the concept of "pivoting" to arrange data from long to wide and vice versa.