Skip to content

Commit

Permalink
tidyverse review and updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Juan Caballero committed Mar 19, 2024
1 parent ebac556 commit c5ca9f9
Show file tree
Hide file tree
Showing 8 changed files with 30 additions and 63 deletions.
33 changes: 13 additions & 20 deletions qmd/10_Data_Import.qmd → qmd/07_DataImport.qmd
Original file line number Diff line number Diff line change
@@ -1,22 +1,14 @@
---
title: "10: Data Import with Tidyverse"
author: "David Koppstein"
format:
revealjs:
toc: false
slide-number: true
chalkboard:
buttons: false
preview-links: auto
logo: images/maxplanck-ie.jpg
footer: <https://maxplanck-ie.github.io/Rintro>
css: styles.css
theme: night
title: "07: Data Import with Tidyverse"
author: "Juan Caballero"
---

```{r, child="_setup.qmd"}
```

## Before we begin

- If you haven't already, please execute the following code:
- If you are not using Workbench, please install the following packages:

```{r}
#| eval: false
Expand All @@ -26,15 +18,16 @@ install.packages("dslabs")
```

- The first command will install a large set of packages that are designed to work together, including ggplot2.
- The second command installs some datasets we will use from Rafael Irizarry's book, "Introduction to Data Science."
- The second command installs some datasets we will use from Rafael Irizarry's book, "*Introduction to Data Science*"

## Credit

- This presentation is heavily influenced by Rafael Irizarry's book [Introduction to Data Science](https://rafalab.dfci.harvard.edu/dsbook) and his corresponding course on EdX.
- It also draws on material from previous MPI R courses given by Devon Ryan.
- It also draws on material from previous MPI R courses given by Devon Ryan and David Koppstein.

![](images/irizarry.jpg){.absolute bottom="25" left="100" width="250"}
![](images/devon.jpg){.absolute bottom="25" left="800" width="250"}
![](images/Rafael_Irizarry.jpg){.absolute bottom="25" left="100" width="250"}
![](images/Devon_Ryan.jpeg){.absolute bottom="25" left="450" width="250"}
![](images/David_Koppstein.jpg){.absolute bottom="25" left="800" width="250"}

::: footer
<https://www.edx.org/bio/rafael-irizarry>
Expand All @@ -50,15 +43,15 @@ install.packages("dslabs")

## Data Import in the Tidyverse

The first step for any analysis is importing the data into a machine-readable format. The tidyverse offers the `readr::` and `readxl::` packages, as we have just recently seen.
The first step for any analysis is importing the data into a machine-readable format. The tidyverse offers the `readr::` and `readxl::` packages.

## Data Import in the Tidyverse

```{r}
#| echo: true
# set path to the location for raw data files in the dslabs package and list files
library(dslabs)
path <- system.file("extdata", package="dslabs")
path = system.file("extdata", package="dslabs")
list.files(path)
```

Expand Down
18 changes: 5 additions & 13 deletions qmd/11_Data_Wrangling.qmd → qmd/08_DataWrangling.qmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,11 @@
---
title: "11: Intro to Data Wrangling with Dplyr"
author: "David Koppstein"
format:
revealjs:
toc: false
slide-number: true
chalkboard:
buttons: false
preview-links: auto
logo: images/maxplanck-ie.jpg
css: styles.css
theme: night
title: "08: Intro to Data Wrangling with Dplyr"
author: "Juan Caballero"
---

```{r, child="_setup.qmd"}
```

## Introduction to Dplyr {auto-animate="true"}

Dplyr uses a functions that act as verbs to transform data frames in ways to facilitate data analysis.
Expand Down Expand Up @@ -356,7 +349,6 @@ murders %>% arrange(desc(rate)) %>% head()
```

Side note 1: You can also use the `slice_max` function
Side note 2: Why would Washington DC have a much higher murder rate than all the other states?

## Exercises with gapminder

Expand Down
19 changes: 6 additions & 13 deletions qmd/12_Data_Visualization.qmd → qmd/09_DataVisualization.qmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,11 @@
---
title: "12: Data visualization with ggplot2"
author: "David Koppstein"
format:
revealjs:
toc: false
slide-number: true
chalkboard:
buttons: false
preview-links: auto
logo: images/maxplanck-ie.jpg
css: styles.css
theme: night
title: "09: Data visualization with ggplot2"
author: "Juan Caballero"
---

```{r, child="_setup.qmd"}
```

## What is ggplot2?

- ggplot2 is part of the *tidyverse*, a set of packages created by Hadley Wickham.
Expand All @@ -30,7 +23,7 @@ format:
- Plots in **ggplot2** consist of 3 main components:
- Data: The dataset being summarized
- Geometry: The type of plot (scatterplot, boxplot, barplot, histogram, qqplot, smooth density, etc.)
- Aesthetic mapping: Variables mapped to visual cues, such as x-axis and y-axis values and color
- Aesthetic mapping: Variables mapped to visual cues, such as x-axis and y-axis values and colors

## Graph components {auto-animate="true"}

Expand Down
23 changes: 6 additions & 17 deletions qmd/13_Tidy_Data.qmd → qmd/10_TidyData.qmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,11 @@
---
title: "13: Tidy Data"
author: "David Koppstein"
format:
revealjs:
toc: false
slide-number: true
chalkboard:
buttons: false
preview-links: auto
logo: images/maxplanck-ie.jpg
css: styles.css
theme: night
title: "10: Tidy Data"
author: "Juan Caballero"
---

```{r, child="_setup.qmd"}
```

## What is Tidy Data?

- In tidy data, each row is an observation and each column is a different variable.
Expand Down Expand Up @@ -478,15 +471,11 @@ co2_stats %>%

## Any questions?

![](images/tidyr_cheat_sheet.png)
![](images/complaining-data-science-data-science.jpg)

::: footer
<https://github.com/rstudio/cheatsheets/blob/main/tidyr.pdf>
:::

## Thank you!

- The Koppstein lab will be starting at Uniklinik Düsseldorf in a few months
- There is a position for a Ph.D student interested in genomic and bioinformatic methods applied to blood and brain cancers
- E-mail [[email protected]]([email protected]) if you or anyone else are interested!

Binary file added qmd/images/David_Koppstein.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added qmd/images/Devon_Ryan.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added qmd/images/Rafael_Irizarry.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c5ca9f9

Please sign in to comment.