HW 5, Project 2

wilkelab · Feb 26, 2024 · ffacd0f · ffacd0f
1 parent cd15352
commit ffacd0f
Show file tree

Hide file tree

Showing 15 changed files with 3,601 additions and 5 deletions.
diff --git a/assignments/HW5.Rmd b/assignments/HW5.Rmd
@@ -0,0 +1,82 @@
+---
+title: "Homework 5"
+output:
+  html_document:
+    theme:
+      version: 4
+---
+
+```{r global_options, include=FALSE}
+library(knitr)
+library(tidyverse)
+library(colorspace)
+opts_chunk$set(fig.align="center", fig.height=4, fig.width=5.5)
+
+# data prep:
+olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
+olympics_2002 <- olympics %>%
+  filter(year == 2002, season == "Winter") %>%
+  select(sex) %>%
+  count(sex) %>%
+  pivot_wider(names_from = sex, values_from = n)
+
+#data prep:
+midwest2 <- midwest %>%
+  filter(state != "IN")
+```
+
+**This homework is due on Mar. 7, 2024 at 11:00pm. Please submit as a pdf file on Canvas.**
+
+**Problem 1:  (9 pts)** We will work with the dataset `olympics_2002` that contains the count of all athletes by sex for the 2002 Winter Olympics in Salt Lake City. It has been derived from the `olympics` dataset, which is described here: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
+
+```{r}
+olympics_2002
+```
+Follow these steps and display the modified data frame after each step:
+
+1. Rearrange the data frame into long form. The resulting data frame will have two columns, which you should call `sex` and `count`, respectively. There will be two rows of data, one for female and one for male athletes.
+2. Create a new column in which you calculate the percent of male and female ahtletes.
+3. Rename the values in the column `sex` to "female" and "male". 
+
+```{r}
+# your code here
+```
+
+```{r}
+# your code here
+```
+
+```{r}
+# your code here
+```
+
+**Problem 2: (5 pts)** 
+
+Use the color picker app from the **colorspace** package (`colorspace::choose_color()`) to create a qualitative color scale containing four colors. One of the four colors should be `#A23C42`, so you need to find three additional colors that go with this one. Use the function `swatchplot()` to plot your colors. `swatchplot()` takes in a vector. 
+
+```{r}
+# complete and uncomment
+#my_colors <- c('#A23C42', ...)
+#swatchplot(my_colors)
+```
+
+**Problem 3: (6 pts)** 
+
+For this problem, we will work with the `midwest2` dataset (derived from `midwest`). In the following plot, you may notice that the axis tick labels are smaller than the axis titles, and also in a different color (gray instead of black). 
+
+1. Use the colors you chose in Problem 1 to color the points.
+2. Make the axis tick labels the same size (`size = 12`) and give them the color black (`color = "black"`)
+3. Set the entire plot background to the color `"#FEF8F0"`. Make sure there are no white areas remaining, such as behind the plot panel or under the legend.
+
+```{r}
+ggplot(midwest2, aes(popdensity, percollege, fill = state)) +
+  geom_point(shape = 21, size = 3, color = "white", stroke = 0.2) +
+  scale_x_log10(name = "population density") +
+  scale_y_continuous(name = "percent college educated") +
+  # your color choices go here in a scale function. 
+  theme_classic(12) +
+  theme(
+    # your theme customization code goes here
+  )
+```
+
diff --git a/assignments/HW5.html b/assignments/HW5.html
diff --git a/assignments/Project_2.Rmd b/assignments/Project_2.Rmd
@@ -0,0 +1,49 @@
+---
+title: "Project 2"
+output: html_document
+---
+
+```{r setup, include=FALSE}
+library(tidyverse)
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+
+In this project, you will be working with a dataset about the members of Himalayan expeditions:
+```{r message = FALSE}
+members <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv')
+
+members
+```
+
+More information about the dataset can be found at https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-22/readme.md and https://www.himalayandatabase.com/.
+
+**Hints:**
+
+- Make sure your two questions are actually questions, and not veiled instructions to perform a particular analysis.
+
+- Remember your code needs to contain at least three data manipulation functions for data wrangling before you plot. You are allowed to put all the data wrangling into the answer for one of the two questions.
+
+- Adjust `fig.width` and `fig.height` in the chunk headers to customize figure sizing and figure aspect ratios.
+
+You can delete these instructions from your project. Please also delete text such as *Your approach here* or `# Q1: Your R code here`.
+
+**Question 1:** *Your question 1 here.*
+
+**Question 2:** *Your question 2 here.*
+
+**Introduction:** *Your introduction here.*
+
+**Approach:** *Your approach here.*
+
+**Analysis:**
+
+```{r fig.width = 5, fig.height = 5}
+# Q1: Your R code here
+```
+
+```{r fig.width = 5, fig.height = 5}
+# Q2: Your R code here
+```
+
+**Discussion:** *Your discussion of results here.*
diff --git a/assignments/Project_2.html b/assignments/Project_2.html
diff --git a/assignments/Project_2_instructions.html b/assignments/Project_2_instructions.html
diff --git a/assignments/Project_2_rubric.pdf b/assignments/Project_2_rubric.pdf
diff --git a/docs/assignments/HW5.Rmd b/docs/assignments/HW5.Rmd
@@ -0,0 +1,82 @@
+---
+title: "Homework 5"
+output:
+  html_document:
+    theme:
+      version: 4
+---
+
+```{r global_options, include=FALSE}
+library(knitr)
+library(tidyverse)
+library(colorspace)
+opts_chunk$set(fig.align="center", fig.height=4, fig.width=5.5)
+
+# data prep:
+olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
+olympics_2002 <- olympics %>%
+  filter(year == 2002, season == "Winter") %>%
+  select(sex) %>%
+  count(sex) %>%
+  pivot_wider(names_from = sex, values_from = n)
+
+#data prep:
+midwest2 <- midwest %>%
+  filter(state != "IN")
+```
+
+**This homework is due on Mar. 7, 2024 at 11:00pm. Please submit as a pdf file on Canvas.**
+
+**Problem 1:  (9 pts)** We will work with the dataset `olympics_2002` that contains the count of all athletes by sex for the 2002 Winter Olympics in Salt Lake City. It has been derived from the `olympics` dataset, which is described here: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
+
+```{r}
+olympics_2002
+```
+Follow these steps and display the modified data frame after each step:
+
+1. Rearrange the data frame into long form. The resulting data frame will have two columns, which you should call `sex` and `count`, respectively. There will be two rows of data, one for female and one for male athletes.
+2. Create a new column in which you calculate the percent of male and female ahtletes.
+3. Rename the values in the column `sex` to "female" and "male". 
+
+```{r}
+# your code here
+```
+
+```{r}
+# your code here
+```
+
+```{r}
+# your code here
+```
+
+**Problem 2: (5 pts)** 
+
+Use the color picker app from the **colorspace** package (`colorspace::choose_color()`) to create a qualitative color scale containing four colors. One of the four colors should be `#A23C42`, so you need to find three additional colors that go with this one. Use the function `swatchplot()` to plot your colors. `swatchplot()` takes in a vector. 
+
+```{r}
+# complete and uncomment
+#my_colors <- c('#A23C42', ...)
+#swatchplot(my_colors)
+```
+
+**Problem 3: (6 pts)** 
+
+For this problem, we will work with the `midwest2` dataset (derived from `midwest`). In the following plot, you may notice that the axis tick labels are smaller than the axis titles, and also in a different color (gray instead of black). 
+
+1. Use the colors you chose in Problem 1 to color the points.
+2. Make the axis tick labels the same size (`size = 12`) and give them the color black (`color = "black"`)
+3. Set the entire plot background to the color `"#FEF8F0"`. Make sure there are no white areas remaining, such as behind the plot panel or under the legend.
+
+```{r}
+ggplot(midwest2, aes(popdensity, percollege, fill = state)) +
+  geom_point(shape = 21, size = 3, color = "white", stroke = 0.2) +
+  scale_x_log10(name = "population density") +
+  scale_y_continuous(name = "percent college educated") +
+  # your color choices go here in a scale function. 
+  theme_classic(12) +
+  theme(
+    # your theme customization code goes here
+  )
+```
+
diff --git a/docs/assignments/HW5.html b/docs/assignments/HW5.html
diff --git a/docs/assignments/Project_2.Rmd b/docs/assignments/Project_2.Rmd
@@ -0,0 +1,49 @@
+---
+title: "Project 2"
+output: html_document
+---
+
+```{r setup, include=FALSE}
+library(tidyverse)
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+
+In this project, you will be working with a dataset about the members of Himalayan expeditions:
+```{r message = FALSE}
+members <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv')
+
+members
+```
+
+More information about the dataset can be found at https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-22/readme.md and https://www.himalayandatabase.com/.
+
+**Hints:**
+
+- Make sure your two questions are actually questions, and not veiled instructions to perform a particular analysis.
+
+- Remember your code needs to contain at least three data manipulation functions for data wrangling before you plot. You are allowed to put all the data wrangling into the answer for one of the two questions.
+
+- Adjust `fig.width` and `fig.height` in the chunk headers to customize figure sizing and figure aspect ratios.
+
+You can delete these instructions from your project. Please also delete text such as *Your approach here* or `# Q1: Your R code here`.
+
+**Question 1:** *Your question 1 here.*
+
+**Question 2:** *Your question 2 here.*
+
+**Introduction:** *Your introduction here.*
+
+**Approach:** *Your approach here.*
+
+**Analysis:**
+
+```{r fig.width = 5, fig.height = 5}
+# Q1: Your R code here
+```
+
+```{r fig.width = 5, fig.height = 5}
+# Q2: Your R code here
+```
+
+**Discussion:** *Your discussion of results here.*
diff --git a/docs/assignments/Project_2.html b/docs/assignments/Project_2.html
diff --git a/docs/assignments/Project_2_instructions.html b/docs/assignments/Project_2_instructions.html
diff --git a/docs/assignments/Project_2_rubric.pdf b/docs/assignments/Project_2_rubric.pdf
diff --git a/docs/schedule.html b/docs/schedule.html
@@ -2722,6 +2722,13 @@ <h3 id="homework-4-due-feb-29-2024">Homework 4 (due Feb 29, 2024)</h3>
 <li><a href="assignments/HW4.html">HTML</a></li>
 </ul>
 <h3 id="homework-5-due-mar-7-2024">Homework 5 (due Mar 7, 2024)</h3>
+<p class="nospace">
+Materials:
+</p>
+<ul>
+<li><a href="assignments/HW5.Rmd">R Markdown template</a></li>
+<li><a href="assignments/HW5.html">HTML</a></li>
+</ul>
 <h3 id="homework-6-due-apr-4-2024">Homework 6 (due Apr 4, 2024)</h3>
 <h3 id="homework-7-due-apr-11-2024">Homework 7 (due Apr 11, 2024)</h3>
 <h2 id="projects">Projects</h2>
@@ -2738,6 +2745,16 @@ <h3 id="project-1-due-feb-15-2024">Project 1 (due Feb 15, 2024)</h3>
 <li><a href="assignments/Project_1_example.html">Example project</a></li>
 </ul>
 <h3 id="project-2-due-mar-21-2024">Project 2 (due Mar 21, 2024)</h3>
+<p class="nospace">
+Materials:
+</p>
+<ul>
+<li><a href="assignments/Project_2_instructions.html">Instructions</a></li>
+<li><a href="assignments/Project_2.Rmd">Project Template (Rmd)</a></li>
+<li><a href="assignments/Project_2.html">Project Template (HTML)</a></li>
+<li><a href="assignments/Project_2_rubric.pdf">Grading rubric</a></li>
+</ul>
+<p>Please use the example and the solutions from Project 1 as examples for Project 2.</p>
 <h3 id="project-3-due-apr-18-2024">Project 3 (due Apr 18, 2024)</h3>
 <h2 class="appendix" id="reuse">Reuse</h2>
 <p>Text and figures are licensed under Creative Commons Attribution <a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a>. Any computer code (R, HTML, CSS, etc.) in slides and worksheets, including in slide and worksheet sources, is also licensed under <a href="https://github.com/wilkelab/SDS375/LICENSE.md">MIT</a>. Note that figures in slides may be pulled in from external sources and may be licensed under different terms. For such images, image credits are available in the slide notes, accessible via pressing the letter ‘p’.</p>