-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathp8105_hw1_ys3766.Rmd
84 lines (64 loc) · 2.74 KB
/
p8105_hw1_ys3766.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "p8105_hw1_ys3766"
author: "Yifan Shi"
date: "2024-09-21"
output: github_document
---
```{r setup, echo = FALSE, message = FALSE}
library(tidyverse)
library(ggplot2)
```
The purpose of this file is to present my work for Homework 1
## Problem 1
Load the penguins dataset
```{r}
data("penguins", package = "palmerpenguins")
```
Describe the dataset
```{r}
str(penguins)
ncol(penguins)
nrow(penguins)
summary(penguins)
mean(penguins$flipper_length_mm, na.rm = TRUE)
```
The 'penguins' is a dataset from the 'palmerpenguins' package. It includes `r nrow(penguins)` observations and `r ncol(penguins)` variables. It contains variable "species", "island", "bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "sex" and "year".
The dataset contains data for 344 penguins of 3 species, namely Adelie, Chinstrap and Gentoo. They are from 3 different islands, Biscoe, Dream and Torgersen. The mean flipper length of the penguins is 200.9mm.
Scatterplot of flipper_length_mm vs bill_length_mm:
```{r}
ggplot(penguins, aes(x = bill_length_mm, y = flipper_length_mm, color = species)) +
geom_point()+
labs(title = "Scatterplot of Penguin Flipper Length vs Bill Length",
x = "Bill Length (mm)",
y = "Flipper Length (mm)",
color = "Species")+
theme_minimal()
ggsave("scatterplot.png")
```
## Problem 2
Create a data frame
```{r}
df_hw1 = tibble(
norm = rnorm(10),
vec_logical = norm > 0,
vec_character = letters[1:10],
vec_factor = factor(rep(c("one", "two", "three"), length.out = 10))
)
```
Take the mean of each variable
```{r}
mean_norm <- mean(pull(df_hw1, norm), na.rm = TRUE)
mean_vec_logical <- mean(pull(df_hw1, vec_logical), na.rm = TRUE)
mean_vec_character <- mean(pull(df_hw1, vec_character), na.rm = TRUE)
mean_vec_factor <- mean(pull(df_hw1, vec_factor), na.rm = TRUE)
```
Attempting to take mean of the random sample and logical vector worked, while attempting to take mean of the character vector and factor vector failed. I received the following warning message: argument is not numeric or logical: returning NA
```{r}
numeric_logical <- as.numeric(pull(df_hw1, vec_logical))
numeric_character <- as.numeric(pull(df_hw1, vec_character))
numeric_factor <- as.numeric(pull(df_hw1, vec_factor))
```
Logical variables were converted to numeric, with FALSE = 0 and TRUE = 1.
Character variables were not converted to numeric, and NA was returned.
Factor variables were converted to numeric based on the levels assigned, with one integer representing each level. Although this would allow for mean calculation, taking the mean of them is not useful unless the factor levels have meaningful numeric relationship.
This explains why taking the mean of logical vector worked, while that for character and factor did not work.