forked from litannalex/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
114 lines (105 loc) · 4.76 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
title: "Peer Graded Assignment_1"
author: "Anna Litvinenko"
date: "8/23/2017"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### 1. Reading data
Cleaning the workspace and loading the libraries
```{r preparing workspace, message=FALSE}
remove(list=ls())
library(ggplot2)
library(dplyr)
```
Loading data assuming the zipped data was saved to the working directory
```{r reading data}
unzip("activity.zip", exdir = getwd())
activity <- read.csv("activity.csv", na.strings = "NA")
```
Converting dates into date format
```{r converting dates}
activity$date <- as.Date(activity$date)
```
### 2. Histogram of the total number of steps taken each day
Aggregating the total number of steps by date and plotting with ggplot2
```{r hist_total steps}
total.steps.day <- summarise(group_by(activity, date), steps=sum(steps))
ggplot(data=total.steps.day, aes(x=date, y=steps))+geom_bar(stat="identity")+
ggtitle("Total Number Of Steps Taken Each Day")+xlab("Date")+
ylab("Number of steps")+theme(axis.text.x=element_text(angle = 90, hjust = 0))+
scale_x_date(date_labels="%d %b %y",date_breaks ="5 days")
```
### 3. Mean and median number of steps taken each day
```{r mean and median steps}
mean(total.steps.day$steps, na.rm = TRUE)
median(total.steps.day$steps, na.rm = TRUE)
```
### 4. Time series plot of the average number of steps taken
Aggregating the average number of steps by interval and plotting with ggplot2
```{r timeser_average steps}
average.steps.interval <- summarise(group_by(activity, interval), steps=mean(
steps, na.rm = TRUE))
ggplot(data=average.steps.interval, aes(x=interval, y=steps))+geom_line()+
ggtitle("Average Number Of Steps Taken Daily")+xlab("5-minute Interval")+
ylab("Number of steps")+theme(axis.text.x=element_text(angle = 90, hjust = 0))+
scale_x_continuous(breaks = pretty(average.steps.interval$interval, n=20))
```
### 5. The 5-minute interval that, on average, contains the maximum number of steps
```{r max steps interval}
average.steps.interval[average.steps.interval[,"steps"]==max(average.steps.interval$steps, na.rm=TRUE),]
```
### 6. Code to describe and show a strategy for imputing missing data
Taking a look at missing vales
```{r counting missing data}
sum(is.na(activity))
NAs <- subset(activity, is.na(activity))
NAs <- transform(NAs, count=1)
table(NAs$date, NAs$count)
```
8 days are completely lost. These NAs will be substituted by the mean number of steps for a particular 5-minute interval across all days
```{r imputing missing data}
activity.complete <- activity
for(i in 1:nrow(activity.complete)){
if(is.na(activity.complete$steps[i])==TRUE){
activity.complete$steps[i] <- average.steps.interval$steps[match(activity.complete$interval[i], average.steps.interval$interval)]
}
}
```
Checking if the NAs are gone
```{r checking imputing}
sum(is.na(activity.complete))
```
Calculating new mean and median number for complete data
```{r mean and median steps_imputed}
total.steps.day.complete <- summarise(group_by(activity.complete, date), steps=sum(steps))
mean(total.steps.day.complete$steps)
median(total.steps.day.complete$steps)
```
### 7. Histogram of the total number of steps taken each day after missing values are imputed
Using imputed data, aggregating the total number of steps by date and plotting with ggplot2
```{r hist_total steps_imputed}
total.steps.day.complete <- summarise(group_by(activity.complete, date), steps=sum(steps))
ggplot(data=total.steps.day.complete, aes(x=date, y=steps))+geom_bar(stat="identity")+
ggtitle("Total Number Of Steps Taken Each Day_Complete Data")+xlab("Date")+
ylab("Number of steps")+theme(axis.text.x=element_text(angle = 90, hjust = 0))+
scale_x_date(date_labels="%d %b %y",date_breaks ="5 days")
```
### 8. Panel plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends
Recoding dates to "weekday" and "weekend"
```{r recoding dates to weekday(end)}
activity.complete$daytype <- weekdays(activity.complete$date)
activity.complete$day <- ifelse(activity.complete$daytype %in% c("Saturday", "Sunday"), "weekend", "weekday")
```
Aggregating the average number of steps by weekday(end) and plotting with ggplot2
```{r panelplot_average steps by weekday(end)}
average.steps.interval.day <- summarise(group_by(activity.complete, interval, day), steps=mean(
steps, na.rm = TRUE))
ggplot(data=average.steps.interval.day, aes(x=interval, y=steps))+geom_line()+
ggtitle("Average Number Of Steps Taken Daily")+xlab("5-minute Interval")+
ylab("Number of steps")+theme(axis.text.x=element_text(angle = 90, hjust = 0))+
scale_x_continuous(breaks = pretty(average.steps.interval$interval, n=20))+
facet_grid(day~.)
```