forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathPA1_template.Rmd
111 lines (102 loc) · 4.55 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: "Peer Graded Assignment_1"
author: "Anna Litvinenko"
date: "8/23/2017"
output:
html_document:
keep_md: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### 1. Reading data
Cleaning the workspace and loading the libraries
```{r PreparingWorkspace, message=FALSE}
remove(list=ls())
library(ggplot2)
library(dplyr)
```
Loading data assuming the zipped data was saved to the working directory
```{r ReadingData}
unzip("activity.zip", exdir = getwd())
activity <- read.csv("activity.csv", na.strings = "NA")
```
Converting dates into date format
```{r converting dates}
activity$date <- as.Date(activity$date)
```
### 2. Histogram of the total number of steps taken each day
Aggregating the total number of steps by date and plotting with ggplot2
```{r hist_TotalSteps}
total.steps.day <- summarise(group_by(activity, date), steps=sum(steps))
qplot(total.steps.day$steps, geom = "histogram", main="Total number of steps taken", xlab = "Number of steps", breaks = seq(0, 24000, by = 3000)) + scale_x_continuous(breaks = seq(0, 24000, by = 3000))
```
### 3. Mean and median number of steps taken each day
```{r MeanAndMedian}
mean(total.steps.day$steps, na.rm = TRUE)
median(total.steps.day$steps, na.rm = TRUE)
```
### 4. Time series plot of the average number of steps taken
Aggregating the average number of steps by interval and plotting with ggplot2
```{r timeser_AverageSteps}
average.steps.interval <- summarise(group_by(activity, interval), steps=mean(
steps, na.rm = TRUE))
ggplot(data=average.steps.interval, aes(x=interval, y=steps))+geom_line()+
ggtitle("Average Number Of Steps Taken Daily")+xlab("5-minute Interval")+
ylab("Number of steps")+theme(axis.text.x=element_text(angle = 90, hjust = 0))+
scale_x_continuous(breaks = pretty(average.steps.interval$interval, n=20))
```
### 5. The 5-minute interval that, on average, contains the maximum number of steps
```{r MaxStepsInterval}
average.steps.interval[average.steps.interval[,"steps"]==max(average.steps.interval$steps, na.rm=TRUE),]
```
### 6. Code to describe and show a strategy for imputing missing data
Taking a look at missing vales
```{r CountingMissingData}
sum(is.na(activity))
NAs <- subset(activity, is.na(activity))
NAs <- transform(NAs, count=1)
table(NAs$date, NAs$count)
```
8 days are completely lost. These NAs will be substituted by the mean number of steps for a particular 5-minute interval across all days
```{r ImputingMissingData}
activity.complete <- activity
for(i in 1:nrow(activity.complete)){
if(is.na(activity.complete$steps[i])==TRUE){
activity.complete$steps[i] <- average.steps.interval$steps[match(activity.complete$interval[i], average.steps.interval$interval)]
}
}
```
Checking if the NAs are gone
```{r CheckingImputing}
sum(is.na(activity.complete))
```
Calculating new mean and median number for complete data
```{r MeanAndMedian_imputed}
total.steps.day.complete <- summarise(group_by(activity.complete, date), steps=sum(steps))
mean(total.steps.day.complete$steps)
median(total.steps.day.complete$steps)
```
### 7. Histogram of the total number of steps taken each day after missing values are imputed
Using imputed data, aggregating the total number of steps by date and plotting with ggplot2
```{r hist_TotalSteps_imputed}
total.steps.day.complete <- summarise(group_by(activity.complete, date), steps=sum(steps))
qplot(total.steps.day.complete$steps, geom = "histogram", main="Total number of steps taken_Complete Data",
xlab = "Number of steps", breaks = seq(0, 24000, by = 3000)) + scale_x_continuous(breaks = seq(0, 24000, by = 3000))
```
### 8. Panel plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends
Recoding dates to "weekday" and "weekend"
```{r RecodingDatesToWeekday(end)}
activity.complete$daytype <- weekdays(activity.complete$date)
activity.complete$day <- ifelse(activity.complete$daytype %in% c("Saturday", "Sunday"), "weekend", "weekday")
```
Aggregating the average number of steps by weekday(end) and plotting with ggplot2
```{r panelplot_AverageStepsByWeekday(end)}
average.steps.interval.day <- summarise(group_by(activity.complete, interval, day), steps=mean(
steps, na.rm = TRUE))
ggplot(data=average.steps.interval.day, aes(x=interval, y=steps))+geom_line()+
ggtitle("Average Number Of Steps Taken Daily")+xlab("5-minute Interval")+
ylab("Number of steps")+theme(axis.text.x=element_text(angle = 90, hjust = 0))+
scale_x_continuous(breaks = pretty(average.steps.interval$interval, n=20))+
facet_grid(day~.)
```