forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1.Rmd
executable file
·127 lines (96 loc) · 3.96 KB
/
PA1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r}
unzip('activity.zip')
activityData <- read.csv(file="activity.csv")
```
## What is mean total number of steps taken per day?
### Make a histogram of the total number of steps taken each day
```{r}
stepsData<- aggregate(steps ~ date, data = activityData, FUN=sum)
barplot(stepsData$steps, names.arg=stepsData$date, ylab='No. of Steps',las=2)
```
### Calculate and report the mean and median total number of steps taken per day
The mean number of steps taken per days is:
```{r}
mean(stepsData$steps)
```
The median number of steps taken per days is:
```{r}
median(stepsData$steps)
```
## What is the average daily activity pattern?
### Make a time series plot (i.e. type = "l") of the 5-minute interval (x-axis)
### and the average number of steps taken, averaged across all days (y-axis)
```{r}
stepsData.interval <- aggregate(steps ~ interval, data = activityData, FUN=mean)
plot(stepsData.interval,type="l")
```
### Which 5-minute interval, on average across all the days in the dataset,
### contains the maximum number of steps?
```{r}
stepsData.interval$interval[which.max(stepsData.interval$steps)]
```
## Imputing missing values
### Calculate and report the total number of missing values in the dataset
sum(is.na(avtivityData))
### Devise a strategy for filling in all of the missing values in the dataset. The
### strategy does not need to be sophisticated.
### Mean for 5-minute interval will be used as fillers
```{r}
imputedActivityData <- activityData
imputedActivityData <- merge(activityData, stepsData.interval, by="interval", suffixes=c("",".temp"))
imputedActivityData$steps[is.na(activityData$steps)] <- imputedActivityData$steps.temp[is.na(activityData$steps)]
```
### Create a new dataset that is equal to the original dataset but with the
### missing data filled in.
```{r}
imputedActivityData <- imputedActivityData[,1:3]
```
### Make a histogram of the total number of steps taken each day
```{r}
stepsData.date <- aggregate(steps ~ date, data = imputedActivityData, FUN = sum)
barplot(stepsData.date$steps, names.arg=stepsData.date$date, ylab='No. of Steps',las=2)
```
### Calculate and report the mean and median total number of steps taken per day.
The mean number of steps taken per days is:
```{r}
mean(stepsData.date$steps)
```
The median number of steps taken per days is:
```{r}
median(stepsData.date$steps)
```
### Do these values differ from the estimates from the first part of the assignment?
### What is the impact of imputing missing data on the estimates of the total
### daily number of steps?
Yes, there is a difference. The number of steps is not biased anymore by missing values. The mean number of steps has decreased from before, whereas the median has increased.
## Are there differences in activity patterns between weekdays and weekends?
### Create a new factor variable in the dataset with two levels – “weekday”
### and “weekend” indicating whether a given date is a weekday or weekend day.
```{r}
typeOfDay <- function(date) {
if (weekdays(as.Date(date)) %in% c("Monday", "Tuesday","Wednesday","Thursday","Friday"))
return("Weekday")
else
return("Weekend")
}
imputedActivityData$typeOfDay <- as.factor(sapply(imputedActivityData$date, typeOfDay))
```
### Make a panel plot containing a time series plot (i.e. type = "l") of the
### 5-minute interval (x-axis) and the average number of steps taken, averaged
### across all weekday days or weekend days (y-axis).
```{r}
stepsData.Weekday <- aggregate(steps ~ interval, data = imputedActivityData,
subset = imputedActivityData$typeOfDay == "Weekday", FUN = mean)
stepsData.Weekend <- aggregate(steps ~ interval, data = imputedActivityData,
subset = imputedActivityData$typeOfDay == "Weekend", FUN = mean)
par(mfrow=c(2,1))
plot(stepsData.Weekday, type = "l", main = "Weekday")
plot(stepsData.Weekend, type = "l", main = "Weekend")
```