This is my solution of the course project of Coursera's Getting and Cleaning Data.
It uses the raw data described here: Human Activity Recognition Using Smartphones and processes it into a clean and aggregated data set.
###Collection of the raw data The data was downloaded from the download link at the course project page and then unzipped at the project root.
See directory "UCI HAR Dataset".
See the file README.txt in that directory for more info on the raw data.
##Creating the tidy datafile
Check out the github repository.
Run the file "run_analysis.R" to recreate the tidy data "tidy-data.txt".
Alternatively just download "run_analysis.R" and run it whith the raw data being present in the working directory.
##Description of the variables in the tidy-data.txt file
The tidy data contains 181 observations with 68 variables.
The first two columns indicate the subject index (a particular human, who participated in the test) and the action that the subject was executing while the data was measured.
The rest of the columns are mean values of measured sensor data. All these measurement columns are numeric with value in between -1 ... 1.
To avoid very long column names some abbreviations are used in the names. The column names are made up of these parts:
- time: this measurement was take at a constant rate of 50 Hz
- frequency: values in the frequency domain obtained using a Fast Fourier Transformation on the time domain data
- Acc: Acceleration obtained from tzhe device's built-in accelerometer
- Body: The body acceleration part of the total acceleration
- Grav: The gravity acceleration part of the total acceleration
- Gyro: Data obtained from the device's built-in gyrometer
- Mag: Magnitude
- Jerk: Velocity derived in time to obtain jerk signals
- _mean: Mean value
- _std: Standard Deviation
- _X, _Y, _Z: Acceleration along one specific axis
For better readability of these long, multi-word column names camelCase is used.
See the file from the raw data set for more specific information: "UCI HAR Dataset/features_info.txt".
All columns in the tidy data set:
- subject
- activity
- timeBodyAcc_mean_X
- timeBodyAcc_mean_Y
- timeBodyAcc_mean_Z
- timeBodyAcc_std_X
- timeBodyAcc_std_Y
- timeBodyAcc_std_Z
- timeGravityAcc_mean_X
- timeGravityAcc_mean_Y
- timeGravityAcc_mean_Z
- timeGravityAcc_std_X
- timeGravityAcc_std_Y
- timeGravityAcc_std_Z
- timeBodyAccJerk_mean_X
- timeBodyAccJerk_mean_Y
- timeBodyAccJerk_mean_Z
- timeBodyAccJerk_std_X
- timeBodyAccJerk_std_Y
- timeBodyAccJerk_std_Z
- timeBodyGyro_mean_X
- timeBodyGyro_mean_Y
- timeBodyGyro_mean_Z
- timeBodyGyro_std_X
- timeBodyGyro_std_Y
- timeBodyGyro_std_Z
- timeBodyGyroJerk_mean_X
- timeBodyGyroJerk_mean_Y
- timeBodyGyroJerk_mean_Z
- timeBodyGyroJerk_std_X
- timeBodyGyroJerk_std_Y
- timeBodyGyroJerk_std_Z
- timeBodyAccMag_mean
- timeBodyAccMag_std
- timeGravityAccMag_mean
- timeGravityAccMag_std
- timeBodyAccJerkMag_mean
- timeBodyAccJerkMag_std
- timeBodyGyroMag_mean
- timeBodyGyroMag_std
- timeBodyGyroJerkMag_mean
- timeBodyGyroJerkMag_std
- frequencyBodyAcc_mean_X
- frequencyBodyAcc_mean_Y
- frequencyBodyAcc_mean_Z
- frequencyBodyAcc_std_X
- frequencyBodyAcc_std_Y
- frequencyBodyAcc_std_Z
- frequencyBodyAccJerk_mean_X
- frequencyBodyAccJerk_mean_Y
- frequencyBodyAccJerk_mean_Z
- frequencyBodyAccJerk_std_X
- frequencyBodyAccJerk_std_Y
- frequencyBodyAccJerk_std_Z
- frequencyBodyGyro_mean_X
- frequencyBodyGyro_mean_Y
- frequencyBodyGyro_mean_Z
- frequencyBodyGyro_std_X
- frequencyBodyGyro_std_Y
- frequencyBodyGyro_std_Z
- frequencyBodyAccMag_mean
- frequencyBodyAccMag_std
- frequencyBodyAccJerkMag_mean
- frequencyBodyAccJerkMag_std
- frequencyBodyGyroMag_mean
- frequencyBodyGyroMag_std
- frequencyBodyGyroJerkMag_mean
- frequencyBodyGyroJerkMag_std