README.md

title	author	date	output
GnCD_Course_Project	Sergey Bushmanov	07/24/2014	html_document

README.md

This README.md is to:

briefly explain the purpose and background of this data cleaning project
provide description of working directory file structure
state naming conventions
describe how the run_analysis.R script works
credit source of the original data

Project purpose and background

The original data collection was performed by recording accelerator and gyroscope mesurements from 30 individuals while they were performing 6 types of physical activities. The goal of this project is to combine and summarize data in a format suitable for further analysis. More specifically, a tidy data set of averages of means and standard deviations of measurements in .txt format should be generated.

File structure

In order to execute run_analysis.R script, zipped raw data should be downloaded from

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

After the file has been downloaded to the working directory of R and unzipped, folder "UCI HAR Dataset" to be renamed to "data" to eliminate spaces in the file path.

After performing these preparation steps the working directory must have the following files to ensure successful running of run_analysis.R script:

"./run_analysis.R" - R script to perform analysis.
"./data/train/X_train.txt" and "./data/test/X_test.txt" - train and test measurement data to be combined
"./data/train/subject_train.txt" and "./data/test/subject_test.txt" - subject id's for train and test data accordingly.
"./data/train/y_train.txt" and "./data/test/y_test.txt" - activity id for train and test data accordingly
"./data/activity_labels.txt" - file matching activity labels and activity id's
"./data/features.txt" - file containing measurement names

No other files are necessary for executing run_analysis.R script.

Files that are not necessary for executing script but critical to understanding the nature of resulting variables:

"./CodeBook.md" - definitions of variables in resulting meanTidyData.txt, data formats, ranges, as well as summary choices made during data processing.

Optional files to peruse:

"./data/README.txt" - detailed explanation of how raw data was recorded, processed, and packed into zipped downloaded file from the source of raw data.

What you got after executing the script:

"./meanTidyData.txt" - resulting tidy data set

Naming conventions

Variables used in the analysis are named according to camelCase convention, i.e. each next word in a variable name starts with a capital letter.

How the run_analysis.R script works

There are five steps in the run_analysis.R script to perform sequentially to arrive from input files to resulting meanTidyData.txt

Step 1. Merge the training and the test sets to create one data set:
- Read measurement train data
- Read and append to the left train subject id and train activity id
- Read measurement test data
- Read and append to the left test subject id and test activity id
- Combine resulting train and test data to obtain mergedData R object
Step 2. Extract only the measurements on the mean and standard deviation for each measurement
- Read measure names
- Find positions (via grep) of only those containing "mean()" and "std()". Disregard others containing "mean" in other forms, e.g. "meanFreq", as not being "true" means.
- Extract columns for positions found, plus two first columns forsubject and individual ids.
- Result of Step 2 is extractedData R object only containing id's for individuals and activities, and means and standard deviations.
Step 3. Use descriptive activity names to name the activities in the data set.
- Read descriptive activity names into object of class Data Frame
- Use ids from this file as levels, and activity labels as labels to factorize extactedData[, 2], that represented activity id's.
- Resulting extractedDataDescriptive will present activities with a descriptive label.
Step 4. Label the data set with descriptive variable names. In the context of this project, a name considered descriptive if it provides some insight into what the variable stand for, as opposed to V1, V2 e.g. As such, fBodyGyroJerkMag.mean. considered descriptive (see CodeBook.md for meaning). Otherwise, names could be too lengthy.
- Make vector namesExtracted of all names of means and standard deviations extracted
- Make vector of valid R names with the help of make.names() R function
- Clean resulting vector of artifacts like "BodyBody", "..." and ".."
- Append c("subject", "activity") to the left and name extractedDataDescriptive with resulting vector of names.
Step 5. Average of each variable for each activity and each subject.
- Aggregate data by subject and activity and calculate means. Resulting meanTidyData considered tidy because
  - the whole table only contains data of similar type (averages)
  - one observation for every row
  - one variable for every column
- Write meanTidyData to meanTidyData.txt

Credit for original data:

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
CodeBook.Rmd		CodeBook.Rmd
CodeBook.md		CodeBook.md
GnCD_Course_Project.Rproj		GnCD_Course_Project.Rproj
README.Rmd		README.Rmd
README.md		README.md
meanTidyData.txt		meanTidyData.txt
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README.md

Project purpose and background

File structure

Naming conventions

How the run_analysis.R script works

Credit for original data:

About

Releases

Packages

Languages

bushmanov/R_Data_Preprocessing

Folders and files

Latest commit

History

Repository files navigation

README.md

Project purpose and background

File structure

Naming conventions

How the run_analysis.R script works

Credit for original data:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages