Are you grading this submission...please read...

Exploratory Data (exdata-008) Course Project #1

Running The Code

If you are using the src files without cloning, make sure:
- You create a plt folder that exists exists at a level higher than the code mkdir ../plt
- You NEED the src/helper.r file in addition to the four src/plot[1-4].r
  - Future feature could be to include folder creation
Clone the repo and run src/plot[1-4].r
- These files rely on the src/helper.r file which does all the heavy lifting to get and clean the data
The entire project relies on just two add on packages that are auto-installed if missing
- sqldf - for efficent retrival of data from large files
- lubridate - for date/time manipulations

Project Details

Replicate 4 Plots using publicly available data from the UC Irvine Machine Learning Repository. Specifically using the "Individual household electric power consumption Data Set" which is available on the coursera course web site.

The following is heavily borrowed/modified from the orginal assignment source:

Dataset: Electric power consumption [20Mb]
Description: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
Variables: The descriptions of the 9 variables in the dataset can be located at the UCI web site

Loading the data

The assignment description warns:

The dataset has 2,075,259 rows and 9 columns. First calculate a rough estimate of how much memory the dataset will require in memory before reading into R.

So, I decided to only read in the rows that I needed into a dataframe using the sqldf package

In order to filter our targeted data - the where clause in the sql-like statement focuses on the "Date" variable ("m/d/yyyy" format NOT "mm/dd/yyyy" as claimed). Specifically we're given two dates to select, and rather than doing extra transformations we can treat them as strings and check equivalence (faster than determining within a range).

So, assuming we open a file handle with the needed data set: A sqlquery is generated with the two dates and a data frame is the result of the query, with a few extra parameters passes in to handle the header and seperator:

 sqlstatement <- sprintf("select * from fhandle where Date = '%s' or Date = '%s'", sDate, eDate) 
    tdf <- sqldf(sqlstatement, file.format = list(header=TRUE, sep = ";"))

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
figure		figure
plt		plt
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Are you grading this submission...please read...

Exploratory Data (exdata-008) Course Project #1

Running The Code

Project Details

Loading the data

The Plots

Plot 1

Plot 2

Plot 3

Plot 4

About

Releases

Packages

Languages

erebuslabs/ExData_Plotting1

Folders and files

Latest commit

History

Repository files navigation

Are you grading this submission...please read...

Exploratory Data (exdata-008) Course Project #1

Running The Code

Project Details

Loading the data

The Plots

Plot 1

Plot 2

Plot 3

Plot 4

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages