Skip to content

erebuslabs/ExData_Plotting1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Are you grading this submission...please read...

Exploratory Data (exdata-008) Course Project #1

Running The Code

  • If you are using the src files without cloning, make sure:
    • You create a plt folder that exists exists at a level higher than the code mkdir ../plt
    • You NEED the src/helper.r file in addition to the four src/plot[1-4].r
      • Future feature could be to include folder creation
  • Clone the repo and run src/plot[1-4].r
    • These files rely on the src/helper.r file which does all the heavy lifting to get and clean the data
  • The entire project relies on just two add on packages that are auto-installed if missing
    • sqldf - for efficent retrival of data from large files
    • lubridate - for date/time manipulations

Project Details

Replicate 4 Plots using publicly available data from the UC Irvine Machine Learning Repository. Specifically using the "Individual household electric power consumption Data Set" which is available on the coursera course web site.

The following is heavily borrowed/modified from the orginal assignment source:

  • Dataset: Electric power consumption [20Mb]

  • Description: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.

  • Variables: The descriptions of the 9 variables in the dataset can be located at the UCI web site

Loading the data

The assignment description warns:

  • The dataset has 2,075,259 rows and 9 columns. First calculate a rough estimate of how much memory the dataset will require in memory before reading into R.

So, I decided to only read in the rows that I needed into a dataframe using the sqldf package

In order to filter our targeted data - the where clause in the sql-like statement focuses on the "Date" variable ("m/d/yyyy" format NOT "mm/dd/yyyy" as claimed). Specifically we're given two dates to select, and rather than doing extra transformations we can treat them as strings and check equivalence (faster than determining within a range).

So, assuming we open a file handle with the needed data set: A sqlquery is generated with the two dates and a data frame is the result of the query, with a few extra parameters passes in to handle the header and seperator:

 sqlstatement <- sprintf("select * from fhandle where Date = '%s' or Date = '%s'", sDate, eDate) 
    tdf <- sqldf(sqlstatement, file.format = list(header=TRUE, sep = ";"))

The Plots

  • In each section the requested plot is followed by the one created
  • The plots generated in this assignment are in the "plt" directory

Plot 1

plot of chunk unnamed-chunk-2 plot of plot1

Plot 2

plot of chunk unnamed-chunk-3 plot of plot2

Plot 3

plot of chunk unnamed-chunk-4 plot of plot1

Plot 4

plot of chunk unnamed-chunk-5 plot of plot4

About

Plotting Assignment 1 for Exploratory Data Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%