Skip to content

Latest commit

 

History

History
33 lines (21 loc) · 1.68 KB

README.md

File metadata and controls

33 lines (21 loc) · 1.68 KB

CfA Fellowship 2016 - Technical Question

This is the repository and source code for the technical component of the Code for America Fellowship application.

Language: R (useful for working with datasets to generate quick data)

Presentation format: Markdown table (can also easily output HTML, LaTeX)

####The Prompt Write a program to summarize a simple dataset (a comma-delimited CSV data file). Calculate the number of violations in each category, and the earliest and latest violation date for each category.

Why R? Why Markdown?

R is used widely in statistical computing and data science. While I could have written this in other languages (likely utilizing a CSV parser or implementing my own), I knew R would be capable of quickly reading the data and analyzing subsets of the data.

Because the question wanted a summary of the dataset, R seemed the most efficient choice. If the question also asked for a web interface or other details, then other tools could have been more suitable.

I chose a Markdown table as the presentation format because I wanted the data to be an easily legible table on github, where I am posting the source code. The knitr reporting package allows for multiple types of output formats, including HTML tables, and LaTeX.

Requirements

Running the script

  1. Install R
  2. Install the knitr package in R console (will also install dependencies): install.packages("knitr")
  3. Clone/download this entire repo (including CSV data file)
  4. Navigate to it's directory in the command line
  5. Run Rscript summary.R
  6. Open "data.md" to view results! :)