This is the repository and source code for the technical component of the Code for America Fellowship application.
Language: R (useful for working with datasets to generate quick data)
Presentation format: Markdown table (can also easily output HTML, LaTeX)
####The Prompt Write a program to summarize a simple dataset (a comma-delimited CSV data file). Calculate the number of violations in each category, and the earliest and latest violation date for each category.
R is used widely in statistical computing and data science. While I could have written this in other languages (likely utilizing a CSV parser or implementing my own), I knew R would be capable of quickly reading the data and analyzing subsets of the data.
Because the question wanted a summary of the dataset, R seemed the most efficient choice. If the question also asked for a web interface or other details, then other tools could have been more suitable.
I chose a Markdown table as the presentation format because I wanted the data to be an easily legible table on github, where I am posting the source code. The knitr reporting package allows for multiple types of output formats, including HTML tables, and LaTeX.
- R - https://www.r-project.org/
- knitr package + dependencies - http://yihui.name/knitr/
- Install R
- Install the knitr package in R console (will also install dependencies):
install.packages("knitr")
- Clone/download this entire repo (including CSV data file)
- Navigate to it's directory in the command line
- Run
Rscript summary.R
- Open "data.md" to view results! :)