Skip to content

Commit

Permalink
Merge branch 'master' of github.com:globeandmail/startr
Browse files Browse the repository at this point in the history
  • Loading branch information
tomcardoso committed Apr 17, 2019
2 parents aa7deec + 3b7628a commit 9c9dd82
Showing 1 changed file with 13 additions and 26 deletions.
39 changes: 13 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ Broadly, the goals of `startr` are to:
* Capture reporting questions and answers from the data
* Generate updatable reports, graphics, and datasets throughout the analysis

Input files are treated as raw and read-only while outputs, including data, plots and reports, are treated as a disposible product. No variables should ever be overwritten or reassigned to prevent order-of-execution accidents.
Input files are treated as raw and read-only while outputs, including data, plots and reports, are treated as a disposable product. No variables should ever be overwritten or reassigned to prevent order-of-execution accidents.


### How do I use this?
## How do I use this?

This template works with R and RStudio, so you'll need both of those installed. Then, just clone down this project, or, better yet, use our scaffolding tool, [`startr-cli`](https://www.github.com/globeandmail/startr-cli).

Expand All @@ -30,7 +30,7 @@ Once the project's cloned, double-click on the `.Rproj` file to start a scoped R
You can then start adding your data and writing your analysis. At The Globe, we like to work in a code editor like Atom or Sublime Text, and use something like [`r-exec`](https://atom.io/packages/r-exec) to send code chunks to RStudio.


### Example workflow using `startr`
## Example workflow using `startr`

Here's how we use `startr` for our own analysis workflow right now. The heart of the project lies in these three files:

Expand All @@ -40,13 +40,13 @@ Here's how we use `startr` for our own analysis workflow right now. The heart of

* **`visualize.R`**: Generates plots.

##### Step 1: Set up your project
#### Step 1: Set up your project

Packages are managed through the `packages` list in the `config.R` file. `devtools` and `here` are loaded by default. The `load_requirements()` function loads, and optionally installs, required packages.

The bulk of the analysis is based on a set of files within the `R` directory which are sourced and run in order by `run.R` at the project root.

Before starting an analysis, you'll want to point to your data files in `config.R` and make sure it's loading all the packages you'll need. For instance, you might want to add the [`cancensus`](https://github.com/mountainMath/cancensus) package. To do that, just add `'cancensus'` to the `packages` vector.
Before starting an analysis, you'll want to point to your data files in `config.R` and make sure it's loading all the packages you'll need. For instance, you might want to add the [`cancensus`](https://github.com/mountainMath/cancensus) package. To do that, just add `'cancensus'` to the `packages` vector. Package suggestions for GIS work, scraping, dataset summaries, etc. are included and commented out to avoid bloat.

Once that's done, you'll want to reference your raw data filenames. For instance, if you're adding pizza delivery data, you'd add this line to the "Files" block in `config.R`:

Expand All @@ -56,7 +56,7 @@ pizza.raw.file <- 'Citywide Pizza Deliveries 1998-2016.xlsx'

Our naming convention is to append `.raw` to variables that reference raw data, and `.file` to variables that are just filename strings.

##### Step 2: Import and process your data
#### Step 2: Import and process your data

In `process.R`, you'll consume the variables you created in `config.R`, clean them up, rename variables, deal with any errors, convert multiple data files to a common structure if necessary, and save out the result, plus some cleanup at the end so as to not pollute the environment. It might look something like this:

Expand Down Expand Up @@ -88,7 +88,7 @@ rm(pizza.raw)

The output files written to `dir_data_processed` (that is, `/data/processed`) aren't checked into Git by design — you should be able to reproduce the analysis-ready files from someone else's project by running `process.R`.

##### Step 2: Do your analysis
#### Step 2: Do your analysis

This part's as simple as consuming that file in `analyze.R` and running with it. It might look something like this:

Expand All @@ -108,7 +108,7 @@ deliveries_monthly <- pizza %>%
)
```

##### Step 3: Visualize your analysis
#### Step 3: Visualize your analysis

You can use `visualize.R` to consume the variables created in `analyze.R`. For instance:

Expand All @@ -127,19 +127,7 @@ plot(plot_deliveries_monthly)
ggsave(plot_deliveries_monthly, file = here::here(dir_plots, 'plot_deliveries_monthly.png'), width = 6.5, height = 6.5)
```


### Notebooks

Notebooks are used to document the data analysis and capture specific queries and response as text, tables, output files and plots.

Notebooks are well-suited to both quick experimentation and larger evolving projects.

Each question or query should be included as a clear heading with notes about relevant fields and details about criteria to include or exclude data. Responses to each query can include tables, graphic plots, summary text with dynamic inline placeholders for values and CSV snapshots for external review.

The notebook should include a link to the primary dataset and additional snapshots as required.


### Directory structure
## Directory structure

```bash
├── data/
Expand All @@ -161,23 +149,22 @@ The notebook should include a link to the primary dataset and additional snapsho
├── run.R # Wrapper file to run the analysis steps, either inline or sourced from component R files.
├── notebook.Rmd # Standard notebook to render reports.
└── startr.Rproj # Rproj file for RStudio

```

An `.nvmrc` is included at the project root for scraping with Node. A `venv` and `requirements.txt` file should be included within the scraper directory if Python is used for scraping.

### Version
## Version

1.0.0

### License
## License

startr © 2019 The Globe and Mail. It is free software, and may be redistributed under the terms specified in our MIT license.

### Get in touch
## Get in touch

If you've got any questions, feel free to send us an email, or give us a shout on Twitter:

[![Michael Pereira](https://avatars0.githubusercontent.com/u/212666?v=3&s=200)](https://github.com/monkeycycle)| [![Tom Cardoso](https://avatars0.githubusercontent.com/u/2408118?v=3&s=200)](https://github.com/tomcardoso)
---|---
[Michael Pereira](mailto:mpereira@globeandmail.com) <br> [@monkeycycle_org](https://www.twitter.com/monkeycycle_org) | [Tom Cardoso](mailto:tcardoso@globeandmail.com) <br> [@tom_cardoso](https://www.twitter.com/tom_cardoso)
[Michael Pereira](mailto:mpereira@globeandmail.com) <br> [@__m_pereira](https://www.twitter.com/__m_pereira) | [Tom Cardoso](mailto:tcardoso@globeandmail.com) <br> [@tom_cardoso](https://www.twitter.com/tom_cardoso)

0 comments on commit 9c9dd82

Please sign in to comment.