Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script templates #1

Merged
merged 11 commits into from
Feb 9, 2023
Merged
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ pip install cookiecutter
cookiecutter https://github.com/NICD-UK/project-template
```

You will be prompted for nine inputs:
You will be prompted for eleven inputs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider changing "eleven" to "the following"


1. Project Name
2. Project Directory Name
Expand All @@ -19,8 +19,9 @@ You will be prompted for nine inputs:
6. Project Sponsor Email
7. Project Summary
8. Raw Data Directory
9. `venv` Project (No / Yes)
10. `git` Project (No / Yes)
9. Language (Python / R)
10. `venv` Project (No / Yes)
11. `git` Project (No / Yes)

## Organization

Expand All @@ -32,7 +33,6 @@ data/
├─ model/
├─ raw/
├─ wrangle/
Comment on lines 33 to 35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be useful to re-order these in terms of the actual order of the workflow i.e. raw -> wrangle -> model ?

notebooks/
reports/
├─ clean/
├─ final/
Expand All @@ -50,7 +50,7 @@ src/
- **Determine Objectives:**
- **Determine Deliverables:**
- **Determine Resources:**
- **Plan Project:**
- **Plan Project:**

### 2. Data Preparation and Understanding

Expand All @@ -60,16 +60,16 @@ src/

### 3. Prototyping

- **Develop Data Product**
- **Evaluate Data Product**
- **Approve Data Product**
- **Develop Data Product:**
- **Evaluate Data Product:**
- **Approve Data Product:**

### 4. Production

- **Deploy Data Product**
- **Monitor Data Product**
- **Maintain Data Product**
- **Close Project**
- **Deploy Data Product:**
- **Monitor Data Product:**
- **Maintain Data Product:**
- **Close Project:**

## Guide

Expand Down
1 change: 1 addition & 0 deletions cookiecutter.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"project_sponsor_email": "Project Sponsor Email",
"project_summary": "Project Summary",
"raw_data_directory": "data/raw",
"language": ["Python", "R"],
"venv_project": ["No", "Yes"],
"git_project": ["No", "Yes"]
}
14 changes: 14 additions & 0 deletions hooks/post_gen_project.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
import subprocess
import glob
import os

venv_project = "{{cookiecutter.venv_project}}"
git_project = "{{cookiecutter.git_project}}"
language = "{{cookiecutter.language}}"

# create Python project
if language == "Python":
os.remove("{{cookiecutter.project_directory_name}}.Rproj")
for file in glob.glob("**/*.Rmd", recursive=True):
os.remove(file)

# create R project
if language == "R":
for file in glob.glob("**/*.py", recursive=True):
os.remove(file)

# create venv project
if venv_project == "Yes":
Expand Down
Empty file.
16 changes: 16 additions & 0 deletions {{cookiecutter.project_directory_name}}/reports/clean/clean.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Load Libraries
```{r message=FALSE}
library(glue)
library(here)
library(tidyverse)
```

# Setup
```{r}
data_name <- "<data-name>"
```

# Read Data
```{r}
clean_data <- read_rds(here("data", "clean", glue("{data_name}.rds")))
```
10 changes: 10 additions & 0 deletions {{cookiecutter.project_directory_name}}/reports/clean/clean.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#%% Load Libraries
import pandas
from pyprojroot import here
import os

#%% Setup
data_name = "<data-name>"

#%% Read Data
clean_data = pandas.read_pickle(os.path.join(here(), "data", "clean", f"{data_name}.pkl"))
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Load Libraries
```{r message=FALSE}
library(glue)
library(here)
library(tidyverse)
```

# Setup
```{r}
data_name <- "<data-name>"
```

# Read Data
```{r}
wrangle_data <- read_rds(here("data", "wrangle", glue("{data_name}.rds")))
```
10 changes: 10 additions & 0 deletions {{cookiecutter.project_directory_name}}/reports/wrangle/wrangle.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#%% Load Libraries
import pandas
from pyprojroot import here
import os

#%% Setup
data_name = "<data-name>"

#%% Read Data
wrangle_data = pandas.read_pickle(os.path.join(here(), "data", "wrangle", f"{data_name}.pkl"))
28 changes: 28 additions & 0 deletions {{cookiecutter.project_directory_name}}/src/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Transformation Checklist

## Motivation

## Cleaning Checklist

For each data source:

- [ ] read data from `/data/raw/`
- [ ] ...
- [ ] write data to `/data/clean/`

## Wrangling Checklist

For each data product:

- [ ] read data from `/data/clean/`
- [ ] ...
- [ ] write data to `/data/wrangle/`

## Processing

For models:





26 changes: 26 additions & 0 deletions {{cookiecutter.project_directory_name}}/src/clean/clean.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Load Libraries
```{r message=FALSE}
library(glue)
library(here)
library(tidyverse)
```

# Setup
```{r}
data_name <- "<data-name>"
```

# Read Data
```{r}
raw_data <- read_csv(here("data", "raw", glue("{data_name}.csv")))
```

# Clean Data
```{r}
clean_data <- raw_data
```

# Write Data
```{r}
write_rds(clean_data, here("data", "clean", glue("{data_name}.rds")))
```
16 changes: 16 additions & 0 deletions {{cookiecutter.project_directory_name}}/src/clean/clean.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#%% Load Libraries
import pandas
from pyprojroot import here
import os

#%% Setup
data_name = "<data-name>"

#%% Read Data
raw_data = pandas.read_csv(os.path.join(here(), "data", "raw", f"{data_name}.csv"))

#%% Clean Data
clean_data = raw_data

#%% Write Data
clean_data.to_pickle(os.path.join(here(), "data", "clean", f"{data_name}.pkl"))
16 changes: 16 additions & 0 deletions {{cookiecutter.project_directory_name}}/src/model/model.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Load Libraries
```{r message=FALSE}
library(glue)
library(here)
library(tidyverse)
```

# Setup
```{r}
data_name <- "<data-name>"
```

# Read Data
```{r}
wrangle_data <- read_rds(here("data", "wrangle", glue("{data_name}.rds")))
```
10 changes: 10 additions & 0 deletions {{cookiecutter.project_directory_name}}/src/model/model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#%% Load Libraries
import pandas
from pyprojroot import here
import os

#%% Setup
data_name = "<data-name>"

#%% Read Data
wrangle_data = pandas.read_pickle(os.path.join(here(), "data", "wrangle", f"{data_name}.pkl"))
26 changes: 26 additions & 0 deletions {{cookiecutter.project_directory_name}}/src/wrangle/wrangle.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Load Libraries
```{r message=FALSE}
library(glue)
library(here)
library(tidyverse)
```

# Setup
```{r}
data_name <- "<data-name>"
```

# Read Data
```{r}
clean_data <- read_rds(here("data", "clean", glue("{data_name}.rds")))
```

# Wrangle Data
```{r}
wrangle_data <- clean_data
```

# Write Data
```{r}
write_rds(wrangle_data, here("data", "wrangle", glue("{data_name}.rds")))
```
16 changes: 16 additions & 0 deletions {{cookiecutter.project_directory_name}}/src/wrangle/wrangle.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#%% Load Libraries
import pandas
from pyprojroot import here
import os

#%% Setup
data_name = "<data-name>"

#%% Read Data
clean_data = pandas.read_pickle(os.path.join(here(), "data", "clean", f"{data_name}.pkl"))

#%% Clean Data
wrangle_data = clean_data

#%% Write Data
wrangle_data.to_pickle(os.path.join(here(), "data", "wrangle", f"{data_name}.pkl"))
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: No

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX