Merge pull request #18 from NICD-UK/script-templates

Script templates
NICD-UK · Mar 2, 2023 · 31b4998 · 31b4998
2 parents ea7f7f4 + 4ba1aba
commit 31b4998
Show file tree

Hide file tree

Showing 32 changed files with 264 additions and 179 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Project Template
 
-## Usage
+## Setup
 
 To use the project template:
 
@@ -9,7 +9,7 @@ pip install cookiecutter
 cookiecutter https://github.com/NICD-UK/project-template
 ```
 
-You will be prompted for eleven inputs:
+You will be prompted for the following answers:
 
 1. Project Name
 2. Project Directory Name
@@ -18,75 +18,93 @@ You will be prompted for eleven inputs:
 5. Project Sponsor Name
 6. Project Sponsor Email
 7. Project Summary
-8. Raw Data Directory
-9. Language (Python / R)
-10. `venv` Project (No / Yes)
-11. `git` Project (No / Yes)
+8. <a name="language">Language</a>: **Python** or **R**
 
-## Organization
+Then run:
+
+```
+make
+```
+
+This command will:
+
+1. Initialise a virtual environment
+    - `venv` for Python
+    - `renv` for R
+2. Install the packages required for the template scipts
+3. Save the packages to a dependencies file
+    - `requirements.txt` for Python
+    - `renv.lock` for R
+4. Initialise a git repository
+
+## Package Management
+
+To install a package in Python run:
+
+```
+venv/bin/pip install <package>
+```
+
+To install a package in R use the Packages tab in RStudio.
+
+To save the installed packages to the dependencies file run:
+
+```
+make save
+```
+
+To load the packages from the dependencies file run:
+
+```
+make load
+```
+
+## Project Structure
+
+The project has the following structure:
 
 ```
 README.md
-config.yml
 data/
 ├─ clean/
-├─ model/
 ├─ raw/
 ├─ wrangle/
+models/
+presentations/
 reports/
 ├─ clean/
-├─ final/
 ├─ wrangle/
 src/
 ├─ clean/
 ├─ model/
 ├─ wrangle/
 ```
 
-## Data Science Workflow
-
-### 1. Business Understanding
-
-- **Determine Objectives:**
-- **Determine Deliverables:**
-- **Determine Resources:**
-- **Plan Project:** 
-
-### 2. Data Preparation and Understanding
-
-- **Import Data:** 
-- **Clean Data:**
-- **Wrangle Data:**
-
-### 3. Prototyping
-
-- **Develop Data Product:**
-- **Evaluate Data Product:**
-- **Approve Data Product:**
+## Project Charter
 
-### 4. Production
+The `README.md` file is the [Project Charter](https://en.wikipedia.org/wiki/Project_charter). The head of the project charter includes: the project name; the name and email of the project manager; and the name and email of the project sponsor. This is filled out with the answers to the corresponding prompts during setup. The body of the project charter includes:
 
-- **Deploy Data Product:**
-- **Monitor Data Product:**
-- **Maintain Data Product:**
-- **Close Project:**
+- Summary
+- Objectives
+- Deliverables
+- Resources
+- Scope
+- Costs and Benefits
+- Risks and Contingencies
 
-## Guide
+The body of the project charter is filled out during the project scoping phase.
 
-### Clean Data
+## Script Templates
 
-![](figures/clean.drawio.svg)
+There are template scripts for:
 
-1. Create a cleaning script in the `src/clean` directory that imports and cleans the raw data from the `data/raw` directory and writes to the `data/clean/` directory.
-2. The cleaned data is stored in the `data/clean/` directory.
-3. Create a cleaning report in the `report/clean/` directory that reads the cleaned data from the `data/clean/` directory.
-4. The cleaning report in the `report/clean/` directory is used to update the cleaning script in the `src/clean/` directory.
+1. cleaning data in `src/clean/`,
+2. describing data in `reports/clean/`,
+3. wrangling data in `src/wrangle/`,
+4. exploring data in `reports/wrangle`
 
-### Wrangle Data
+available in [Python](https://www.python.org) or [R](https://www.r-project.org). Answer **Python** or **R** to the [Language](#language) prompt during setup for the relevant template scripts. All template scripts include code to read from and write to the appropriate data directories. The template scripts for describing and exploring data generate reports for the cleaned and wrangled data, respectively. There is also a template script for presenting data in `presentations/` available in [Quarto](https://quarto.org).
 
-![](figures/wrangle.drawio.svg)
+## Recommendations
 
-1. Create a wrangling script in the `src/wrangle` directory that reads and wrangles the clean data from the `data/clean/` directory and writes to the `data/wrangle/` directory.
-2. The wragled data is stored in the `data/wrangle/` directory.
-3. Create a wrangling report in the `report/wrangle/` directory that reads the wrangled data from the `data/wrangle/` directory.
-4. The wrangling report in the `report/wrangle/` directory is used to update the wrangling script in the `src/wrangle/` directory.
+For the best experience it is recommended to use the project template with [Visual Studio Code](https://code.visualstudio.com) for Python projects and [RStudio](https://posit.co/products/open-source/rstudio/) for R projects. 
diff --git a/cookiecutter.json b/cookiecutter.json
@@ -6,8 +6,5 @@
     "project_sponsor_name": "Project Sponsor Name",
     "project_sponsor_email": "Project Sponsor Email",
     "project_summary": "Project Summary",
-    "raw_data_directory": "data/raw",
-    "language": ["Python", "R"],
-    "venv_project": ["No", "Yes"],
-    "git_project": ["No", "Yes"]
+    "language": ["Python", "R"]
 }
diff --git a/hooks/post_gen_project.py b/hooks/post_gen_project.py
@@ -2,33 +2,19 @@
 import glob
 import os
 
-venv_project = "{{cookiecutter.venv_project}}"
-git_project = "{{cookiecutter.git_project}}"
 language = "{{cookiecutter.language}}"
 
 # create Python project
 if language == "Python":
+    os.remove("MakefileR")
+    os.rename("MakefilePython", "Makefile")
     os.remove("{{cookiecutter.project_directory_name}}.Rproj")
     for file in glob.glob("**/*.Rmd", recursive=True):
         os.remove(file)
 
 # create R project
 if language == "R":
+    os.remove("MakefilePython")
+    os.rename("MakefileR", "Makefile")
     for file in glob.glob("**/*.py", recursive=True):
         os.remove(file)
-
-# create venv project
-if venv_project == "Yes":   
-    subprocess.run(["python3", "-m", "venv", ".venv"], stdout=subprocess.DEVNULL)
-    subprocess.run([".venv/bin/python", "-m", "pip", "install", "--upgrade", "pip"], stdout=subprocess.DEVNULL)
-
-# gitignore config.yml
-with open(".gitignore", "a") as f:
-    lines = ["\n", "# configuration file\n", "config.yml\n"]
-    f.writelines(lines)
-
-# create git project
-if git_project == "Yes":
-    subprocess.run(["git", "init"], stdout=subprocess.DEVNULL)
-    subprocess.run(["git", "add", "--all"], stdout=subprocess.DEVNULL)
-    subprocess.run(["git", "commit", "-m", "'initial commit'"], stdout=subprocess.DEVNULL)
diff --git a/{{cookiecutter.project_directory_name}}/.gitignore b/{{cookiecutter.project_directory_name}}/.gitignore
@@ -1,14 +1,19 @@
-# .venv directory
-/.venv/
+# venv directory
+/venv/*
+/renv/*
+!renv/activate.R
 
 # data directory
 /data/clean/*
-/data/model/*
 /data/raw/*
 /data/wrangle/*
 
-# notebooks directory
-/notebooks/*
+# models directory
+/models/*
 
 # directory structure
 !.gitkeep
+
+# presentation files
+/presentations/*.html
+/presentations/*_files/
diff --git a/{{cookiecutter.project_directory_name}}/MakefilePython b/{{cookiecutter.project_directory_name}}/MakefilePython
@@ -0,0 +1,26 @@
+.PHONY: all venv save load git
+
+#################################################################################
+# COMMANDS                                                                      #
+#################################################################################
+
+all: venv save git
+
+venv:
+	python3 -m venv venv
+	venv/bin/pip install --upgrade pip
+	venv/bin/pip install ipykernel 
+	venv/bin/pip install pandas 
+	venv/bin/pip install pathlib 
+	venv/bin/pip install ydata-profiling
+
+save:
+	venv/bin/pip freeze > requirements.txt
+
+load:
+	venv/bin/pip install -r requirements.txt
+
+git:
+	git init
+	git add --all
+	git commit -m "initial commit"
diff --git a/{{cookiecutter.project_directory_name}}/MakefileR b/{{cookiecutter.project_directory_name}}/MakefileR
@@ -0,0 +1,26 @@
+.PHONY: all venv save load git
+
+#################################################################################
+# COMMANDS                                                                      #
+#################################################################################
+
+all: venv save git
+
+venv:
+	Rscript -e 'install.packages("renv", repos = "https://cloud.r-project.org/")'
+	Rscript -e 'renv::init(bare = TRUE)'
+	Rscript -e 'renv::install("dlookr")'
+	Rscript -e 'renv::install("glue")'
+	Rscript -e 'renv::install("here")'
+	Rscript -e 'renv::install("readr")'
+
+save:
+	Rscript -e 'renv::snapshot()'
+
+load:
+	Rscript -e 'renv::restore()'
+
+git:
+	git init
+	git add --all
+	git commit -m "initial commit"
diff --git a/{{cookiecutter.project_directory_name}}/config.yml b/{{cookiecutter.project_directory_name}}/config.yml
diff --git a/...ject_directory_name}}/data/model/.gitkeep → ....project_directory_name}}/models/.gitkeep b/...ject_directory_name}}/data/model/.gitkeep → ....project_directory_name}}/models/.gitkeep
diff --git a/{{cookiecutter.project_directory_name}}/presentations/01-presentation.qmd b/{{cookiecutter.project_directory_name}}/presentations/01-presentation.qmd
@@ -0,0 +1,7 @@
+---
+title: "Presentation"
+author: "{{cookiecutter.project_manager_name}}"
+format: revealjs
+---
+
+## Introduction
diff --git a/{{cookiecutter.project_directory_name}}/reports/clean/.gitkeep b/{{cookiecutter.project_directory_name}}/reports/clean/.gitkeep
diff --git a/{{cookiecutter.project_directory_name}}/reports/clean/01-describe.Rmd b/{{cookiecutter.project_directory_name}}/reports/clean/01-describe.Rmd
@@ -0,0 +1,22 @@
+# Load Libraries
+```{r message=FALSE}
+library(dlookr)
+library(glue)
+library(here)
+library(readr)
+```
+
+# Setup
+```{r}
+data_name <- "<data-name>"
+```
+
+# Read Data
+```{r}
+clean_data <- read_rds(here(glue("data/clean/{data_name}.rds")))
+```
+
+# Describe Data
+```{r}
+diagnose_web_report(clean_data)
+```
diff --git a/{{cookiecutter.project_directory_name}}/reports/clean/01-describe.py b/{{cookiecutter.project_directory_name}}/reports/clean/01-describe.py
@@ -0,0 +1,15 @@
+#%% Load Libraries
+import pandas
+from pathlib import Path
+from ydata_profiling import ProfileReport
+
+#%% Setup
+root_path = Path(__file__).parent.parent.parent
+data_name = "<data-name>"
+
+#%% Read Data
+clean_data = pandas.read_pickle(root_path / f"data/clean/{data_name}.pkl")
+
+#%% Describe Datadata 
+profile = ProfileReport(clean_data, title="Description Report")
+profile.to_notebook_iframe()
diff --git a/{{cookiecutter.project_directory_name}}/reports/clean/clean.py b/{{cookiecutter.project_directory_name}}/reports/clean/clean.py
diff --git a/{{cookiecutter.project_directory_name}}/reports/final/.gitkeep b/{{cookiecutter.project_directory_name}}/reports/final/.gitkeep
diff --git a/{{cookiecutter.project_directory_name}}/reports/wrangle/.gitkeep b/{{cookiecutter.project_directory_name}}/reports/wrangle/.gitkeep
diff --git a/{{cookiecutter.project_directory_name}}/reports/wrangle/01-explore.Rmd b/{{cookiecutter.project_directory_name}}/reports/wrangle/01-explore.Rmd
@@ -0,0 +1,22 @@
+# Load Libraries
+```{r message=FALSE}
+library(dlookr)
+library(glue)
+library(here)
+library(readr)
+```
+
+# Setup
+```{r}
+data_name <- "<data-name>"
+```
+
+# Read Data
+```{r}
+wrangle_data <- read_rds(here(glue("data/wrangle/{data_name}.rds")))
+```
+
+# Explore Data
+```{r}
+eda_web_report(wrangle_data)
+```
diff --git a/{{cookiecutter.project_directory_name}}/reports/wrangle/01-explore.py b/{{cookiecutter.project_directory_name}}/reports/wrangle/01-explore.py
@@ -0,0 +1,15 @@
+#%% Load Libraries
+import pandas
+from pathlib import Path
+from ydata_profiling import ProfileReport
+
+#%% Setup
+root_path = Path(__file__).parent.parent.parent
+data_name = "<data-name>"
+
+#%% Read Data
+wrangle_data = pandas.read_pickle(root_path / f"data/wrangle/{data_name}.pkl")
+
+#%% Explore Data
+profile = ProfileReport(wrangle_data, title="Exploration Report")
+profile.to_notebook_iframe()