Skip to content

Commit

Permalink
feat!: transform into cookiecutter template
Browse files Browse the repository at this point in the history
  • Loading branch information
pppmlt committed Aug 29, 2022
1 parent 8bf2ec3 commit 46b315e
Show file tree
Hide file tree
Showing 21 changed files with 1,014 additions and 89 deletions.
2 changes: 0 additions & 2 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
data/** filter=lfs diff=lfs merge=lfs -text

# let git decide if a file is a text file or a binary files
# for text files, convert all CRLF to LF on check-in (not in working tree)
* text=auto
2 changes: 1 addition & 1 deletion .gitlab-ci-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
before_script:
- eval "$(micromamba shell hook --shell=bash)"
- micromamba activate
- micromamba install -n base --yes --file environment.yml
- micromamba install -n base --yes --file environment.yaml
- micromamba install -n base -c conda-forge --yes git git-lfs
- git config --global --add safe.directory $(pwd)
- pre-commit install
Expand Down
6 changes: 6 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# the ci configuration is split into multiple files
# this allows the semantic versioning (versioning the template itself) to be easily removed
include:
- .gitlab-ci-stages.yaml
- .gitlab-ci-release.yaml
- .gitlab-ci-test.yaml
13 changes: 0 additions & 13 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,24 +38,11 @@ repos:
entry: black
language: system
types: [python]
- id: nbqa-black
name: nbqa-black
entry: nbqa black
language: system
files: \.ipynb
- id: pylint
name: pylint
entry: pylint
language: system
types: [python]
- id: nbqa-pylint
name: nbqa-pylint
entry: nbqa pylint
language: system
files: \.ipynb
args:
- --disable=pointless-statement,duplicate-code,expression-not-assigned
- --const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__)|([a-z_][a-z0-9_]{0,50}))$
- repo: https://github.com/prettier/pre-commit
rev: v2.1.2
hooks:
Expand Down
93 changes: 25 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,46 @@
# Data Science Template Repository
# HSG Data Science Template

Basic setup for a data science project.
## Prerequisites

## Requirements
The template depends on the following software:

`Mamba`
- `mamba` (e.g., `miniforge` with `mamba` a.k.a. `mambaforge` [link](https://github.com/conda-forge/miniforge))
- `python` (>= 3.7)
- `cookiecutter` ([link](https://pypi.org/project/cookiecutter/))
- `git`

---
## Usage Instructions

## Setup Instructions (Unix)

<ol>
<li>Create and enter directory for your new repository</li>
<li>Clone template repository</li>
To setup a new project with the HSG data science template create the project repository in gitlab, run

```
git clone git@ac1-git1.umlaut.com:hsg/hsg/data-science/data-science-project-template.git .
cookiecutter https://ac1-git1.umlaut.com/hsg/hsg/data-science/data-science-templates/data-science-project-template.git
```

<li>Read template version and remove its git history</li>
and fill out the needed information.

```
version_tag=`git describe --tags --abbrev=0`
rm -rf .git
rm .releaserc.json README.md
sed -i -e '/release/d' .gitlab-ci-stages.yml
```
## Development Instructions

<li>Initialize a new repository and push it</li>
Checkout the repository, run

```
git init
git add .
git commit -m "Initialize from data science template version ${version_tag}"
git remote add origin git@ac1-git1.umlaut.com:hsg/YOUR/NEW/PROJECT.git
```

<li>Configure tools</li>

```
env_name=YOUR_NEW_MAMBA_ENV_NAME
mamba env create --name=${env_name} -f environment.yml
mamba activate $env_name
mamba env create -f environment.yaml
mamba activate data-science-project-template
pre-commit install
```

</ol>

---

## Setup Instructions (Windows)

<ol>
<li>Create and enter directory for your new repository</li>
<li>Clone template repository</li>

```
git clone git@ac1-git1.umlaut.com:hsg/hsg/data-science/data-science-project-template.git .
```

<li>Read template version and remove its git history</li>
and start developing.

```
@for /f "delims=" %i in ('git describe --tags --abbrev^=0') do @set version_tag=%i
rmdir /S /Q .git
del .gitlab-ci.yml .releaserc.json README.md
Set-Content -Path ".gitlab-ci-stages.yml" -Value (get-content -Path ".gitlab-ci-stages.yml" | Select-String -Pattern 'release' -NotMatch)
```
## Setup CI/CD pipelines in gitlab

<li>Initialize a new repository and push it</li>
The project is set up to run:

```
git init
git add .
git commit -m "Initialize from data science template version %version_tag%"
git remote add origin git@ac1-git1.umlaut.com:hsg/YOUR/NEW/PROJECT.git
```

<li>Configure tools</li>
- `pre-commit` checks on every new commit pushed to gitlab
- `semantic-release` on every MR to main

```
set env_name=YOUR_NEW_MAMBA_ENV_NAME
mamba env create --name=%env_name% -f environment.yml
mamba activate %env_name%
pre-commit install
```
To enable the CI/CD setup in gitlab please use the following steps:

</ol>
- enable `CI/CD` pipelines in `Settings -> General -> Visibility, project features, permissions`.
- create a project access token named `GITLAB_TOKEN` in `Settings -> Access Tokens` with `Maintainer` role and `api` + `write_repository` scopes.
- copy the token value appearing at the top of the page after the token creation
- create a `masked` and `protected` variable called `GITLAB_TOKEN` in `CI/CD -> Variables` using the previously created token value
7 changes: 7 additions & 0 deletions cookiecutter.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"project_name": "Data Science Project",
"project_description": "A short description of the project.",
"repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '-') }}",
"repo_url": "URL to repository",
"env_name": "{{ cookiecutter.project_name.lower().replace(' ', '-') + '-env' }}"
}
3 changes: 1 addition & 2 deletions environment.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
name: data-science-project-template
channels:
- conda-forge
dependencies:
- black=22.3.0
- nbqa=1.2.3
- ipykernel=6.9.1
- python=3.9.7
- pre-commit=2.18.1
- pylint=2.12.2
17 changes: 17 additions & 0 deletions hooks/post_gen_project.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import subprocess

# initialize new project repository
subprocess.run(["git", "init"], check=True)
subprocess.run(["git", "branch", "-m", "main"], check=True)
subprocess.run(["git", "add", "."], check=True)
subprocess.run(
["git", "commit", "-m", "chore: initialize repo from data science template"], check=True
)
subprocess.run(["git", "remote", "add", "origin", "{{cookiecutter.repo_url}}"], check=True)

# setup environment
subprocess.run(["mamba", "env", "create", "-f", "environment.yaml", "--quiet"], check=True)
subprocess.run(
["mamba", "run", "--no-banner", "-n", "{{cookiecutter.env_name}}", "pre-commit", "install"],
check=True,
)
27 changes: 27 additions & 0 deletions {{cookiecutter.repo_name}}/.commitlintrc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
extends:
- '@commitlint/config-conventional'
rules:
# Scope enum should be customized to match project needs
scope-enum:
- 2
- always
- - scope-1
- scope-2
# Type enum should be consistent across data science projects
type-enum:
- 2
- always
- - build
- chore
- ci
- docs
- feat
- fix
- perf
- refactor
- revert
- style
- test
- data
- explore
- result
5 changes: 5 additions & 0 deletions {{cookiecutter.repo_name}}/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
data/** filter=lfs diff=lfs merge=lfs -text

# let git decide if a file is a text file or a binary files
# for text files, convert all CRLF to LF on check-in (not in working tree)
* text=auto
85 changes: 85 additions & 0 deletions {{cookiecutter.repo_name}}/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# based on: https://github.com/github/gitignore/blob/master/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# IDEs
.idea
.vscode

# Data
data/*
!data/raw/
2 changes: 2 additions & 0 deletions {{cookiecutter.repo_name}}/.gitlab-ci-stages.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
stages:
- test
24 changes: 24 additions & 0 deletions {{cookiecutter.repo_name}}/.gitlab-ci-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.setup_python_env: &setup_python_env
before_script:
- apt-get update && apt-get install -y --no-install-recommends git-core ca-certificates git-lfs
- conda env create -f environment.yaml --name=conda_env
- conda init bash
- source ~/.bashrc
- conda activate conda_env
- pre-commit install

format_and_lint:
stage: test
image: continuumio/miniconda3
<<: *setup_python_env
script:
- pre-commit run --hook-stage manual --all-files
- git fetch
- chmod a+x check_commit_msgs.sh
- ./check_commit_msgs.sh -c "remotes/origin/${CI_COMMIT_BRANCH}" -m "remotes/origin/${CI_DEFAULT_BRANCH}"
rules:
- if: $CI_COMMIT_TAG
when: never
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
when: never
- when: always
5 changes: 2 additions & 3 deletions .gitlab-ci.yaml → {{cookiecutter.repo_name}}/.gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# the ci configuration is split into multiple files
# this allows the semantic versioning (versioning the template itself) to be easily removed
include:
- .gitlab-ci-stages.yml
- .gitlab-ci-release.yml
- .gitlab-ci-test.yml
- .gitlab-ci-stages.yaml
- .gitlab-ci-test.yaml
Loading

0 comments on commit 46b315e

Please sign in to comment.