The Velexi Research Project Cookiecutter is intended to streamline the process of setting up a Jupyter-based research project involving computational work (but that is not necessarily centered around data science and/or machine learning models). The structure of this research project template is inspired by Cookiecutter Data Science, Kuygen Tran's Data Science Template, the blog article "Jupyter Notebook Best Practices for Data Science" by Jonathan Whitmore.
-
Support for common research workflows (for both individuals and teams)
-
A directory structure that organizes and separates different components and stages of research: data, exploration/experimentation (e.g., Jupyter notebooks), documentation (e.g., reports, references), and software (e.g., custom functions and test code)
-
Integration with tools that encourage code, data, and scientific quality while promoting research efficiency.
- Code version control: Git
- Data version control: DVC, FastDS
- Experiment tracking: MLflow Tracking
- Automated testing and coverage reporting: pytest, coverage
- Code quality: pre-commit, black, flake8, radon
-
Quick references for software tools (e.g., FastDS, MLflow, Poetry)
-
Support for the Julia programming language
-
Python package and dependency management using Poetry
-
Directory-based development environment isolation with direnv
-
1.2. Setting Up a New Research Project
1.3. Publishing Project Documentation to GitHub Pages
1.4. Known Issues
-
2.1. License
2.2. Repository Contents
2.4. Setting Up to Develop the Cookiecutter
2.5. Additional Notes
-
project_name
: project name -
author
: project's primary author -
email
: primary author's email -
license
: type of license to use for the project -
python_version
: Python versions compatible with project. See the "Dependency sepcification" section of the Poetry documentation for version specifier semantics. -
enable_julia
: flag indicating whether Julia should be enabled for the project
-
Prerequisites
-
Install Git.
-
Install Python 3.9 (or greater).
-
Install Poetry 1.2 (or greater).
Note. The project template uses
poetry
instead ofpip
for management of Python package dependencies. -
Install the Cookiecutter Python package.
-
Optional. Install direnv.
-
-
Use
cookiecutter
to create a new research project.$ cookiecutter https://github.com/velexi-research/VLXI-Cookiecutter-Research.git
-
Set up a dedicated virtual environment for the project. Any of the common virtual environment options (e.g.,
venv
,direnv
,conda
) should work. Below are instructions for setting up adirenv
orpoetry
environment.Note: to avoid conflicts between virtual environments, only one method should be used to manage the virtual environment.
-
direnv
Environment. Note:direnv
manages the environment for both Python and the shell.-
Prerequisite. Install
direnv
. -
Copy
extras/dot-envrc
to the project root directory, and rename it to.envrc
.$ cd $PROJECT_ROOT_DIR $ cp extras/dot-envrc .envrc
-
Grant permission to direnv to execute the .envrc file.
$ direnv allow
-
-
poetry
Environment. Note:poetry
only manages the Python environment (it does not manage the shell environment).-
Create a
poetry
environment that uses a specific Python executable. For instance, ifpython3
is on yourPATH
, the following command creates (or activates if it already exists) a Python virtual environment that usespython3
for the project.$ poetry env use python3
For commands to use other Python executables for the virtual environment, see the Poetry Quick Reference.
-
-
-
Install the base Python package dependencies.
$ poetry install
-
Configure Git.
-
Install the Git pre-commit hooks.
$ pre-commit install
-
Optional. Set up a remote Git repository (e.g., GitHub repository).
-
Create a remote Git repository.
-
Configure the remote Git repository.
$ git remote add origin GIT_REMOTE
where
GIT_REMOTE
is the URL of the remote Git repository. -
Push the
main
branch to the remote Git repository.$ git checkout main $ git push -u origin main
-
-
-
Configure DVC.
-
Initialize DVC (Data Version Control). In the following command
PROJECT_DIR
should be replaced by the path to the newly created research project.-
Using
fds
.$ cd PROJECT_DIR $ fds init $ fds commit -m "Initialize DVC"
-
Using
dvc
+git
.$ cd PROJECT_DIR $ dvc init $ git commit -m "Initialize DVC"
-
-
Add a remote DVC repository.
-
Set up a remote DVC repository (e.g., S3 bucket).
-
Configure the remote DVC repository.
$ dvc remote add -d storage DVC_REMOTE
where
storage
is the name for the remote repository andDVC_REMOTE
is the URL to the remote DVC repository. Note: the-d
option indicates thatstorage
should be used as the default remote DVC repository.
-
-
Configure DVC to automatically stage changes to
*.dvc
files with Git.$ dvc config core.autostage true
-
-
Finish setting up the new research project.
-
Verify the copyright year and owner in the copyright notice. If the project is licensed under Apache License 2.0, the copyright notice is located in the
NOTICE
file. Otherwise, the copyright notice is located in theLICENSE
file. -
Update the base Python package dependencies to the latest available versions.
$ poetry update
-
Review the Python package dependencies for the project, and modify them as needed using the
poetry
CLI tool. For a quick reference ofpoetry
commands, see the Poetry Quick Reference.Packages that may be useful (but are not included by default):
- numpy
- numba
- scipy
- pandas
- scikit-learn
- matplotlib
- seaborn
For instance, to add numpy to the project dependencies, use the command:
$ poetry add numpy
-
Fill in any empty fields in
pyproject.toml
. -
Customize the
README.md
file to reflect the specifics of the project. -
If the project was created with Julia support enabled, configure the Julia package dependencies for the project
julia> ] (...) pkg> instantiate
- Review the Julia package dependencies for the project, and modify them as needed using the Julia package manager. For a quick reference of Julia package manager REPL commands, see the Julia Quick Reference.
-
Commit all updated files (e.g.,
poetry.lock
,Project.toml
) to the project Git repository.
-
-
From the project GitHub repository, navigate to "Settings" > "Pages" (in the "Code and automation" section of the side menu) and configure GitHub Pages to deploy from the
main
branch.- Source: Deploy from a branch
- Branch: main
- Folder: /(root)
-
In the "About" section of the project GitHub repository, set "Website" to the URL for the project GitHub Pages.
-
That's it! Every time the
main
branch is updated, GitHub will automatically build project documentation from theREADME.md
file (and any linked Markdown files) and publish them to the project GitHub Pages.
-
When including
numba
as a project dependency, the Python version constraintpyproject.toml
needs to be more restrictive than the default^3.9
. For numba 0.55, the Python version constraint in[tool.poetry.dependencies]
section ofpyproject.toml
should be set to:python = ">=3.9,<3.11"
The contents of this cookiecutter are covered under the Apache License 2.0
(included in the LICENSE
file). The copyright for this cookiecutter is
contained in the NOTICE
file.
├── README.md <- this file
├── RELEASE-NOTES.md <- cookiecutter release notes
├── LICENSE <- cookiecutter license
├── NOTICE <- cookiecutter copyright notice
├── cookiecutter.json <- cookiecutter configuration file
├── pyproject.toml <- Python project metadata file for
│ cookiecutter development
├── poetry.lock <- Poetry lockfile for cookiecutter
│ development
├── docs/ <- cookiecutter documentation
├── extras/ <- additional files that may be useful for
│ cookiecutter development
├── hooks/ <- cookiecutter scripts that run before
│ and/or after project generation
├── spikes/ <- experimental code
└── {{cookiecutter.__project_name}}/ <- cookiecutter template
See [tool.poetry.dependencies]
section of pyproject.toml
.
-
Set up a dedicated virtual environment for cookiecutter development. See Step 3 from Section 2.1 for instructions on how to set up
direnv
andpoetry
environments. -
Install the Python packages required for development.
$ poetry install
-
Install the Git pre-commit hooks.
$ pre-commit install
-
Make the cookiecutter better!
To update the Python dependencies for the template (contained in the
{{cookiecutter.__project_name}}
directory), use the following procedure to
ensure that Python package dependencies for developing the non-template
components of the cookiecutter (e.g., hooks/pre_gen_project.py
) do not
interfere with Python package dependencies for the template.
-
Create a local clone of the cookiecutter Git repository to use for cookiecutter development.
$ git clone git@github.com:velexi-research/VLXI-Cookiecutter-Research.git
-
Use
cookiecutter
from the local cookiecutter Git repository to create an instance of the template to use for updating Python package dependencies.$ cookiecutter PATH/TO/LOCAL/REPO
-
In the instance of the template, perform the following steps to update the template's Python package dependencies.
-
Set up a virtual environment for developing the template (e.g., a direnv environment).
-
Use
poetry
or manually editpyproject.toml
to (1) make changes to the Python package dependency list and (2) update the versions of Python package dependencies. -
Use
poetry
to update the Python package dependencies and versions recorded in thepoetry.lock
file.
-
-
Update
{{cookiecutter.__project_name}}/pyproject.toml
.-
Copy
pyproject.toml
from the instance of the template to{{cookiecutter.__project_name}}/pyproject.toml
. -
Restore the templated values in the
[tool.poetry]
section to the following:[tool.poetry] name = "{{ cookiecutter.__project_name }}" version = "0.0.0" description = "" license = "{% if cookiecutter.license == 'Apache License 2.0' %}Apache-2.0{% elif cookiecutter.license == 'BSD-3-Clause License' %}BSD-3-Clause{% elif cookiecutter.license == 'MIT License' %}MIT{% endif %}" readme = "README.md" authors = ["{{ cookiecutter.author }} <{{ cookiecutter.email }}>"]
-
-
Update
{{cookiecutter.__project_name}}/poetry.lock
.- Copy
poetry.lock
from the instance of the template to{{cookiecutter.__project_name}}/poetry.lock
.
- Copy
-
Commit the updated
pyproject.toml
andpoetry.lock
files to the Git repository.