.. contents::
:local:
:depth: 2
Each subsection here contains a brief description about the suggested usage of these folders.
Any data that needs to be stored locally should be saved in this location. This folder, and all its sub-folders are not version-controlled.
The sub-folders should be used as follows:
external
: Any data that will not be processed at all, such as reference data;raw
: Any raw data before any processing;interim
: Any raw data that has been partially processed and, for whatever reason, needs to be stored before further processing is completed; andprocessed
: Any raw or interim data that has been fully processed into its final state.
The folder paths for these directories are loaded as environment variables by the .envrc
file; to load them
in Python, use any or all of the following code:
import os
DIR_DATA = os.environ.get("DIR_DATA")
DIR_DATA_EXTERNAL = os.environ.get("DIR_DATA_EXTERNAL")
DIR_DATA_RAW = os.environ.get("DIR_DATA_RAW")
DIR_DATA_INTERIM = os.environ.get("DIR_DATA_INTERIM")
DIR_DATA_PROCESSED = os.environ.get("DIR_DATA_PROCESSED")
All documentation for the project should be included in this folder in either Markdown or ReStructuredText files, with acceptable formatting for Sphinx.
To build the documentation, run the docs
command from the Makefile
using the make
utility at the
top-level of this Git repository.
make docs
For further information about writing Sphinx documentation, see the README.md
file in the docs
folder.
All analytical quality assurance (AQA) documents can be found in the docs/aqa
folder. These files document how this
project meets HM Government guidance on producing quality analysis, as described in the
Aqua book.
All Jupyter notebooks should be stored in this folder. The .envrc
file should automatically add the entire
project path into the PYTHONPATH
environment variable - this should allow you to directly import src
in your
notebook.
All outputs from the project should be stored here. This folder path for these directories is loaded as an environment
variable by the .envrc
file; to load them in Python, use the following code:
import os
DIR_OUTPUT = os.environ.get("DIR_OUTPUT")
All functions for this project, should be stored in this folder. All tests should be stored in the
tests
folder, which is one-level above this folder in the main project directory.
The sub-folders should be used as follows:
make_data
: Data processing-related functions;make_features
: Feature-related functions, for example, functions to create features from processed data;make_models
: Model-related functions;make_visualisations
: Functions to produce visualisations;utils
: Utility functions that are helpful in the project.
Feel free to create/rename/delete these folders as required, as they will not be necessary for each and every project.
It is strongly suggested that you import functions in the src
__init__.py
script. You should also
try and use absolute imports in this script whenever possible; relative imports are not discouraged, but can be an
issue for projects where the directory structure is likely to change. See
PEP 328 for further information.
All tests for the functions defined in the src
package should be stored here.
Each subsection here contains a brief description about the files at the top-level of this Git repository.
A file containing environment variables for the Git repository that can be selectively loaded. This uses the
direnv
shell extension; see their documentation for further information.
A configuration file for the flake8
Python package that provides linting. This
file is based on the
common configuration
described on The GDS Way.
A .gitignore
file to ignore certain files and folders from this Git repository. See the
GitHub Help pages for further information.
A pre-commit hook configuration file. See the pre-commit
package documentation for further
details.
A file to store all secrets and credentials as environment variables. This is read-in by .envrc
using the
direnv
shell extension, but is not tracked by Git.
Optional baseline file for the detect-secrets
package; this package detects
secrets, and, in conjunction with pre-commit
, prevents them from being committed to the repository. The baseline file
flags secret-like data that the user deliberately wishes to commit the to repository.
The Makefile
contains a set of commands for the make
utility. Run the help
command for further information at the
top-level of the Git repository.
make help
An overview of the Git repository, including all necessary instructions to execute the code.
A list of Python package requirements for this Git repository, which can be installed using the pip install
command.
pip install --requirement requirements.txt