beeblebrox-project-structure

Default project structure for data science projects

├── LICENSE
├── README.md          <- The top-level README for developers using this project.
│
├── data
|   |
│   ├── processed      <- The final, canonical data sets for modeling.
│   │
│   └── raw            <- The original, immutable data dump.
|
├── notebooks          <- Jupyter notebooks.
|    |
│    ├── poc           <- Proof of concept notebooks.
|    |
│    ├── eda           <- Exploratory data analysis.
│    │
│    ├── modelling     <- Used for modelling exploration and evaluation.
│    │
│    └── evaluation    <- Evaluation of model results.
│
├── models             <- Storage of model weights, nested by model architecture.
│    |
│    ├── ARCH_X        <- Directory for architecture X.
|        ├── ARCH_X_001.h5
|        ├── ARCH_X_001.json
|        │
|        └── ARCH_X_002
│
├── src                 <- Source code to use in this project.
|    |
│    ├── __init__.py    <- Makes src a Python module.
|    |
│    ├── utils.py       <- General utility functions.
|    |
│    ├── main.py        <- Main code for running an API/application.
│    │
│    ├── app            <- Code used in deployment.
│    │
│    ├── preparation    <- Data preparation. Retrieves data and puts in data/raw/, transforms to data/processed/ using src/processing/.
│    │
│    ├── processing     <- Data transformation and pre-processing.
│    │
│    ├── training       <- Code for training and evaluating models.
│    │
│    └── modelling      <- Model classes.
│
├── test                <- Code used for testing src/.
│
├── requirements.txt    <- The requirements file for reproducing the environment.
│
├── pyproject.toml      <- Makes project pip installable so src can be imported.

How to work - phases

POC phase

In this phase, use some kaggle dataset or a small collected dataset. Focus should be on displaying potential and not building a whole sophisticated system.

Post-POC phase

Now it's time to explore the real data. Start by working in notebooks/eda to really understand the data you're working with.

When the code you write in the notebooks becomes robust, turn it into functions in src/. If the notebook is still useful for visualization, experimentation or demonstration purposes, import these functions from src/, otherwise remove the notebook.

Under src/, make sure that all files follow python coding best practices, such as typing, docstrings and naming conventions. All scripts should be used with argparse.

Deployment

Deploying a Python API/application should use the file src/main.py. A deployed API/application should ignore everything in this repo except:

src/app/
src/processing/
src/modelling/

and relevant files in

root/
models/
src/

Naming conventions

Notebooks and executable scripts

To make it easy to run scripts/notebooks in the right order we use a numbering + description convention.

e.g.

01_data_analysis.ipynb
02_model_training.ipynb
03_export_results.py

Model weight directories

Nested by ARCHITECTURE/NBR_ARCHITECTURE.

e.g.

models/efficientnet_b0/efficientnet_b0_001
models/resnet50/resnet50_003

Requirements and installs

Always try to pin your requirements using pip freeze > requirements.txt

Install the repo

in development: pip install -e .
in production: pip install .

Optional files and their locations

Dockerfile: root
.dockerignore: root
.env: root
requirements_dev.txt: root

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

beeblebrox-project-structure

How to work - phases

POC phase

Post-POC phase

Deployment

Naming conventions

Notebooks and executable scripts

Model weight directories

Requirements and installs

Optional files and their locations

About

Releases

Packages

Contributors 3

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
models		models
notebooks		notebooks
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

NordAxon/beeblebrox-project-structure

Folders and files

Latest commit

History

Repository files navigation

beeblebrox-project-structure

How to work - phases

POC phase

Post-POC phase

Deployment

Naming conventions

Notebooks and executable scripts

Model weight directories

Requirements and installs

Optional files and their locations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages