Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #57

Merged
merged 3 commits into from
Mar 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 19 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,22 +31,35 @@
</p>


Generic cookiecutter template to bootstrap your [PyTorch](https://pytorch.org/get-started/locally/) project,
Generic template to bootstrap your [PyTorch](https://pytorch.org/get-started/locally/) project,
read more in the [documentation](https://grok-ai.github.io/nn-template).

## Get started

Generate your project with cookiecutter:
If you already know [cookiecutter](https://github.com/cookiecutter/cookiecutter), just generate your project with:

```bash
cookiecutter https://github.com/grok-ai/nn-template
```

> This is a *parametrized* template that uses [cookiecutter](https://github.com/cookiecutter/cookiecutter).
> Install cookiecutter with:
>
> ```pip install cookiecutter```
<details>
<summary>Otherwise</summary>
Cookiecutter manages the setup stages and delivers to you a personalized ready to run project.

Install it with:
<pre><code>pip install cookiecutter
</code></pre>
</details>

More details in the [documentation](https://grok-ai.github.io/nn-template/getting-started/generation/).

## Strengths
- **Actually works for [research](https://grok-ai.github.io/nn-template/papers/)**!
- Guided setup to customize project bootstrapping;
- Fast prototyping of new ideas, no need to build a new code base from scratch;
- Less boilerplate with no impact on the learning curve (as long as you know the integrated tools);
- Automatize via GitHub actions: testing, stylish documentation deploy, PyPi upload;
- Enforce Python [best practices](https://grok-ai.github.io/nn-template/features/bestpractices/);

## Integrations

Expand All @@ -61,67 +74,3 @@ Avoid writing boilerplate code to integrate:
- [GitHub Actions](https://github.com/features/actions), to run the tests, publish the documentation and to PyPI automatically.
- Python best practices for developing and publishing research projects.

## Structure

The generated projects will contain the following files:

```bash
.
├── conf
│   ├── default.yaml
│   ├── hydra
│   │   └── default.yaml
│   ├── nn
│   │   └── default.yaml
│   └── train
│   └── default.yaml
├── data
│   └── .gitignore
├── docs
│   ├── index.md
│   └── overrides
│   └── main.html
├── .editorconfig
├── .env
├── .env.template
├── env.yaml
├── .flake8
├── .github
│   └── workflows
│   ├── publish.yml
│   └── test_suite.yml
├── .gitignore
├── LICENSE
├── mkdocs.yml
├── .pre-commit-config.yaml
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── src
│   └── awesome_project
│   ├── data
│   │   ├── datamodule.py
│   │   ├── dataset.py
│   │   └── __init__.py
│   ├── __init__.py
│   ├── modules
│   │   ├── __init__.py
│   │   └── module.py
│   ├── pl_modules
│   │   ├── __init__.py
│   │   └── pl_module.py
│   ├── run.py
│   └── ui
│   ├── __init__.py
│   └── run.py
└── tests
├── conftest.py
├── __init__.py
├── test_checkpoint.py
├── test_configuration.py
├── test_nn_core_integration.py
├── test_resume.py
├── test_storage.py
└── test_training.py
```
2 changes: 2 additions & 0 deletions docs/changelog.md → docs/changelog/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
# Changelog

See the changelog in the [releases](https://github.com/grok-ai/nn-template/releases) page.
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/features/bestpractices.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Tooling

The template configures are the tooling necessary for a modern python project.
The template configures are the tooling necessary for a modern Python project.

These include:

Expand Down
6 changes: 5 additions & 1 deletion docs/features/cicd.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CI/CD

The generated project contains two GiHub Actions workflow to run the Test Suite and to publish you project.
The generated project contains two [GiHub Actions](https://github.com/features/actions) workflow to run the Test Suite and to publish you project.

!!! note
You need to enable the GitHub Actions from the settings in your repository.
Expand Down Expand Up @@ -35,6 +35,10 @@ mike deploy 0.1 latest --push
mike set-default latest
```

!!! warning

You do not need to execute these commands if you accepted the optional cookiecutter setup step.

!!! info

Remember to enable the GitHub Pages from the repository settings.
Expand Down
6 changes: 3 additions & 3 deletions docs/features/determinism.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Determinism

The template always logs the seed utilized in order to guarantee reproducibility.
The template always logs the seed utilized in order to guarantee **reproducibility**.

The user specifies a `seed_index` value in the configuration `train/default.yaml`:

```yaml
seed_index: 0
seed_index: 1
deterministic: False
```

Expand All @@ -25,7 +25,7 @@ Setting seed 1273642419 from seeds[1]
in the logger dashboard:

```bash
python src/project/run.py -m train.seed_index=0,1,2,3,4
python src/project/run.py -m train.seed_index=1,2,3,4
```

!!! info
Expand Down
18 changes: 18 additions & 0 deletions docs/features/metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# MetaData

The *bridge* between the Lightning DataModule and the Lightning Module.

It is responsible for collecting data information to be fed to the module.
The Lightning Module will receive an instance of MetaData when instantiated,
both in the train loop or when restored from a checkpoint.

!!! warning

MetaData exposes `save` and `load`. Those are two user-defined methods that specify how to serialize and de-serialize the information contained in its attributes.
This is needed for the checkpointing restore to work properly and **must be
always implemented**, where the metadata is needed.

This decoupling allows the architecture to be parametric (e.g. in the number of classes) and
DataModule/Trainer independent (useful in prediction scenarios).
Examples are the class names in a classification task or the vocabulary in NLP tasks.

6 changes: 3 additions & 3 deletions docs/features/nncore.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# NN Template core

Most of the logic is abstracted from the template into an accompanying library: [`nn-tempalte-core`](https://pypi.org/project/nn-template-core/).
Most of the logic is abstracted from the template into an accompanying library: [`nn-template-core`](https://pypi.org/project/nn-template-core/).

This library contains the logic necessary for the restore, logging, and many other functionalities implemented in the template.

Expand All @@ -11,11 +11,11 @@ This library contains the logic necessary for the restore, logging, and many oth
- `template`: easy to use and customize, hard to update
- `library`: hard to customize, easy to update

With our approach updating most of the functions is extremely easy, it is just a python
With our approach updating most of the functions is extremely easy, it is just a Python
dependency, while maintaing the flexibility of a template.


!!! warning

It is important to not remove the `NNTemplateCore` callback from the instantiated callbacks
It is important to **not** remove the `NNTemplateCore` callback from the instantiated callbacks
in the template. It is used to inject personalized behaviour in the training loop.
68 changes: 56 additions & 12 deletions docs/features/restore.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ The relevant configuration block is in `conf/train/default.yml`:
```yaml
restore:
ckpt_or_run_path: null
mode: null # null, continue, hotstart
mode: null # null, finetune, hotstart, continue
```

## ckpt_or_run_path

The `ckpt_or_run_path` can be a path towards a Lightning Checkpoint or the run identifies.
In case of W&B it is called `run_path` and are in the form of `entity/project/run_id`.
The `ckpt_or_run_path` can be a path towards a Lightning Checkpoint or the run identifiers w.r.t. the logger.
In case of W&B as a logger, they are called `run_path` and are in the form of `entity/project/run_id`.

!!! warning

Expand All @@ -22,30 +22,74 @@ In case of W&B it is called `run_path` and are in the form of `entity/project/ru

## mode

We support three different modes for restoring an experiment:
We support 4 different modes for restoring an experiment:

=== "continue"
=== "null"

```yaml
restore:
mode: continue
mode: null
```
In this `mode` the training continues from the checkpoint **and** the logging continues
in the previous run. No new run is created on the logger dashboard.
In this `mode` no restore happens, and `ckpt_or_run_path` is ignored.


!!! example "Use Case"

This is the default option and allows the user to train the model from
scratch logging into a new run.


=== "finetune"

```yaml
restore:
mode: finetune
```
In this `mode` only the model weights are restored, both the `Trainer` state and the logger run
are *not restored*.


!!! example "Use Case"

As the name suggest, one of the most common use case is when fine
tuning a trained model logging into a new run with a novel training
regimen.

=== "hotstart"

```yaml
restore:
mode: hotstart
```
In this `mode` the training continues from the checkpoint **but** the logging does not.
In this `mode` the training continues from the checkpoint restoring the `Trainer` state **but** the logging does not.
A new run is created on the logger dashboard.

=== "null"

!!! example "Use Case"

Perform different tests in separate logging runs branching from the same trained
model.


=== "continue"

```yaml
restore:
mode: null
mode: continue
```
In this `mode` no restore happens, and `ckpt_or_run_path` is ignored.
In this `mode` the training continues from the checkpoint **and** the logging continues
in the previous run. No new run is created on the logger dashboard.


!!! example "Use Case"

The training execution was interrupted and the user wants to continue it.


!!! tldr "Restore summary"

| | null | finetune | hotstart | continue |
|---------------|------|--------------------|--------------------|--------------------|
| **Model weights** | :x: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| **Trainer state** | :x: | :x: | :white_check_mark: | :white_check_mark: |
| **Logging run** | :x: | :x: | :x: | :white_check_mark: |
3 changes: 2 additions & 1 deletion docs/features/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ storage
└── <project_name>
└── <run_id>
├── checkpoints
│ └── <checkpoint_name>.ckpt
│ └── <checkpoint_name>.ckpt.zip
└── config.yaml
```

Expand All @@ -21,4 +21,5 @@ stored inside the `storage_dir` should be uploaded to the cloud:
logging:
upload:
run_files: true
source: true
```
Loading