Skip to content

Commit

Permalink
latest
Browse files Browse the repository at this point in the history
  • Loading branch information
htahir1 committed Nov 23, 2023
1 parent 19b1f09 commit f07c9ca
Show file tree
Hide file tree
Showing 2 changed files with 248 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
description: Recommended repository structure and best practices.
---

# Follow best practices

Until now, you probably have kept all your code in one single file. In production, it is recommended to split up your steps and pipelines into separate files.

```markdown
.
├── .dockerignore
├── Dockerfile
├── steps
│ ├── loader_step
│ │ ├── .dockerignore (optional)
│ │ ├── Dockerfile (optional)
│ │ ├── loader_step.py
│ │ └── requirements.txt (optional)
│ └── training_step
│ └── ...
├── pipelines
│ ├── training_pipeline
│ │ ├── .dockerignore (optional)
│ │ ├── config.yaml (optional)
│ │ ├── Dockerfile (optional)
│ │ ├── training_pipeline.py
│ │ └── requirements.txt (optional)
│ └── deployment_pipeline
│ └── ...
├── notebooks
│ └── *.ipynb
├── requirements.txt
├── .zen
└── run.py
```

Check out how to initialize your project from a template following best practices in the [Project templates](./using-project-templates.md#generating-project-from-a-project-template) section.

#### Steps

Keep your steps in separate Python files. This allows you to optionally keep their utils, dependencies, and Dockerfiles separate.


#### Logging

ZenML records the root python logging handler's output into the artifact store as a side-effect of running a step. Therefore, when writing steps, use the `logging` module to record logs, to ensure that these logs then show up in the ZenML dashboard.

```python
# Use ZenML handler
from zenml.logger import get_logger

logger = get_logger(__name__)
...

@step
def training_data_loader():
# This will show up in the dashboard
logger.info("My logs")
```

#### Pipelines

Just like steps, keep your pipelines in separate Python files. This allows you to optionally keep their utils, dependencies, and Dockerfiles separate.

It is recommended that you separate the pipeline execution from the pipeline definition so that importing the pipeline does not immediately run it. See [run.py](follow-best-practices.md) for more details.

{% hint style="warning" %}
Do not give pipelines or pipeline instances the name "pipeline". Doing this will overwrite the imported `pipeline` and decorator and lead to failures at later stages if more pipelines are decorated there.
{% endhint %}

{% hint style="info" %}
Pipeline names are their unique identifiers, so using the same name for different pipelines will create a mixed history where two versions of a pipeline are two very different entities.
{% endhint %}

#### .dockerignore

Containerized orchestrators and step operators load your complete project files into a Docker image for execution. To speed up the process and reduce Docker image sizes, exclude all unnecessary files (like data, virtual environments, git repos, etc.) within the `.dockerignore`.

#### Dockerfile (optional)

By default, ZenML uses the official[ zenml docker image](https://hub.docker.com/r/zenmldocker/zenml) as a base for all pipeline and step builds. You can use your own Dockerfile to overwrite this behavior. Learn more [here](../advanced-guide/environment-management/containerize-your-pipeline.md).

#### Notebooks

Collect all your notebooks in one place.

#### .zen

By running `zenml init` at the root of your project, you define the project scope for ZenML. In ZenML terms, this will be called your "source's root". This will be used to resolve import paths and store configurations.

Although this is optional, it is recommended that you do this for all of your projects.

{% hint style="warning" %}
All of your import paths should be relative to the source's root.
{% endhint %}

#### run.py

Putting your pipeline runners in the root of the repository ensures that all imports that are defined relative to the project root resolve for the pipeline runner. In case there is no `.zen` defined this also defines the implicit source's root.

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
description: Rocketstart your ZenML journey!
---

# Project templates

What would you need to get a quick understanding of the ZenML framework and start building your ML pipelines? The answer is one of ZenML project templates to cover major use cases of ZenML: a collection of steps and pipelines and, to top it all off, a simple but useful CLI. This is exactly what the ZenML templates are all about!

## List of available project templates

<table data-full-width="true">
<thead>
<tr>
<th width="281.33333333333337">Project Template [Short name]</th>
<th width="200">Tags</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<a href="https://github.com/zenml-io/template-starter">Starter template</a> [<code>starter</code>]
</td>
<td>
<code>basic</code> <code>scikit-learn</code>
</td>
<td>
All the basic ML ingredients you need to get you started with ZenML: parameterized steps, a model training pipeline, a flexible configuration and a simple CLI. All created around a representative and versatile model training use-case implemented with the scikit-learn library.
</td>
</tr>
<tr>
<td>
<a href="https://github.com/zenml-io/template-e2e-batch">E2E Training with Batch Predictions</a> [<code>e2e_batch</code>]
</td>
<td>
<code>etl</code> <code>hp-tuning</code> <code>model-promotion</code> <code>drift-detection</code> <code>batch-prediction</code> <code>scikit-learn</code>
</td>
<td>
This project template is a good starting point for anyone starting with ZenML. It consists of two pipelines with the following high-level steps: load, split, and preprocess data; run HP tuning; train and evaluate model performance; promote model to production; detect data drift; run batch inference.
</td>
</tr>
<tr>
<td>
<a href="https://github.com/zenml-io/template-nlp">NLP Training Pipeline</a> [<code>nlp</code>]
</td>
<td>
<code>nlp</code> <code>hp-tuning</code> <code>model-promotion</code> <code>training</code> <code>pytorch</code> <code>gradio</code> <code>huggingface</code>
</td>
<td>
This project template is a simple NLP training pipeline that walks through tokenization, training, HP tuning, evaluation and deployment for a BERT or GPT-2 based model and testing locally it with gradio
</td>
</tr>
</tbody>
</table>

{% hint style="info" %}
Do you have a personal project powered by ZenML that you would like to see here? At ZenML, we are looking for design partnerships and collaboration to help us better understand the real-world scenarios in which MLOps is being used and to build the best possible experience for our users. If you are interested in sharing all or parts of your project with us in the form of a ZenML project template, please [join our Slack](https://zenml.io/slack-invite/) and leave us a message!
{% endhint %}

## Generating project from a project template

First, to use the templates, you need to have ZenML and its `templates` extras installed:

```bash
pip install zenml[templates]
```

Now, you can generate a project from one of the existing templates by using the `--template` flag with the `zenml init` command:

```bash
zenml init --template <short_name_of_template>
# example: zenml init --template e2e_batch
```

Running the command above will result in input prompts being shown to you. If you would like to rely on default values for the ZenML project template - you can add `--template-with-defaults` to the same command, like this:

```bash
zenml init --template <short_name_of_template> --template-with-defaults
# example: zenml init --template e2e_batch --template-with-defaults
```

## Creating your own ZenML template

Creating your own ZenML template is a great way to standardize and share your ML workflows across different projects or teams. ZenML uses [Copier](https://copier.readthedocs.io/en/stable/) to manage its project templates. Copier is a library that allows you to generate projects from templates. It's simple, versatile, and powerful.

Here's a step-by-step guide on how to create your own ZenML template:

1. **Create a new repository for your template.** This will be the place where you store all the code and configuration files for your template.

2. **Define your ML workflows as ZenML steps and pipelines.** You can start by copying the code from one of the existing ZenML templates (like the [starter template](https://github.com/zenml-io/template-starter)) and modifying it to fit your needs.

3. **Create a `copier.yml` file.** This file is used by Copier to define the template's parameters and their default values. You can learn more about this config file [in the copier docs](https://copier.readthedocs.io/en/stable/creating/).

4. **Test your template.** You can use the `copier` command-line tool to generate a new project from your template and check if everything works as expected:

```bash
copier copy https://github.com/your-username/your-template.git your-project
```

Replace `https://github.com/your-username/your-template.git` with the URL of your template repository, and `your-project` with the name of the new project you want to create.

5. **Use your template with ZenML.** Once your template is ready, you can use it with the `zenml init` command:

```bash
zenml init --template https://github.com/your-username/your-template.git
```

Replace `https://github.com/your-username/your-template.git` with the URL of your template repository.

If you want to use a specific version of your template, you can use the `--template-tag` option to specify the git tag of the version you want to use:

```bash
zenml init --template https://github.com/your-username/your-template.git --template-tag v1.0.0
```

Replace `v1.0.0` with the git tag of the version you want to use.

That's it! Now you have your own ZenML project template that you can use to quickly set up new ML projects. Remember to keep your template up-to-date with the latest best practices and changes in your ML workflows.

## Preparing for the Advanced Guide

<!-- ### Starter Guide
Our Starter Guide documentation is built around the `Starter` project template codes.
Most examples will be based on it, so we highly recommend you to install the `starter` template
with `--template-with-defaults` flag before diving deeper into this documentation section,
so you can follow this guide along using your own local environment.
```bash
mkdir starter
cd starter
zenml init --template starter --template-with-defaults
``` -->

Our Advanced Guide documentation is built around the `E2E Batch` project template codes.
Most examples will be based on it, so we highly recommend you to install the `e2e_batch` template
with `--template-with-defaults` flag before diving deeper into this documentation section,
so you can follow this guide along using your own local environment.

```bash
mkdir e2e_batch
cd e2e_batch
zenml init --template e2e_batch --template-with-defaults
```

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>

0 comments on commit f07c9ca

Please sign in to comment.