Skip to content

Commit

Permalink
Add usage instructions to readme
Browse files Browse the repository at this point in the history
Signed-off-by: Laura Couto <laurarccouto@gmail.com>
  • Loading branch information
lrcouto committed Sep 23, 2024
1 parent 6c5ac73 commit 60f06ad
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 107 deletions.
95 changes: 8 additions & 87 deletions performance-test/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,97 +2,18 @@

## Overview

This is your new Kedro project with PySpark setup, which was generated using `kedro 0.19.8`.
This is a test project meant to simulate delays in specific parts of a Kedro pipeline. It's supposed to be a tool to gauge pipeline performance and be used to compare in-development changes to Kedro with an already stable release version.

Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.
## Usage

## Rules and guidelines
There are three delay parameters that can be set in this project:

In order to get the best out of the template:
**hook_delay** - Simulates slow-loading hooks due to it performing complex operations or accessing external services that can suffer from latency.

* Don't remove any lines from the `.gitignore` file we provide
* Make sure your results can be reproduced by following a [data engineering convention](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention)
* Don't commit data to your repository
* Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in `conf/local/`
**dataset_load_delay** - Simulates a delay in loading a dataset, because of a large size or connection latency, for example.

## How to install dependencies
**file_save_delay** - Simulates a delay in saving an output file, because of, for example, connection delay in accessing remote storage.

Declare any dependencies in `requirements.txt` for `pip` installation.
When invoking the `kedro run` command, you can pass the desired value in seconds for each delay as a parameter using the `--params` flag. For example:

To install them, run:

```
pip install -r requirements.txt
```

## How to run your Kedro pipeline

You can run your Kedro project with:

```
kedro run
```

## How to test your Kedro project

Have a look at the files `src/tests/test_run.py` and `src/tests/pipelines/data_science/test_pipeline.py` for instructions on how to write your tests. Run the tests as follows:

```
pytest
```

To configure the coverage threshold, look at the `.coveragerc` file.

## Project dependencies

To see and update the dependency requirements for your project use `requirements.txt`. Install the project requirements with `pip install -r requirements.txt`.

[Further information about project dependencies](https://docs.kedro.org/en/stable/kedro_project_setup/dependencies.html#project-specific-dependencies)

## How to work with Kedro and notebooks

> Note: Using `kedro jupyter` or `kedro ipython` to run your notebook provides these variables in scope: `catalog`, `context`, `pipelines` and `session`.
>
> Jupyter, JupyterLab, and IPython are already included in the project requirements by default, so once you have run `pip install -r requirements.txt` you will not need to take any extra steps before you use them.
### Jupyter
To use Jupyter notebooks in your Kedro project, you need to install Jupyter:

```
pip install jupyter
```

After installing Jupyter, you can start a local notebook server:

```
kedro jupyter notebook
```

### JupyterLab
To use JupyterLab, you need to install it:

```
pip install jupyterlab
```

You can also start JupyterLab:

```
kedro jupyter lab
```

### IPython
And if you want to run an IPython session:

```
kedro ipython
```

### How to ignore notebook output cells in `git`
To automatically strip out all output cell contents before committing to `git`, you can use tools like [`nbstripout`](https://github.com/kynan/nbstripout). For example, you can add a hook in `.git/config` with `nbstripout --install`. This will run `nbstripout` before anything is committed to `git`.

> *Note:* Your output cells will be retained locally.
## Package your Kedro project

[Further information about building project documentation and packaging your project](https://docs.kedro.org/en/stable/tutorial/package_a_project.html)
`kedro run --params=hook_delay=5,dataset_load_delay=5,file_save_delay=5`
20 changes: 0 additions & 20 deletions performance-test/conf/README.md

This file was deleted.

0 comments on commit 60f06ad

Please sign in to comment.