Skip to content

Commit

Permalink
Update the dependencies page in the docs (#3772)
Browse files Browse the repository at this point in the history
* Update the dependencies page

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update docs/source/kedro_project_setup/dependencies.md

Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>

* Fix lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Move the last line to notes

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com>
  • Loading branch information
ankatiyar and stichbury authored Apr 3, 2024
1 parent caf745b commit 00789fa
Showing 1 changed file with 33 additions and 29 deletions.
62 changes: 33 additions & 29 deletions docs/source/kedro_project_setup/dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,45 +4,22 @@ Both `pip install kedro` and `conda install -c conda-forge kedro` install the co

When you create a project, you then introduce additional dependencies for the tasks it performs.

## Project-specific dependencies
## Declare project-specific dependencies

You can specify a project's exact dependencies in the `requirements.txt` file to make it easier for you and others to run your project in the future,
and to avoid version conflicts downstream. This can be achieved with the help of [`pip-tools`](https://pypi.org/project/pip-tools/).
To install `pip-tools` in your virtual environment, run the following command:

```bash
pip install pip-tools
```

To add or remove dependencies to a project, edit the `requirements.txt` file, then run the following:

```bash
pip-compile <project_root>/requirements.txt --output-file <project_root>/requirements.lock
```

This will [pip compile](https://github.com/jazzband/pip-tools#example-usage-for-pip-compile) the requirements listed in
the `requirements.txt` file into a `requirements.lock` that specifies a list of pinned project dependencies
(those with a strict version). You can also use this command with additional CLI arguments such as `--generate-hashes`
to use `pip`'s Hash Checking Mode or `--upgrade-package` to update specific packages to the latest or specific versions.
[Check out the `pip-tools` documentation](https://pypi.org/project/pip-tools/) for more information.

```{note}
The `requirements.txt` file contains "source" requirements, while `src/requirements.lock` contains the compiled version of those and requires no manual updates.
```
When you create a new Kedro project, Kedro generates a `requirements.txt` file in the root directory of the project. The file contains the core dependencies and those related to the tools you choose to include in the project. Specifying the project's exact dependencies in a `requirements.txt` file makes it easier to run the project in the future, and avoids version conflicts downstream.

To further update the project requirements, modify the `requirements.txt` file (not `src/requirements.lock`) and re-run the `pip-compile` command above.

## Install project-specific dependencies

To install the project-specific dependencies, navigate to the root directory of the project and run:
When someone clones your project, they can install the project-specific dependencies by navigating to the root directory of the project and running the following command:

```bash
pip install -r requirements.txt
```

### Install dependencies related to the Data Catalog

The [Data Catalog](../data/data_catalog.md) is your way of interacting with different data types in Kedro. The modular dependencies in this category include `pandas`, `numpy`, `pyspark`, `matplotlib`, `pillow`, `dask`, and more.
The [Data Catalog](../data/data_catalog.md) is your way of interacting with different data types in Kedro. You can use [`kedro-datasets`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets) to interact with the data used in your projects. Depending on the datasets that you use in your Data Catalog, you might need to include additional dependencies in your `requirements.txt`. The modular dependencies in this category include `pandas`, `numpy`, `pyspark`, `matplotlib`, `pillow`, `dask`, and more.

#### Install dependencies at a group-level

Expand All @@ -59,7 +36,34 @@ This installs Kedro and dependencies related to the data type group. An example
To limit installation to dependencies specific to a data type:

```bash
pip install "kedro-datasets[<group>.<dataset>]"
pip install "kedro-datasets[<group>-<dataset>]"
```

For example, your workflow might require use of the `pandas.ExcelDataset`, so to install its dependencies, run `pip install "kedro-datasets[pandas.ExcelDataset]"`.
For example, your workflow might require the `pandas.ExcelDataset`, so to install its dependencies, run `pip install "kedro-datasets[pandas-exceldataset]"`.

```{note}
From `kedro-datasets` version 3.0.0 onwards, the names of the optional dataset-level dependencies have been normalised to follow [PEP 685](https://peps.python.org/pep-0685/). The '.' character has been replaced with a '-' character and the names are in lowercase. For example, if you had `kedro-datasets[pandas.ExcelDataset]` in your requirements file, it would have to be changed to `kedro-datasets[pandas-exceldataset]`.
```


## Reproducible environments
To ensure that the project dependencies and the transitive dependencies are pinned to specific versions, use [`pip-tools`](https://pypi.org/project/pip-tools/) to compile `requirements.txt` file into a `requirements.lock` file.
To install `pip-tools` in your virtual environment, run the following command:

```bash
pip install pip-tools
```

To add or remove dependencies to a project, edit the `requirements.txt` file, then run the following:

```bash
pip-compile <project_root>/requirements.txt --output-file <project_root>/requirements.lock
```

This will [pip compile](https://github.com/jazzband/pip-tools#example-usage-for-pip-compile) the requirements listed in the `requirements.txt` file into a `requirements.lock` that specifies a list of pinned project dependencies(those with a strict version). You can also use this command with additional CLI arguments such as `--generate-hashes`
to use `pip`'s Hash Checking Mode or `--upgrade-package` to update specific packages to the latest or specific versions.
[Check out the `pip-tools` documentation](https://pypi.org/project/pip-tools/) for more information.

```{note}
The `requirements.txt` file contains "source" requirements, while `requirements.lock` contains the compiled version of those and requires no manual updates. If you need to update the dependencies, update the `requirements.txt` file and re-run the `pip-compile` command.
```

0 comments on commit 00789fa

Please sign in to comment.