From 00789fa4d5f1ed8734d6e2561db4fd52c3feddc8 Mon Sep 17 00:00:00 2001 From: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Date: Wed, 3 Apr 2024 16:42:09 +0100 Subject: [PATCH] Update the dependencies page in the docs (#3772) * Update the dependencies page Signed-off-by: Ankita Katiyar * Update docs/source/kedro_project_setup/dependencies.md Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jo Stichbury Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> * Fix lint Signed-off-by: Ankita Katiyar * Move the last line to notes Signed-off-by: Ankita Katiyar --------- Signed-off-by: Ankita Katiyar Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Jo Stichbury --- .../kedro_project_setup/dependencies.md | 62 ++++++++++--------- 1 file changed, 33 insertions(+), 29 deletions(-) diff --git a/docs/source/kedro_project_setup/dependencies.md b/docs/source/kedro_project_setup/dependencies.md index 4af705b961..576ab266da 100644 --- a/docs/source/kedro_project_setup/dependencies.md +++ b/docs/source/kedro_project_setup/dependencies.md @@ -4,37 +4,14 @@ Both `pip install kedro` and `conda install -c conda-forge kedro` install the co When you create a project, you then introduce additional dependencies for the tasks it performs. -## Project-specific dependencies +## Declare project-specific dependencies -You can specify a project's exact dependencies in the `requirements.txt` file to make it easier for you and others to run your project in the future, -and to avoid version conflicts downstream. This can be achieved with the help of [`pip-tools`](https://pypi.org/project/pip-tools/). -To install `pip-tools` in your virtual environment, run the following command: - -```bash -pip install pip-tools -``` - -To add or remove dependencies to a project, edit the `requirements.txt` file, then run the following: - -```bash -pip-compile /requirements.txt --output-file /requirements.lock -``` - -This will [pip compile](https://github.com/jazzband/pip-tools#example-usage-for-pip-compile) the requirements listed in -the `requirements.txt` file into a `requirements.lock` that specifies a list of pinned project dependencies -(those with a strict version). You can also use this command with additional CLI arguments such as `--generate-hashes` -to use `pip`'s Hash Checking Mode or `--upgrade-package` to update specific packages to the latest or specific versions. -[Check out the `pip-tools` documentation](https://pypi.org/project/pip-tools/) for more information. - -```{note} -The `requirements.txt` file contains "source" requirements, while `src/requirements.lock` contains the compiled version of those and requires no manual updates. -``` +When you create a new Kedro project, Kedro generates a `requirements.txt` file in the root directory of the project. The file contains the core dependencies and those related to the tools you choose to include in the project. Specifying the project's exact dependencies in a `requirements.txt` file makes it easier to run the project in the future, and avoids version conflicts downstream. -To further update the project requirements, modify the `requirements.txt` file (not `src/requirements.lock`) and re-run the `pip-compile` command above. ## Install project-specific dependencies -To install the project-specific dependencies, navigate to the root directory of the project and run: +When someone clones your project, they can install the project-specific dependencies by navigating to the root directory of the project and running the following command: ```bash pip install -r requirements.txt @@ -42,7 +19,7 @@ pip install -r requirements.txt ### Install dependencies related to the Data Catalog -The [Data Catalog](../data/data_catalog.md) is your way of interacting with different data types in Kedro. The modular dependencies in this category include `pandas`, `numpy`, `pyspark`, `matplotlib`, `pillow`, `dask`, and more. +The [Data Catalog](../data/data_catalog.md) is your way of interacting with different data types in Kedro. You can use [`kedro-datasets`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets) to interact with the data used in your projects. Depending on the datasets that you use in your Data Catalog, you might need to include additional dependencies in your `requirements.txt`. The modular dependencies in this category include `pandas`, `numpy`, `pyspark`, `matplotlib`, `pillow`, `dask`, and more. #### Install dependencies at a group-level @@ -59,7 +36,34 @@ This installs Kedro and dependencies related to the data type group. An example To limit installation to dependencies specific to a data type: ```bash -pip install "kedro-datasets[.]" +pip install "kedro-datasets[-]" ``` -For example, your workflow might require use of the `pandas.ExcelDataset`, so to install its dependencies, run `pip install "kedro-datasets[pandas.ExcelDataset]"`. +For example, your workflow might require the `pandas.ExcelDataset`, so to install its dependencies, run `pip install "kedro-datasets[pandas-exceldataset]"`. + +```{note} +From `kedro-datasets` version 3.0.0 onwards, the names of the optional dataset-level dependencies have been normalised to follow [PEP 685](https://peps.python.org/pep-0685/). The '.' character has been replaced with a '-' character and the names are in lowercase. For example, if you had `kedro-datasets[pandas.ExcelDataset]` in your requirements file, it would have to be changed to `kedro-datasets[pandas-exceldataset]`. +``` + + +## Reproducible environments +To ensure that the project dependencies and the transitive dependencies are pinned to specific versions, use [`pip-tools`](https://pypi.org/project/pip-tools/) to compile `requirements.txt` file into a `requirements.lock` file. +To install `pip-tools` in your virtual environment, run the following command: + +```bash +pip install pip-tools +``` + +To add or remove dependencies to a project, edit the `requirements.txt` file, then run the following: + +```bash +pip-compile /requirements.txt --output-file /requirements.lock +``` + +This will [pip compile](https://github.com/jazzband/pip-tools#example-usage-for-pip-compile) the requirements listed in the `requirements.txt` file into a `requirements.lock` that specifies a list of pinned project dependencies(those with a strict version). You can also use this command with additional CLI arguments such as `--generate-hashes` +to use `pip`'s Hash Checking Mode or `--upgrade-package` to update specific packages to the latest or specific versions. +[Check out the `pip-tools` documentation](https://pypi.org/project/pip-tools/) for more information. + +```{note} +The `requirements.txt` file contains "source" requirements, while `requirements.lock` contains the compiled version of those and requires no manual updates. If you need to update the dependencies, update the `requirements.txt` file and re-run the `pip-compile` command. +```