From 2e64459a021bd22d79bd322d9bb87ea22f30c5f2 Mon Sep 17 00:00:00 2001 From: Jo Stichbury Date: Fri, 22 Dec 2023 19:03:58 +0000 Subject: [PATCH] Tweak new tools/example workflow description (#3454) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Update new project docs Signed-off-by: Jo Stichbury * Fix formatting Signed-off-by: Jo Stichbury * Fix repeating starter name Signed-off-by: Jo Stichbury * Adjust docs in starters section, fix a few issues, lint Signed-off-by: Jo Stichbury * Update docs/source/get_started/new_project.md Co-authored-by: Juan Luis Cano Rodríguez Signed-off-by: Jo Stichbury * Update docs/source/get_started/new_project.md Co-authored-by: Juan Luis Cano Rodríguez Signed-off-by: Jo Stichbury * Update docs/source/get_started/new_project.md Co-authored-by: Juan Luis Cano Rodríguez Signed-off-by: Jo Stichbury * Update docs/source/get_started/new_project.md Co-authored-by: Juan Luis Cano Rodríguez Signed-off-by: Jo Stichbury * Update docs/source/get_started/new_project.md Co-authored-by: Juan Luis Cano Rodríguez Signed-off-by: Jo Stichbury * Update docs/source/get_started/new_project.md Co-authored-by: Juan Luis Cano Rodríguez Signed-off-by: Jo Stichbury * relint the docs Signed-off-by: Jo Stichbury --------- Signed-off-by: Jo Stichbury Co-authored-by: Juan Luis Cano Rodríguez --- docs/source/get_started/new_project.md | 102 +++++++++++++++++----- docs/source/starters/new_project_tools.md | 58 ++++++------ 2 files changed, 115 insertions(+), 45 deletions(-) diff --git a/docs/source/get_started/new_project.md b/docs/source/get_started/new_project.md index f64a93eb61..70b780e322 100644 --- a/docs/source/get_started/new_project.md +++ b/docs/source/get_started/new_project.md @@ -2,11 +2,11 @@ There are several ways to create a new Kedro project. This page explains the flow to create a basic project using `kedro new` to output a project directory containing the basic files and subdirectories that make up a Kedro project. -You can also create a new Kedro project with a starter that adds a set of code for a common project use case. [Starters are explained separately](../starters/starters.md) later in the documentation set and illustrated with the [spaceflights tutorial](../tutorial/tutorial_template.md). +You can also create a new Kedro project with a starter that adds code for a common project use case. [Starters are explained separately](../starters/starters.md) and the [spaceflights tutorial](../tutorial/tutorial_template.md) illustrates their use. ## Introducing `kedro new` -You can create a basic Kedro project containing the default code needed to set up your own nodes and pipelines. Navigate to your preferred directory and type: +To create a basic Kedro project containing the default code needed to set up your own nodes and pipelines, navigate to your preferred directory and type: ```bash kedro new @@ -14,11 +14,10 @@ kedro new ### Project name -The command line interface then asks you to enter a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long. +The command line interface (CLI) first asks for a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long. -It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically. +It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically. For example, if you enter "Get Started", the folder for the project (`repo_name`) is automatically set to be `get-started`, and the Python package name (`python_package`) for the project is set to be `get_started`. -So, if you enter "Get Started", the folder for the project (`repo_name`) is automatically set to be `get-started`, and the Python package name (`python_package`) for the project is set to be `get_started`. | Description | Setting | Example | | --------------------------------------------------------------- | ---------------- | ------------- | @@ -28,29 +27,93 @@ So, if you enter "Get Started", the folder for the project (`repo_name`) is auto ### Project tools -The command line interface then asks which tools you'd like to include in the project. The options are as follows and described in more detail above in the [documentation about the new project tools](../starters/new_project_tools.md). +Next, the CLI asks which tools you'd like to include in the project: -You can add one or more of the options, or follow the default and add none at all: +```text +Tools +1) Lint: Basic linting with Black and Ruff +2) Test: Basic testing with pytest +3) Log: Additional, environment-specific logging options +4) Docs: A Sphinx documentation setup +5) Data Folder: A folder structure for data management +6) PySpark: Configuration for working with PySpark +7) Kedro-Viz: Kedro's native visualisation tool + +Which tools would you like to include in your project? [1-7/1,3/all/none]: + (none): +``` + +The options are described in more detail in the [documentation about the new project tools](../starters/new_project_tools.md). + +Select the tools by number, or `all` or follow the default to add `none`. -* Linting: A basic linting setup with Black and ruff -* Testing: A basic testing setup with pytest -* Custom Logging: Additional logging options -* Documentation: Configuration for basic documentation built with Sphinx -* Data Structure: The [directory structure](../faq/faq.md#what-is-data-engineering-convention) for storing data locally -* PySpark: Setup and configuration for working with PySpark -* Kedro Viz: Kedro's native visualisation tool. ### Project examples -The CLI offers the option to include example pipelines. Your choice of tools determines which spaceflights starter example is provided. Here's a guide to understanding which starter examples are used based on your selections: +Finally, the CLI offers the option to include starter example code in the project: -* [Default Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) (`spaceflights-pandas`): Used when you select any combination of Linting, Testing, Custom Logging, Documentation, and Data Structure, excluding PySpark and Kedro Viz. -* [PySpark Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark) (`spaceflights-pyspark`): Chosen when PySpark is selected with any other tools, except Kedro Viz. -* [Kedro Viz Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz) (`spaceflights-pandas-viz`): Applicable when Kedro Viz is part of your selection, with any other tools, excluding PySpark. -* [Full Feature Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz) (`spaceflights-pyspark-viz`): This example is used when you select all available tools, including PySpark and Kedro Viz. +```text +Would you like to include an example pipeline? : + (no): +``` + +If you say `yes`, the example code included depends upon your previous choice of tools, as follows: + +* [Default spaceflights starter (`spaceflights-pandas`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): Added if you selected any combination of linting, testing, custom logging, documentation, and data structure, unless you also selected PySpark or Kedro Viz. +* [PySpark spaceflights starter (`spaceflights-pyspark`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): Added if you selected PySpark with any other tools, unless you also selected Kedro Viz. +* [Kedro Viz spaceflights starter (`spaceflights-pandas-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): Added if Kedro Viz was one of your tools choices, unless you also selected PySpark. +* [Full feature spaceflights starter (`spaceflights-pyspark-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): Added if you selected all available tools, including PySpark and Kedro Viz. Each starter example is tailored to demonstrate the capabilities and integrations of the selected tools, offering a practical insight into how they can be utilised in your project. +### Quickstart examples + +1. To create a default Kedro project called `My-Project` with no tools and no example code: + +```text +kedro new ⮐ +My-Project ⮐ +none ⮐ +no ⮐ +``` + +You can also enter this in a single line as follows: + +```bash +kedro new --name=My-Project --tools=none --example=n +``` + +2. To create a spaceflights project called `spaceflights` with Kedro Viz features and example code: + +```text +kedro new ⮐ +spaceflights ⮐ +7 ⮐ +yes ⮐ +``` + +You can also enter this in a single line as follows: + +```bash +kedro new --name=spaceflights --tools=viz --example=y +``` + +3. To create a project, called `testproject` containing linting, documentation, and PySpark, but no example code: + +```text +kedro new ⮐ +testproject ⮐ +1,4,6 ⮐ +no ⮐ +``` + +You can also enter this in a single line as follows: + +```bash +kedro new --name=testproject --tools=lint,docs,pyspark --example=n +``` + + ## Run the new project Whichever options you selected for tools and example code, once `kedro new` has completed, the next step is to navigate to the project folder (`cd `) and install dependencies with `pip` as follows: @@ -107,7 +170,6 @@ Here is a flowchart to help guide your choice of tools and examples you can sele ```{figure} ../meta/images/new-project-tools.png :alt: mermaid-General overview diagram for setting up a new Kedro project with tools ``` - % Mermaid code, see https://github.com/kedro-org/kedro/wiki/Render-Mermaid-diagrams % flowchart TD % A[Start] --> B[Enter Project Name]; diff --git a/docs/source/starters/new_project_tools.md b/docs/source/starters/new_project_tools.md index a911bf2de3..8927d2a6db 100644 --- a/docs/source/starters/new_project_tools.md +++ b/docs/source/starters/new_project_tools.md @@ -1,8 +1,13 @@ # Tools to customise a new Kedro project -There are several ways to customise your new project with the tools and example code. +There are several ways to customise your new project with the tools and example code: -## Specify tools configuration using `kedro new` +* [Specify tools using inputs to `kedro new`](#specify-tools-as-inputs-to-kedro-new) +* [Specify tools using YAML configuration](#specify-tools-using-yaml-configuration) + +There is a [flowchart to illustrate the choices available](#flowchart-illustration) at the bottom of the page. + +## Specify tools as inputs to `kedro new` Navigate to the directory in which you would like to create your new Kedro project, and run the following command: @@ -19,14 +24,16 @@ You can also add flags to `kedro new` to skip some or all of the steps in the pr ### Project name The first prompt asks you to input a project name. -You can skip the step to name the project by adding it to your command. For example: +To skip this step and name the project directly, add it to `kedro new` as follows: ```bash kedro new --name=spaceflights ``` ### Tools -You are then asked to select which tools to include. Choose from the list using comma separated values `(1,2,4)`, ranges of values `(1-3,5-7)`, a combination of the two `(1,3-5,7)`, or the key words `all` or `none`. Skipping the prompt by entering no value will result in the default selection of `none`. Further information about each of the tools is described below in [Kedro tools](#kedro-tools). +You are then asked to select which tools to include. Choose from the list using comma separated values `(1,2,4)`, ranges of values `(1-3,5-7)`, a combination of the two `(1,3-5,7)`, or the key words `all` or `none`. Skipping the prompt by entering no value will result in the default selection of `none`. + +[Further information about each of the tools is described below](#kedro-tools). ``` @@ -49,15 +56,6 @@ Which tools would you like to include in your project? [1-7/1,3/all/none]: [none]: ``` - -You may also specify your tools selection directly from the command line by using the flag `--tools`: - -```bash -kedro new --tools= -``` - -To specify your desired tools you must provide them by name as a comma separated list, for example `--tools=lint,test,viz`. The following tools are available for selection: `lint`, `test`, `log`, `docs`, `data`, `pyspark`, and `viz`. - A list of available tools can also be accessed by running `kedro new --help` ``` @@ -94,23 +92,35 @@ A list of available tools can also be accessed by running `kedro new --help` kedro new --tools=none ... ``` +### Shortcut + +To skip this step and select tools directly, add the tools selection to `kedro new` as follows: + +```bash +kedro new --tools= +``` +To specify your desired tools you must provide them by name as a comma separated list, for example `--tools=lint,test,viz`. The following tools are available for selection: `lint`, `test`, `log`, `docs`, `data`, `pyspark`, and `viz`. ### Example code -In the final step you are asked whether you want to populate the project with an example spaceflights starter pipeline. Here’s a brief overview: +In the final step you are asked whether you want to populate the project with an example spaceflights starter pipeline. If you select `yes`, the example code included depends upon your previous choice of tools, as follows: + +* [Default spaceflights starter (`spaceflights-pandas`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): Added if you selected any combination of linting, testing, custom logging, documentation, and data structure, unless you also selected PySpark or Kedro Viz. +* [PySpark spaceflights starter (`spaceflights-pyspark`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): Added if you selected PySpark with any other tools, unless you also selected Kedro Viz. +* [Kedro Viz spaceflights starter (`spaceflights-pandas-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): Added if Kedro Viz was one of your tools choices, unless you also selected PySpark. +* [Full feature spaceflights starter (`spaceflights-pyspark-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): Added if you selected all available tools, including PySpark and Kedro Viz. -* [Default Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) (`spaceflights-pandas`): For combinations of Linting, Testing, Custom Logging, Documentation, and Data Structure, without PySpark and Kedro Viz. -* [PySpark Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark) (`spaceflights-pyspark`): Selected with PySpark, excluding Kedro Viz. -* [Kedro Viz Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz) (`spaceflights-pandas-viz`): For choices including Kedro Viz, without PySpark. -* [Full Feature Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz) (`spaceflights-pyspark-viz`): When all tools are selected, including PySpark and Kedro Viz. +Each starter example is tailored to demonstrate the capabilities and integrations of the selected tools, offering a practical insight into how they can be utilised in your project. -You can add the example pipeline to your new project as follows: +### Shortcut + +To skip this step and make a choice of example code directly, add the your preference to `kedro new` as follows: ```bash kedro new --example=y ``` -## Specify tools configuration using `kedro new --config=` +## Specify tools using YAML configuration As an alternative to the interactive project creation workflow, you can also supply values to `kedro new` by providing a YML configuration file to your `kedro new` command. Consider the following file: @@ -140,11 +150,9 @@ kedro new --config= ``` ``` {note} -Note: When using a configuration file to create a new project, you must provide values for the project name, repository name, and package names. +Note: When using a configuration file to create a new project, you must provide values for the project name, repository name, and package names. Specifying your tools selection is optional, omitting them results in the default selection of `none`. ``` -Specifying your tools selection is optional, omitting them results in the default selection of `none`. - ## Kedro tools Tools in Kedro serve as modular functionalities that enhance a foundational project template. They provide a means to tailor your Kedro project to meet your unique requirements. When creating a new project, you may select one or more of the available tools, or none at all. @@ -235,7 +243,7 @@ The aim of this tool reflects Kedro's commitment to best practices in understand If you did not initially select `docs` and want to implement it later you can do so by following the [official documentation](https://docs.kedro.org/en/stable/tutorial/package_a_project.html#add-documentation-to-a-kedro-project-if-you-have-not-selected-docs-tool) for guidance on adding documentation to a Kedro project. -### Data Structure +### Data structure The Data Structure tool provides a local standardised folder hierarchy for your project data, which includes predefined folders such as raw, intermediate, and processed data, as determined by [data engineering convention](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention). This is crucial if you want to include example pipelines during the creation of your project as it can not be omitted from the tool selections. @@ -255,7 +263,7 @@ The `viz` tool will add visualisation to your project by including Kedro-Viz, wh In addition, `viz` will also add setup for experiment tracking and plotting datasets. See the [Kedro-Viz documentation](https://docs.kedro.org/projects/kedro-viz/en/stable/index.html) for more information on using this tool. -## Flowchart of example choice of tools and example selections +## Flowchart illustration Here is a flowchart to help illustrate some example choice of tools you can select: