Skip to content

Commit

Permalink
Tweak new tools/example workflow description (#3454)
Browse files Browse the repository at this point in the history
* Update new project docs

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Fix formatting

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Fix repeating starter name

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Adjust docs in starters section, fix a few issues, lint

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* Update docs/source/get_started/new_project.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

* relint the docs

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>

---------

Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
  • Loading branch information
stichbury and astrojuanlu committed Dec 22, 2023
1 parent 81ab60a commit 2e64459
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 45 deletions.
102 changes: 82 additions & 20 deletions docs/source/get_started/new_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,22 @@

There are several ways to create a new Kedro project. This page explains the flow to create a basic project using `kedro new` to output a project directory containing the basic files and subdirectories that make up a Kedro project.

You can also create a new Kedro project with a starter that adds a set of code for a common project use case. [Starters are explained separately](../starters/starters.md) later in the documentation set and illustrated with the [spaceflights tutorial](../tutorial/tutorial_template.md).
You can also create a new Kedro project with a starter that adds code for a common project use case. [Starters are explained separately](../starters/starters.md) and the [spaceflights tutorial](../tutorial/tutorial_template.md) illustrates their use.

## Introducing `kedro new`

You can create a basic Kedro project containing the default code needed to set up your own nodes and pipelines. Navigate to your preferred directory and type:
To create a basic Kedro project containing the default code needed to set up your own nodes and pipelines, navigate to your preferred directory and type:

```bash
kedro new
```

### Project name

The command line interface then asks you to enter a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long.
The command line interface (CLI) first asks for a name for the project. This is the human-readable name, and it may contain alphanumeric symbols, spaces, underscores, and hyphens. It must be at least two characters long.

It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically.
It's best to keep the name simple because the choice is set as the value of `project_name` and is also used to generate the folder and package names for the project automatically. For example, if you enter "Get Started", the folder for the project (`repo_name`) is automatically set to be `get-started`, and the Python package name (`python_package`) for the project is set to be `get_started`.

So, if you enter "Get Started", the folder for the project (`repo_name`) is automatically set to be `get-started`, and the Python package name (`python_package`) for the project is set to be `get_started`.

| Description | Setting | Example |
| --------------------------------------------------------------- | ---------------- | ------------- |
Expand All @@ -28,29 +27,93 @@ So, if you enter "Get Started", the folder for the project (`repo_name`) is auto

### Project tools

The command line interface then asks which tools you'd like to include in the project. The options are as follows and described in more detail above in the [documentation about the new project tools](../starters/new_project_tools.md).
Next, the CLI asks which tools you'd like to include in the project:

You can add one or more of the options, or follow the default and add none at all:
```text
Tools
1) Lint: Basic linting with Black and Ruff
2) Test: Basic testing with pytest
3) Log: Additional, environment-specific logging options
4) Docs: A Sphinx documentation setup
5) Data Folder: A folder structure for data management
6) PySpark: Configuration for working with PySpark
7) Kedro-Viz: Kedro's native visualisation tool
Which tools would you like to include in your project? [1-7/1,3/all/none]:
(none):
```

The options are described in more detail in the [documentation about the new project tools](../starters/new_project_tools.md).

Select the tools by number, or `all` or follow the default to add `none`.

* Linting: A basic linting setup with Black and ruff
* Testing: A basic testing setup with pytest
* Custom Logging: Additional logging options
* Documentation: Configuration for basic documentation built with Sphinx
* Data Structure: The [directory structure](../faq/faq.md#what-is-data-engineering-convention) for storing data locally
* PySpark: Setup and configuration for working with PySpark
* Kedro Viz: Kedro's native visualisation tool.

### Project examples

The CLI offers the option to include example pipelines. Your choice of tools determines which spaceflights starter example is provided. Here's a guide to understanding which starter examples are used based on your selections:
Finally, the CLI offers the option to include starter example code in the project:

* [Default Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) (`spaceflights-pandas`): Used when you select any combination of Linting, Testing, Custom Logging, Documentation, and Data Structure, excluding PySpark and Kedro Viz.
* [PySpark Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark) (`spaceflights-pyspark`): Chosen when PySpark is selected with any other tools, except Kedro Viz.
* [Kedro Viz Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz) (`spaceflights-pandas-viz`): Applicable when Kedro Viz is part of your selection, with any other tools, excluding PySpark.
* [Full Feature Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz) (`spaceflights-pyspark-viz`): This example is used when you select all available tools, including PySpark and Kedro Viz.
```text
Would you like to include an example pipeline? :
(no):
```

If you say `yes`, the example code included depends upon your previous choice of tools, as follows:

* [Default spaceflights starter (`spaceflights-pandas`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): Added if you selected any combination of linting, testing, custom logging, documentation, and data structure, unless you also selected PySpark or Kedro Viz.
* [PySpark spaceflights starter (`spaceflights-pyspark`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): Added if you selected PySpark with any other tools, unless you also selected Kedro Viz.
* [Kedro Viz spaceflights starter (`spaceflights-pandas-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): Added if Kedro Viz was one of your tools choices, unless you also selected PySpark.
* [Full feature spaceflights starter (`spaceflights-pyspark-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): Added if you selected all available tools, including PySpark and Kedro Viz.

Each starter example is tailored to demonstrate the capabilities and integrations of the selected tools, offering a practical insight into how they can be utilised in your project.

### Quickstart examples

1. To create a default Kedro project called `My-Project` with no tools and no example code:

```text
kedro new ⮐
My-Project ⮐
none ⮐
no ⮐
```

You can also enter this in a single line as follows:

```bash
kedro new --name=My-Project --tools=none --example=n
```

2. To create a spaceflights project called `spaceflights` with Kedro Viz features and example code:

```text
kedro new ⮐
spaceflights ⮐
7 ⮐
yes ⮐
```

You can also enter this in a single line as follows:

```bash
kedro new --name=spaceflights --tools=viz --example=y
```

3. To create a project, called `testproject` containing linting, documentation, and PySpark, but no example code:

```text
kedro new ⮐
testproject ⮐
1,4,6 ⮐
no ⮐
```

You can also enter this in a single line as follows:

```bash
kedro new --name=testproject --tools=lint,docs,pyspark --example=n
```


## Run the new project

Whichever options you selected for tools and example code, once `kedro new` has completed, the next step is to navigate to the project folder (`cd <project-name>`) and install dependencies with `pip` as follows:
Expand Down Expand Up @@ -107,7 +170,6 @@ Here is a flowchart to help guide your choice of tools and examples you can sele
```{figure} ../meta/images/new-project-tools.png
:alt: mermaid-General overview diagram for setting up a new Kedro project with tools
```

% Mermaid code, see https://github.com/kedro-org/kedro/wiki/Render-Mermaid-diagrams
% flowchart TD
% A[Start] --> B[Enter Project Name];
Expand Down
58 changes: 33 additions & 25 deletions docs/source/starters/new_project_tools.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Tools to customise a new Kedro project

There are several ways to customise your new project with the tools and example code.
There are several ways to customise your new project with the tools and example code:

## Specify tools configuration using `kedro new`
* [Specify tools using inputs to `kedro new`](#specify-tools-as-inputs-to-kedro-new)
* [Specify tools using YAML configuration](#specify-tools-using-yaml-configuration)

There is a [flowchart to illustrate the choices available](#flowchart-illustration) at the bottom of the page.

## Specify tools as inputs to `kedro new`

Navigate to the directory in which you would like to create your new Kedro project, and run the following command:

Expand All @@ -19,14 +24,16 @@ You can also add flags to `kedro new` to skip some or all of the steps in the pr
### Project name
The first prompt asks you to input a project name.

You can skip the step to name the project by adding it to your command. For example:
To skip this step and name the project directly, add it to `kedro new` as follows:

```bash
kedro new --name=spaceflights
```

### Tools
You are then asked to select which tools to include. Choose from the list using comma separated values `(1,2,4)`, ranges of values `(1-3,5-7)`, a combination of the two `(1,3-5,7)`, or the key words `all` or `none`. Skipping the prompt by entering no value will result in the default selection of `none`. Further information about each of the tools is described below in [Kedro tools](#kedro-tools).
You are then asked to select which tools to include. Choose from the list using comma separated values `(1,2,4)`, ranges of values `(1-3,5-7)`, a combination of the two `(1,3-5,7)`, or the key words `all` or `none`. Skipping the prompt by entering no value will result in the default selection of `none`.

[Further information about each of the tools is described below](#kedro-tools).


```
Expand All @@ -49,15 +56,6 @@ Which tools would you like to include in your project? [1-7/1,3/all/none]:
[none]:
```


You may also specify your tools selection directly from the command line by using the flag `--tools`:

```bash
kedro new --tools=<your tool selection>
```

To specify your desired tools you must provide them by name as a comma separated list, for example `--tools=lint,test,viz`. The following tools are available for selection: `lint`, `test`, `log`, `docs`, `data`, `pyspark`, and `viz`.

A list of available tools can also be accessed by running `kedro new --help`

```
Expand Down Expand Up @@ -94,23 +92,35 @@ A list of available tools can also be accessed by running `kedro new --help`
kedro new --tools=none
...
```
### Shortcut

To skip this step and select tools directly, add the tools selection to `kedro new` as follows:

```bash
kedro new --tools=<your tool selection>
```

To specify your desired tools you must provide them by name as a comma separated list, for example `--tools=lint,test,viz`. The following tools are available for selection: `lint`, `test`, `log`, `docs`, `data`, `pyspark`, and `viz`.

### Example code
In the final step you are asked whether you want to populate the project with an example spaceflights starter pipeline. Here’s a brief overview:
In the final step you are asked whether you want to populate the project with an example spaceflights starter pipeline. If you select `yes`, the example code included depends upon your previous choice of tools, as follows:

* [Default spaceflights starter (`spaceflights-pandas`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): Added if you selected any combination of linting, testing, custom logging, documentation, and data structure, unless you also selected PySpark or Kedro Viz.
* [PySpark spaceflights starter (`spaceflights-pyspark`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark): Added if you selected PySpark with any other tools, unless you also selected Kedro Viz.
* [Kedro Viz spaceflights starter (`spaceflights-pandas-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz): Added if Kedro Viz was one of your tools choices, unless you also selected PySpark.
* [Full feature spaceflights starter (`spaceflights-pyspark-viz`)](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz): Added if you selected all available tools, including PySpark and Kedro Viz.

* [Default Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) (`spaceflights-pandas`): For combinations of Linting, Testing, Custom Logging, Documentation, and Data Structure, without PySpark and Kedro Viz.
* [PySpark Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark) (`spaceflights-pyspark`): Selected with PySpark, excluding Kedro Viz.
* [Kedro Viz Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas-viz) (`spaceflights-pandas-viz`): For choices including Kedro Viz, without PySpark.
* [Full Feature Starter](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pyspark-viz) (`spaceflights-pyspark-viz`): When all tools are selected, including PySpark and Kedro Viz.
Each starter example is tailored to demonstrate the capabilities and integrations of the selected tools, offering a practical insight into how they can be utilised in your project.

You can add the example pipeline to your new project as follows:
### Shortcut

To skip this step and make a choice of example code directly, add the your preference to `kedro new` as follows:

```bash
kedro new --example=y
```

## Specify tools configuration using `kedro new --config=`
## Specify tools using YAML configuration

As an alternative to the interactive project creation workflow, you can also supply values to `kedro new` by providing a YML configuration file to your `kedro new` command. Consider the following file:

Expand Down Expand Up @@ -140,11 +150,9 @@ kedro new --config=<path/to/config.yml>
```

``` {note}
Note: When using a configuration file to create a new project, you must provide values for the project name, repository name, and package names.
Note: When using a configuration file to create a new project, you must provide values for the project name, repository name, and package names. Specifying your tools selection is optional, omitting them results in the default selection of `none`.
```

Specifying your tools selection is optional, omitting them results in the default selection of `none`.

## Kedro tools

Tools in Kedro serve as modular functionalities that enhance a foundational project template. They provide a means to tailor your Kedro project to meet your unique requirements. When creating a new project, you may select one or more of the available tools, or none at all.
Expand Down Expand Up @@ -235,7 +243,7 @@ The aim of this tool reflects Kedro's commitment to best practices in understand

If you did not initially select `docs` and want to implement it later you can do so by following the [official documentation](https://docs.kedro.org/en/stable/tutorial/package_a_project.html#add-documentation-to-a-kedro-project-if-you-have-not-selected-docs-tool) for guidance on adding documentation to a Kedro project.

### Data Structure
### Data structure

The Data Structure tool provides a local standardised folder hierarchy for your project data, which includes predefined folders such as raw, intermediate, and processed data, as determined by [data engineering convention](https://docs.kedro.org/en/stable/faq/faq.html#what-is-data-engineering-convention).
This is crucial if you want to include example pipelines during the creation of your project as it can not be omitted from the tool selections.
Expand All @@ -255,7 +263,7 @@ The `viz` tool will add visualisation to your project by including Kedro-Viz, wh
In addition, `viz` will also add setup for experiment tracking and plotting datasets.
See the [Kedro-Viz documentation](https://docs.kedro.org/projects/kedro-viz/en/stable/index.html) for more information on using this tool.

## Flowchart of example choice of tools and example selections
## Flowchart illustration

Here is a flowchart to help illustrate some example choice of tools you can select:

Expand Down

0 comments on commit 2e64459

Please sign in to comment.