Skip to content

Commit

Permalink
Added support of toml config file (#759)
Browse files Browse the repository at this point in the history
  • Loading branch information
andrii-ivaniuk authored Aug 20, 2020
1 parent ee0ea85 commit 69c1d1c
Show file tree
Hide file tree
Showing 10 changed files with 201 additions and 61 deletions.
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## Major features and improvements
* Added `register_pipelines()`, a new hook to register a project's pipelines. The order of execution is: plugin hooks, `.kedro.yml` hooks, hooks in `ProjectContext.hooks`.
* Added support for `pyproject.toml` to configure Kedro. `pyproject.toml` is used if `.kedro.yml` doesn't exist (Kedro configuration should be under `[tool.kedro]` section).

## Bug fixes and other changes
* `project_name`, `project_version` and `package_name` now have to be defined in `.kedro.yml` for the projects generated using Kedro 0.16.5+.
Expand Down
16 changes: 11 additions & 5 deletions docs/source/07_extend_kedro/04_hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Hooks are a mechanism to add extra behaviour to Kedro's main execution in an eas
A Hook is comprised of a Hook specification and Hook implementation. To add Hooks to your project you will need to:

* Provide a Hook implementation for an existing Kedro-defined Hook specification
* Register your Hook implementation in your `ProjectContext` or `.kedro.yml`
* Register your Hook implementation in the `ProjectContext`, `.kedro.yml`, or `pyproject.toml` under `[tool.kedro]` section if `.kedro.yml` doesn't exist.


### Hook specification
Expand Down Expand Up @@ -95,7 +95,7 @@ We recommend that you group related Hook implementations under a namespace, pref

#### Registering your Hook implementations with Kedro

Hook implementations should be registered with Kedro either through code, in `ProjectContext`, or using static configuration in `.kedro.yml`.
Hook implementations should be registered with Kedro either through code, in `ProjectContext`, or using a static configuration in `.kedro.yml` (if it exists) otherwise in `pyproject.toml` under the `[tool.kedro]` section.

You can register more than one implementation for the same specification. They will be called in LIFO (last-in, first-out) order.

Expand All @@ -107,8 +107,6 @@ from your_project.hooks import TransformerHooks


class ProjectContext(KedroContext):
project_name = "kedro-tutorial"
project_version = "0.16.4"

hooks = (
# register the collection of your Hook implementations here.
Expand All @@ -127,10 +125,18 @@ hooks:
- your_project.hooks.transformer_hooks
```
If `.kedro.yml` doesn't exist you can use `pyproject.toml` instead as follows:

```toml
# <your_project>/pyproject.toml
[tool.kedro]
hooks=["your_project.hooks.transformer_hooks"]
```


Kedro also has auto-discovery enabled by default. This means that any installed plugins that declare a Hooks entry-point will be registered. To learn more about how to enable this for your custom plugin, see our [plugin development guide](../07_extend_kedro/05_plugins.md#hooks).

>Note: Auto-discovered Hooks will run *first*, followed by the ones specified in `.kedro.yml`, and finally `ProjectContext.hooks`.
>Note: Auto-discovered Hooks will run *first*, followed by the ones specified in `.kedro.yml` or `pyproject.toml` (if `.kedro.yml` doesn't exist), and finally `ProjectContext.hooks`.

## Under the hood

Expand Down
2 changes: 1 addition & 1 deletion docs/source/07_extend_kedro/05_plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ While running, plugins may request information about the current project by call
This function provides access to the verbose flag via the key `verbose` and to anything returned by the project's `KedroContext`. The returned instance of `ProjectContext(KedroContext)` class must contain at least the following properties and methods:

* `project_version`: the version of Kedro the project was created with, or `None` if the project was not created with `kedro new`.
* `project_path`: the path to the directory where `.kedro.yml` is located.
* `project_path`: the path to the directory where either `.kedro.yml` or `pyproject.toml` is located.
* `config_loader`: an instance of `kedro.config.ConfigLoader`.
* `catalog`: an instance of `kedro.io.DataCatalog`.
* `pipeline`: an instance of `kedro.pipeline.Pipeline`.
Expand Down
3 changes: 0 additions & 3 deletions docs/source/10_tools_integration/01_pyspark.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,6 @@ from pyspark.sql import SparkSession
class ProjectContext(KedroContext):
project_name = "kedro"
project_version = "0.16.4"
def __init__(
self,
project_path: Union[Path, str],
Expand Down
4 changes: 3 additions & 1 deletion docs/source/11_faq/02_architecture_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ A Python file located in `src/<python_package>/run.py`, which by default contain

`.kedro.yml` must be located at the root of the project.

> *Note:* Since Kedro 0.16.5, the `.kedro.yml` file is optional, instead a `pyproject.toml` file can be used with the same content under `[tool.kedro]` section.
#### `00-kedro-init.py`

This script is automatically invoked at IPython kernel startup when calling `kedro jupyter notebook`, `kedro jupyter lab` and `kedro ipython` CLI commands. `00-kedro-init.py` creates an instance of `ProjectContext` object, which can be used to interact with the current project right away.
Expand Down Expand Up @@ -68,7 +70,7 @@ A python function that instantiates the project context by calling `load_context
#### `load_context()`

A python function that locates Kedro project based on `.kedro.yml` and instantiates the project context.
A python function that locates Kedro project based on `.kedro.yml` or `pyproject.toml` (if `.kedro.yml` doesn't exist) and instantiates the project context.

#### `KedroContext`

Expand Down
7 changes: 4 additions & 3 deletions kedro/config/templated_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,10 @@ class TemplatedConfigLoader(ConfigLoader):
the config_loader method, making it return a ``TemplatedConfigLoader``
object instead of a ``ConfigLoader`` object.
For this method to work, the context_path variable in `.kedro.yml` needs
to be pointing at this newly created class. The `run.py` script has an
extension of the ``KedroContext`` by default, called the ``ProjectContext``.
For this method to work, the context_path variable in `.kedro.yml` (if exists) or
in `pyproject.toml` under `[tool.kedro]` section needs to be pointing at this newly
created class. The `run.py` script has an extension of the ``KedroContext`` by default,
called the ``ProjectContext``.
Example:
::
Expand Down
10 changes: 3 additions & 7 deletions kedro/framework/cli/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@
from typing import Iterable, List, Sequence, Tuple, Union

import click
import yaml

from kedro.framework.context import get_static_project_data

CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
MAX_SUGGESTIONS = 3
Expand Down Expand Up @@ -255,12 +256,7 @@ def get_source_dir(project_path: Path) -> Path:
DeprecationWarning,
)

with (project_path / ".kedro.yml").open("r") as kedro_yml:
kedro_yaml = yaml.safe_load(kedro_yml)

source_dir = Path(kedro_yaml.get("source_dir", "src")).expanduser()
source_path = (project_path / source_dir).resolve()
return source_path
return get_static_project_data(project_path)["source_dir"]


def _check_module_importable(module_name: str) -> None:
Expand Down
78 changes: 56 additions & 22 deletions kedro/framework/context/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
from urllib.parse import urlparse
from warnings import warn

import yaml
import anyconfig

from kedro import __version__
from kedro.config import ConfigLoader, MissingConfigException
Expand All @@ -54,6 +54,9 @@

_PLUGIN_HOOKS = "kedro.hooks" # entry-point to load hooks from for installed plugins

# Kedro configuration files in the precedence order
KEDRO_CONFIGS = (".kedro.yml", "pyproject.toml")


def _version_mismatch_error(context_version) -> str:
return (
Expand Down Expand Up @@ -342,7 +345,7 @@ def _register_hooks(self, auto: bool = False) -> None:
"""
hook_manager = get_hook_manager()

# enrich with hooks specified in .kedro.yml
# enrich with hooks specified in .kedro.yml or pyproject.toml if .kedro.yml doesn't exist
hooks_locations = self.static_data.get("hooks", [])
configured_hooks = tuple(load_obj(hook) for hook in hooks_locations)

Expand Down Expand Up @@ -765,37 +768,61 @@ def _get_save_version(


def get_static_project_data(project_path: Union[str, Path]) -> Dict[str, Any]:
"""Read static project data from `<project_path>/.kedro.yml` config file.
"""Read static project data from `<project_path>/.kedro.yml` config file if it
exists otherwise from `<project_path>/pyproject.toml` (under `[tool.kedro]` section).
Args:
project_path: Local path to project root directory to look up `.kedro.yml` in.
project_path: Local path to project root directory to look up `.kedro.yml` or
`pyproject.toml` in.
Raises:
KedroContextError: `.kedro.yml` was not found or cannot be parsed.
KedroContextError: Neither '.kedro.yml' nor `pyproject.toml` was found
or `[tool.kedro]` section is missing in `pyproject.toml`, or config file
cannot be parsed.
Returns:
A mapping that contains static project data.
"""
project_path = Path(project_path).expanduser().resolve()
kedro_yml = project_path / ".kedro.yml"

try:
with kedro_yml.open("r") as _f:
static_data = yaml.safe_load(_f)
except FileNotFoundError:
config_paths = [
project_path / conf_file
for conf_file in KEDRO_CONFIGS
if (project_path / conf_file).is_file()
]

if not config_paths:
configs = ", ".join(KEDRO_CONFIGS)
raise KedroContextError(
f"Could not find '.kedro.yml' in {project_path}. If you have "
f"created your project with Kedro version <0.15.0, make sure to "
f"update your project template. See "
f"https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md "
f"Could not find any of configuration files '{configs}' in {project_path}. "
f"If you have created your project with Kedro "
f"version <0.15.0, make sure to update your project template. "
f"See https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md"
f"#migration-guide-from-kedro-014-to-kedro-0150 "
f"for how to migrate your Kedro project."
)

# First found wins
kedro_config = config_paths[0]
try:
static_data = anyconfig.load(kedro_config)
except Exception:
raise KedroContextError("Failed to parse '.kedro.yml' file")
raise KedroContextError(f"Failed to parse '{kedro_config.name}' file.")

if kedro_config.suffix == ".toml":
try:
static_data = static_data["tool"]["kedro"]
except KeyError:
raise KedroContextError(
f"There's no '[tool.kedro]' section in the '{kedro_config.name}'. "
f"Please add '[tool.kedro]' section to the file with appropriate "
f"configuration parameters."
)

source_dir = Path(static_data.get("source_dir", "src")).expanduser()
source_dir = (project_path / source_dir).resolve()
static_data["source_dir"] = source_dir
static_data["config_file"] = kedro_config

return static_data

Expand Down Expand Up @@ -841,8 +868,9 @@ def load_package_context(
Instance of ``KedroContext`` class defined in Kedro project.
Raises:
KedroContextError: Either '.kedro.yml' was not found
or loaded context has package conflict.
KedroContextError: Neither '.kedro.yml' nor `pyproject.toml` was found
or `[tool.kedro]` section is missing in `pyproject.toml`, or loaded context
has package conflict.
"""
context_path = f"{package_name}.run.ProjectContext"
try:
Expand Down Expand Up @@ -871,8 +899,12 @@ def load_context(project_path: Union[str, Path], **kwargs) -> KedroContext:
|__ <src_dir>
|__ .kedro.yml
|__ kedro_cli.py
|__ pyproject.toml
The name of the <scr_dir> is `src` by default and configurable in `.kedro.yml`.
The name of the <scr_dir> is `src` by default. The `.kedro.yml` or `pyproject.toml` can
be used for configuration. If `.kedro.yml` exists, it will be used otherwise, `pyproject.toml`
will be treated as the configuration file (Kedro configuration should be under
`[tool.kedro]` section).
Args:
project_path: Path to the Kedro project.
Expand All @@ -882,8 +914,9 @@ def load_context(project_path: Union[str, Path], **kwargs) -> KedroContext:
Instance of ``KedroContext`` class defined in Kedro project.
Raises:
KedroContextError: Either '.kedro.yml' was not found
or loaded context has package conflict.
KedroContextError: Neither '.kedro.yml' nor `pyproject.toml` was found
or `[tool.kedro]` section is missing in `pyproject.toml`, or loaded context
has package conflict.
"""
project_path = Path(project_path).expanduser().resolve()
Expand All @@ -893,9 +926,10 @@ def load_context(project_path: Union[str, Path], **kwargs) -> KedroContext:
validate_source_path(source_dir, project_path)

if "context_path" not in static_data:
conf_file = static_data["config_file"].name
raise KedroContextError(
"'.kedro.yml' doesn't have a required `context_path` field. "
"Please refer to the documentation."
f"'{conf_file}' doesn't have a required `context_path` field. "
f"Please refer to the documentation."
)

if str(source_dir) not in sys.path:
Expand Down
87 changes: 87 additions & 0 deletions tests/framework/context/test_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
_convert_paths_to_absolute_posix,
_is_relative_path,
_validate_layers_for_transcoding,
get_static_project_data,
)
from kedro.io.core import Version, generate_timestamp
from kedro.pipeline import Pipeline, node
Expand Down Expand Up @@ -857,3 +858,89 @@ def test_non_existent_source_path(self, tmp_path):
pattern = re.escape(f"Source path '{source_path}' cannot be found.")
with pytest.raises(KedroContextError, match=pattern):
validate_source_path(source_path, tmp_path.resolve())


class TestGetStaticProjectData:
project_path = Path.cwd()

def test_no_config_files(self, mocker):
mocker.patch.object(Path, "is_file", return_value=False)

pattern = (
f"Could not find any of configuration files '.kedro.yml, pyproject.toml' "
f"in {self.project_path}"
)
with pytest.raises(KedroContextError, match=re.escape(pattern)):
get_static_project_data(self.project_path)

def test_kedro_yml_invalid_format(self, tmp_path):
"""Test for loading context from an invalid path. """
kedro_yml_path = tmp_path / ".kedro.yml"
kedro_yml_path.write_text("!!") # Invalid YAML
pattern = "Failed to parse '.kedro.yml' file"
with pytest.raises(KedroContextError, match=re.escape(pattern)):
get_static_project_data(str(tmp_path))

def test_toml_invalid_format(self, tmp_path):
"""Test for loading context from an invalid path. """
toml_path = tmp_path / "pyproject.toml"
toml_path.write_text("!!") # Invalid TOML
pattern = "Failed to parse 'pyproject.toml' file"
with pytest.raises(KedroContextError, match=re.escape(pattern)):
get_static_project_data(str(tmp_path))

def test_valid_yml_file_exists(self, mocker):
# Both yml and toml files exist
mocker.patch.object(Path, "is_file", return_value=True)
mocker.patch("anyconfig.load", return_value={})

static_data = get_static_project_data(self.project_path)

# Using default source directory
assert static_data == {
"source_dir": self.project_path / "src",
"config_file": self.project_path / ".kedro.yml",
}

def test_valid_toml_file(self, mocker):
# .kedro.yml doesn't exists
mocker.patch.object(Path, "is_file", side_effect=[False, True])
mocker.patch("anyconfig.load", return_value={"tool": {"kedro": {}}})

static_data = get_static_project_data(self.project_path)

# Using default source directory
assert static_data == {
"source_dir": self.project_path / "src",
"config_file": self.project_path / "pyproject.toml",
}

def test_toml_file_without_kedro_section(self, mocker):
mocker.patch.object(Path, "is_file", side_effect=[False, True])
mocker.patch("anyconfig.load", return_value={})

pattern = "There's no '[tool.kedro]' section in the 'pyproject.toml'."

with pytest.raises(KedroContextError, match=re.escape(pattern)):
get_static_project_data(self.project_path)

def test_source_dir_specified_in_yml(self, mocker):
mocker.patch.object(Path, "is_file", side_effect=[True, False])
source_dir = "test_dir"
mocker.patch("anyconfig.load", return_value={"source_dir": source_dir})

static_data = get_static_project_data(self.project_path)

assert static_data["source_dir"] == self.project_path / source_dir

def test_source_dir_specified_in_toml(self, mocker):
mocker.patch.object(Path, "is_file", side_effect=[False, True])
source_dir = "test_dir"
mocker.patch(
"anyconfig.load",
return_value={"tool": {"kedro": {"source_dir": source_dir}}},
)

static_data = get_static_project_data(self.project_path)

assert static_data["source_dir"] == self.project_path / source_dir
Loading

0 comments on commit 69c1d1c

Please sign in to comment.