diff --git a/.github/styles/Kedro/ignore.txt b/.github/styles/Kedro/ignore.txt index 805a77b1db..004a050817 100644 --- a/.github/styles/Kedro/ignore.txt +++ b/.github/styles/Kedro/ignore.txt @@ -35,3 +35,5 @@ VSCode Astro Xebia pytest +transcoding +transcode diff --git a/docs/source/integrations/pyspark_integration.md b/docs/source/integrations/pyspark_integration.md index c0e5cec08b..392f88cc3b 100644 --- a/docs/source/integrations/pyspark_integration.md +++ b/docs/source/integrations/pyspark_integration.md @@ -110,7 +110,7 @@ assert isinstance(df, pyspark.sql.DataFrame) [Delta Lake](https://delta.io/) is an open-source project that enables building a Lakehouse architecture on top of data lakes. It provides ACID transactions and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS, GCS, and HDFS. To setup PySpark with Delta Lake, have a look at [the recommendations in Delta Lake's documentation](https://docs.delta.io/latest/quick-start.html#python). -We recommend the following workflow, which makes use of the [transcoding feature in Kedro](../data/data_catalog.md): +We recommend the following workflow, which makes use of the [transcoding feature in Kedro](../data/data_catalog_yaml_examples.md#read-the-same-file-using-two-different-datasets): * To create a Delta table, use a `SparkDataSet` with `file_format="delta"`. You can also use this type of dataset to read from a Delta table and/or overwrite it. * To perform [Delta table deletes, updates, and merges](https://docs.delta.io/latest/delta-update.html#language-python), load the data using a `DeltaTableDataSet` and perform the write operations within the node function.