From bfb29c9bf13f5f2e0cce4c0ab4bc9c5f273ff070 Mon Sep 17 00:00:00 2001 From: Enrico Minack Date: Fri, 26 Apr 2024 20:20:26 +0200 Subject: [PATCH] Releasing 2.12.0 --- CHANGELOG.md | 2 +- README.md | 20 ++++++++++---------- pom.xml | 2 +- python/README.md | 20 ++++++++++---------- python/setup.py | 2 +- 5 files changed, 23 insertions(+), 23 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2346917d..8da9d481 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). -## [UNRELEASED] - YYYY-MM-DD +## [2.12.0] - 2024-04-26 ## Fixes diff --git a/README.md b/README.md index 9dd94a73..27f24aef 100644 --- a/README.md +++ b/README.md @@ -198,7 +198,7 @@ The package version has the following semantics: `spark-extension_{SCALA_COMPAT_ Add this line to your `build.sbt` file: ```sbt -libraryDependencies += "uk.co.gresearch.spark" %% "spark-extension" % "2.11.0-3.5" +libraryDependencies += "uk.co.gresearch.spark" %% "spark-extension" % "2.12.0-3.5" ``` ### Maven @@ -209,7 +209,7 @@ Add this dependency to your `pom.xml` file: uk.co.gresearch.spark spark-extension_2.12 - 2.11.0-3.5 + 2.12.0-3.5 ``` @@ -219,7 +219,7 @@ Add this dependency to your `build.gradle` file: ```groovy dependencies { - implementation "uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5" + implementation "uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5" } ``` @@ -228,7 +228,7 @@ dependencies { Submit your Spark app with the Spark Extension dependency (version ≥1.1.0) as follows: ```shell script -spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5 [jar] +spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5 [jar] ``` Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your Spark version. @@ -238,7 +238,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depe Launch a Spark Shell with the Spark Extension dependency (version ≥1.1.0) as follows: ```shell script -spark-shell --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5 +spark-shell --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5 ``` Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your Spark Shell version. @@ -254,7 +254,7 @@ from pyspark.sql import SparkSession spark = SparkSession \ .builder \ - .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5") \ + .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5") \ .getOrCreate() ``` @@ -265,7 +265,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depe Launch the Python Spark REPL with the Spark Extension dependency (version ≥1.1.0) as follows: ```shell script -pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5 +pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5 ``` Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your PySpark version. @@ -275,7 +275,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depe Run your Python scripts that use PySpark via `spark-submit`: ```shell script -spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5 [script.py] +spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5 [script.py] ``` Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your Spark version. @@ -289,7 +289,7 @@ Running your Python application on a Spark cluster will still require one of the to add the Scala package to the Spark environment. ```shell script -pip install pyspark-extension==2.11.0.3.5 +pip install pyspark-extension==2.12.0.3.5 ``` Note: Pick the right Spark version (here 3.5) depending on your PySpark version. @@ -299,7 +299,7 @@ Note: Pick the right Spark version (here 3.5) depending on your PySpark version. There are plenty of [Data Science notebooks](https://datasciencenotebook.org/) around. To use this library, add **a jar dependency** to your notebook using these **Maven coordinates**: - uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5 + uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5 Or [download the jar](https://mvnrepository.com/artifact/uk.co.gresearch.spark/spark-extension) and place it on a filesystem where it is accessible by the notebook, and reference that jar file directly. diff --git a/pom.xml b/pom.xml index 6f0a4bbe..19a85db0 100644 --- a/pom.xml +++ b/pom.xml @@ -2,7 +2,7 @@ 4.0.0 uk.co.gresearch.spark spark-extension_2.13 - 2.12.0-3.5-SNAPSHOT + 2.12.0-3.5 Spark Extension A library that provides useful extensions to Apache Spark. 2020 diff --git a/python/README.md b/python/README.md index fc331ef5..c0812e7a 100644 --- a/python/README.md +++ b/python/README.md @@ -2,20 +2,20 @@ This project provides extensions to the [Apache Spark project](https://spark.apache.org/) in Scala and Python: -**[Diff](https://github.com/G-Research/spark-extension/blob/v2.11.0/DIFF.md):** A `diff` transformation and application for `Dataset`s that computes the differences between +**[Diff](https://github.com/G-Research/spark-extension/blob/v2.12.0/DIFF.md):** A `diff` transformation and application for `Dataset`s that computes the differences between two datasets, i.e. which rows to _add_, _delete_ or _change_ to get from one dataset to the other. -**[Histogram](https://github.com/G-Research/spark-extension/blob/v2.11.0/HISTOGRAM.md):** A `histogram` transformation that computes the histogram DataFrame for a value column. +**[Histogram](https://github.com/G-Research/spark-extension/blob/v2.12.0/HISTOGRAM.md):** A `histogram` transformation that computes the histogram DataFrame for a value column. -**[Global Row Number](https://github.com/G-Research/spark-extension/blob/v2.11.0/ROW_NUMBER.md):** A `withRowNumbers` transformation that provides the global row number w.r.t. +**[Global Row Number](https://github.com/G-Research/spark-extension/blob/v2.12.0/ROW_NUMBER.md):** A `withRowNumbers` transformation that provides the global row number w.r.t. the current order of the Dataset, or any given order. In contrast to the existing SQL function `row_number`, which requires a window spec, this transformation provides the row number across the entire Dataset without scaling problems. -**[Inspect Parquet files](https://github.com/G-Research/spark-extension/blob/v2.11.0/PARQUET.md):** The structure of Parquet files (the metadata, not the data stored in Parquet) can be inspected similar to [parquet-tools](https://pypi.org/project/parquet-tools/) +**[Inspect Parquet files](https://github.com/G-Research/spark-extension/blob/v2.12.0/PARQUET.md):** The structure of Parquet files (the metadata, not the data stored in Parquet) can be inspected similar to [parquet-tools](https://pypi.org/project/parquet-tools/) or [parquet-cli](https://pypi.org/project/parquet-cli/) by reading from a simple Spark data source. This simplifies identifying why some Parquet files cannot be split by Spark into scalable partitions. -**[Install Python packages into PySpark job](https://github.com/G-Research/spark-extension/blob/v2.11.0/PYSPARK-DEPS.md):** Install Python dependencies via PIP or Poetry programatically into your running PySpark job (PySpark ≥ 3.1.0): +**[Install Python packages into PySpark job](https://github.com/G-Research/spark-extension/blob/v2.12.0/PYSPARK-DEPS.md):** Install Python dependencies via PIP or Poetry programatically into your running PySpark job (PySpark ≥ 3.1.0): ```python # noinspection PyUnresolvedReferences @@ -94,7 +94,7 @@ Running your Python application on a Spark cluster will still require one of the to add the Scala package to the Spark environment. ```shell script -pip install pyspark-extension==2.11.0.3.4 +pip install pyspark-extension==2.12.0.3.4 ``` Note: Pick the right Spark version (here 3.4) depending on your PySpark version. @@ -108,7 +108,7 @@ from pyspark.sql import SparkSession spark = SparkSession \ .builder \ - .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4") \ + .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4") \ .getOrCreate() ``` @@ -119,7 +119,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depe Launch the Python Spark REPL with the Spark Extension dependency (version ≥1.1.0) as follows: ```shell script -pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4 +pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4 ``` Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depending on your PySpark version. @@ -129,7 +129,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depe Run your Python scripts that use PySpark via `spark-submit`: ```shell script -spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4 [script.py] +spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4 [script.py] ``` Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depending on your Spark version. @@ -139,7 +139,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depe There are plenty of [Data Science notebooks](https://datasciencenotebook.org/) around. To use this library, add **a jar dependency** to your notebook using these **Maven coordinates**: - uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4 + uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4 Or [download the jar](https://mvnrepository.com/artifact/uk.co.gresearch.spark/spark-extension) and place it on a filesystem where it is accessible by the notebook, and reference that jar file directly. diff --git a/python/setup.py b/python/setup.py index df535713..a5843f43 100755 --- a/python/setup.py +++ b/python/setup.py @@ -17,7 +17,7 @@ from pathlib import Path from setuptools import setup -jar_version = '2.12.0-3.5-SNAPSHOT' +jar_version = '2.12.0-3.5' scala_version = '2.13.8' scala_compat_version = '.'.join(scala_version.split('.')[:2]) spark_compat_version = jar_version.split('-')[1]