Skip to content

Releases: delta-io/delta-sharing

Delta Sharing 0.3.0

01 Dec 18:19
Compare
Choose a tag to compare

We are excited to announce the release of Delta Sharing 0.3.0, which introduces the following improvements and fixes issues:

Improvements:

  • Support Azure Blob Storage and Azure Data Lake Gen2 in Delta Sharing Server (#56, #59)
  • Apache Spark Connector now can send the limitHint parameter when a user query is using limit (#55)
  • load_as_pandas in Python Connector now accepts a limit parameter to allow users fetching only a few rows to explore (#76)
  • Apache Spark Connector will re-fetch pre-signed urls before they expire to support long running queries (#69)
  • Add a new API to list all tables in a share to save network round trips (#63, #66, #67, #88)
  • Add a User-Agent header to request sent from Apache Spark Connector and Python (#75)
  • Add an optional expirationTime field to Delta Sharing Profile File Format to provide the token expiration time (#77)

Bug fixes:

  • Fix a corner case that list_all_tables may not return correct results in the Python Connector (#84)

Credits: Denny Lee, Felix Cheung, Lin Zhou, Matei Zaharia, Shixiong Zhu, Will Girten, Xiaotong Sun, Yuhong Chen, kohei-tosshy, William Chau

Delta Sharing 0.2.0

11 Aug 05:31
Compare
Choose a tag to compare

We are excited to announce the release of Delta Sharing 0.2.0, which introduces the following improvements and fixes multiple issues:

Improvements:

  • Added official Docker images for Delta Sharing Server
  • Added an examples project to show how to try the open Delta Sharing Server (#26)
  • Added the conf directory to the Delta Sharing Server classpath to allow users to add their Hadoop configuration files in the directory (#45)
  • Added retry with exponential backoff for REST requests in the Python connector (#49)

Bug fixes:

  • Added the minimum fsspec requirement in the Python connector (#23)
  • Fixed an issue when files in a table have no stats in the Python connector (#30)
  • Improve error handling in Delta Sharing Server to report 400 Bad Request properly (#32)
  • Fixed the table schema when a table is empty in the Python connector (#37)
  • Fixed KeyError when there are no shared tables in the Python connector (#50)

Credits: Denny Lee, Matei Zaharia, Shixiong Zhu, Yaohua, Yuhong Chen, dobachi

Delta Sharing 0.1.0

26 May 04:59
Compare
Choose a tag to compare

We are excited to announce the release of Delta Sharing 0.1.0.

Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use. It is a simple REST protocol that securely shares access to part of a cloud dataset and leverages modern cloud storage systems, such as S3, ADLS, or GCS, to reliably transfer data.

With Delta Sharing, a user accessing shared data can directly connect to it through pandas, Tableau, Apache Spark, Rust, Python, or dozens of other systems that support the open protocol, without having to deploy a specific compute platform first. This makes life simpler for both data providers and consumers. Data providers can share a dataset once to reach a broad range of consumers on any platform, and data consumers can get started using the data in minutes on their existing computing tools.

This repo includes the following components:

  • Delta Sharing protocol specification.
  • Python Connector: A Python library that implements the Delta Sharing Protocol to read shared tables as pandas DataFrame or Apache Spark DataFrames.
  • Apache Spark Connector: An Apache Spark connector that implements the Delta Sharing Protocol to read shared tables from a Delta Sharing Server. The tables can then be accessed in SQL, Python, Java, Scala, or R.
  • Delta Sharing Server: A reference implementation server for the Delta Sharing Protocol for development purposes. Users can deploy this server to share existing tables in Delta Lake and Apache Parquet format on modern cloud storage systems.

See the documentation for more details.