Skip to content

Commit

Permalink
- adding SQL doc #2
Browse files Browse the repository at this point in the history
Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
  • Loading branch information
kristoffSC authored and scottsand-db committed Jun 26, 2023
1 parent f382819 commit 9c7fde5
Showing 1 changed file with 20 additions and 12 deletions.
32 changes: 20 additions & 12 deletions connectors/flink/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -331,25 +331,33 @@ using Flink API without Delta Catalog configured will cause SQL job to fail.
| [SQL INSERT](#insert-query) | Support both streaming and batch mode. |

### Delta Catalog
The Delta Catalog is meant to be a source of truth regarding Delta tables in Flink's SQL API.
That is why it is required by user to use Delta Catalog for every interaction with Delta table using Flink SQL query.
Such SQL query will fail if used without Delta Catalog properly configured for given SQL session.
The delta log is the source of truth for Delta tables, and the Delta Catalog is the only
Flink catalog implementation that enforces this.
It is required for every interaction with Delta tables via the Flink SQL API. If you attempt to use
any other catalog other than the Delta Catalog, your SQL query will fail.

At the same time, any other Flink connector (Kafka, Filesystem etc.) can be used with Delta Catalog unless it has any restrictions on its own.
This is achieved by Delta Catalog acting as a proxy for non Delta tables.
For Delta tables however, the Delta Catalog ensures that any DDL operation is reflected in underlying Delta table log.
In other words, Delta Catalog ensures that only valid Delta tables can be created and used by Flink job.
At the same time, any other Flink connector (Kafka, Filesystem etc.) can be used with Delta Catalog
(so long as it doesn't have any restrictions of its own). This is achieved by Delta Catalog acting
as a proxy for non-Delta tables.

For Delta tables, however, the Delta Catalog ensures that any DDL operation is reflected in the
underlying Delta table. In other words, the Delta Catalog ensures that only valid Delta tables
can be created and used by Flink job.

#### Decorated catalog
Delta Catalog is implemented using a decorator pattern. It decorates/wraps other [Catalog](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/)
implementation.
Delta Catalog acts as a wrapper around other [Catalog](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/) implementation.
Currently, we support `in-memory` and `hive` decorated catalogs.
The `in-memory` type is ephemeral and does not persist any data in external metastore. This means that
its bounded only to single session.

The `hive` type is based on Flink's Hive catalog where metadata is persistence external Hive metastore.
In this case, tables defined by user A can be used by user B.

For Delta tables, only minimum information such as database/table name, connector type and delta table file path will be stored in the metastore.
For Delta tables, only minimum information such as database/table name, connector type
and delta table file path will be stored in the metastore.
For Delta tables no information about table properties or schema will be stored in the metastore.
Delta Catalog will store those in `_delta_log`.

For non-Delta tables, Delta Catalog acts as a simple proxy and fully redirects every method call to decorated catalog.

#### Delta Catalog Configuration
A catalog is created and named by executing the following query:
```sql
Expand Down

0 comments on commit 9c7fde5

Please sign in to comment.