Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49152][SQL] V2SessionCatalog should use V2Command #47724

Closed
wants to merge 3 commits into from

Conversation

amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Aug 12, 2024

What changes were proposed in this pull request?

V2SessionCatalog should use V2Command when possible.

Why are the changes needed?

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

NO

V2SessionCatalog should use V2Command when possible.

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.

No

Existing tests.

 NO

Closes apache#47660 from amaliujia/create_table_v2.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@github-actions github-actions bot added the SQL label Aug 12, 2024
@amaliujia
Copy link
Contributor Author

@cloud-fan

ShowTablesCommand(Some(db), pattern, output)

case ShowTableExtended(
DatabaseInSessionCatalog(db),
ResolvedV1Database(db),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ResolvedV1Database(db),
ResolvedV1Database(db),

@cloud-fan
Copy link
Contributor

thanks, merging to 3.5!

cloud-fan added a commit that referenced this pull request Aug 13, 2024
### What changes were proposed in this pull request?

V2SessionCatalog should use V2Command when possible.

### Why are the changes needed?

This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.
### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

NO

Closes #47724 from amaliujia/branch-3.5.

Lead-authored-by: Rui Wang <rui.wang@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this Aug 13, 2024
@@ -579,6 +581,18 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager)
}

object ResolvedV1Identifier {
def unapply(resolved: LogicalPlan): Option[TableIdentifier] = resolved match {
case ResolvedIdentifier(catalog, ident) if supportsV1Command(catalog) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaliujia @cloud-fan
This change looks to have broken creating V1 table from a V2_SESSION_CATALOG_IMPLEMENTATION like Iceberg's SparkSessionCatalog

// For CREATE TABLE [AS SELECT], we should use the v1 command if the catalog is resolved to the
// session catalog and the table provider is not v2.
case c @ CreateTable(ResolvedV1Identifier(ident), _, _, tableSpec: TableSpec, _) =>
val (storageFormat, provider) = getStorageFormatAndProvider(
c.tableSpec.provider, tableSpec.options, c.tableSpec.location, c.tableSpec.serde,
ctas = false)
if (!isV2Provider(provider)) {
constructV1TableCmd(None, c.tableSpec, ident, c.tableSchema, c.partitioning,
c.ignoreIfExists, storageFormat, provider)
} else {
c
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Iceberg catalog extend DelegatingCatalogExtension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension

Copy link
Contributor

@manuzhang manuzhang Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Iceberg catalog extend DelegatingCatalogExtension?

Nope.

We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension

Even so, is it the right time to introduce such a behavior change in a bug fix release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider it as a bug. People implementing DS V2 catalog APIs expect to see v2 commands to customize the table behaviors. And there is a backdoor: DelegatingCatalogExtension.

For iceberg, it should be easy to work around it by extending DelegatingCatalogExtension? Iceberg catalog can still keep all its methods unchanged, don't use the delegate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg's SparkSessionCatalog already extends a base class. There's no easy way to extend DelegatingCatalogExtension without a major refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make either the iceberg BaseCatalog or the Spark DelegatingCatalogExtension an interface. It looks easier to make BaseCatalog an interface?

dongjoon-hyun pushed a commit that referenced this pull request Sep 26, 2024
### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Sep 26, 2024
### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 339dd5b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
a0x8o added a commit to a0x8o/spark that referenced this pull request Sep 26, 2024
### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. apache/spark#47724 should use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

This PR updates `DelegatingCatalogExtension` so that it's more extendable
- `initialize` becomes not final, so that sub-classes can overwrite it
- `delegate` becomes `protected`, so that sub-classes can access it

In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. apache#47724 should use `CatalogExtension` instead.

### Why are the changes needed?

Unblock the Iceberg extension.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#48257 from cloud-fan/catalog.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants