-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49152][SQL] V2SessionCatalog should use V2Command #47724
Conversation
V2SessionCatalog should use V2Command when possible. This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog. No Existing tests. NO Closes apache#47660 from amaliujia/create_table_v2. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ShowTablesCommand(Some(db), pattern, output) | ||
|
||
case ShowTableExtended( | ||
DatabaseInSessionCatalog(db), | ||
ResolvedV1Database(db), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResolvedV1Database(db), | |
ResolvedV1Database(db), |
…/ResolveSessionCatalog.scala
thanks, merging to 3.5! |
### What changes were proposed in this pull request? V2SessionCatalog should use V2Command when possible. ### Why are the changes needed? This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #47724 from amaliujia/branch-3.5. Lead-authored-by: Rui Wang <rui.wang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@@ -579,6 +581,18 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager) | |||
} | |||
|
|||
object ResolvedV1Identifier { | |||
def unapply(resolved: LogicalPlan): Option[TableIdentifier] = resolved match { | |||
case ResolvedIdentifier(catalog, ident) if supportsV1Command(catalog) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amaliujia @cloud-fan
This change looks to have broken creating V1 table from a V2_SESSION_CATALOG_IMPLEMENTATION
like Iceberg's SparkSessionCatalog
spark/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Lines 159 to 170 in e693e18
// For CREATE TABLE [AS SELECT], we should use the v1 command if the catalog is resolved to the | |
// session catalog and the table provider is not v2. | |
case c @ CreateTable(ResolvedV1Identifier(ident), _, _, tableSpec: TableSpec, _) => | |
val (storageFormat, provider) = getStorageFormatAndProvider( | |
c.tableSpec.provider, tableSpec.options, c.tableSpec.location, c.tableSpec.serde, | |
ctas = false) | |
if (!isV2Provider(provider)) { | |
constructV1TableCmd(None, c.tableSpec, ident, c.tableSchema, c.partitioning, | |
c.ignoreIfExists, storageFormat, provider) | |
} else { | |
c | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Iceberg catalog extend DelegatingCatalogExtension
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Iceberg catalog extend DelegatingCatalogExtension?
Nope.
We do want to use v2 commands for custom catalogs that do not extend DelegatingCatalogExtension
Even so, is it the right time to introduce such a behavior change in a bug fix release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider it as a bug. People implementing DS V2 catalog APIs expect to see v2 commands to customize the table behaviors. And there is a backdoor: DelegatingCatalogExtension
.
For iceberg, it should be easy to work around it by extending DelegatingCatalogExtension
? Iceberg catalog can still keep all its methods unchanged, don't use the delegate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg's SparkSessionCatalog already extends a base class. There's no easy way to extend DelegatingCatalogExtension
without a major refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make either the iceberg BaseCatalog
or the Spark DelegatingCatalogExtension
an interface. It looks easier to make BaseCatalog
an interface?
### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. #47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 339dd5b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. apache/spark#47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? This PR updates `DelegatingCatalogExtension` so that it's more extendable - `initialize` becomes not final, so that sub-classes can overwrite it - `delegate` becomes `protected`, so that sub-classes can access it In addition, this PR fixes a mistake that `DelegatingCatalogExtension` is just a convenient default implementation, it's actually the `CatalogExtension` interface that indicates this catalog implementation will delegate requests to the Spark session catalog. apache#47724 should use `CatalogExtension` instead. ### Why are the changes needed? Unblock the Iceberg extension. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48257 from cloud-fan/catalog. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
V2SessionCatalog should use V2Command when possible.
Why are the changes needed?
This is because the session catalog can be overwritten thus the overwritten's catalog should use v2 commands, otherwise the V1Command will still call hive metastore or the built-in session catalog.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
NO