Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49183][SQL] V2SessionCatalog.createTable should respect PROP_IS_MANAGED_LOCATION #47684

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Even if the table definition has a location, the table should be a managed table if PROP_IS_MANAGED_LOCATION is specified, in V2SessionCatalog.createTable.

Why are the changes needed?

It's a bug fix. A custom spark_catalog may generate custom location for managed table and delegate the actual table creation to V2SessionCatalog. The table should still be a managed table if PROP_IS_MANAGED_LOCATION is specified.

Does this PR introduce any user-facing change?

Yes, now users who use custom spark_catalog that generates custom location for managed table, can correctly create managed tables.

How was this patch tested?

a new test

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Aug 9, 2024
@cloud-fan
Copy link
Contributor Author

@yaooqinn @amaliujia

@yaooqinn
Copy link
Member

yaooqinn commented Aug 9, 2024

Do we need to update the API doc to make it more precise?

 /**
   * A reserved property to indicate that the table location is managed, not user-specified.
   * If this property is "true", SHOW CREATE TABLE will not generate the LOCATION clause.
   */

@@ -170,7 +170,8 @@ class V2SessionCatalog(catalog: SessionCatalog)
val storage = DataSource.buildStorageFormatFromOptions(toOptions(tableProperties))
.copy(locationUri = location.map(CatalogUtils.stringToURI))
val isExternal = properties.containsKey(TableCatalog.PROP_EXTERNAL)
val tableType = if (isExternal || location.isDefined) {
val isManagedLocation = properties.containsKey(TableCatalog.PROP_IS_MANAGED_LOCATION)
Copy link
Member

@yaooqinn yaooqinn Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to test if the value is true or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed how PROP_EXTERNAL is handled.

@@ -120,7 +120,7 @@ case class ShowCreateTableExec(
private def showTableLocation(table: Table, builder: StringBuilder): Unit = {
val isManagedOption = Option(table.properties.get(TableCatalog.PROP_IS_MANAGED_LOCATION))
// Only generate LOCATION clause if it's not managed.
if (isManagedOption.forall(_.equalsIgnoreCase("false"))) {
if (isManagedOption.isEmpty || !isManagedOption.get.equalsIgnoreCase("true")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small fix to match the document. true means effective, non true means not effective.

@@ -52,7 +52,8 @@ public interface TableCatalog extends CatalogPlugin {

/**
* A reserved property to indicate that the table location is managed, not user-specified.
* If this property is "true", SHOW CREATE TABLE will not generate the LOCATION clause.
* If this property is "true", it means it's a managed table even if it has a location. As an
* example, SHOW CREATE TABLE will not generate the LOCATION clause.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to also document non true case?

Is non true value invalid, or it is valid but ignored, etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-true case is hard to document, as treating tables with location as external tables is a Hive-specific behavior.

sql(s"CREATE TABLE t (i INT) USING $v2Format")
val table = catalog(SESSION_CATALOG_NAME).asTableCatalog
.loadTable(Identifier.of(Array("default"), "t"))
assert(!table.properties().containsKey(TableCatalog.PROP_EXTERNAL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to also test SHOW CREATE TABLE or DESC TABLE that this is a managed table without a location?

Copy link
Contributor Author

@cloud-fan cloud-fan Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 table property is already the source of truth.

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master/3.5!

@cloud-fan cloud-fan closed this in ed04e16 Aug 12, 2024
cloud-fan added a commit that referenced this pull request Aug 12, 2024
…S_MANAGED_LOCATION

Even if the table definition has a location, the table should be a managed table if `PROP_IS_MANAGED_LOCATION` is specified, in `V2SessionCatalog.createTable`.

It's a bug fix. A custom `spark_catalog` may generate custom location for managed table and delegate the actual table creation to `V2SessionCatalog`. The table should still be a managed table if `PROP_IS_MANAGED_LOCATION` is specified.

Yes, now users who use custom `spark_catalog` that generates custom location for managed table, can correctly create managed tables.

a new test

no

Closes #47684 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit ed04e16)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@ulysses-you
Copy link
Contributor

it seems the branch-3.5 CI failed..

[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala:3401:24: type Map is not a member of package org.apache.spark.sql.connector.util
[error]       properties: util.Map[String, String]): Table = {
[error]                        ^
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala:3402:29: type HashMap is not a member of package org.apache.spark.sql.connector.util
[error]     val newProps = new util.HashMap[String, String]
[error]                             ^

@cloud-fan
Copy link
Contributor Author

fixing at #47722

dongjoon-hyun added a commit that referenced this pull request Aug 12, 2024
### What changes were proposed in this pull request?

fix compilation error caused by #47684 . Seems scala 2.12 has some subtle differences in the import priority.

### Why are the changes needed?

N/A

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

pass compilation

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47722 from cloud-fan/fix.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…S_MANAGED_LOCATION

### What changes were proposed in this pull request?

Even if the table definition has a location, the table should be a managed table if `PROP_IS_MANAGED_LOCATION` is specified, in `V2SessionCatalog.createTable`.

### Why are the changes needed?

It's a bug fix. A custom `spark_catalog` may generate custom location for managed table and delegate the actual table creation to `V2SessionCatalog`. The table should still be a managed table if `PROP_IS_MANAGED_LOCATION` is specified.

### Does this PR introduce _any_ user-facing change?

Yes, now users who use custom `spark_catalog` that generates custom location for managed table, can correctly create managed tables.

### How was this patch tested?

a new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47684 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…S_MANAGED_LOCATION

### What changes were proposed in this pull request?

Even if the table definition has a location, the table should be a managed table if `PROP_IS_MANAGED_LOCATION` is specified, in `V2SessionCatalog.createTable`.

### Why are the changes needed?

It's a bug fix. A custom `spark_catalog` may generate custom location for managed table and delegate the actual table creation to `V2SessionCatalog`. The table should still be a managed table if `PROP_IS_MANAGED_LOCATION` is specified.

### Does this PR introduce _any_ user-facing change?

Yes, now users who use custom `spark_catalog` that generates custom location for managed table, can correctly create managed tables.

### How was this patch tested?

a new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47684 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…S_MANAGED_LOCATION

### What changes were proposed in this pull request?

Even if the table definition has a location, the table should be a managed table if `PROP_IS_MANAGED_LOCATION` is specified, in `V2SessionCatalog.createTable`.

### Why are the changes needed?

It's a bug fix. A custom `spark_catalog` may generate custom location for managed table and delegate the actual table creation to `V2SessionCatalog`. The table should still be a managed table if `PROP_IS_MANAGED_LOCATION` is specified.

### Does this PR introduce _any_ user-facing change?

Yes, now users who use custom `spark_catalog` that generates custom location for managed table, can correctly create managed tables.

### How was this patch tested?

a new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#47684 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants