Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed usage of snapshotCreationTs, metadataCommitTs & tableAccessTs when using REST Catalog #11103

Closed
1 of 3 tasks
haizhou-zhao opened this issue Sep 10, 2024 · 8 comments
Closed
1 of 3 tasks
Labels
bug Something isn't working

Comments

@haizhou-zhao
Copy link
Contributor

haizhou-zhao commented Sep 10, 2024

Apache Iceberg version

1.6.1 (latest release)

Query engine

Spark

Please describe the bug 🐞

Background

When using Spark (or likewise execution engines) on top of REST Catalog to commit a new snapshot, there are 3 critical timestamps:

  1. When Spark invokes SnapshotProducer to produce a new snapshot - referred to as snapshotCreationTs: ref
  2. When REST catalog receives the update request from Spark and commit new metadata - referred to as metadataCommitTs
  3. At the time when commit is done finished, and Spark refreshes table state - referred to as tableAccessTs: ref

Desired behavior

spark.sql("SELECT * from ${db}.${table}.metadata_log_entries") means the user is looking for metadataCommitTs;

while spark.sql("SELECT * from ${db}.${table}.snapshots") means the user is looking for snapshotCreationTs.

Issue 1

Traditionally, with Hadoop and Hive Catalog, Spark (using Iceberg client) is responsible for both generating new snapshots and committing the new metadata files. And those catalogs are capable of enforcing snapshotCreationTs and metadataCommitTs to take on the exact same value for every new snapshot committed. However, with REST Catalog, because Spark only controls new snapshot generation while REST Catalog Server controls metadata commit, based on REST Catalog implementation (which is not controllable by this repo), snapshotCreationTs and metadataCommitTs may or may not be the same.

The current implementation in some cases assumes that the two timestamp takes on exact same value. For example, this integration test: ref.

It would be best to clarify whether a valid REST Catalog implementation should always enforce snapshotCreationTs and metadataCommitTs to take on the same value. For the current reference implementation (RESTCatalogAdapter on top of JdbcCatalog), the answer is positive (they take on the same value). However, should it be (or feasible to be) enforced for all REST Catalog implementations.

Issue 2

Currently, when loading Table from REST Catalog using LoadTableResponse, the lastUpdatedMillis attribute of the metadata (which may be taking the value of metadataCommitTs or snapshotCreationTs on REST Catalog side based on impl detail) will be incorrectly replaced by tableAccessTs (ref1, ref2). Because Spark depends on lastUpdatedMillis to generate the latest metadataCommitTs on metadata_log_entries (ref), there will always be a wrong time stamp on metadata_log_entries if REST Catalog is used.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@haizhou-zhao haizhou-zhao added the bug Something isn't working label Sep 10, 2024
@haizhou-zhao haizhou-zhao changed the title RESTSessionCatalog loadTable incorrect set last access time as metadata last update time Mixed usage of snapshotCreationTs, metadataCommitTs & tableAccessTs when using REST Catalog Sep 10, 2024
@haizhou-zhao
Copy link
Contributor Author

cc: @rdblue (since you authored the original LoadTableResponse class)

This message is centered on issue 2. Summarizing the problem, when obtaining TableMetadata from LoadTableResponse, the TableMetadata is reconstructed, during which, the last-update-ms information is lost (and replaced with System.currentMillis() on client side). This has a impact because last-update-ms is returned as part of the response for spark.sql("SELECT * from ${db}.${table}.metadata_log_entries").

Code ref:
https://github.com/apache/iceberg/blob/afda8be/core/src/main/java/org/apache/iceberg/rest/responses/LoadTableResponse.java#L64
https://github.com/apache/iceberg/blob/113c6e7/core/src/main/java/org/apache/iceberg/TableMetadata.java#L938

There are 2 ways forward that I could think of:

  1. When obtaining TableMetadata from LoadTableResponse, don't re-build metadata, just return the underlying metadata as-is
  2. When rebuilding TableMetadata, preserve last-update-ms field.

Let me know if the issue and the 2 options proposed make sense. Or if there's a better way forward. Thanks!

@rdblue
Copy link
Contributor

rdblue commented Sep 10, 2024

Why are these timestamps important? They are primarily for debugging and should not affect any correctness.

@haizhou-zhao
Copy link
Contributor Author

Why are these timestamps important? They are primarily for debugging and should not affect any correctness.

Correct, these timestamps won't impact critical operations like commit. Functionally, only that the spark users might not obtain a correct last modified time for the latest metadata commit.

The issue pops up when running this Spark integration test on REST Catalog (RESTCatalogAdapter on top of JdbcCatalog): https://github.com/apache/iceberg/blob/2240154/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java#L446. This test is validating that the latest metadata commit time equal to the latest snapshot commit time (if the latest change is a new snapshot commit). Hadoop and Hive Catalog will enforce this equality, but REST Catalog will not (because no matter what last-updated-ms value returned by REST server, the client will replace it with a client side timestamp immediately after the HTTPS call and before returning to Spark code stacks).

The integration test above somehow indicates that there's a correctness concept on last-updated-ms (that it has to equal to timestamp-ms of the corresponding snapshot commit). However, if there should not be any correctness rule enforced on last-updated-ms, then we might need to consider changing the integration test above? (i.e. drop or loosen equality check)

@haizhou-zhao
Copy link
Contributor Author

haizhou-zhao commented Sep 11, 2024

I realized I've been putting many thought threads that doesn't directly related to this problem, making it hard for people to read.

The problem really is: the current RESTCatalog client implementation will NOT honor the last-updated-ms field passed back by REST server, and instead override the last-updated-ms field with client side timestamp before passing back to Spark optimizer/executor. This doesn't sound like the right behavior to me, but I want to check if that's actually the intentional behavior. The problem is less about what makes a "right" timestamp in that sense.

The issue was discovered when running RCK REST Catalog test server on existing Spark 3.5 integration tests: #11079

@nastra
Copy link
Contributor

nastra commented Sep 14, 2024

I agree with your observation @haizhou-zhao you mentioned in your last comment. https://github.com/apache/iceberg/blob/afda8be/core/src/main/java/org/apache/iceberg/rest/responses/LoadTableResponse.java#L64 writes a new metadata log entry and also updates lastUpdatedMillis when intending to configure the metadata file location on the original table metadata. I just ran into this myself while working on a different feature. I'll take a closer look at this next week

@rdblue
Copy link
Contributor

rdblue commented Sep 16, 2024

The problem really is: the current RESTCatalog client implementation will NOT honor the last-updated-ms field passed back by REST server, and instead override the last-updated-ms field with client side timestamp before passing back to Spark optimizer/executor. This doesn't sound like the right behavior to me, but I want to check if that's actually the intentional behavior. The problem is less about what makes a "right" timestamp in that sense.

I've been talking with with @haizhou-zhao on Slack and I think this is the correct issue to look into. The REST service should update the timestamp when it writes and commits metadata using the changes sent from the client. But the REST client should not modify the timestamp that it receives from the service. I suspect this is happening when the metadata is rebuilt to add the metadata location.

@nastra
Copy link
Contributor

nastra commented Sep 17, 2024

I've opened #11151 to fix this issue

@nastra
Copy link
Contributor

nastra commented Sep 18, 2024

fixed by #11151

@nastra nastra closed this as completed Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants