Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename numBytesAdded/Removed metrics and add deletion vector metrics in Databricks 12.2 shims [databricks] #8624

Merged
merged 19 commits into from
Jul 3, 2023

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Jun 28, 2023

Closes #8423 (this PR fixes the final issue and removes the last references to this issue)

Databricks 12.2 renames the numBytes[Added|Removed] metrics in the Delete command to num[Added|Removed]Bytes for consistency with OSS Delta Lake. It also adds new numDeletionVectors[Added|Removed] metrics.

Changes in this PR:

  • Rename the metrics
  • Enables some tests that were previously skipped
  • Add deletion vector metrics (set to zero because we do not support deletion vectors yet)

metrics_to_remove = ["executionTimeMs", "numOutputBytes", "rewriteTimeMs", "scanTimeMs",
"numRemovedBytes", "numAddedBytes", "numTargetBytesAdded", "numTargetBytesInserted",
"numTargetBytesUpdated", "numTargetBytesRemoved",
"numDeletionVectorsAdded", "numDeletionVectorsRemoved"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not ideal to just ignore the deletion vector metrics. I am still trying to find a better solution.

Copy link
Contributor Author

@andygrove andygrove Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @jlowe noted in #8628 (comment):

"Seems like there's two phases to supporting deletion vectors. There's full support which is tracked by #8554, but before that is completed we should be reporting the metrics (always 0) in the delta log stats just like the CPU does when not using deletion vectors. I could see shipping without full deletion support, but we should be adding these deletion metrics even when we don't support them being non-zero. I'm OK with that being a followup, but IMO having the metric is a P0 for the release while the full support issue is not."

@andygrove andygrove changed the title WIP: Rename numBytesAdded/Removed metrics in Databricks 12.2 shims [databricks] Rename numBytesAdded/Removed metrics and add deletion vector metrics in Databricks 12.2 shims [databricks] Jun 30, 2023
@andygrove andygrove marked this pull request as ready for review June 30, 2023 17:51
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove self-assigned this Jun 30, 2023
@andygrove
Copy link
Contributor Author

build

@jlowe jlowe merged commit f78b1d6 into NVIDIA:branch-23.08 Jul 3, 2023
@sameerz sameerz added the task Work required that improves the product but is not user facing label Jul 3, 2023
@andygrove andygrove deleted the dbr-rename-metrics branch July 28, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] [Databricks 12.2] Get Delta Lake integration tests passing
3 participants