Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add typed table configuration #309

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented Aug 8, 2024

No description provided.

Copy link

codecov bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 88.52868% with 46 lines in your changes missing coverage. Please review.

Project coverage is 74.06%. Comparing base (886ac9c) to head (e5500be).
Report is 3 commits behind head on main.

Files Patch % Lines
kernel/src/config.rs 89.06% 40 Missing and 2 partials ⚠️
kernel/src/error.rs 0.00% 3 Missing ⚠️
ffi/src/lib.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #309      +/-   ##
==========================================
+ Coverage   72.51%   74.06%   +1.54%     
==========================================
  Files          43       45       +2     
  Lines        7783     8217     +434     
  Branches     7783     8217     +434     
==========================================
+ Hits         5644     6086     +442     
+ Misses       1768     1754      -14     
- Partials      371      377       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey robert thanks so much I'm flushing some comments now and will have another pass soon (after i do some more thinking)! major items: (1) consider naming for the configuration - since these are effectively specifying table properties i'm advocating for (strawman): TableProperties. (2) We have an interesting pattern here of typed table properties stored in a string/string map and I'm wondering if we can keep the best of both worlds (keep strongly-typed keys but allow for arbitrary key/value pairs) if we do something like introduce a Custom type in the table property enum

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename to table_properties.rs? thoughts on clarity?

use std::collections::HashMap;
use std::time::Duration;

use lazy_static::lazy_static;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: using LazyLock now instead. (should be a simple change)

/// true for Delta Lake to automatically optimize the layout of the files for this Delta table.
#[strum(serialize = "delta.autoOptimize.autoCompact")]
#[serde(rename = "delta.autoOptimize.autoCompact")]
AutoOptimizeAutoCompact,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AutoOptimizeAutoCompact,
AutoCompact,

/// true for Delta Lake to automatically optimize the layout of the files for this Delta table during writes.
#[strum(serialize = "delta.autoOptimize.optimizeWrite")]
#[serde(rename = "delta.autoOptimize.optimizeWrite")]
AutoOptimizeOptimizeWrite,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AutoOptimizeOptimizeWrite,
OptimizeWrite,

Hash,
)]
#[non_exhaustive]
pub enum DeltaConfigKey {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we call this DeltaTableProperties?

Suggested change
pub enum DeltaConfigKey {
pub enum DeltaTableProperties {

Comment on lines +326 to +331
pub fn isolation_level(&self) -> IsolationLevel {
self.0
.get(DeltaConfigKey::IsolationLevel.as_ref())
.and_then(|v| v.parse().ok())
.unwrap_or_default()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these use the macro above?


impl Default for CheckpointPolicy {
fn default() -> Self {
Self::Classic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to V2 since classic is being deprecated

Suggested change
Self::Classic
Self::V2

/// The full path to the field.
pub name: String,
/// The SQL string that must always evaluate to true.
pub expr: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I need to think about how we will integrate with this (perhaps parsing to Expression here but that sounds terrible or passing off opaque to engine?)

Comment on lines +252 to +253
pub fn config(&self) -> TableConfig<'_> {
self.metadata.config()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag for rename

@@ -0,0 +1,802 @@
//! Delta Table configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! Delta Table configuration
//! Delta Table properties. Note this module implements per-table
//! configuration which governs how table-level capabilities/properties
//! are configured (turned on/off etc.). This is orthogonal to protocol-level
//! 'table features' which enable or disable reader/writer features (which then
//! usually must be enabled/configured by table properties)
//! For example (from delta's protocol.md):
//! A feature being supported does not imply that it is active. For example,
//! a table may have the `appendOnly` feature listed in writerFeatures, but
//! it does not have a table property delta.appendOnly that is set to `true`.
//! In such a case the table is not append-only, and writers are allowed to
//! change, remove, and rearrange data. However, writers must know that
//! the table property delta.appendOnly should be checked before writing
//! the table.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: are all of these table properties being used in the code today? I doubt that delta.autoOptimize.autoCompact is, for example.

IMO - we should only add table properties as they come and as they are used. I'm open and curious what the argument would be to include non-used table properties?

/// Interval (number of commits) after which a new checkpoint should be created
#[strum(serialize = "delta.checkpointInterval")]
#[serde(rename = "delta.checkpointInterval")]
CheckpointInterval,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a rust guy, so apologies if this is a naive or unapplicable question: is there a way to define the validation here, too? e.g. for this table property, delta-kernel-java will require and validate that the specified value is greater than 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants