Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTS-3385: Switch getMetadata to use delta-kernel #505

Merged
merged 8 commits into from
Jun 17, 2024

Conversation

pranavsuku-db
Copy link
Collaborator

getMetadata now uses delta-kernel

@pranavsuku-db pranavsuku-db requested review from linzhou-db and removed request for linzhou-db June 15, 2024 04:59
case e: io.delta.kernel.exceptions.TableNotFoundException =>
throw new FileNotFoundException(e.getMessage)
case e: Throwable =>
throw new FileNotFoundException(e.getMessage)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linzhou-db I needed to catch all throwabales here to fix the issue in this test case: table_changes_exception: cdf_table_missing_log from DeltaSharingSuite.scala. I get a runtime error for this test case on the table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl] line and it needs to be handled as a 400 error and not a 500 error so I need to throw a FileNotFound, but I could not figure out what I need to catch, and so I caught all throwables.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the error and stacktrace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing string matching on error message now since fix is not in delta-kernel yet

Copy link
Collaborator

@linzhou-db linzhou-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think we should double check and add only what's available in DeltaSharedTable.scala


case class PropertyAllowedValues(property: String, allowedValues: Seq[String])

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove these for now and add them until they are needed.
until L77

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
}

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L84 to L125 can be removed too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need DeletionVectorsTableFeature and ColumnMappingTableFeature but removed the rest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see if these two can be removed once the usage of clientReaderFeatures can be removed.

case e: io.delta.kernel.exceptions.TableNotFoundException =>
throw new FileNotFoundException(e.getMessage)
case e: Throwable =>
throw new FileNotFoundException(e.getMessage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the error and stacktrace?


// Validate snapshot version, throw error if it's before the valid start version.
val snapshotVersion = snapshot.getVersion(engine)
if (snapshotVersion < validStartVersion) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the logic around validStartVersion in the oss code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

)
}

val refreshTokenOpt = refreshToken.map(decodeAndValidateRefreshToken)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed and added back later as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its needed to get snapshot, dont think I can delete this yet:

val snapshot = getSharedTableSnapshot(
table,
engine,
version.orElse(refreshTokenOpt.map(_.getVersion)),
timestamp,
validStartVersion = tableConfig.startVersion,
clientReaderFeatures =
allowedClientReaderFeatures(clientReaderFeatures = clientReaderFeaturesSet)
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if I missed anything, I remember we don't have any test cases setting client features in the request, in that case, we can always just go with the condition on L175, always assuming clientReaderFeature is empty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add this back when we actually add those tests, to make this code well tested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For refreshTokenOpt, it's only used when queryTable(includeFiles = true).

So you can just use version in getSharedTableSnapshot without the orElse.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, fixed

Copy link
Collaborator

@linzhou-db linzhou-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is we always add code with its tests together.
The the changes that's no needed by getTableMetadata, we can leave them out and add them back along with queryTable PR and have them tested.

)
}

val refreshTokenOpt = refreshToken.map(decodeAndValidateRefreshToken)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if I missed anything, I remember we don't have any test cases setting client features in the request, in that case, we can always just go with the condition on L175, always assuming clientReaderFeature is empty.

)
}

val refreshTokenOpt = refreshToken.map(decodeAndValidateRefreshToken)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add this back when we actually add those tests, to make this code well tested.

)
}

val refreshTokenOpt = refreshToken.map(decodeAndValidateRefreshToken)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For refreshTokenOpt, it's only used when queryTable(includeFiles = true).

So you can just use version in getSharedTableSnapshot without the orElse.

}
}

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see if these two can be removed once the usage of clientReaderFeatures can be removed.

@pranavsuku-db pranavsuku-db merged commit 9a6fd9f into main Jun 17, 2024
5 of 8 checks passed
@pranavsuku-db pranavsuku-db deleted the pranavsuku-db-metadatakernel branch July 31, 2024 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants