-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema to SpannerIO read #32008
Add schema to SpannerIO read #32008
Conversation
Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
Format check error message
have you been able to find SpannerIO reviewer? |
If this is meant for DataflowTemplates, we should push for GoogleCloudPlatform/DataflowTemplates#1429 again... |
This is meant as a prerequisite to #31987 which adds ReadFromSpanner to Beam YAML. It allows schema reads from Spanner without needing to specify a schema file which makes using in Beam YAML much more user-friendly and similar to how ReadFromBigQuery works.
I can fix format error
No, I was waiting for checks to pass before finding one |
Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
Assigning reviewers. If you would like to opt out of this review, comment R: @kennknowles for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
@Polber what are next steps here? |
Looks like these folks are out - @manitgupta could you help with a review here (or recommend someone to help)? |
@bharadwaj-aditya @darshan-sj Can you have a look please? |
I am ooo until September but can review it then. |
...e-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerQuerySourceDef.java
Show resolved
Hide resolved
case FLOAT64: | ||
return Schema.FieldType.DOUBLE; | ||
case NUMERIC: | ||
return Schema.FieldType.DECIMAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that the datatype of the corresponding values in the rowStruct to be of certain datatype? Like what is the datatype expected for DECIMAL? Does Spanner client library give the data in the expected datatype for all these types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since data types are not 1:1 for many different sources in Beam (BigQuery has lots of data types that are exclusive to BigQuery and do not exist as Java types) we map to the closest Java type.
For example, FLOAT64
is not a Java type, but Double
is implemented as a 64-bit float, so the cast will work when the data is deserialized.
As I said in my other comment, all the data types that were added in this PR were tested and all the data conversions worked successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had 2 comments, (one just a nit), otherwise LGTM
...ava/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
Outdated
Show resolved
Hide resolved
...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/StructUtils.java
Show resolved
Hide resolved
Signed-off-by: Jeffrey Kinard <jeff@thekinards.com>
private SpannerSourceDef createSourceDef() { | ||
SpannerSourceDef sourceDef; | ||
if (getReadOperation().getQuery() != null) { | ||
return SpannerQuerySourceDef.create(getSpannerConfig(), getReadOperation().getQuery()); | ||
} | ||
|
||
return SpannerTableSourceDef.create( | ||
getSpannerConfig(), getReadOperation().getTable(), getReadOperation().getColumns()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My crappy markdown code didn't work
private SpannerSourceDef createSourceDef() { | |
SpannerSourceDef sourceDef; | |
if (getReadOperation().getQuery() != null) { | |
return SpannerQuerySourceDef.create(getSpannerConfig(), getReadOperation().getQuery()); | |
} | |
return SpannerTableSourceDef.create( | |
getSpannerConfig(), getReadOperation().getTable(), getReadOperation().getColumns()); | |
} | |
private SpannerSourceDef createSourceDef() { | |
if (getReadOperation().getQuery() != null) { | |
return SpannerQuerySourceDef.create(getSpannerConfig(), getReadOperation().getQuery()); | |
} | |
return SpannerTableSourceDef.create( | |
getSpannerConfig(), getReadOperation().getTable(), getReadOperation().getColumns()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth doing this locally and running ./gradlew :sdks:java:io:google-cloud-platform:spotlessJava'
as well to avoid more precommit failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Following change adds
SpannerIO.readWithSchema()
method for reading Spanner table schema. This performs a call to the source table to get schema metadata for constructing aRowCoder
for the outputPCollection
without having to manually provide a schema.Logic follows
BigQuerySourceDef
very closely.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.