-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42406][SQL] Fix check for missing required fields of to_protobuf #40080
Conversation
Add a proto2 file to verify required field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added inline comment to help with review.
|
||
/** Return true if `fieldDescriptor` is optional. */ | ||
private[protobuf] def isNullable(fieldDescriptor: FieldDescriptor): Boolean = | ||
!fieldDescriptor.isOptional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug. extra !
returns false for optional
field. But optional fields are nullable. This is fixed at line #103 above.
@@ -0,0 +1,8 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a descriptor file for proto2_messages.proto
* limitations under the License. | ||
*/ | ||
|
||
syntax = "proto2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protobuf v2 file is added to test required fields.
@@ -59,17 +59,6 @@ message TypeMiss { | |||
int64 bar = 1; | |||
} | |||
|
|||
/* Field boo missing from SQL root, but available in Protobuf root*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two are moved to proto2_messages.proto
(and renamed)
@rangadi with this change, we can conclude that spark-protobuf supports proto2 and proto3 versions right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @rangadi . Please remove [3.4]
from the PR title. Apache Spark community uses a branch tag like [3.4]
only when the PR is open for the release branch, branch-3.4
. This PR is mis-using it as a target version.
@dongjoon-hyun, done. Remove 3.4 in the summary.
Yes. And we always supported Proto2. |
Thank you for updates, @rangadi . cc @hvanhovell |
@gengliangwang could you take a quick look and merge in to master and 3.4. |
@@ -144,44 +146,52 @@ class ProtobufSerdeSuite extends SharedSparkSession with ProtobufTestBase { | |||
test("Fail to convert with missing Catalyst fields") { | |||
val protoFile = ProtobufUtils.buildDescriptor(testFileDesc, "FieldMissingInSQLRoot") | |||
|
|||
// serializing with extra fails if extra field is missing in SQL Schema | |||
val foobarSQLType = DataType.fromDDL( | |||
"struct<foo string>" // "bar" is missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we test catalyst schema which contains bar
only and there is a result returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe a bit more?
Do we want to test not having 'foo' field which is optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new test at line #69. PTAL.
@rangadi can you update the PR description to follow the PR template? |
@gengliangwang update the summary to follow the template. Sorry about that. |
Thanks, merging to master/3.4 |
### What changes were proposed in this pull request? Protobuf serializer (used in `to_protobuf()`) should error if non-nullable fields (i.e. protobuf `required` fields) are present in the schema of the catalyst record being converted to a protobuf. But `isNullable()` method used for this check returns opposite (see PR comment in the diff). As a result, Serializer incorrectly requires the fields that are optional. This PR fixes this check (see PR comment in the diff). This also requires corresponding fix for couple of unit tests. In order use a Protobuf message with a `required` field, Protobuf version 2 file `proto2_messages.proto` is added. Two tests are updated to verify missing required fields results in an error. ### Why are the changes needed? This is need to fix a bug where we were incorrectly enforcing a schema check on optional fields rather than on required fields. ### Does this PR introduce _any_ user-facing change? It fixes a bug, and gives more flexibility for user queries. ### How was this patch tested? - Updated unit tests Closes #40080 from rangadi/fix-required-field-check. Authored-by: Raghu Angadi <raghu.angadi@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit fb56477) Signed-off-by: Gengliang Wang <gengliang@apache.org>
### What changes were proposed in this pull request? Protobuf serializer (used in `to_protobuf()`) should error if non-nullable fields (i.e. protobuf `required` fields) are present in the schema of the catalyst record being converted to a protobuf. But `isNullable()` method used for this check returns opposite (see PR comment in the diff). As a result, Serializer incorrectly requires the fields that are optional. This PR fixes this check (see PR comment in the diff). This also requires corresponding fix for couple of unit tests. In order use a Protobuf message with a `required` field, Protobuf version 2 file `proto2_messages.proto` is added. Two tests are updated to verify missing required fields results in an error. ### Why are the changes needed? This is need to fix a bug where we were incorrectly enforcing a schema check on optional fields rather than on required fields. ### Does this PR introduce _any_ user-facing change? It fixes a bug, and gives more flexibility for user queries. ### How was this patch tested? - Updated unit tests Closes apache#40080 from rangadi/fix-required-field-check. Authored-by: Raghu Angadi <raghu.angadi@databricks.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit fb56477) Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Protobuf serializer (used in
to_protobuf()
) should error if non-nullable fields (i.e. protobufrequired
fields) are present in the schema of the catalyst record being converted to a protobuf.But
isNullable()
method used for this check returns opposite (see PR comment in the diff). As a result, Serializer incorrectly requires the fields that are optional. This PR fixes this check (see PR comment in the diff).This also requires corresponding fix for couple of unit tests. In order use a Protobuf message with a
required
field, Protobuf version 2 fileproto2_messages.proto
is added.Two tests are updated to verify missing required fields results in an error.
Why are the changes needed?
This is need to fix a bug where we were incorrectly enforcing a schema check on optional fields rather than on required fields.
Does this PR introduce any user-facing change?
It fixes a bug, and gives more flexibility for user queries.
How was this patch tested?