-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Protobuf support for trino-kafka #14734
Add Protobuf support for trino-kafka #14734
Conversation
Code, tests & docs are ready. Still missing product tests, will add later. |
fb36d7f
to
eb77237
Compare
da33362
to
424ef36
Compare
It's still failing
|
3595513
to
6f85b81
Compare
One thing looking at this i couldnt spot where the smarts is to auto deduce if to use proto or avro when using schema registry client, where a broker may have topics with both Schema registry holds the schematype now and like wise supports avro, and proto, so just from the schema id or from the subject latest schema you can auto deduce if its avro or proto, thus easily allowing to switch based on this which deserializing method to use (avro vs proto) so you can have a broker with both protocol types in use, and by topic it can be auto deduced which one to use
I could be blind and missed it in which case if you could point me to it. |
1e828f6
to
2649add
Compare
Tests finally green. @Praveen2112 PTAL when you get a chance? 🙏 |
plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/KafkaConnectorModule.java
Outdated
Show resolved
Hide resolved
protoType.getValueType(), | ||
definedMessages)) | ||
.build()); | ||
// Handle for underscores and name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for this case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we want to test here, that the descriptor is constructed correctly? That's covered by TestProtobuf{Encoder,Decoder}
already. Commenting out anything except .setName(field.getName())
fails the tests.
EDIT: actually setName(field.getName)
is already called before the if
branch and is redundant.
d662a51
to
0914de4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could have a single commit where we could implement the Protobuf
in the decoder module and use them in Kafka connector for now.
lib/trino-record-decoder/src/main/java/io/trino/decoder/protobuf/ProtobufColumnDecoder.java
Outdated
Show resolved
Hide resolved
0914de4
to
9d54d97
Compare
@Praveen2112 rebased into 2 commits, decoder module & kafka support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Minor comments
<groupId>com.squareup.wire</groupId> | ||
<artifactId>wire-schema</artifactId> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we exclude few resources for duplicate classfinder to be happy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Guava is the only one we can exclude? Kotlin is already pinned to make enforcer happy.
https://mvnrepository.com/artifact/com.squareup.wire/wire-schema/3.2.2
Speaking of, do we want a more recent version of this? They removed Guava dependency but bumped Kotlin, which should be backwards compatible.
https://mvnrepository.com/artifact/com.squareup.wire/wire-schema/4.4.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try updating to the latest version. We need to confirm if schema-registry
libraries have some conflict with the latest version.
...ecord-decoder/src/main/java/io/trino/decoder/protobuf/FixedSchemaDynamicMessageProvider.java
Outdated
Show resolved
Hide resolved
lib/trino-record-decoder/src/main/java/io/trino/decoder/protobuf/ProtobufColumnDecoder.java
Outdated
Show resolved
Hide resolved
lib/trino-record-decoder/src/main/java/io/trino/decoder/protobuf/ProtobufColumnDecoder.java
Outdated
Show resolved
Hide resolved
return Optional.of(columnDecoders.entrySet().stream() | ||
.collect(toImmutableMap( | ||
Map.Entry::getKey, | ||
entry -> entry.getValue().decodeField(dynamicMessageProvider.parseDynamicMessage(data))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we parse it once and use the DynamicMessage
for all the required columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we fix this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure I did but must've lost it during rebase. Just fixed.
lib/trino-record-decoder/src/main/java/io/trino/decoder/protobuf/ProtobufValueProvider.java
Show resolved
Hide resolved
plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/protobuf/ProtobufRowEncoder.java
Show resolved
Hide resolved
plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/schema/confluent/ConfluentModule.java
Outdated
Show resolved
Hide resolved
...va/io/trino/plugin/kafka/schema/confluent/ConfluentSchemaRegistryDynamicMessageProvider.java
Outdated
Show resolved
Hide resolved
182cf78
to
1f92c36
Compare
Let me know once the changes are applied. |
@Praveen2112 hey yeah I addressed most except the question regarding excisions #14734 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the changes were not applied.
<groupId>com.squareup.wire</groupId> | ||
<artifactId>wire-schema</artifactId> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try updating to the latest version. We need to confirm if schema-registry
libraries have some conflict with the latest version.
...ecord-decoder/src/main/java/io/trino/decoder/protobuf/FixedSchemaDynamicMessageProvider.java
Outdated
Show resolved
Hide resolved
return round(micros, MAX_SHORT_PRECISION - precision); | ||
} | ||
|
||
private static Descriptors.FieldDescriptor getFieldDescriptor(DynamicMessage message, String name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
lib/trino-record-decoder/src/main/java/io/trino/decoder/protobuf/ProtobufValueProvider.java
Show resolved
Hide resolved
8e825c2
to
3db448d
Compare
@Praveen2112 localized |
18729a3
to
c46bb5a
Compare
c46bb5a
to
5c40f44
Compare
Thanks for working on this. |
Holy shit guys. That's awesome. 🎉 |
Hello! I'm on version 403 and can't find a way to handle protobuf messages (as far as I tried) and this protobuf support is nice to have! I'd like to test this. Have you documented anywhere how to use this? |
if (fieldDescriptor.getMessageType().getFullName().equals(TIMESTAMP_TYPE_NAME)) { | ||
return createTimestampType(6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Timestamp represents a point in time independent of any time zone or calendar, represented as seconds and fractions of seconds at nanosecond resolution in UTC Epoch time
I.e. it's an instant.
We don't have an instant type (#2273), so we represent point-in-time data as "timestamp with time zone".
Would you consider changing the mapping to reveal the point-in-time semantics of the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the type is nano-second precision. Why do we map it to microsecond precision?
cc @mx123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are direct ports from SEP, maybe @Praveen2112 remembers why?
Changing the semantic or precision might break back-compat, we might have to support both local/instant x all precisions combo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the semantic or precision might break back-compat,
Of course. Every change, including bug fixes, can break something.
we might have to support both local/instant x all precisions combo?
not sure why we would want that.
Description
Add Protobuf support for Kafka
This is a contribution from the Starburst Kafka Connector
Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: