Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Decaton can consume any topic with deserializer #241

Merged
merged 5 commits into from
Aug 16, 2024

Conversation

ocadaruma
Copy link
Member

@ocadaruma ocadaruma commented Aug 15, 2024

Motivation

  • As of Decaton 8.0.1, the most common way to use Decaton was to use DecatonClient as the producer and DecatonProcessor as the consumer, with implementing value Serializer/Deserializer respectively.
  • After TaskMetadata as header #238, DecatonClient no longer wraps tasks with DecatonTaskRequest protobuf, which means records produced by DecatonClient and ones produced by non-DecatonClient producers (serialized by Kafka's serializer) have identical serialized bytes.
    • Then why not making Deserializer support consuming non-DecatonClient topics, without forcing users to implement TaskExtractor (which is bit bothersome than just a deserializer) ?

Summary of changes

  • Update documentations and examples
  • ProcessorsBuilder#consuming(String topic, Deserializer<T> deserializer) now deserializes messages bytes directly even when task-metadata header is missing. This is a breaking change
    • Previous behavior when task-metadata header is missing was to unwrap as DecatonTaskRequest pb first, then deserialize serialized task in it. This was for <= 8.0.0 to 9.0.0 migration where old DecatonClient might produce records during the upgrade.
      • Since "records are produced by old client / or records are produced by non-Decaton client" is not distinguishable from Decaton, we add new decaton.parse.as.legacy.format.when.header.missing property to control the behavior for graceful migration.
    • With regards to this, when task-metadata header is missing, we need to fill reasonable default of task metadata.
      • The only field must be filled might be TaskMetadata#timestampMillis.
        • e.g. to calculate delivery latency correctly
      • To fill timestampMillis, we add ConsumedRecord#recordTimestampMillis, which can be retrieved from record itself

Expected upgrade procedure from <= 8.0.1 to 9.0.0

  • Enable both decaton.retry.task.as.legacy.format and decaton.parse.as.legacy.format.when.header.missing
  • Upgrade all processor instances
  • Upgrade all decaton client
  • Disable decaton.retry.task.as.legacy.format
  • Disable decaton.parse.as.legacy.format.when.header.missing

I'll describe the detail in 9.0.0 release note

@ocadaruma ocadaruma added the breaking change Breaking change for a public API label Aug 15, 2024
@ocadaruma ocadaruma requested a review from kawamuray August 15, 2024 02:39
public static final PropertyDefinition<Boolean> CONFIG_TASK_METADATA_AS_HEADER =
PropertyDefinition.define("decaton.task.metadata.as.header", Boolean.class, true,
public static final PropertyDefinition<Boolean> CONFIG_RETRY_TASK_AS_LEGACY_FORMAT =
PropertyDefinition.define("decaton.retry.task.as.legacy.format", Boolean.class, false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the property and flipped the meaning, because

  • For decaton.parse.as.legacy.format.when.header.missing, I couldn't figure out good "enabled-by-default" naming.
  • Then, for decaton.task.metadata.as.header, during migration, "Disable one and enable one" might be too confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably task.in.legacy.format sounds more correct?

Copy link
Member

@kawamuray kawamuray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning once

public static final PropertyDefinition<Boolean> CONFIG_TASK_METADATA_AS_HEADER =
PropertyDefinition.define("decaton.task.metadata.as.header", Boolean.class, true,
public static final PropertyDefinition<Boolean> CONFIG_RETRY_TASK_AS_LEGACY_FORMAT =
PropertyDefinition.define("decaton.retry.task.as.legacy.format", Boolean.class, false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably task.in.legacy.format sounds more correct?

* Reloadable: yes
*/
public static final PropertyDefinition<Boolean> CONFIG_PARSE_AS_LEGACY_FORMAT_WHEN_HEADER_MISSING =
PropertyDefinition.define("decaton.parse.as.legacy.format.when.header.missing", Boolean.class, false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about decaton.legacy.parse.fallback.enabled? (just to make it bit shorter...)

@@ -31,8 +31,13 @@

@RequiredArgsConstructor
public class DefaultTaskExtractor<T> implements TaskExtractor<T> {
private static final ThreadLocal<Boolean> parseAsLegacyWhenHeaderMissing = ThreadLocal.withInitial(() -> false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h-m this might be a bit too hacky, given that we would have to maintain this compatibility mode for certain period of time isn't it?

Can we do it like;

  1. delay DefaultTaskExctractor instantiation to the time calling ProcessorsBuilder#build
  2. so that we can instantiate it in SubscriptionBuilder#build
  3. at that point we already have access to properties, so we can simply pass a boolean flag to the DefaultTaskExtractor constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's way better. Let me fix

@ocadaruma ocadaruma requested a review from kawamuray August 15, 2024 07:52
@@ -116,15 +127,24 @@ public ProcessorsBuilder<T> thenProcess(DecatonProcessor<T> processor) {
return thenProcess(new DecatonProcessorSupplierImpl<>(() -> processor, ProcessorScope.PROVIDED));
}

Processors<T> build(DecatonProcessorSupplier<byte[]> retryProcessorSupplier) {
return new Processors<>(suppliers, retryProcessorSupplier, taskExtractor, retryTaskExtractor);
Processors<T> build(DecatonProcessorSupplier<byte[]> retryProcessorSupplier, ProcessorProperties properties) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that just taking an instance of taskExtractor as an argument of this method was ok, better to do it with "constructor" thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To instantiate TaskExtractor on build's caller (i.e. SubscriptionBuilder), ProcessorsBuilder needs to store Deserializer or TaskExtractor passed from user through consuming and SubscriptionBuilder need to instantiate appropriate resulting TaskExtractor accordingly, which is fairly complicated.

Maybe just deferring the instantiation by constructor is straightforward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but storing constructor might be unnecessary actually.
Let me fix

//
// From Decaton perspective, there is no way to distinguish between these two cases,
// so we need to rely on a configuration to determine how to deserialize the task.
if (legacyFallbackEnabledProperty.value()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h-m should this be dynamically reloadable? i just sense that it might be better fixed for the whole lifetime of this subscription, but idk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should make property reloadable unless it's difficult or there is big downside, because it can reduce the number of rolling-restart which is a certain burden and enables quick rollback.

Since upgrade-procedure might be tough already, making reloadable might be better.

The downside I can imagine is the blast radius when property change caused a problem (because property change will happen on all instances at the same time), but quick-rollback by dynamic-reloading might be more preferred than rolling restart the application I guess.

@ocadaruma ocadaruma requested a review from kawamuray August 15, 2024 10:39
Copy link
Member

@kawamuray kawamuray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kawamuray kawamuray merged commit d5dae11 into line:master Aug 16, 2024
5 checks passed
@ocadaruma ocadaruma deleted the task-metadata-follow-up branch August 16, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Breaking change for a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants