KAFKA-12268: Implement task idling semantics via currentLag API #10137

vvcephei · 2021-02-17T05:07:47Z

Adds Consumer#currentLag().
Updates the Streams task idling feature to use the new API.
Implements KIP-695
Reverts a previous behavior change to Consumer.poll and replaces
it with a new Consumer.currentLag API, which returns the client's
currently cached lag. The reverted PR is KAFKA-10866: Add metadata to ConsumerRecords #9836 . The reverted Jira is KAFKA-10866 .
Ports 2.8's [DO NOT SQUASH]: revert KIP-695 #10119 to trunk. The 2.8 PR had to revert both KAFKA-10866 and
KAFKA-10867, but this PR needed to revert KAFKA-10866 and adapt KAFKA-10867.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

This reverts commit fdcf8fb.

Implements KIP-695 Reverts a previous behavior change to Consumer.poll and replaces it with a new Consumer.currentLag API, which returns the client's currently cached lag. Uses this new API to implement the desired task idling semantics improvement from KIP-695.

chia7712 · 2021-02-17T05:18:42Z

clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java

+    /**
+     * @see KafkaConsumer#currentLag(TopicPartition)
+     */
+    OptionalLong currentLag(TopicPartition topicPartition);


Pardon me, KIP-695 does not include this change. It seems KIP-695 is still based on metadata? Please correct me If I misunderstand anything :)

Woah, you are fast, @chia7712 !

I just sent a message to the vote thread. I wanted to submit this PR first so that the vote thread message can have the full context available.

Do you mind reading over what I said there? If it sounds good to you, then I'll update the KIP, and we can maybe put this whole mess to bed.

Quick question, should the API take a Collection<TopicPartition> like other APIs?

Thanks @ijuma, I considered it, but decided on the current API because:

This is a very quick, local in-memory lookup, so there's no reason to batch multiple requests in one

It complicates the return type. We'd have to return either a Map<TP, Long>, with mappings missing for unknown lags (which creates unfortunate null semantics for users), or a Map<TP, OptionalLong> which creates a complex-to-understand two hop lookup (lag:=result.get(tp).get()). Or else, we could return a more complex domain object object like @chia7712 proposed in the mailing list. All these complications seem like unnecessary complexity in the case of this particular API, given the first point.

WDYT?

If there is no batching benefit, then the simpler API makes sense. @hachikuji Any reason why batching could be useful here?

For API calls that may incur a broker round trip, have batching of partitions makes sense. For this API I think single partition lookup is good enough.

If we concern that users may call this function too frequent looping a large number of partitions, and each call is synchronizing on the subscription state, then maybe we can make it in a batching mode.

That's a good point, @guozhangwang. Entering the synchronized block will have some overhead each time it's called.

I think we can just reason about the use cases here. My guess is that people would either tend to spot-check specific lags, as we are doing here, or they would tend to periodically check all lags. In the former case, I'd hazard that the current API is fine. In the latter case, we'd face more overhead. I'm sure this is motivated reasoning, but perhaps we can lump the latter case in with @chia7712 's suggestion to expose more metadata and defer it to the future.

chia7712 · 2021-02-23T03:08:28Z

clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java

+     **/
+    @Override
+    public OptionalLong currentLag(TopicPartition topicPartition) {
+        final Long lag = subscriptions.partitionLag(topicPartition, isolationLevel);


Other methods call acquireAndEnsureOpen(); first and then call release() in the finally block. Should this new method follow same pattern?

Thanks, I overlooked that, and it's a good idea.

guozhangwang

Made a pass over the PR, overall LGTM! Just one comment about the lag provider call frequency.

guozhangwang · 2021-02-23T05:47:52Z

clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java

+    /**
+     * @see KafkaConsumer#currentLag(TopicPartition)
+     */
+    OptionalLong currentLag(TopicPartition topicPartition);


For API calls that may incur a broker round trip, have batching of partitions makes sense. For this API I think single partition lookup is good enough.

guozhangwang · 2021-02-23T05:58:00Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/PartitionGroup.java

@@ -156,24 +134,24 @@ public boolean readyToProcess(final long wallClockTime) {
            final TopicPartition partition = entry.getKey();
            final RecordQueue queue = entry.getValue();

-            final Long nullableFetchedLag = fetchedLags.get(partition);
+            final OptionalLong fetchedLag = lagProvider.apply(partition);


Wearing my paranoid hat here: readyToProcess is on the critical path, called per record, while we would only update the underlying lag at most as frequent as the consumer poll rate. And in practice we would fall in to the first condition !queue.isEmpty() most of the time. On the other hand, partitionLag call on SubscriptionState is synchronized and could slow down the fetching thread (well, maybe just a bit). So could we call the provider only necessary, i.e. the queue is empty and the lag is either == 0 or not present?

Thanks, this is a good suggestion.

guozhangwang

Just one nit on log4j entry, otherwise LGTM!

guozhangwang · 2021-02-25T20:30:34Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/PartitionGroup.java

@@ -211,7 +192,7 @@ public boolean readyToProcess(final long wallClockTime) {
            return false;
        } else {
            enforcedProcessingSensor.record(1.0d, wallClockTime);
-            logger.info("Continuing to process although some partition timestamps were not buffered locally." +
+            logger.trace("Continuing to process although some partitions are empty on the broker." +


I thought if user did not set to MAX_TASK_IDLE_MS_DISABLED they may care about when enforce processing starts, so this is better to stay as INFO?

Thanks @guozhangwang, I thought so as well, but @ableegoldman pointed out to me that it actually prints out on every invocation while we are enforcing processing, which turns out to flood the logs.

Maybe we could leave this detailed logging at TRACE, and just print a single message at warn the first time this enforced processing occurs?

chia7712

LGTM overall

chia7712 · 2021-02-26T07:03:18Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/PartitionGroup.java

-                idlePartitionDeadlines.putIfAbsent(partition, wallClockTime + maxTaskIdleMs);
-                final long deadline = idlePartitionDeadlines.get(partition);
-                if (wallClockTime < deadline) {
+                final OptionalLong fetchedLag = lagProvider.apply(partition);


It was nullable before (the partition has not been fetched). For this net API, getting lag for such partition can produce exception. Is this a potential bug?

Thanks, @chia7712 . Yes, it can produce an exception now, but only if the partition is not assigned. If it simply has not been fetched, then the OptionalLong will be "empty".

I think this exception is fine to raise, since the assignment being unassigned at this point seems like an illegal state.

vvcephei · 2021-02-26T16:22:14Z

One unrelated test failure:

Build / JDK 11 / kafka.api.ConsumerBounceTest.testClose()

* apache-github/trunk: (37 commits) KAFKA-10357: Extract setup of changelog from Streams partition assignor (apache#10163) KAFKA-10251: increase timeout for consuming records (apache#10228) KAFKA-12394; Return `TOPIC_AUTHORIZATION_FAILED` in delete topic response if no describe permission (apache#10223) MINOR: Disable transactional/idempotent system tests for Raft quorums (apache#10224) KAFKA-10766: Unit test cases for RocksDBRangeIterator (apache#9717) KAFKA-12289: Adding test cases for prefix scan in InMemoryKeyValueStore (apache#10052) KAFKA-12268: Implement task idling semantics via currentLag API (apache#10137) MINOR: Time and log producer state recovery phases (apache#10241) MINOR: correct the error message of validating uint32 (apache#10193) MINOR: Format the revoking active log output in `StreamsPartitionAssignor` (apache#10242) KAFKA-12323 Follow-up: Refactor the unit test a bit (apache#10205) MINOR: Remove stack trace of the lock exception in a debug log4j (apache#10231) MINOR: Word count should account for extra whitespaces between words (apache#10229) MINOR; Small refactor in `GroupMetadata` (apache#10236) KAFKA-10340: Proactively close producer when cancelling source tasks (apache#10016) KAFKA-12329; kafka-reassign-partitions command should give a better error message when a topic does not exist (apache#10141) KAFKA-12254: Ensure MM2 creates topics with source topic configs (apache#10217) MINOR: fix kafka-metadata-shell.sh (apache#10226) KAFKA-12374: Add missing config sasl.mechanism.controller.protocol (apache#10199) KAFKA-10101: Fix edge cases in Log.recoverLog and LogManager.loadLogs (apache#8812) ...

…he#10137) Implements KIP-695 Reverts a previous behavior change to Consumer.poll and replaces it with a new Consumer.currentLag API, which returns the client's currently cached lag. Uses this new API to implement the desired task idling semantics improvement from KIP-695. Reverts fdcf8fb / KAFKA-10866: Add metadata to ConsumerRecords (apache#9836) Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@apache.org>

John Roesler added 2 commits February 16, 2021 23:07

Revert "KAFKA-10866: Add metadata to ConsumerRecords (apache#9836)"

ab3f6ac

This reverts commit fdcf8fb.

vvcephei added consumer streams labels Feb 17, 2021

chia7712 reviewed Feb 17, 2021

View reviewed changes

vvcephei requested review from guozhangwang and rajinisivaram February 22, 2021 17:04

chia7712 reviewed Feb 23, 2021

View reviewed changes

guozhangwang reviewed Feb 23, 2021

View reviewed changes

cr feedback

096c4e3

guozhangwang approved these changes Feb 25, 2021

View reviewed changes

chia7712 approved these changes Feb 26, 2021

View reviewed changes

vvcephei mentioned this pull request Feb 26, 2021

MINOR: fix message and reduce log level of PartitionGroup enforced processing #10214

Closed

vvcephei merged commit a92b986 into apache:trunk Mar 2, 2021

vvcephei deleted the kafka-12268-internal-lag-api branch March 2, 2021 14:20

vvcephei mentioned this pull request Mar 2, 2021

KAFKA-12268: Make early poll return opt-in #10096

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-12268: Implement task idling semantics via currentLag API #10137

KAFKA-12268: Implement task idling semantics via currentLag API #10137

vvcephei commented Feb 17, 2021 •

edited

Loading

chia7712 Feb 17, 2021

vvcephei Feb 17, 2021

ijuma Feb 22, 2021

vvcephei Feb 22, 2021 •

edited

Loading

ijuma Feb 22, 2021

guozhangwang Feb 23, 2021

guozhangwang Feb 23, 2021

vvcephei Feb 23, 2021

chia7712 Feb 23, 2021

vvcephei Feb 23, 2021

guozhangwang left a comment

guozhangwang Feb 23, 2021

guozhangwang Feb 23, 2021

vvcephei Feb 23, 2021

guozhangwang left a comment

guozhangwang Feb 25, 2021

vvcephei Feb 26, 2021

ableegoldman Feb 26, 2021

chia7712 left a comment

chia7712 Feb 26, 2021

vvcephei Feb 26, 2021

vvcephei commented Feb 26, 2021

KAFKA-12268: Implement task idling semantics via currentLag API #10137

KAFKA-12268: Implement task idling semantics via currentLag API #10137

Conversation

vvcephei commented Feb 17, 2021 • edited Loading

Committer Checklist (excluded from commit message)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chia7712 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei commented Feb 26, 2021

vvcephei commented Feb 17, 2021 •

edited

Loading

vvcephei Feb 22, 2021 •

edited

Loading