KAFKA-12554: Refactor Log layer #10280

kowshik · 2021-03-08T21:59:16Z

TL;DR:

This PR implements the details of the Log layer refactor, as outlined in this document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit. Few details maybe different from the doc, but it is more or less the same.

STRATEGY:

In this PR, I've extracted a new class called LocalLog out of Log. Currently LocalLog is purely an implementation detail thats not exposed outside Log class (except for tests). The object encapsulation is that each Log instance wraps around a LocalLog instance.

This new LocalLog class attempts to encompass most of the responsibilities of local log surrounding the segments map, which otherwise were present in Log previously. Note that not all local log responsibilities have been moved over to this new class (yet). The criteria I used was to preserve (for now) in existing Log class, any logic that is mingled in a complex manner with the logStartOffset or the LeaderEpochCache or the ProducerStateManager.

WINS:

The main win is that the new LocalLog class is now agnostic of the logStartOffset, which continues to be managed mainly by Log class. Below is the local log functionality that has successfully moved over from Log to LocalLog:

Access of LogSegments instance containing the local LogSegment objects.
Read path logic to read records from the log.
Segment file deletion logic.
Segment truncation logic.
Segment roll logic.
Segment split logic.
Segment replacement logic.

Below is the main local log functionality that continues to remain in Log:

Segment append logic. The reason is that the below logic is mingled with one or more of the following: logStartOffset or LeaderEpochCache or ProducerStateManager. This makes it hard to separate just the local logic out of it.
Last stable offset and logic surrounding it.
High watermark and logic surrounding it.
Logic to fetchOffsetByTimestamp and logic to legacyFetchOffsetsBefore.
Some of the retention logic thats related with the global view of the log.
All other logic related with handling LeaderEpochCache and ProducerStateManager.

PAINPOINTS:

Log locking semantics had to be changed in handful of areas, with lock taken at a coarse level.
Few API implementations needed re-ordering of logic in Log class to make migration feasible.
Certain APIs added to LocalLog are crude in nature or signature, examples: def checkIfMemoryMappedBufferClosed, def markFlushed, def updateRecoveryPoint, def replaceSegments etc.
Certain important APIs (such as def append logic) were hard to migrate because logic was mingled with Leader epoch cache, Producer state manager and log start offset.

TESTS:

New unit suite: LocalLogTest.scala has been provided containing tests specific to LocalLog class.
All other existing tests are expected to pass.
6/10/2021: System test run on top of 28bf22af168ca0db76796b5d3cd67a38ed8ed1c2: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4550/
6/12/2021:
- System test runs 4560 and 4562 on top of 008b701386ce5a4d892d6ac5b90798b981c4fba0.
- System test runs 4561 and 4563 on top of trunk/6de37e536ac76ef13530d49dc7320110332cd1ee.
- kafkatest.tests.client.consumer_test rerun:
  - 4564 against trunk/6de37e536ac76ef13530d49dc7320110332cd1ee.
  - 4566 on top of 008b701386ce5a4d892d6ac5b90798b981c4fba0.

ijuma · 2021-03-09T13:53:01Z

Can we do one PR for renaming Log to LocalLog and then a separate ones for the rest? It seems like git rename detection failed here and it will make diffs harder to review.

kowshik · 2021-03-09T17:36:48Z

@ijuma In this PR the intention was not to rename Log to LocalLog, but rather extract LocalLog class out of Log. My current plan is to do the alternative of what you had suggested above i.e. extract LocalLog from Log in this PR and then rename Log to GlobalLog in a subsequent PR. The reason is to focus on the more important/major piece in the first PR (which is the Log layer separation), then renaming the abstractions in a future PR becomes a relatively minor activity. Either way we choose, we will eventually introduce a new abstraction viz. LocalLog or GlobalLog.

Thoughts?

ijuma · 2021-03-09T17:42:51Z

OK, thanks for the explanation. Btw, why do we call one of the logs GlobalLog? In what sense is it Global?

kowshik · 2021-03-09T17:49:49Z

@ijuma The purpose of GlobalLog class is to serve as a higher layer, stitching together the unified view of both the local and remote portion of the log. Importantly, this class is aware of the global log start offset. This class is external facing, and acts as the outer shell, meaning that the public API of this class will be used by other components such as LogManager, LogCleaner etc. and other components outside the kafka.log package. It could just be called as Log too, but I thought by calling it GlobalLog the intention/differentiation is clear.

The above and few more things are explained in the doc attached in the description. I'd suggest having a look at the doc.

ijuma · 2021-03-09T17:56:44Z

@kowshik I can't comment on the doc, that's why I commented here. :) I didn't see any reason there for calling it Global btw. Global tends to imply something more than what this is doing IMO.

kowshik · 2021-03-10T00:51:09Z

@ijuma I've opened up the doc for comments. I've also updated it to use the name UnifiedLog instead of GlobalLog. Hopefully the intent is better communicated now in the naming.

junrao

@kowshik : Thanks for the PR. A few comments below.

core/src/main/scala/kafka/log/LocalLog.scala

core/src/main/scala/kafka/log/Log.scala

junrao · 2021-03-25T01:12:38Z

core/src/main/scala/kafka/log/Log.scala

@@ -1816,8 +1292,12 @@ class Log(@volatile private var _dir: File,
   */
  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean,
                                reason: SegmentDeletionReason): Int = {
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment], logEndOffset: Long): Boolean = {
+      highWatermark >= nextSegmentOpt.map(_.baseOffset).getOrElse(logEndOffset) &&


Hmm, why do we need to wrap predicate with an additional condition?

This is to accomodate for the hwm check that was previously happening in Log#deletableSegments in this line. The deletableSegments method has now moved to LocalLog, but we can't do the hwm check inside LocalLog since hwm is still owned by Log. So we piggyback on the predicate here to additionally attach the hwm check.

kowshik · 2021-03-26T07:47:50Z

@junrao Thanks a lot for the review! I've addressed your comments in 63be325.

kowshik · 2021-03-26T21:21:03Z

@junrao Just a heads up on the following. I'm working on the changes for the following in separate PRs, these are related with refactoring the recovery logic (KAFKA-12553):

KAFKA-12552 (KAFKA-12552: Introduce LogSegments class abstracting the segments map #10401) to extract segments map [MERGED]
KAFKA-12571: (KAFKA-12571: Eliminate LeaderEpochFileCache constructor dependency on logEndOffset #10426) to eliminate LeaderEpochFileCache constructor dependency on logEndOffset [MERGED]
KAFKA-12575: (KAFKA-12575: Eliminate Log.isLogDirOffline boolean attribute #10430) to eliminate Log.isLogDirOffline boolean attribute [MERGED]
KAFKA-12553: (KAFKA-12553: Refactor recovery logic to introduce LogLoader #10478) Refactor recovery logic to introduce LogLoader [MERGED]

It seems better if we merge those into trunk ahead of the current PR.

kowshik · 2021-05-10T06:46:47Z

@junrao This PR is ready for another round of review. I've rebased the PR onto latest AK trunk, iterated on the implementation bit more and added new unit tests for LocalLog class under LocalLogTest.scala.

cc @dhruvilshah3

junrao

@kowshik : Thanks for the updated PR. A few more comments below.

junrao · 2021-05-25T00:04:56Z

core/src/main/scala/kafka/log/Log.scala

+      producerStateManager.takeSnapshot()
+    },
+    postRollAction = (newSegment: LogSegment, deletedSegment: Option[LogSegment]) => {
+      deletedSegment.foreach(segment => deleteProducerSnapshotAsync(Seq(segment)))


This seems to have exposed an existing bug. During roll, deletedSegment will be non-empty if there is an existing segment of 0 size with the newOffsetToRoll. However, since we take a producer snapshot on newOffsetToRoll before calling postRollAction, we will be deleting the same snapshot we just created.

In this case, I think we don't need to delete producerSnapshot for deletedSegment.

This is a great catch. I agree with you. While I can address it in this PR, should we create a separate JIRA for it?

We could fix this in a separate jira too.

I've created a JIRA to track this: https://issues.apache.org/jira/browse/KAFKA-12876.

core/src/main/scala/kafka/log/Log.scala

junrao · 2021-05-25T21:17:37Z

core/src/main/scala/kafka/log/Log.scala

-    rebuildProducerState(endOffset, producerStateManager)
+    lock synchronized {
+      rebuildProducerState(endOffset, producerStateManager)
+    }


This change has a couple of issues.
(1) updateHighWatermark() now only updates the offset, but not the corresponding offset metadata. The offset metadata is needed in serving fetch requests. Recomputing that requires index lookup and log scan, and can be extensive. So, we need to preserve the offset metadata during truncate() and truncateFully().
(2) I think updateHighWatermark() needs to be called within the lock. updateHighWatermark() reads local log's logEndOffset. So, we don't want the logEndOffset to change while updateHighWatermark() is called.

Sounds good. I'll fix this.

core/src/main/scala/kafka/log/Log.scala

junrao · 2021-05-25T22:16:21Z

core/src/main/scala/kafka/log/LogLoader.scala

            params.logIdentifier)
+          deleteProducerSnapshotsAsync(result.deletedSegments, params)


This is unnecessary since during splitting, the old segment is replaced with a new segment with the same base offset. So, result.deletedSegments is always empty.

Sounds good. Great catch. It appears straightforward to just skip deleting the snapshot here, I can leave a comment explaining why.

@junrao I thought about this again. Correct me if I'm wrong, but it appears we may be altering existing behavior if we go down this route. Should we do it in a separate PR to isolate the change?

Yes, that's fine.

I have created a jira to track this improvement. https://issues.apache.org/jira/browse/KAFKA-12923

core/src/test/scala/unit/kafka/log/LocalLogTest.scala

kowshik · 2021-06-04T10:57:42Z

@junrao Thanks for the review! I've addressed your comments in e201295e03e0ea8a7102983888d1a7afc66d384a, and have also rebased this PR onto most recent commit in trunk. This comment is pending and needs discussion. The PR is ready for review again.

junrao

@kowshik : Thanks for the updated PR. A few more comments. Also, could you run all system tests for the PR?

core/src/main/scala/kafka/log/Log.scala

junrao · 2021-06-07T17:17:27Z

core/src/main/scala/kafka/log/Log.scala

    } else {
      segment
    }
  }

  /**
-   * Roll the log over to a new active segment starting with the current logEndOffset.
+   * Roll the local log over to a new active segment starting with the current logEndOffset.


This comment is not very accurate since we roll to expectedNextOffset or logEndOffset.

Sure, I'll fix it. Good catch.

Done in 8f14879.

junrao · 2021-06-07T17:24:09Z

core/src/main/scala/kafka/log/LocalLog.scala

+      }
+      Utils.delete(dir)
+      // File handlers will be closed if this log is deleted
+      isMemoryMappedBufferClosed = true


It seems that we should set isMemoryMappedBufferClosed in deleteAllSegments()?

That's a good point. I'll move it there.

Done in 8f14879.

junrao · 2021-06-07T17:34:09Z

core/src/main/scala/kafka/log/Log.scala

@@ -1812,37 +1577,36 @@ class Log(@volatile private var _dir: File,
    endOffset: Long
  ): Unit = {
    logStartOffset = startOffset
-    nextOffsetMetadata = LogOffsetMetadata(endOffset, activeSegment.baseOffset, activeSegment.size)
-    recoveryPoint = math.min(recoveryPoint, endOffset)
+    localLog.updateLogEndOffset(endOffset)


We need to preserve the LogOffsetMetadata for endOffset and use it to call updateHighWatermark.

Sounds good. This can be updated to updateHighWatermark(localLog.logEndOffsetMetadata).

Done in 8f14879.

junrao · 2021-06-07T17:38:02Z

core/src/main/scala/kafka/log/Log.scala

-          initFileSize = config.initFileSize,
-          preallocate = config.preallocate))
+        val deletedSegments = localLog.truncateFullyAndStartAt(newOffset)
+        deleteProducerSnapshots(deletedSegments, asyncDelete = true)


producerStateManager.truncateFullyAndStartAt() removes all producer snapshots. So, this is necessary.

Sounds good. I'll fix this.

Done in 8f14879.

junrao · 2021-06-07T17:45:37Z

core/src/main/scala/kafka/log/LogLoader.scala

            params.logIdentifier)
+          deleteProducerSnapshotsAsync(result.deletedSegments, params)


Yes, that's fine.

kowshik · 2021-06-21T20:46:03Z

@junrao @dhruvilshah3 I ran a perf test against a Broker build with and without this PR. The test involved the following:

Created a test topic with 1 partition and replication factor 1 using the command: $> ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic kowshik-test-1 --partitions 1 --replication-factor 1.
Ran kafka-producer-perf-test.sh to produce 10M messages each of size 1KB and with max producer throughput 100K to the above topic. Command: $> ./bin/kafka-producer-perf-test.sh --num-records 10000000 --print-metrics --producer-props bootstrap.servers=localhost:9092 --record-size 1024 --throughput 100000 --topic kowshik-test-1.
In parallel, ran kafka-consumer-perf-test.sh to consume the 10M messages that were produced in (2) using the command: $> bin/kafka-consumer-perf-test.sh --topic kowshik-test-1 --bootstrap-server localhost:9092 --messages 10000000 --print-metrics --show-detailed-stats

The tests have similar results, meaning that the performance with and without this PR looks similar. Here are the results:

log.segment.bytes=10MB

Without this PR on top of c333bfd: https://gist.github.com/kowshik/0ea1ae9ac8210f4bba49967727ddb475
With this PR on top of c333bfd: https://gist.github.com/kowshik/c5ec0fb92679bd91613f520455446bf5

log.segment.bytes=100MB

Without this PR on top of c333bfd: https://gist.github.com/kowshik/2aaa7113fd05e10721c60aaf9bf8c654
With this PR on top of c333bfd: https://gist.github.com/kowshik/b09882c2b13930be2efc69554c31aded

log.segment.bytes=1GB

Without this PR on top of c333bfd: https://gist.github.com/kowshik/ea36153c9751180c5dbe383b189d50df
With this PR on top of c333bfd: https://gist.github.com/kowshik/133803d10d510df93f1d15858e91035a

junrao

@kowshik : Thanks for the updated PR. LGTM. Are there any more tests that you plan to do?

kowshik · 2021-07-12T22:31:45Z

@junrao Thanks for the review. I ran load tests on the changes from this PR, there weren't any new regressions (i.e. latency regressions or errors) that I noticed, except for an issue that I found which looks unrelated to this PR, its described in this jira: https://issues.apache.org/jira/browse/KAFKA-13070.

The load test was run on a 6-broker cluster with 250GB SSD disks:

Produce consume on a test topic 2000 partitions (~1000+ replica count per broker).
Per topic # of producers = 6.
Produce ingress per broker = ~20.5MBps.
Per topic # of consumers = 6.
# of consumer groups = 3.
Test duration: ~1h.

Mid-way through the test, I rolled the cluster under load to check how the cluster behaved. Overall things looked OK.

There weren't any additional tests that I was planning to do.

satishd · 2021-07-14T10:18:11Z

Thanks @junrao for merging into trunk. Can we also push this to 3.0 branch as we discussed earlier?
cc @kowshik

ijuma · 2021-07-14T15:29:49Z

What is the reason for including a refactoring in 3.0 after the feature freeze?

kowshik · 2021-07-14T16:40:29Z

@ijuma Discussed with @satishd. We are not planning to include this in 3.0.

In this PR, I've renamed kafka.log.Log to kafka.log.UnifiedLog. With the advent of KIP-405, going forward the existing Log class would present a unified view of local and tiered log segments, so we rename it to UnifiedLog. The motivation for this PR is also the same as outlined in this design document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit. This PR is a follow-up to #10280 where we had refactored the Log layer introducing a new kafka.log.LocalLog class. Note: the Log class name had to be hardcoded to ensure metrics are defined under the Log class (for backwards compatibility). Please refer to the newly introduced UnifiedLog.metricName() method. Reviewers: Cong Ding <cong@ccding.com>, Satish Duggana <satishd@apache.org>, Jun Rao <junrao@gmail.com>

TL;DR: This PR implements the details of the Log layer refactor, as outlined in this document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit. Few details maybe different from the doc, but it is more or less the same. STRATEGY: In this PR, I've extracted a new class called LocalLog out of Log. Currently LocalLog is purely an implementation detail thats not exposed outside Log class (except for tests). The object encapsulation is that each Log instance wraps around a LocalLog instance. This new LocalLog class attempts to encompass most of the responsibilities of local log surrounding the segments map, which otherwise were present in Log previously. Note that not all local log responsibilities have been moved over to this new class (yet). The criteria I used was to preserve (for now) in existing Log class, any logic that is mingled in a complex manner with the logStartOffset or the LeaderEpochCache or the ProducerStateManager. Reviewers: Ismael Juma <ismael@juma.me.uk>, Satish Duggana <satishd@apache.org>, Jun Rao <junrao@gmail.com>

In this PR, I've renamed kafka.log.Log to kafka.log.UnifiedLog. With the advent of KIP-405, going forward the existing Log class would present a unified view of local and tiered log segments, so we rename it to UnifiedLog. The motivation for this PR is also the same as outlined in this design document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit. This PR is a follow-up to apache#10280 where we had refactored the Log layer introducing a new kafka.log.LocalLog class. Note: the Log class name had to be hardcoded to ensure metrics are defined under the Log class (for backwards compatibility). Please refer to the newly introduced UnifiedLog.metricName() method. Reviewers: Cong Ding <cong@ccding.com>, Satish Duggana <satishd@apache.org>, Jun Rao <junrao@gmail.com>

kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from ee9d77f to de98edf Compare March 12, 2021 22:47

junrao reviewed Mar 25, 2021

View reviewed changes

kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from f5ae1d5 to 63be325 Compare March 26, 2021 07:47

kowshik changed the title ~~KIP-405: Log layer refactor~~ KAFKA-12551: Log layer refactor Mar 26, 2021

kowshik changed the title ~~KAFKA-12551: Log layer refactor~~ KAFKA-12554: Refactor Log layer Mar 26, 2021

kowshik force-pushed the KIP-405_Log_refactoring branch from 63be325 to c263438 Compare May 3, 2021 00:53

kowshik force-pushed the KIP-405_Log_refactoring branch from 3375149 to a541efe Compare May 10, 2021 01:30

kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from 262ddab to 1ae93dd Compare May 13, 2021 07:26

kowshik mentioned this pull request May 19, 2021

MINOR: Remove unused maxProducerIdExpirationMs parameter in Log constructor #10723

Merged

kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from 64100f9 to c419c35 Compare May 24, 2021 19:33

junrao reviewed May 26, 2021

View reviewed changes

kowshik force-pushed the KIP-405_Log_refactoring branch 4 times, most recently from 45a55e2 to e201295 Compare June 4, 2021 10:55

junrao reviewed Jun 7, 2021

View reviewed changes

kowshik force-pushed the KIP-405_Log_refactoring branch 7 times, most recently from e34c0b5 to 20ccbc4 Compare June 30, 2021 09:46

kowshik force-pushed the KIP-405_Log_refactoring branch 4 times, most recently from 52d53ad to 4d207a7 Compare July 9, 2021 08:44

junrao approved these changes Jul 9, 2021

View reviewed changes

kowshik added 5 commits July 11, 2021 13:19

KAFKA-12554: Refactor Log layer

d79c404

Address comments

6d9d198

Address comments

2d24153

Address comments from Jun

40817b3

Minor cleanups + changes

a0d94b3

kowshik force-pushed the KIP-405_Log_refactoring branch from 4d207a7 to a0d94b3 Compare July 11, 2021 20:20

kowshik added 2 commits July 11, 2021 18:36

Avoid reference to Log inside LocalLog

c64c1e4

Minor logging improvements

1752ea8

junrao merged commit b80ff18 into apache:trunk Jul 14, 2021

kowshik mentioned this pull request Jul 31, 2021

KAFKA-13068: Rename Log to UnifiedLog #11154

Merged

		params.logIdentifier)
		deleteProducerSnapshotsAsync(result.deletedSegments, params)

KAFKA-12554: Refactor Log layer #10280

KAFKA-12554: Refactor Log layer #10280

Conversation

kowshik commented Mar 8, 2021 • edited Loading

ijuma commented Mar 9, 2021

kowshik commented Mar 9, 2021 • edited Loading

ijuma commented Mar 9, 2021

kowshik commented Mar 9, 2021 • edited Loading

ijuma commented Mar 9, 2021

kowshik commented Mar 10, 2021

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

kowshik commented Mar 26, 2021

kowshik commented Mar 26, 2021 • edited Loading

kowshik commented May 10, 2021 • edited Loading

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik Jun 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik commented Jun 4, 2021 • edited Loading

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik Jun 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik Jun 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik Jun 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik Jun 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kowshik commented Jun 21, 2021

junrao left a comment

Choose a reason for hiding this comment

kowshik commented Jul 12, 2021 • edited Loading

satishd commented Jul 14, 2021

ijuma commented Jul 14, 2021

kowshik commented Jul 14, 2021

kowshik commented Mar 8, 2021 •

edited

Loading

kowshik commented Mar 9, 2021 •

edited

Loading

kowshik commented Mar 9, 2021 •

edited

Loading

kowshik Mar 26, 2021 •

edited

Loading

kowshik commented Mar 26, 2021 •

edited

Loading

kowshik commented May 10, 2021 •

edited

Loading

kowshik Jun 1, 2021 •

edited

Loading

kowshik commented Jun 4, 2021 •

edited

Loading

kowshik Jun 9, 2021 •

edited

Loading

kowshik Jun 9, 2021 •

edited

Loading

kowshik Jun 9, 2021 •

edited

Loading

kowshik Jun 9, 2021 •

edited

Loading

kowshik commented Jul 12, 2021 •

edited

Loading