Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-12554: Refactor Log layer #10280

Merged
merged 7 commits into from
Jul 14, 2021
Merged

Conversation

kowshik
Copy link
Contributor

@kowshik kowshik commented Mar 8, 2021

TL;DR:

This PR implements the details of the Log layer refactor, as outlined in this document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit. Few details maybe different from the doc, but it is more or less the same.

STRATEGY:

In this PR, I've extracted a new class called LocalLog out of Log. Currently LocalLog is purely an implementation detail thats not exposed outside Log class (except for tests). The object encapsulation is that each Log instance wraps around a LocalLog instance.

This new LocalLog class attempts to encompass most of the responsibilities of local log surrounding the segments map, which otherwise were present in Log previously. Note that not all local log responsibilities have been moved over to this new class (yet). The criteria I used was to preserve (for now) in existing Log class, any logic that is mingled in a complex manner with the logStartOffset or the LeaderEpochCache or the ProducerStateManager.

WINS:

The main win is that the new LocalLog class is now agnostic of the logStartOffset, which continues to be managed mainly by Log class. Below is the local log functionality that has successfully moved over from Log to LocalLog:

  1. Access of LogSegments instance containing the local LogSegment objects.
  2. Read path logic to read records from the log.
  3. Segment file deletion logic.
  4. Segment truncation logic.
  5. Segment roll logic.
  6. Segment split logic.
  7. Segment replacement logic.

Below is the main local log functionality that continues to remain in Log:

  1. Segment append logic. The reason is that the below logic is mingled with one or more of the following: logStartOffset or LeaderEpochCache or ProducerStateManager. This makes it hard to separate just the local logic out of it.
  2. Last stable offset and logic surrounding it.
  3. High watermark and logic surrounding it.
  4. Logic to fetchOffsetByTimestamp and logic to legacyFetchOffsetsBefore.
  5. Some of the retention logic thats related with the global view of the log.
  6. All other logic related with handling LeaderEpochCache and ProducerStateManager.

PAINPOINTS:

  1. Log locking semantics had to be changed in handful of areas, with lock taken at a coarse level.
  2. Few API implementations needed re-ordering of logic in Log class to make migration feasible.
  3. Certain APIs added to LocalLog are crude in nature or signature, examples: def checkIfMemoryMappedBufferClosed, def markFlushed, def updateRecoveryPoint, def replaceSegments etc.
  4. Certain important APIs (such as def append logic) were hard to migrate because logic was mingled with Leader epoch cache, Producer state manager and log start offset.

TESTS:

  • New unit suite: LocalLogTest.scala has been provided containing tests specific to LocalLog class.
    All other existing tests are expected to pass.
  • 6/10/2021: System test run on top of 28bf22af168ca0db76796b5d3cd67a38ed8ed1c2: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4550/
  • 6/12/2021:
    • System test runs 4560 and 4562 on top of 008b701386ce5a4d892d6ac5b90798b981c4fba0.
    • System test runs 4561 and 4563 on top of trunk/6de37e536ac76ef13530d49dc7320110332cd1ee.
    • kafkatest.tests.client.consumer_test rerun:
      • 4564 against trunk/6de37e536ac76ef13530d49dc7320110332cd1ee.
      • 4566 on top of 008b701386ce5a4d892d6ac5b90798b981c4fba0.

@ijuma
Copy link
Contributor

ijuma commented Mar 9, 2021

Can we do one PR for renaming Log to LocalLog and then a separate ones for the rest? It seems like git rename detection failed here and it will make diffs harder to review.

@kowshik
Copy link
Contributor Author

kowshik commented Mar 9, 2021

@ijuma In this PR the intention was not to rename Log to LocalLog, but rather extract LocalLog class out of Log. My current plan is to do the alternative of what you had suggested above i.e. extract LocalLog from Log in this PR and then rename Log to GlobalLog in a subsequent PR. The reason is to focus on the more important/major piece in the first PR (which is the Log layer separation), then renaming the abstractions in a future PR becomes a relatively minor activity. Either way we choose, we will eventually introduce a new abstraction viz. LocalLog or GlobalLog.

Thoughts?

@ijuma
Copy link
Contributor

ijuma commented Mar 9, 2021

OK, thanks for the explanation. Btw, why do we call one of the logs GlobalLog? In what sense is it Global?

@kowshik
Copy link
Contributor Author

kowshik commented Mar 9, 2021

@ijuma The purpose of GlobalLog class is to serve as a higher layer, stitching together the unified view of both the local and remote portion of the log. Importantly, this class is aware of the global log start offset. This class is external facing, and acts as the outer shell, meaning that the public API of this class will be used by other components such as LogManager, LogCleaner etc. and other components outside the kafka.log package. It could just be called as Log too, but I thought by calling it GlobalLog the intention/differentiation is clear.

The above and few more things are explained in the doc attached in the description. I'd suggest having a look at the doc.

@ijuma
Copy link
Contributor

ijuma commented Mar 9, 2021

@kowshik I can't comment on the doc, that's why I commented here. :) I didn't see any reason there for calling it Global btw. Global tends to imply something more than what this is doing IMO.

@kowshik
Copy link
Contributor Author

kowshik commented Mar 10, 2021

@ijuma I've opened up the doc for comments. I've also updated it to use the name UnifiedLog instead of GlobalLog. Hopefully the intent is better communicated now in the naming.

@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from ee9d77f to de98edf Compare March 12, 2021 22:47
Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kowshik : Thanks for the PR. A few comments below.

core/src/main/scala/kafka/log/LocalLog.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/LocalLog.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/LocalLog.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/LocalLog.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/LocalLog.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
@@ -1816,8 +1292,12 @@ class Log(@volatile private var _dir: File,
*/
private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean,
reason: SegmentDeletionReason): Int = {
def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment], logEndOffset: Long): Boolean = {
highWatermark >= nextSegmentOpt.map(_.baseOffset).getOrElse(logEndOffset) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why do we need to wrap predicate with an additional condition?

Copy link
Contributor Author

@kowshik kowshik Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to accomodate for the hwm check that was previously happening in Log#deletableSegments in this line. The deletableSegments method has now moved to LocalLog, but we can't do the hwm check inside LocalLog since hwm is still owned by Log. So we piggyback on the predicate here to additionally attach the hwm check.

@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from f5ae1d5 to 63be325 Compare March 26, 2021 07:47
@kowshik
Copy link
Contributor Author

kowshik commented Mar 26, 2021

@junrao Thanks a lot for the review! I've addressed your comments in 63be325.

@kowshik kowshik changed the title KIP-405: Log layer refactor KAFKA-12551: Log layer refactor Mar 26, 2021
@kowshik kowshik changed the title KAFKA-12551: Log layer refactor KAFKA-12554: Refactor Log layer Mar 26, 2021
@kowshik
Copy link
Contributor Author

kowshik commented Mar 26, 2021

@junrao Just a heads up on the following. I'm working on the changes for the following in separate PRs, these are related with refactoring the recovery logic (KAFKA-12553):

It seems better if we merge those into trunk ahead of the current PR.

@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch from 63be325 to c263438 Compare May 3, 2021 00:53
@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch from 3375149 to a541efe Compare May 10, 2021 01:30
@kowshik
Copy link
Contributor Author

kowshik commented May 10, 2021

@junrao This PR is ready for another round of review. I've rebased the PR onto latest AK trunk, iterated on the implementation bit more and added new unit tests for LocalLog class under LocalLogTest.scala.

cc @dhruvilshah3

@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from 262ddab to 1ae93dd Compare May 13, 2021 07:26
@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch 2 times, most recently from 64100f9 to c419c35 Compare May 24, 2021 19:33
Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kowshik : Thanks for the updated PR. A few more comments below.

producerStateManager.takeSnapshot()
},
postRollAction = (newSegment: LogSegment, deletedSegment: Option[LogSegment]) => {
deletedSegment.foreach(segment => deleteProducerSnapshotAsync(Seq(segment)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have exposed an existing bug. During roll, deletedSegment will be non-empty if there is an existing segment of 0 size with the newOffsetToRoll. However, since we take a producer snapshot on newOffsetToRoll before calling postRollAction, we will be deleting the same snapshot we just created.

In this case, I think we don't need to delete producerSnapshot for deletedSegment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch. I agree with you. While I can address it in this PR, should we create a separate JIRA for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could fix this in a separate jira too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a JIRA to track this: https://issues.apache.org/jira/browse/KAFKA-12876.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
rebuildProducerState(endOffset, producerStateManager)
lock synchronized {
rebuildProducerState(endOffset, producerStateManager)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has a couple of issues.
(1) updateHighWatermark() now only updates the offset, but not the corresponding offset metadata. The offset metadata is needed in serving fetch requests. Recomputing that requires index lookup and log scan, and can be extensive. So, we need to preserve the offset metadata during truncate() and truncateFully().
(2) I think updateHighWatermark() needs to be called within the lock. updateHighWatermark() reads local log's logEndOffset. So, we don't want the logEndOffset to change while updateHighWatermark() is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
params.logIdentifier)
deleteProducerSnapshotsAsync(result.deletedSegments, params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary since during splitting, the old segment is replaced with a new segment with the same base offset. So, result.deletedSegments is always empty.

Copy link
Contributor Author

@kowshik kowshik Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Great catch. It appears straightforward to just skip deleting the snapshot here, I can leave a comment explaining why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junrao I thought about this again. Correct me if I'm wrong, but it appears we may be altering existing behavior if we go down this route. Should we do it in a separate PR to isolate the change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a jira to track this improvement. https://issues.apache.org/jira/browse/KAFKA-12923

core/src/test/scala/unit/kafka/log/LocalLogTest.scala Outdated Show resolved Hide resolved
core/src/test/scala/unit/kafka/log/LocalLogTest.scala Outdated Show resolved Hide resolved
@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch 4 times, most recently from 45a55e2 to e201295 Compare June 4, 2021 10:55
@kowshik
Copy link
Contributor Author

kowshik commented Jun 4, 2021

@junrao Thanks for the review! I've addressed your comments in e201295e03e0ea8a7102983888d1a7afc66d384a, and have also rebased this PR onto most recent commit in trunk. This comment is pending and needs discussion. The PR is ready for review again.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kowshik : Thanks for the updated PR. A few more comments. Also, could you run all system tests for the PR?

core/src/main/scala/kafka/log/Log.scala Outdated Show resolved Hide resolved
} else {
segment
}
}

/**
* Roll the log over to a new active segment starting with the current logEndOffset.
* Roll the local log over to a new active segment starting with the current logEndOffset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not very accurate since we roll to expectedNextOffset or logEndOffset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll fix it. Good catch.

Copy link
Contributor Author

@kowshik kowshik Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f14879.

}
Utils.delete(dir)
// File handlers will be closed if this log is deleted
isMemoryMappedBufferClosed = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we should set isMemoryMappedBufferClosed in deleteAllSegments()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'll move it there.

Copy link
Contributor Author

@kowshik kowshik Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f14879.

@@ -1812,37 +1577,36 @@ class Log(@volatile private var _dir: File,
endOffset: Long
): Unit = {
logStartOffset = startOffset
nextOffsetMetadata = LogOffsetMetadata(endOffset, activeSegment.baseOffset, activeSegment.size)
recoveryPoint = math.min(recoveryPoint, endOffset)
localLog.updateLogEndOffset(endOffset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to preserve the LogOffsetMetadata for endOffset and use it to call updateHighWatermark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This can be updated to updateHighWatermark(localLog.logEndOffsetMetadata).

Copy link
Contributor Author

@kowshik kowshik Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f14879.

initFileSize = config.initFileSize,
preallocate = config.preallocate))
val deletedSegments = localLog.truncateFullyAndStartAt(newOffset)
deleteProducerSnapshots(deletedSegments, asyncDelete = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

producerStateManager.truncateFullyAndStartAt() removes all producer snapshots. So, this is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll fix this.

Copy link
Contributor Author

@kowshik kowshik Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f14879.

params.logIdentifier)
deleteProducerSnapshotsAsync(result.deletedSegments, params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine.

@kowshik
Copy link
Contributor Author

kowshik commented Jun 21, 2021

@junrao @dhruvilshah3 I ran a perf test against a Broker build with and without this PR. The test involved the following:

  1. Created a test topic with 1 partition and replication factor 1 using the command: $> ./bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic kowshik-test-1 --partitions 1 --replication-factor 1.
  2. Ran kafka-producer-perf-test.sh to produce 10M messages each of size 1KB and with max producer throughput 100K to the above topic. Command: $> ./bin/kafka-producer-perf-test.sh --num-records 10000000 --print-metrics --producer-props bootstrap.servers=localhost:9092 --record-size 1024 --throughput 100000 --topic kowshik-test-1.
  3. In parallel, ran kafka-consumer-perf-test.sh to consume the 10M messages that were produced in (2) using the command: $> bin/kafka-consumer-perf-test.sh --topic kowshik-test-1 --bootstrap-server localhost:9092 --messages 10000000 --print-metrics --show-detailed-stats

The tests have similar results, meaning that the performance with and without this PR looks similar. Here are the results:

log.segment.bytes=10MB

log.segment.bytes=100MB

log.segment.bytes=1GB

@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch 7 times, most recently from e34c0b5 to 20ccbc4 Compare June 30, 2021 09:46
@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch 4 times, most recently from 52d53ad to 4d207a7 Compare July 9, 2021 08:44
Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kowshik : Thanks for the updated PR. LGTM. Are there any more tests that you plan to do?

@kowshik kowshik force-pushed the KIP-405_Log_refactoring branch from 4d207a7 to a0d94b3 Compare July 11, 2021 20:20
@kowshik
Copy link
Contributor Author

kowshik commented Jul 12, 2021

@junrao Thanks for the review. I ran load tests on the changes from this PR, there weren't any new regressions (i.e. latency regressions or errors) that I noticed, except for an issue that I found which looks unrelated to this PR, its described in this jira: https://issues.apache.org/jira/browse/KAFKA-13070.

The load test was run on a 6-broker cluster with 250GB SSD disks:

  • Produce consume on a test topic 2000 partitions (~1000+ replica count per broker).
  • Per topic # of producers = 6.
  • Produce ingress per broker = ~20.5MBps.
  • Per topic # of consumers = 6.
  • # of consumer groups = 3.
  • Test duration: ~1h.

Mid-way through the test, I rolled the cluster under load to check how the cluster behaved. Overall things looked OK.

There weren't any additional tests that I was planning to do.

@junrao junrao merged commit b80ff18 into apache:trunk Jul 14, 2021
@satishd
Copy link
Member

satishd commented Jul 14, 2021

Thanks @junrao for merging into trunk. Can we also push this to 3.0 branch as we discussed earlier?
cc @kowshik

@ijuma
Copy link
Contributor

ijuma commented Jul 14, 2021

What is the reason for including a refactoring in 3.0 after the feature freeze?

@kowshik
Copy link
Contributor Author

kowshik commented Jul 14, 2021

@ijuma Discussed with @satishd. We are not planning to include this in 3.0.

junrao pushed a commit that referenced this pull request Aug 12, 2021
In this PR, I've renamed kafka.log.Log to kafka.log.UnifiedLog. With the advent of KIP-405, going forward the existing Log class would present a unified view of local and tiered log segments, so we rename it to UnifiedLog. The motivation for this PR is also the same as outlined in this design document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit.
This PR is a follow-up to #10280 where we had refactored the Log layer introducing a new kafka.log.LocalLog class.

Note: the Log class name had to be hardcoded to ensure metrics are defined under the Log class (for backwards compatibility). Please refer to the newly introduced UnifiedLog.metricName() method.

Reviewers: Cong Ding <cong@ccding.com>, Satish Duggana <satishd@apache.org>, Jun Rao <junrao@gmail.com>
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
TL;DR:

This PR implements the details of the Log layer refactor, as outlined in this document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit. Few details maybe different from the doc, but it is more or less the same.

STRATEGY:

In this PR, I've extracted a new class called LocalLog out of Log. Currently LocalLog is purely an implementation detail thats not exposed outside Log class (except for tests). The object encapsulation is that each Log instance wraps around a LocalLog instance.

This new LocalLog class attempts to encompass most of the responsibilities of local log surrounding the segments map, which otherwise were present in Log previously. Note that not all local log responsibilities have been moved over to this new class (yet). The criteria I used was to preserve (for now) in existing Log class, any logic that is mingled in a complex manner with the logStartOffset or the LeaderEpochCache or the ProducerStateManager.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Satish Duggana <satishd@apache.org>, Jun Rao <junrao@gmail.com>
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
In this PR, I've renamed kafka.log.Log to kafka.log.UnifiedLog. With the advent of KIP-405, going forward the existing Log class would present a unified view of local and tiered log segments, so we rename it to UnifiedLog. The motivation for this PR is also the same as outlined in this design document: https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit.
This PR is a follow-up to apache#10280 where we had refactored the Log layer introducing a new kafka.log.LocalLog class.

Note: the Log class name had to be hardcoded to ensure metrics are defined under the Log class (for backwards compatibility). Please refer to the newly introduced UnifiedLog.metricName() method.

Reviewers: Cong Ding <cong@ccding.com>, Satish Duggana <satishd@apache.org>, Jun Rao <junrao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants