Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too slow recovering log streams while startup storage node #125

Closed
ijsong opened this issue Sep 8, 2022 · 1 comment · Fixed by #160, #181 or #204
Closed

Too slow recovering log streams while startup storage node #125

ijsong opened this issue Sep 8, 2022 · 1 comment · Fixed by #160, #181 or #204
Assignees

Comments

@ijsong
Copy link
Member

ijsong commented Sep 8, 2022

A storage node reloads log streams while restarting. Sometimes, it is very slow. There are two optimization points:

  • Reloading log streams concurrently.
  • Boost up the performance of recovering commit context.

I think the first is quite easy. The second, however, needs careful consideration since it needs an understanding of communication between the storage node and metadata repository.

ijsong added a commit to ijsong/varlog that referenced this issue Sep 15, 2022
ijsong added a commit to ijsong/varlog that referenced this issue Sep 19, 2022
ijsong added a commit to ijsong/varlog that referenced this issue Sep 20, 2022
@ijsong ijsong self-assigned this Sep 21, 2022
ijsong added a commit to ijsong/varlog that referenced this issue Sep 21, 2022
… metadata repository

Previously, a log stream replica in the learning state sent reports to the metadata repository, and
the log stream replica had to use commit contexts sent from the source replica.
However, the source cannot do that in a future patch because replicas in a log stream will store the
latest commit context rather than all commit contexts.
This patch slightly changes the report's behavior in a log stream replica, not to send it while in a
learning state. When the log stream replica copies the data from the source replica, it will not
send the report.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Sep 21, 2022
… metadata repository

Previously, a log stream replica in the learning state sent reports to the metadata repository, and
the log stream replica had to use commit contexts sent from the source replica.
However, the source cannot do that in a future patch because replicas in a log stream will store the
latest commit context rather than all commit contexts.
This patch slightly changes the report's behavior in a log stream replica, not to send it while in a
learning state. When the log stream replica copies the data from the source replica, it will not
send the report.

Updates kakao#125
@ijsong
Copy link
Member Author

ijsong commented Sep 22, 2022

I plan to resolve this issue with several PRs.
Briefly, replicas running a new version should be able to handle previous data either loaded from local storage or received from other replicas. However, they do not have to mimic old behaviors to send data to other replicas for synchronization; we recommend that the new replica not send any data to the old replica for synchronization while updating a cluster.
Note that these changes might lose backward compatibility.

Planning for multi-PRs are follow:

  • Implement a new way to store the commit. Previously it kept all commit contexts, but it will now hold only the latest ones. The change should leave the previous approach, which stores all commit contexts for each commit to keep tests succeeding.
  • Change the method internal/storage.(*Storage).Trim to support the above change. It will not support the previous approach; if users want to wipe out unnecessary series of commit contexts, we recommend that users can update the log stream replica and remove old ones.
  • Implement a new recovery method for the log stream replica. And the previous way also has to work.
  • Change RPCs related to synchronization. This PR will be very tricky. A replica running a new version has to handle previous synchronization RPCs; the previous SyncReplicate had a set of interleaved commit context and log entries, but the new SyncReplicate will have a sequence of log entries followed by a single commit context. However, a replica running a new version will not send old-style synchronization.
  • Remove the old-style function that stores all commit context. As you see, we have left it for testing.

From now, we can deploy the new version of the varlog cluster with the old one. If we are sure that all nodes work with new functionalities, we can remove old codes that are left for testing and interoperability.

ijsong added a commit to ijsong/varlog that referenced this issue Sep 23, 2022
This patch adds a new approach to storing a commit context, only the latest commit context. It does
not remove the previous method to keep tests succeeding; we will remove it later.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Sep 29, 2022
This patch changes the method `internal/storage.(*Storage).Trim` not to remove the commit context.
According to the RFC (i.e., [20220915_commit_context.md]((https://github.com/kakao/varlog/blob/main/docs/RFCs/20220915_commit_context.md)), the commit context is a pointer of the last commit message to help recover the log stream.
Thus, the trim that removes the log entries' prefix does not have to remove the commit context.
Moreover, removing a commit context while appending log entries is not trivial since Commit RPC can
change the commit context continuously.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Sep 29, 2022
This patch changes the method `internal/storage.(*Storage).Trim` not to remove the commit context.
According to the RFC (i.e., [20220915_commit_context.md]((https://github.com/kakao/varlog/blob/main/docs/RFCs/20220915_commit_context.md)), the commit context is a pointer of the last commit message to help recover the log stream.
Thus, the trim that removes the log entries' prefix does not have to remove the commit context.
Moreover, removing a commit context while appending log entries is not trivial since Commit RPC can
change the commit context continuously.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Sep 29, 2022
This patch changes the method `internal/storage.(*Storage).Trim` not to remove the commit context.
According to the RFC (i.e., [20220915_commit_context.md](https://github.com/kakao/varlog/blob/main/docs/RFCs/20220915_commit_context.md)), the commit context is a pointer of the last commit message to help recover the log stream.
Thus, the trim that removes the log entries' prefix does not have to remove the commit context.
Moreover, removing a commit context while appending log entries is not trivial since Commit RPC can
change the commit context continuously.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 5, 2022
This patch changes the method `internal/storage.(*Storage).Trim` not to remove the commit context.
According to the RFC (i.e., [20220915_commit_context.md](https://github.com/kakao/varlog/blob/main/docs/RFCs/20220915_commit_context.md)), the commit context is a pointer of the last commit message to help recover the log stream.
Thus, the trim that removes the log entries' prefix does not have to remove the commit context.
Moreover, removing a commit context while appending log entries is not trivial since Commit RPC can
change the commit context continuously.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 5, 2022
…g the latest commit context

This patch changes how to restore the status of a log stream replica after kakao#162. Previously, the log
stream replica reads a sequence of all commit contexts to decide whether it can work well after
restoring. However, since the replica will maintain only the latest commit context, we should change
the previous approach to reloading a replica.

The log stream replica reads the latest commit context and boundary of log entries and checks if it
recovers its status. The replica runs if the recovery completes. However, if there is something
wrong, its reportCommitBase will be invalid. In that case, the metadata repository will ignore its
report, and cloning the log entries from the other replica will occur.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 6, 2022
…g the latest commit context

This patch changes how to restore the status of a log stream replica after kakao#162. Previously, the log
stream replica reads a sequence of all commit contexts to decide whether it can work well after
restoring. However, since the replica will maintain only the latest commit context, we should change
the previous approach to reloading a replica.

The log stream replica reads the last commit context and boundary of log entries and checks if it
recovers its status. The replica runs if the recovery completes. However, if there is something
wrong, its reportCommitBase will be invalid. In that case, the metadata repository will ignore its
report, and cloning the log entries from the other replica will occur.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 13, 2022
…Base is invalid

When reportCommitBase is invalid, which means that the commit context and log entries are
inconsistent, the log stream replica becomes sealing. The synchronization between replicas can
resolve it.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 13, 2022
…Base is invalid

When reportCommitBase is invalid, which means that the commit context and log entries are
inconsistent, the log stream replica becomes sealing. The synchronization between replicas can
resolve it.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 13, 2022
…Base is invalid

When reportCommitBase is invalid, which means that the commit context and log entries are
inconsistent, the log stream replica becomes sealing. The synchronization between replicas can
resolve it.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 13, 2022
There were already two kinds of batches in storage:

- The write batch saves only data keyed by the LLSN.
- The commit batch saves only the commit context and GLSN, and the commit context must exist.

This patch introduces a new batch type - append batch. The append b catch saves both log entries and
a commit context. However, it is not necessary to keep the commit context. During synchronization,
the append batch must store log entries from a destination replica.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 13, 2022
There were already two kinds of batches in storage:

- The write batch saves only data keyed by the LLSN.
- The commit batch saves only the commit context and GLSN, and the commit context must exist.

This patch introduces a new batch type - append batch. The append b catch saves both log entries and
a commit context. However, it is not necessary to keep the commit context. During synchronization,
the append batch must store log entries from a destination replica.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 25, 2022
…he last commit context

This patch changes the synchronization process to accept only the last commit context applied by kakao#162.
The synchronization copies log entries and the latest commit context from the source replica to the
destination replica. The destination's reportCommitBase should be valid, and its log entries should
be the same as the source's.

SyncInit can now delete log entries that the source replica had already removed. It deletes all log
entries simultaneously to avoid a hole in the middle of log entries. It also makes the replica's
reportCommitBase invalid because commit context and log entry are inconsistent during the
synchronization.

SyncReplicate no longer uses the SyncReplicateBuffer since there is no reason to store a commit
context and corresponding log entries in the buffer and write them into a disk at once. It now saves
log entries or the commit context every time. We can optimize the SyncReplicate to transfer a batch
of log entries, which we can discuss later.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 28, 2022
…he last commit context

This patch changes the synchronization process to accept only the last commit context applied by kakao#162.
The synchronization copies log entries and the latest commit context from the source replica to the
destination replica. The destination's reportCommitBase should be valid, and its log entries should
be the same as the source's.

SyncInit can now delete log entries that the source replica had already removed. It deletes all log
entries simultaneously to avoid a hole in the middle of log entries. It also makes the replica's
reportCommitBase invalid because commit context and log entry are inconsistent during the
synchronization.

SyncReplicate no longer uses the SyncReplicateBuffer since there is no reason to store a commit
context and corresponding log entries in the buffer and write them into a disk at once. It now saves
log entries or the commit context every time. We can optimize the SyncReplicate to transfer a batch
of log entries, which we can discuss later.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 28, 2022
Previously, the log stream wrote a commit context for each commit. However, to fix kakao#125, we no
longer need to persist in a sequence of commit contexts. So this patch removes lines that keep all
commit context.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Oct 31, 2022
…he last commit context

This patch changes the synchronization process to accept only the last commit context applied by kakao#162.
The synchronization copies log entries and the latest commit context from the source replica to the
destination replica. The destination's reportCommitBase should be valid, and its log entries should
be the same as the source's.

SyncInit can now delete log entries that the source replica had already removed. It deletes all log
entries simultaneously to avoid a hole in the middle of log entries. It also makes the replica's
reportCommitBase invalid because commit context and log entry are inconsistent during the
synchronization.

SyncReplicate no longer uses the SyncReplicateBuffer since there is no reason to store a commit
context and corresponding log entries in the buffer and write them into a disk at once. It now saves
log entries or the commit context every time. We can optimize the SyncReplicate to transfer a batch
of log entries, which we can discuss later.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Nov 3, 2022
…he last commit context

This patch changes the synchronization process to accept only the last commit context applied by kakao#162.
The synchronization copies log entries and the latest commit context from the source replica to the
destination replica. The destination's reportCommitBase should be valid, and its log entries should
be the same as the source's.

SyncInit can now delete log entries that the source replica had already removed. It deletes all log
entries simultaneously to avoid a hole in the middle of log entries. It also makes the replica's
reportCommitBase invalid because commit context and log entry are inconsistent during the
synchronization.

SyncReplicate no longer uses the SyncReplicateBuffer since there is no reason to store a commit
context and corresponding log entries in the buffer and write them into a disk at once. It now saves
log entries or the commit context every time. We can optimize the SyncReplicate to transfer a batch
of log entries, which we can discuss later.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Nov 3, 2022
…he last commit context

This patch changes the synchronization process to accept only the last commit context applied by kakao#162.
The synchronization copies log entries and the latest commit context from the source replica to the
destination replica. The destination's reportCommitBase should be valid, and its log entries should
be the same as the source's.

SyncInit can now delete log entries that the source replica had already removed. It deletes all log
entries simultaneously to avoid a hole in the middle of log entries. It also makes the replica's
reportCommitBase invalid because commit context and log entry are inconsistent during the
synchronization.

SyncReplicate no longer uses the SyncReplicateBuffer since there is no reason to store a commit
context and corresponding log entries in the buffer and write them into a disk at once. It now saves
log entries or the commit context every time. We can optimize the SyncReplicate to transfer a batch
of log entries, which we can discuss later.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Nov 4, 2022
Previously, the log stream wrote a commit context for each commit. However, to fix kakao#125, we no
longer need to persist in a sequence of commit contexts. So this patch removes lines that keep all
commit context.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Nov 7, 2022
Previously, the log stream wrote a commit context for each commit. However, to fix kakao#125, we no
longer need to persist in a sequence of commit contexts. So this patch removes lines that keep all
commit context.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Nov 7, 2022
Previously, the log stream wrote a commit context for each commit. However, to fix kakao#125, we no
longer need to persist in a sequence of commit contexts. So this patch removes lines that keep all
commit context.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Nov 8, 2022
Previously, the log stream wrote a commit context for each commit. However, to fix kakao#125, we no
longer need to persist in a sequence of commit contexts. So this patch removes lines that keep all
commit context.

Updates kakao#125
ijsong added a commit to ijsong/varlog that referenced this issue Nov 22, 2022
Previously, the log stream wrote a commit context for each commit. However, to fix kakao#125, we no
longer need to persist in a sequence of commit contexts. So this patch removes lines that keep all
commit context.

Updates kakao#125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant