Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storagenode: can't commit written logs after restarting #490

Closed
ijsong opened this issue Jun 22, 2023 · 0 comments · Fixed by #492 or #525
Closed

storagenode: can't commit written logs after restarting #490

ijsong opened this issue Jun 22, 2023 · 0 comments · Fixed by #492 or #525
Assignees
Labels
bug Something isn't working

Comments

@ijsong
Copy link
Member

ijsong commented Jun 22, 2023

Current Behavior

All log stream replicas belonging to the storage nodes that are just restarted can't commit logs written before restarting if all log stream replicas in a log stream were restarted simultaneously.

Expected Behavior

If the log stream replicas have uncommitted logs and the metadata repository sends Commit, they should commit the logs.

Steps To Reproduce

  1. Try to append logs in the target log stream.
  2. Kill the storage nodes that have the log stream replicas.
  3. There must be some log entries that the metadata repository committed, but all the log stream replicas didn't.
  4. Restart the storage nodes.
  5. The target log stream remains SEALING because Seal RPC can't make them SEALED.

The reason why the Seal RPC can't make them SEALED is that they don't have committed all the logs that the metadata repository committed. To do so, the log stream replicas have to restore uncommitted logs.

Environment

  • Varlog version: v0.14.1
@ijsong ijsong added the bug Something isn't working label Jun 22, 2023
@ijsong ijsong self-assigned this Jun 22, 2023
ijsong added a commit that referenced this issue Jun 22, 2023
Currently, all log stream replicas belonging to the storage nodes that are just restarted can't
commit logs written before restarting if all log stream replicas in a log stream were restarted
simultaneously. They have logs uncommitted in their storages, but they can't process Commit RPC sent
from the metadata repository.

This PR fixes the above issue. While recovering the log stream context after restarting the storage
nodes, it restores uncommitted logs.

Resolves #490
@ijsong ijsong linked a pull request Jun 22, 2023 that will close this issue
ijsong added a commit that referenced this issue Jun 22, 2023
Currently, all log stream replicas belonging to the storage nodes that are just restarted can't
commit logs written before restarting if all log stream replicas in a log stream were restarted
simultaneously. They have logs uncommitted in their storages, but they can't process Commit RPC sent
from the metadata repository.

This PR fixes the above issue. While recovering the log stream context after restarting the storage
nodes, it restores uncommitted logs.

Resolves #490
ijsong added a commit that referenced this issue Jun 27, 2023
Currently, all log stream replicas belonging to the storage nodes that are just restarted can't
commit logs written before restarting if all log stream replicas in a log stream were restarted
simultaneously. They have logs uncommitted in their storages, but they can't process Commit RPC sent
from the metadata repository.

This PR fixes the above issue. While recovering the log stream context after restarting the storage
nodes, it restores uncommitted logs.

Resolves #490
ijsong added a commit that referenced this issue Jun 27, 2023
Currently, all log stream replicas belonging to the storage nodes that are just restarted can't
commit logs written before restarting if all log stream replicas in a log stream were restarted
simultaneously. They have logs uncommitted in their storages, but they can't process Commit RPC sent
from the metadata repository.

This PR fixes the above issue. While recovering the log stream context after restarting the storage
nodes, it restores uncommitted logs.

Resolves #490
ijsong added a commit that referenced this issue Jul 4, 2023
Currently, all log stream replicas belonging to the storage nodes that are just restarted can't
commit logs written before restarting if all log stream replicas in a log stream were restarted
simultaneously. They have logs uncommitted in their storages, but they can't process Commit RPC sent
from the metadata repository.

This PR fixes the above issue. While recovering the log stream context after restarting the storage
nodes, it restores uncommitted logs.

Resolves #490
ijsong added a commit that referenced this issue Jul 17, 2023
Currently, all log stream replicas belonging to the storage nodes that are just restarted can't
commit logs written before restarting if all log stream replicas in a log stream were restarted
simultaneously. They have logs uncommitted in their storages, but they can't process Commit RPC sent
from the metadata repository.

This PR fixes the above issue. While recovering the log stream context after restarting the storage
nodes, it restores uncommitted logs.

Resolves #490
ijsong added a commit that referenced this issue Jul 28, 2023
Currently, all log stream replicas belonging to the storage nodes that are just restarted can't
commit logs written before restarting if all log stream replicas in a log stream were restarted
simultaneously. They have logs uncommitted in their storages, but they can't process Commit RPC sent
from the metadata repository.

This PR fixes the above issue. While recovering the log stream context after restarting the storage
nodes, it restores uncommitted logs.

Resolves #490
ijsong added a commit that referenced this issue Jul 28, 2023
### What this PR does

Currently, all log stream replicas belonging to the storage nodes that are just
restarted can't commit logs written before restarting if all log stream
replicas in a log stream were restarted simultaneously. They have logs
uncommitted in their storages, but they can't process Commit RPC sent from the
metadata repository.

This PR fixes the above issue. While recovering the log stream context after
restarting the storage nodes, it restores uncommitted logs.

### Which issue(s) this PR resolves

Resolves #490
ijsong added a commit that referenced this issue Aug 7, 2023
🤖 I have created a release *beep* *boop*
---


## [0.15.0](v0.14.1...v0.15.0) (2023-07-31)


### Features

* **admin:** add otelgrpc metric interceptor ([d9ca9aa](d9ca9aa))
* **admin:** add otelgrpc metric interceptor ([#509](#509)) ([db7a1a2](db7a1a2))
* **admin:** speed up fetching cluster metadata ([3e46f62](3e46f62))
* **admin:** speed up fetching cluster metadata ([#480](#480)) ([53a8f19](53a8f19))
* **all:** add common flags for telemetry ([fcacd1a](fcacd1a))
* **all:** add common flags for telemetry ([#494](#494)) ([63355e9](63355e9))
* **benchmark:** share a connection between appenders in a target ([7dc53e9](7dc53e9))
* **benchmark:** share a connection between appenders in a target ([#524](#524)) ([2cd9196](2cd9196))
* **client:** add Clear to the log stream appender manager ([9a89065](9a89065))
* **client:** add Clear to the log stream appender manager ([#514](#514)) ([e5b6a2e](e5b6a2e))
* **storagenode:** add --storage-trim-delay to set a delay before the deletion of log entries ([db39713](db39713))
* **storagenode:** add --storage-trim-delay to set a delay before the deletion of log entries ([#529](#529)) ([015bfa4](015bfa4))
* **storagenode:** add --storage-trim-rate to set throttling rate of Trim ([83b7496](83b7496))
* **storagenode:** add --storage-trim-rate to set throttling rate of Trim ([#530](#530)) ([6e69306](6e69306))
* **telemetry:** customize bucket size of process.runtime.go.gc.pause_ns ([b181132](b181132))
* **telemetry:** customize bucket size of process.runtime.go.gc.pause_ns ([#510](#510)) ([9d99520](9d99520))
* **telemetry:** customize bucket size of rpc.server.duration ([a0e5973](a0e5973))
* **telemetry:** customize bucket size of rpc.server.duration ([#511](#511)) ([e41fe1c](e41fe1c))


### Bug Fixes

* **benchmark:** make append duration's precision high ([e3a091d](e3a091d))
* **benchmark:** make append duration's precision high ([#522](#522)) ([815af53](815af53))
* **benchmark:** support graceful stop ([8616d55](8616d55))
* **benchmark:** support graceful stop ([#527](#527)) ([fc4ed81](fc4ed81))
* **metarepos:** add TestMRIgnoreDirtyReport ([fe2a550](fe2a550))
* **metarepos:** allow set commitTick ([bdca20a](bdca20a))
* **metarepos:** ignore invalid report ([e8620de](e8620de))
* **storagenode:** ignore context error while checking to interleave of Append RPC errors ([04d1052](04d1052))
* **storagenode:** ignore context error while checking to interleave of Append RPC errors ([#504](#504)) ([5a7a3b0](5a7a3b0))
* **storagenode:** restore uncommitted logs ([267cccc](267cccc)), closes [#490](#490)
* **storagenode:** restore uncommitted logs ([#492](#492)) ([a9832ee](a9832ee)), closes [#490](#490)


### Performance Improvements

* **admin:** use singleflight to handle Admin's RPCs ([c231888](c231888))
* **admin:** use singleflight to handle Admin's RPCs ([#482](#482)) ([1a6a96d](1a6a96d))
* **metarepos:** add a pool for []*mrpb.Report ([fa8c89d](fa8c89d))
* **metarepos:** add a pool for []*mrpb.Report ([#534](#534)) ([16b2181](16b2181))
* **metarepos:** add a pool for *mrpb.RaftEntry ([be9f121](be9f121))
* **metarepos:** add a pool for *mrpb.RaftEntry ([#536](#536)) ([96ab5e2](96ab5e2))
* **metarepos:** add a pool for mrpb.Reports ([59a6a5a](59a6a5a))
* **metarepos:** add a pool for mrpb.Reports ([#533](#533)) ([b227c75](b227c75))
* **metarepos:** avoid copy overhead by removing unnecessary converting from byte slice to string ([a775628](a775628))
* **metarepos:** avoid copy overhead by removing unnecessary converting from byte slice to string ([#532](#532)) ([1702769](1702769))
* **metarepos:** reuse mrpb.StorageNodeUncommitReport while changed ([57d8039](57d8039))
* **metarepos:** reuse mrpb.StorageNodeUncommitReport while changed ([#537](#537)) ([8f6e097](8f6e097))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant