-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RocksDB corruption leads to endless failure #1383
Labels
type/bug
This issue reports a bug.
Comments
empiredan
pushed a commit
that referenced
this issue
Mar 8, 2023
#1383 This is a refactor patch before fixing #1383. This patch has no functionality changes, but just including refactors: 1. Moves functions `load()`, `newr()` and `clear_on_failure()` from class replica to class replica_stub, and the first two have been renamed to `load_replica()` and `new_replica()`. 2. Encapsulates a new function `move_to_err_path`. 3. Some minor refactors like fix typo.
This was referenced Mar 8, 2023
This was referenced Mar 25, 2023
acelyc111
added a commit
that referenced
this issue
Mar 27, 2023
#1383 To handle the return code of read and write requests, it would be great to refactor the return code of the related functions. This patch change to use rocksdb::Status::code insteadn of meanless integer, and left some TODOs to be dealt in follow up patchs.
This was referenced Mar 29, 2023
acelyc111
added a commit
that referenced
this issue
Mar 31, 2023
#1383 This patch fix some minor issues includes: - short return id `FLAGS_fd_disabled` is true in `remove_replica_on_meta_server` to avoid running meaningless logic - encapsulate a new function `wait_closing_replicas_finished()` in `replica_stub` - marks some functions as `const` or `override` - marks some parameters or variables as `const` - adds missing lock - fixes some typos - use short-circuit return style
acelyc111
added a commit
that referenced
this issue
Apr 4, 2023
#1383 The replica instance path will be removed to trash path, a.k.a `<table_id>.<pid>.<timestamp>.err`, but it may not complete when a replica server crash, then the path is left but some files (e.g. `.init-info`) in the path have been moved. When restart the server after that, server will crash because of a check on existence of the files, which is not necessary, the server is able to trash the corrupt path and start normally, the missing replica can be recovered from other servers automatically. This patch removes the check.
empiredan
pushed a commit
that referenced
this issue
Apr 11, 2023
#1383 This patch deal with the error `kCorruption` returned from storage engine of write requests. After replica server got such an error, it will trash the replica to a trash path `<app_id>.<pid>.pegasus.<timestamp>.err`. Note that the replica server may crash because the corrupted replica has been trashed and closed, it is left to be completed by another patches.
This was referenced Apr 11, 2023
acelyc111
added a commit
that referenced
this issue
Apr 17, 2023
This was referenced Apr 17, 2023
empiredan
pushed a commit
that referenced
this issue
Apr 24, 2023
#1383 ReplicaServer doesn't handle the error returned from storage engine, thus even if the storage engine is corrupted, the server doesn't recognize these situactions, and still running happily. However, the client always gets an error status. This situaction will not recover automatically except stopping the server and moving away the corrupted RocksDB directories manually. This patch handle the kCorruption error returned from storage engine, then close the replcia, move the directory to ".err" trash path. The replica is able to recover automatically (if RF > 1).
This was referenced May 8, 2023
empiredan
pushed a commit
that referenced
this issue
May 17, 2023
#1383 This is a minor refactor work on class fs_manager, including: - use `uint64_t` instead of `unsigned` in fs_manager module. - remove useless "test" parameters.
This was referenced May 17, 2023
acelyc111
added a commit
that referenced
this issue
May 25, 2023
#1383 This patch moves some functions to fs_manager which are more reasonable to be responsibilities of class fs_manager rather than those of class replica_stub.
empiredan
pushed a commit
that referenced
this issue
May 26, 2023
#1383 In prior implemention, every replica has a "dir_node status", if a dir_node has some abnormal status (e.g. in space insufficient), we have to update all replicas' referenced "dir_node status", it is implemented in `replica_stub::update_disks_status`. This make the "dir_node status" updating path too long, and a bit of duplicate. A new implemention is completed in #1473, every replica has a reference of dir_node directly, so it would be easy to update replcia's "dir_node status" by updating the referenced dir_node's status once. Before the new implemention, this patch submit a minor refactor to remove `replica_stub::update_disks_status` and related functions and variables. Also some unit tests have been updated.
acelyc111
added a commit
that referenced
this issue
May 31, 2023
#1383 This patch removes the duplicated _disk_tag and _disk_status of the dir_node where it is placed on, instead, introduce a dir_node pointer for replica. So once the status of the dir_node updated, we can judge the replica's status more conveniently. Some unit tests have been updated as well, including: - change the test directory from `./` to `test_dir` - simplify the logic of replica_disk_test related test
acelyc111
added a commit
that referenced
this issue
Jun 8, 2023
#1383 A disk (a.k.a node_dir in Pegasus) is possible to become SPACE_INSUFFICIENT or IO_ERROR from NORMAL, meanwhile, it's possible to recovery from SPACE_INSUFFICIENT to NORMAL. So we can keep all node_dirs in system, but only reject to assign replicas on abnormal node_dirs, reject to do write type of operations on abnormal node_dirs. This patch also update some unit tests.
acelyc111
added a commit
that referenced
this issue
Jun 8, 2023
…#1522) #1383 This patch moves some functions to fs_manager which are more reasonable to be responsibilities of class fs_manager rather than those of other classes, includeing: - remove `fs_manager::for_each_dir_node` - minimize some locks - rename `fs_manager::is_dir_node_available` to `fs_manager::is_dir_node_exist` - move `get_disk_infos` code to class `fs_manager` and encapsulate it as a function - move `validate_migrate_op` code to class `fs_manager` and encapsulate it as a function - move `disk_status_to_error_code` from replica_2pc.cpp to class `fs_manager`
acelyc111
added a commit
that referenced
this issue
Jun 14, 2023
…en encounter read/write IO error (#1473) #1383 This patch deal with the IO error populated from storage engine of read and write operations, the replica will be closed and mark the dir_node as disk_status::IO_ERROR. The dir_node marked as IO_ERROR will not be selected when new replicas created as patch 4dcbb1e implemented. This patch also add/update some unit tests.
GehaFearless
pushed a commit
to GehaFearless/incubator-pegasus
that referenced
this issue
Feb 28, 2024
…ub' (apache#1384) 对应的社区commit: https://github.com/apache/incubator-pegasus/pull/1384/files apache#1383 This is a refactor patch before fixing apache#1383. This patch has no functionality changes, but just including refactors: 1. Moves functions `load()`, `newr()` and `clear_on_failure()` from class replica to class replica_stub, and the first two have been renamed to `load_replica()` and `new_replica()`. 2. Encapsulates a new function `move_to_err_path`. 3. Some minor refactors like fix typo.
GehaFearless
pushed a commit
to GehaFearless/incubator-pegasus
that referenced
this issue
Feb 28, 2024
…apache#1422) 对应社区commit: https://github.com/apache/incubator-pegasus/pull/1422/files 其中,单测 integration_test.cpp 未添加,原因是整个function test的变更过大不便添加,等 最后都合入后再单独补充 apache#1383 This patch deal with the error `kCorruption` returned from storage engine of write requests. After replica server got such an error, it will trash the replica to a trash path `<app_id>.<pid>.pegasus.<timestamp>.err`. Note that the replica server may crash because the corrupted replica has been trashed and closed, it is left to be completed by another patches.
GehaFearless
pushed a commit
to GehaFearless/incubator-pegasus
that referenced
this issue
Feb 28, 2024
…pache#1456) 对应社区commit: https://github.com/apache/incubator-pegasus/pull/1456/files apache#1383 This is a refactor patch before fixing apache#1383. This patch has no functionality changes, but just including refactors: 1. Moves functions `load()`, `newr()` and `clear_on_failure()` from class replica to class replica_stub, and the first two have been renamed to `load_replica()` and `new_replica()`. 2. Encapsulates a new function `move_to_err_path`. 3. Some minor refactors like fix typo.
GehaFearless
pushed a commit
to GehaFearless/incubator-pegasus
that referenced
this issue
Feb 28, 2024
…ns (apache#1447) 对应社区commit: https://github.com/apache/incubator-pegasus/pull/1447/files 注: 单测部分变更较大,本次未合入 apache#1383 ReplicaServer doesn't handle the error returned from storage engine, thus even if the storage engine is corrupted, the server doesn't recognize these situactions, and still running happily. However, the client always gets an error status. This situaction will not recover automatically except stopping the server and moving away the corrupted RocksDB directories manually. This patch handle the kCorruption error returned from storage engine, then close the replcia, move the directory to ".err" trash path. The replica is able to recover automatically (if RF > 1).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug Report
Please answer these questions before submitting your issue. Thanks!
What did you do?
Some RocksDB instances corrupted for some reasons, maybe disk driver IO error, data corruption, etc.
What did you expect to see?
The cluster can recover from error automatically.
What did you see instead?
The replica server will close the replica when see write errors, but will start the replica again in the same place whose data is still corrupted. And then, the error occured again and again.
For read requests, the replica will not handle the error instead, then read requests will fail again and again.
Write:
Read:
1.12, 2.0
The text was updated successfully, but these errors were encountered: