Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'full scan' can't scan data completely when one replicaserver restart which causing some primary partition transferring #824

Closed
cauchy1988 opened this issue Oct 13, 2021 · 5 comments
Labels
type/bug This issue reports a bug.
Milestone

Comments

@cauchy1988
Copy link
Contributor

After stopping inputting data to the storage, when using get_unordered_scanners to scan the database data, if a certain replicaserver is restarted, there will be a phenomenon that the data of some shards is not scanned completely and correspond scanner exits early;

The reasons for the problem are analyzed by viewing the code as follows:
In the scanning process, the client requests the server to be divided into two parts, on_get_scanner and on_scan. on_get_scanner: mainly establishes a context with the server; while on_scan: the client continuously scans data from the server

When restarting a replicaserver, some primary shards will be transferred to the new replicaserver, so the context determined by the client and server through on_get_scanner is lost; at this time, the client will have its own retry mechanism in on_get_scanner and the new replicaserver to determine the new Context, but at this time, the client will pass the last scanned key as start_key to on_get_scanner; the on_get_scanner function will mistakenly think that the intention of this scan is a fixed hashkey scan instead of a full scan, because start_key is a non-empty string

@cauchy1988 cauchy1988 added the type/bug This issue reports a bug. label Oct 13, 2021
@foreverneverer
Copy link
Contributor

The latest version of java && go has fixed the bug:
XiaoMi/pegasus-java-client#156
XiaoMi/pegasus-go-client#86

@cauchy1988
Copy link
Contributor Author

i think the best way to fix this is in the server side : in this way, client scanner's process can resume from break-point

@Smityz
Copy link
Contributor

Smityz commented Oct 13, 2021

i think the best way to fix this is in the server side : in this way, client scanner's process can resume from break-point

yes, but it's difficult to achieve. You can put an issue of your design.

@cauchy1988
Copy link
Contributor Author

cauchy1988 commented Oct 13, 2021

@Smityz @shuo-jia it is not the problem fixed in XiaoMi/pegasus-java-client#156 and XiaoMi/pegasus-go-client#86 ;
it's a new problem;
int this problem
image
in this problem, server side lost its context with the client and java client's logic will meet the block showing in the above picture;
int this case, java client will call 'on_get_scanners' again and restart the scan process, but at this time, the "start_key" field int the request struct will be filled with hashkey then cause the bug i have said above

@Smityz
Copy link
Contributor

Smityz commented Oct 13, 2021

okay,I see

@hycdong hycdong added this to the v2.3.0 milestone Oct 22, 2021
@hycdong hycdong closed this as completed Nov 8, 2021
acelyc111 pushed a commit that referenced this issue Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug This issue reports a bug.
Projects
None yet
Development

No branches or pull requests

4 participants