'full scan' can't scan data completely when one replicaserver restart which causing some primary partition transferring #824

cauchy1988 · 2021-10-13T03:53:13Z

After stopping inputting data to the storage, when using get_unordered_scanners to scan the database data, if a certain replicaserver is restarted, there will be a phenomenon that the data of some shards is not scanned completely and correspond scanner exits early;

The reasons for the problem are analyzed by viewing the code as follows:
In the scanning process, the client requests the server to be divided into two parts, on_get_scanner and on_scan. on_get_scanner: mainly establishes a context with the server; while on_scan: the client continuously scans data from the server

When restarting a replicaserver, some primary shards will be transferred to the new replicaserver, so the context determined by the client and server through on_get_scanner is lost; at this time, the client will have its own retry mechanism in on_get_scanner and the new replicaserver to determine the new Context, but at this time, the client will pass the last scanned key as start_key to on_get_scanner; the on_get_scanner function will mistakenly think that the intention of this scan is a fixed hashkey scan instead of a full scan, because start_key is a non-empty string

foreverneverer · 2021-10-13T03:56:46Z

The latest version of java && go has fixed the bug:
XiaoMi/pegasus-java-client#156
XiaoMi/pegasus-go-client#86

cauchy1988 · 2021-10-13T04:12:58Z

i think the best way to fix this is in the server side : in this way, client scanner's process can resume from break-point

Smityz · 2021-10-13T04:17:56Z

i think the best way to fix this is in the server side : in this way, client scanner's process can resume from break-point

yes, but it's difficult to achieve. You can put an issue of your design.

cauchy1988 · 2021-10-13T04:43:25Z

@Smityz @shuo-jia it is not the problem fixed in XiaoMi/pegasus-java-client#156 and XiaoMi/pegasus-go-client#86 ;
it's a new problem;
int this problem

in this problem, server side lost its context with the client and java client's logic will meet the block showing in the above picture;
int this case, java client will call 'on_get_scanners' again and restart the scan process, but at this time, the "start_key" field int the request struct will be filled with hashkey then cause the bug i have said above

Smityz · 2021-10-13T07:39:35Z

okay，I see

cauchy1988 added the type/bug This issue reports a bug. label Oct 13, 2021

cauchy1988 mentioned this issue Oct 13, 2021

fix: full_scan can't scan data completely in some occassions #825

Merged

hycdong added this to the v2.3.0 milestone Oct 22, 2021

hycdong closed this as completed Nov 8, 2021

acelyc111 pushed a commit that referenced this issue Jun 23, 2022

fix: add /version http api (#824)

ad9d61d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'full scan' can't scan data completely when one replicaserver restart which causing some primary partition transferring #824

'full scan' can't scan data completely when one replicaserver restart which causing some primary partition transferring #824

cauchy1988 commented Oct 13, 2021

foreverneverer commented Oct 13, 2021

cauchy1988 commented Oct 13, 2021

Smityz commented Oct 13, 2021

cauchy1988 commented Oct 13, 2021 •

edited

Loading

Smityz commented Oct 13, 2021

'full scan' can't scan data completely when one replicaserver restart which causing some primary partition transferring #824

'full scan' can't scan data completely when one replicaserver restart which causing some primary partition transferring #824

Comments

cauchy1988 commented Oct 13, 2021

foreverneverer commented Oct 13, 2021

cauchy1988 commented Oct 13, 2021

Smityz commented Oct 13, 2021

cauchy1988 commented Oct 13, 2021 • edited Loading

Smityz commented Oct 13, 2021

cauchy1988 commented Oct 13, 2021 •

edited

Loading