Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(integration): add integration test for master lost during syncing sst files. #2691

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

LindaSummer
Copy link
Contributor

Issue

Close #2671

Proposed Changes

Add integration test for losing master during SST file syncing.

Here is the steps.

  1. Create master and slave instances.
  2. Apply master and slave setnodeid.
  3. Create a slow tcp proxy for master.
  4. Setnodes to master and write data to master.
  5. Setnodes to slave.
  6. Wait for one second and close tcp proxy.
  7. Try setnodes in slave and get duration.

@LindaSummer
Copy link
Contributor Author

Hi,

I'm not sure my case is correctly covered the scenario in related issue.

I only got about 1 second delay in my own machine.

Please let me know if there is any problem about the test case. 😊

Best Regards,
Edward

Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! See if some others have comments.

start := time.Now()
require.NoError(t, replicaClient.Do(ctx, "clusterx", "SETNODES", unexistNodesInfo, "2").Err())
duration := time.Since(start)
require.Less(t, duration, time.Second*6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the time out is 5000/5100 but the duration be 6 here 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

The time duration may not be a precise value due to network.

So I reserved 1 second to reduce flaky failure. 😊

Best Regards,
Edward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would a duration really looks like? Would it:

  1. Force greater than 5?
  2. Always less than 6? (I guess this is not guranteed ...)

Copy link
Contributor Author

@LindaSummer LindaSummer Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

Yes, I thought it should be in a [5, 6 or longer] range.

But I only got about one second in my own environment and make me confused and not sure whether the case full covers the issue. 🤔

Maybe we can set the upper bound to a conservative value just for integration test. 😊

Best Regards,
Edward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I only got about one second in my own environment and make me confused and not sure whether the case full covers the issue. 🤔

Thanks! I'll dive in my env later, we can first move forward

Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work.

@mapleFU mapleFU merged commit 7cda420 into apache:unstable Dec 12, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more test cases about replication and cluster
4 participants