Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: replace restore2TB and restoretpccInc tests #98072

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Mar 6, 2023

This patch removes the restore2TB* roachtests which ran a 2TB bank restore to
benchmark restore performance across a few hardware configurations. This patch
also replaces the restoreTPCCInc/nodes=10 test which tested our ability to
handle a backup with a long chain.

This patch also adds:

  1. restore/tpce/400GB/aws/nodes=4/cpus=16 to measure how per-node throughput
    scales when the per node vcpu count doubles relative to default.
  2. restore/tpce/400GB/aws/nodes=8/cpus=8 to measure how per-node throughput
    scales when the number of nodes doubles relative to default.
  3. restore/tpce/400GB/aws/backupsIncluded=48/nodes=4/cpus=8 to measure
    restore reliability and performance on 48 length long backup chain relative to
    default.

A future patch will update the fixtures used in the restore node shutdown
scripts, and add more perf based tests.

Fixes #92699

Release note: None

@msbutler msbutler requested review from dt and lidorcarmel March 6, 2023 18:07
@msbutler msbutler requested a review from a team as a code owner March 6, 2023 18:07
@msbutler msbutler self-assigned this Mar 6, 2023
@msbutler msbutler requested review from srosenberg and renatolabs and removed request for a team March 6, 2023 18:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall makes sense, a few things:

  • you're also dropping a tpcc roachperf, right? probably intentional, mention it in the pr/commit description?
  • in roachperf we have 2 bank tests with 6 nodes, one has no data, I'm guessing it was removed from restore.go but not from roachperf? or something like that.. which brings me to the question - do you need to remove those from roachperf also?
  • the 32 nodes bank test [1] shows a regression since last Dec, we're going to lose this info now. The newly added tests here don't backfill, so maybe we should keep those tests for a few more months to have an overlap with the new tests? WDYT?
    [1] https://roachperf.crdb.dev/?filter=restore&view=restore2TB%2Fnodes%3D32&tab=gce

@msbutler
Copy link
Collaborator Author

msbutler commented Mar 6, 2023

you're also dropping a tpcc roachperf, right?

do you need to remove those from roachperf also?

  • i'm assuming that the DR team will stop using roachperf in a month or so once we can observe test perf in prom/grafana. So I dont think we need to do any extra clean up in roachperf. I'd rather let it rot >:)

to paraphrase your last question: " should we keep them around for a couple months for perf regression detection?"

  • the roachperf graphs for the old tests aren't getting deleted, nor do I think anyone is actively looking into that regression. - Further these bank2tb tests aren't a very good signal of restore perf as the fixtures were created in 2019. IMHO I don't think we can learn much wrt restore perf from these old tests. I also think the new tests provide much more general test coverage.
  • We also need to delete them in the next few weeks anyway to let backupccl: disallow restore of backups older than the minimum supported binary version #93804 pass.

@lidorcarmel
Copy link
Contributor

the roachperf graphs for the old tests aren't getting deleted, nor do I think anyone is actively looking into that regression. - Further these bank2tb tests aren't a very good signal of restore perf as the fixtures were created in 2019

I think it is helpful to have it, it helps us understand that things regressed slowly in that time range. I know it's not the best test but it's info I'd like to have available for debugging. anyway, sounds like we'll have the history so we're good, thank you!

This patch removes the restore2TB* roachtests which ran a 2TB bank restore to
benchmark restore performance across a few hardware configurations. This patch
also replaces the `restoreTPCCInc/nodes=10` test which tested our ability to
handle a backup with a long chain.

This patch also adds:
1. `restore/tpce/400GB/aws/nodes=4/cpus=16` to measure how per-node throughput
scales when the per node vcpu count doubles relative to default.
2. `restore/tpce/400GB/aws/nodes=8/cpus=8` to measure how per-node throughput
scales when the number of nodes doubles relative to default.
3. `restore/tpce/400GB/aws/backupsIncluded=48/nodes=4/cpus=8` to measure
restore reliability and performance on 48 length long backup chain relative to
default.

A future patch will update the fixtures used in the restore node shutdown
scripts, and add more perf based tests.

Fixes cockroachdb#92699

Release note: None
@msbutler msbutler changed the title backupccl: replace restore2TB perf tests backupccl: replace restore2TB and restoretpccInc tests Mar 7, 2023
@msbutler
Copy link
Collaborator Author

msbutler commented Mar 7, 2023

TFTR!

bors r=lidorcarmel

@craig
Copy link
Contributor

craig bot commented Mar 7, 2023

Build succeeded:

@craig craig bot merged commit a95ffcd into cockroachdb:master Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: change default restore roachtest configuration
3 participants