-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr #40355
Conversation
07f6c02
to
0ad176f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)
pkg/cli/demo.go, line 446 at r2 (raw file):
// Make sure that the user didn't request to have a topology and an empty database. if demoCtx.geoPartitionedReplicas && demoCtx.useEmptyDatabase { return errors.New("cannot setup geo-partitioned replicas topology on an empty database")
This is guaranteed to fail if an enterprise license could not be acquired.
I would recommend to make a test in a a way that guarantees the enterprise license is not available, and verify that demo
still work.
pkg/workload/movr/movr.go, line 116 at r2 (raw file):
{city: "los angeles", locality: "us_west"}, // The demo setup we have is a 9 node, 3 region cluster. //{city: "chicago", locality: "us_central"},
Don't comment out things - simply delete them. We have version control so if we want them again we can restore them.
(Also I think this change is not desirable - are you sure about it?)
Sorrry, I should have marked this WIP — I’ll update the pr based on your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)
pkg/cli/cliflags/flags.go, line 924 at r2 (raw file):
Name: "geo-partitioned-replicas", Description: ` Create a 9 node cluster preloaded with the Movr dataset, and automatically apply
This should probably say something about "when used with the movr dataset (which is the default)" and then error if another dataset is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)
pkg/cli/demo.go, line 446 at r2 (raw file):
Previously, knz (kena) wrote…
This is guaranteed to fail if an enterprise license could not be acquired.
I would recommend to make a test in a a way that guarantees the enterprise license is not available, and verify that
demo
still work.
The test I added to the demo (disabling telemetry and expecting success) should test this. However, if there is an error with partitioning, I think that demo should error out.
pkg/workload/movr/movr.go, line 116 at r2 (raw file):
Previously, knz (kena) wrote…
Don't comment out things - simply delete them. We have version control so if we want them again we can restore them.
(Also I think this change is not desirable - are you sure about it?)
I'm not sure what to do here then -- we could increase the number of nodes in this setup to 12 to have a 4 region cluster using all of the movr cities, or just delete the us-central region from the demo dataset. I'm open to either, but I'm leaning towards just removing some cities?
0ad176f
to
31b9132
Compare
Friendly ping |
Oh, sorry! I would guess people have been staying away from this because it still has "wip" in the title. For the future, our emergent convention here seems to be removing the "wip" and leaving a comment to the effect of "this should be good to go" or "this is ready for review" : - ) I took a look again and the approach seems fine to me, but there were a bunch of conversations about how the license stuff should work that I wasn't in, so I'll leave the review to someone that was. |
Sorry -- thats my bad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r3, 6 of 6 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)
pkg/cli/demo.go, line 446 at r2 (raw file):
Previously, rohany (Rohan Yadav) wrote…
The test I added to the demo (disabling telemetry and expecting success) should test this. However, if there is an error with partitioning, I think that demo should error out.
I suppose that can be asserted in a test too, can't it?
pkg/cli/demo.go, line 129 at r4 (raw file):
// This demo only works on a 9 node cluster, so set the node count as such. // Ignore input user localities. if demoCtx.geoPartitionedReplicas {
I think it's important to inform the user when they use contradictory settings. Otherwise they will be under the mistaken assumption that, if no error is produced, their desired number of nodes was respected.
You can test whether a flag was set using pflag.Lookup(...).Changed
.
pkg/cli/demo.go, line 274 at r4 (raw file):
// If we are requested to prepartition our data, and the // specified dataset is Movr, spawn a goroutine to do the partitioning. if demoCtx.geoPartitionedReplicas && gen.Meta().Name == "movr" {
Also please reject a combination of flags when a user specifies another dataset and also geo-partitioned replicas, as we are unable to honor their request in that case.
pkg/cli/demo.go, line 294 at r4 (raw file):
} // TODO (rohany): Should we wait until partitioning is done
yes that seems like a good plan.
Also: is the workload meant to be ran synchronously like this? As-is it will prevent the interactive session from startinig. Isn't it better to have the workload run in the background (in a goroutine) while the user has access to the itneractive shell?
pkg/cli/demo.go, line 306 at r4 (raw file):
} func setupGeoPartitionedReplicas(db *gosql.DB) error {
This looks like a possible API extension to the workload generator interface.
It seems weird to me to have code specific to the movr
workload hanging in the cli
package. Instead I'd say this code should be in the movr
workload (in a place where folk who later change the workload will think about!), and reachable via a new method on the interface, perhaps PreparePartitioning()
. That method would (presumably) be a no-op for the other workloads, until/unless we find interest into partitioning other workloads.
(Actually tpc-c has partitioning code defined already, but I'd recommend against you trying to look into this in the same PR, so presumably you'd leave a TODO in the tpc-implemenation of PreparePartitioning()
)
pkg/cli/cliflags/flags.go, line 925 at r4 (raw file):
Description: ` When used with the Movr dataset, reate a 9 node cluster and automatically apply the geo-partitioned replicas topology across the regions us-east1, us-west1, and
"across 3 virtual regions named ..."
(otherwise you may misguide users into thinking that demo
deploys to GCE)
96f4165
to
b030c35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, @knz, and @rohany)
pkg/cli/demo.go, line 274 at r4 (raw file):
Previously, knz (kena) wrote…
Also please reject a combination of flags when a user specifies another dataset and also geo-partitioned replicas, as we are unable to honor their request in that case.
This is done in runDemo already.
pkg/cli/demo.go, line 294 at r4 (raw file):
Previously, knz (kena) wrote…
yes that seems like a good plan.
Also: is the workload meant to be ran synchronously like this? As-is it will prevent the interactive session from startinig. Isn't it better to have the workload run in the background (in a goroutine) while the user has access to the itneractive shell?
The workload is run in the background -- this function just launches the goroutines.
pkg/cli/demo.go, line 306 at r4 (raw file):
Previously, knz (kena) wrote…
This looks like a possible API extension to the workload generator interface.
It seems weird to me to have code specific to themovr
workload hanging in thecli
package. Instead I'd say this code should be in themovr
workload (in a place where folk who later change the workload will think about!), and reachable via a new method on the interface, perhapsPreparePartitioning()
. That method would (presumably) be a no-op for the other workloads, until/unless we find interest into partitioning other workloads.(Actually tpc-c has partitioning code defined already, but I'd recommend against you trying to look into this in the same PR, so presumably you'd leave a TODO in the tpc-implemenation of
PreparePartitioning()
)
Just to clarify, I should look into adding an interface method to workload in a future PR? I can drop a comment on the TPCC partitioning and here, and open an issue as well.
pkg/cli/cliflags/flags.go, line 925 at r4 (raw file):
Previously, knz (kena) wrote…
"across 3 virtual regions named ..."
(otherwise you may misguide users into thinking that
demo
deploys to GCE)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)
pkg/cli/demo.go, line 129 at r4 (raw file):
Previously, knz (kena) wrote…
I think it's important to inform the user when they use contradictory settings. Otherwise they will be under the mistaken assumption that, if no error is produced, their desired number of nodes was respected.
You can test whether a flag was set using
pflag.Lookup(...).Changed
.
done
RFAL |
b030c35
to
3a71fb9
Compare
There is a wider discussion/design going on to serve licenses in a more general way in the future, so this commit is not aiming for a future-proof design, but instead an MVP to allow users to demo enterprise features within `cockroach demo`. We are not concerned about offline usage of enterprise features as users can obtain a license and enable features manually using SET. Fixes cockroachdb#40222. Release note (cli change): cockroach demo attempts to contact a license server to obtain a temporary license. cockroach demo now enables telemetry for the demo cluster. This feature can be opted out of by setting the `COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING` environment variable (https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I am starting to like this. Thanks for all the good work.
Reviewed 2 of 6 files at r5, 7 of 8 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, @knz, and @rohany)
pkg/cli/demo.go, line 306 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Just to clarify, I should look into adding an interface method to workload in a future PR? I can drop a comment on the TPCC partitioning and here, and open an issue as well.
I am saying that your PR as is is creating technical debt and therefore I am not exactly happy about it.
The bare minimum would be to move this particular function to the same package where movr
is defined (so that folk who work on future changes to movr are forced to also think about updating this function), then inject it as a dependency into cli
upon initialization (so that you don't depend on a CCL package from here)
Then a step beyond the bare minimum would be to extend the workload interface, to give any workload (not just movr) a chance to set up partitioning. For that extra step you can either do it here or file an issue and leave a TODO with the issue number.
pkg/cli/demo.go, line 128 at r6 (raw file):
// This demo only works on a 9 node cluster, so set the node count as such. // Ignore input user localities.
Move this override below in runDemo
so that it is clearly linked to the command line parsing.
pkg/cli/demo.go, line 489 at r6 (raw file):
return errors.New("--nodes cannot be used with --geo-partitioned-replicas") }
Also add either:
- "if --geo-partitioned-replicas is set and --nodes is Changed and set to a different value than 9, print an error" (we don't want to silently dismiss a setting that was purposefully entered by the user)
- "if --geo-partitioned-replicas is set and --nodes is not Changed, then override it to 9 and print an info message to the user (on stdout) to inform them that the demo cluster size is grown from the default" (we don't want to let a difference from the standard behavior remain silent)
54b530a
to
1c17b91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)
pkg/cli/demo.go, line 306 at r4 (raw file):
Previously, knz (kena) wrote…
I am saying that your PR as is is creating technical debt and therefore I am not exactly happy about it.
The bare minimum would be to move this particular function to the same package where
movr
is defined (so that folk who work on future changes to movr are forced to also think about updating this function), then inject it as a dependency intocli
upon initialization (so that you don't depend on a CCL package from here)Then a step beyond the bare minimum would be to extend the workload interface, to give any workload (not just movr) a chance to set up partitioning. For that extra step you can either do it here or file an issue and leave a TODO with the issue number.
Done. This was actually an easy change to add a partition step to the Hooks interface.
pkg/cli/demo.go, line 128 at r6 (raw file):
Previously, knz (kena) wrote…
Move this override below in
runDemo
so that it is clearly linked to the command line parsing.
Done.
pkg/cli/demo.go, line 489 at r6 (raw file):
Previously, knz (kena) wrote…
Also add either:
- "if --geo-partitioned-replicas is set and --nodes is Changed and set to a different value than 9, print an error" (we don't want to silently dismiss a setting that was purposefully entered by the user)
- "if --geo-partitioned-replicas is set and --nodes is not Changed, then override it to 9 and print an info message to the user (on stdout) to inform them that the demo cluster size is grown from the default" (we don't want to let a difference from the standard behavior remain silent)
The first should already be handled by the above -- if geoPartitionedReplicas && nodes is changed we throw an error. I'll add the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo adding a test for the new feature.
(and a nit)
You also have a linter error to address.
Reviewed 3 of 3 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)
pkg/cli/demo.go, line 284 at r7 (raw file):
} partitionFunc := hookser.Hooks().Partition if partitionFunc == nil {
I think you need this instead:
if partitionFunc != nil {
if err := partitionFunc(db); err != nil {
...
}
}
otherwise the other workloads could break.
pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 11 at r7 (raw file):
spawn $argv demo --geo-partitioned-replicas # expect a failure eexpect "Error: license acquisition was unsuccessful. Enterprise features are needed to partition data"
You probably need another test that asserts that partitioning is successful when the env var is not set.
(I did not find a test for the new feature elsewhere)
1c17b91
to
2404e3f
Compare
Yeah, I'll add an .tcl test for this. However the testing kind of relates to my questions in #40613 -- The partitioning is happening asynchronously, so we want to repeatedly send a |
it's very crude but you could do something like this: report "waiting for partitioning to become available..."
system "for i in `seq 1 5`; do
$argv sql -e 'select ...' >result.txt
if grep ... result.txt; then
found=1
break
fi
sleep 1
done
if test x$found = x; then
exit 1
fi"
report "done" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, @knz, and @rohany)
pkg/cli/demo.go, line 284 at r7 (raw file):
Previously, knz (kena) wrote…
I think you need this instead:
if partitionFunc != nil { if err := partitionFunc(db); err != nil { ... } }otherwise the other workloads could break.
I dont see the difference is here -- if partitionFunc is nil when we requested geo-partitioned-replicas on a dataset, then we should throw an error if it is not defined.
2404e3f
to
81e3d14
Compare
Tests added. |
81e3d14
to
c1098cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests :)
Reviewed 2 of 2 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)
pkg/cli/demo.go, line 278 at r8 (raw file):
} defer db.Close() configErr := errors.New(fmt.Sprintf("workload %s is not configured to have a partitioning step", gen.Meta().Name))
I have thought about this more and I am going to change my advice:
These are config checks. Having these checks so late in the initialization process, moreso in an async goroutine, is a design mistake.
When the code arrives at this point the error path should be limited to unexpected errors unrelated to user input.
I think the checks should be present alongside the command line parsing. It would also clarify the relationship between the command-line arguments and the prerequisites on the generators.
pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 20 at r8 (raw file):
start_test "Expect partitioning succeeds" # test that partitioning works if a license could be acquired unset env(COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING)
Move the "positive" tests to the beginning, this way you don't need to tweak the env var twice.
(Also it would be the more "natural" order for reading/understanding tests: first positive tests, then negative)
c1098cf
to
6e193b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)
pkg/cli/demo.go, line 278 at r8 (raw file):
Previously, knz (kena) wrote…
I have thought about this more and I am going to change my advice:
These are config checks. Having these checks so late in the initialization process, moreso in an async goroutine, is a design mistake.
When the code arrives at this point the error path should be limited to unexpected errors unrelated to user input.I think the checks should be present alongside the command line parsing. It would also clarify the relationship between the command-line arguments and the prerequisites on the generators.
Done.
pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 20 at r8 (raw file):
Previously, knz (kena) wrote…
Move the "positive" tests to the beginning, this way you don't need to tweak the env var twice.
(Also it would be the more "natural" order for reading/understanding tests: first positive tests, then negative)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a small suggestion for improvement
Reviewed 3 of 3 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)
pkg/cli/demo.go, line 383 at r9 (raw file):
// If geo-partition-replicas is requested, make sure the workload is MovR and has a Partitioning step. if demoCtx.geoPartitionedReplicas { if gen.Meta().Name != "movr" {
If you would remove this specific check, this would enable additional workloads to "opt in" the flag smoothly by just implementing the Partition
hook, without having to modify demo.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also spell out the name of the flag in the release note? thx
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)
As usual, thanks for the very detailed review @knz ! |
6e193b6
to
a8912c5
Compare
Work for cockroachdb#39945. This PR adds a Partition function to the workload.Hooks struct so that workloads that have a partitioning step can just implement that function within the workload package. Release note (cli change): Add an option for cockroach demo to automatically apply the geo-partitioned replicas topology to the movr dataset using the --geo-partitioned-replicas flag.
a8912c5
to
10345ee
Compare
bors r+ |
Build failed |
bors r+ Test flake i believe |
40355: demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr r=rohany a=rohany Work for #39945. This PR adds a Partition function to the workload.Hooks struct so that workloads that have a partitioning step can just implement that function within the workload package. Release note (cli change): Add an option for cockroach demo to automatically apply the geo-partitioned replicas topology to the movr dataset using the --geo-partitioned-replicas flag. Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Build succeeded |
Work for #39945.
This PR adds a Partition function to the workload.Hooks struct so that
workloads that have a partitioning step can just implement that function
within the workload package.
Release note (cli change): Add an option for cockroach demo to
automatically apply the geo-partitioned replicas topology to the movr
dataset using the --geo-partitioned-replicas flag.