Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr #40355

Merged
merged 2 commits into from
Sep 12, 2019

Conversation

rohany
Copy link
Contributor

@rohany rohany commented Aug 29, 2019

Work for #39945.

This PR adds a Partition function to the workload.Hooks struct so that
workloads that have a partitioning step can just implement that function
within the workload package.

Release note (cli change): Add an option for cockroach demo to
automatically apply the geo-partitioned replicas topology to the movr
dataset using the --geo-partitioned-replicas flag.

@rohany rohany requested review from jordanlewis, awoods187 and a team August 29, 2019 23:43
@rohany rohany requested a review from a team as a code owner August 29, 2019 23:43
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rohany rohany changed the title Geo partitioned replicas demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr Aug 29, 2019
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)


pkg/cli/demo.go, line 446 at r2 (raw file):

	// Make sure that the user didn't request to have a topology and an empty database.
	if demoCtx.geoPartitionedReplicas && demoCtx.useEmptyDatabase {
		return errors.New("cannot setup geo-partitioned replicas topology on an empty database")

This is guaranteed to fail if an enterprise license could not be acquired.

I would recommend to make a test in a a way that guarantees the enterprise license is not available, and verify that demo still work.


pkg/workload/movr/movr.go, line 116 at r2 (raw file):

	{city: "los angeles", locality: "us_west"},
	// The demo setup we have is a 9 node, 3 region cluster.
	//{city: "chicago", locality: "us_central"},

Don't comment out things - simply delete them. We have version control so if we want them again we can restore them.

(Also I think this change is not desirable - are you sure about it?)

@rohany
Copy link
Contributor Author

rohany commented Aug 30, 2019

Sorrry, I should have marked this WIP — I’ll update the pr based on your comments.

@rohany rohany changed the title demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr [wip] demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr Aug 30, 2019
Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)


pkg/cli/cliflags/flags.go, line 924 at r2 (raw file):

		Name: "geo-partitioned-replicas",
		Description: `
Create a 9 node cluster preloaded with the Movr dataset, and automatically apply

This should probably say something about "when used with the movr dataset (which is the default)" and then error if another dataset is used

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)


pkg/cli/demo.go, line 446 at r2 (raw file):

Previously, knz (kena) wrote…

This is guaranteed to fail if an enterprise license could not be acquired.

I would recommend to make a test in a a way that guarantees the enterprise license is not available, and verify that demo still work.

The test I added to the demo (disabling telemetry and expecting success) should test this. However, if there is an error with partitioning, I think that demo should error out.


pkg/workload/movr/movr.go, line 116 at r2 (raw file):

Previously, knz (kena) wrote…

Don't comment out things - simply delete them. We have version control so if we want them again we can restore them.

(Also I think this change is not desirable - are you sure about it?)

I'm not sure what to do here then -- we could increase the number of nodes in this setup to 12 to have a 4 region cluster using all of the movr cities, or just delete the us-central region from the demo dataset. I'm open to either, but I'm leaning towards just removing some cities?

@rohany
Copy link
Contributor Author

rohany commented Sep 5, 2019

Friendly ping

@danhhz
Copy link
Contributor

danhhz commented Sep 5, 2019

Oh, sorry! I would guess people have been staying away from this because it still has "wip" in the title. For the future, our emergent convention here seems to be removing the "wip" and leaving a comment to the effect of "this should be good to go" or "this is ready for review" : - )

I took a look again and the approach seems fine to me, but there were a bunch of conversations about how the license stuff should work that I wasn't in, so I'll leave the review to someone that was.

@rohany rohany changed the title [wip] demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr Sep 5, 2019
@rohany
Copy link
Contributor Author

rohany commented Sep 5, 2019

Sorry -- thats my bad.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r3, 6 of 6 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)


pkg/cli/demo.go, line 446 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

The test I added to the demo (disabling telemetry and expecting success) should test this. However, if there is an error with partitioning, I think that demo should error out.

I suppose that can be asserted in a test too, can't it?


pkg/cli/demo.go, line 129 at r4 (raw file):

	// This demo only works on a 9 node cluster, so set the node count as such.
	// Ignore input user localities.
	if demoCtx.geoPartitionedReplicas {

I think it's important to inform the user when they use contradictory settings. Otherwise they will be under the mistaken assumption that, if no error is produced, their desired number of nodes was respected.

You can test whether a flag was set using pflag.Lookup(...).Changed.


pkg/cli/demo.go, line 274 at r4 (raw file):

		// If we are requested to prepartition our data, and the
		// specified dataset is Movr, spawn a goroutine to do the partitioning.
		if demoCtx.geoPartitionedReplicas && gen.Meta().Name == "movr" {

Also please reject a combination of flags when a user specifies another dataset and also geo-partitioned replicas, as we are unable to honor their request in that case.


pkg/cli/demo.go, line 294 at r4 (raw file):

		}

		// TODO (rohany): Should we wait until partitioning is done

yes that seems like a good plan.

Also: is the workload meant to be ran synchronously like this? As-is it will prevent the interactive session from startinig. Isn't it better to have the workload run in the background (in a goroutine) while the user has access to the itneractive shell?


pkg/cli/demo.go, line 306 at r4 (raw file):

}

func setupGeoPartitionedReplicas(db *gosql.DB) error {

This looks like a possible API extension to the workload generator interface.
It seems weird to me to have code specific to the movr workload hanging in the cli package. Instead I'd say this code should be in the movr workload (in a place where folk who later change the workload will think about!), and reachable via a new method on the interface, perhaps PreparePartitioning(). That method would (presumably) be a no-op for the other workloads, until/unless we find interest into partitioning other workloads.

(Actually tpc-c has partitioning code defined already, but I'd recommend against you trying to look into this in the same PR, so presumably you'd leave a TODO in the tpc-implemenation of PreparePartitioning())


pkg/cli/cliflags/flags.go, line 925 at r4 (raw file):

		Description: `
When used with the Movr dataset, reate a 9 node cluster and automatically apply
the geo-partitioned replicas topology across the regions us-east1, us-west1, and

"across 3 virtual regions named ..."

(otherwise you may misguide users into thinking that demo deploys to GCE)

@rohany rohany force-pushed the geo-partitioned-replicas branch 5 times, most recently from 96f4165 to b030c35 Compare September 5, 2019 19:31
Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, @knz, and @rohany)


pkg/cli/demo.go, line 274 at r4 (raw file):

Previously, knz (kena) wrote…

Also please reject a combination of flags when a user specifies another dataset and also geo-partitioned replicas, as we are unable to honor their request in that case.

This is done in runDemo already.


pkg/cli/demo.go, line 294 at r4 (raw file):

Previously, knz (kena) wrote…

yes that seems like a good plan.

Also: is the workload meant to be ran synchronously like this? As-is it will prevent the interactive session from startinig. Isn't it better to have the workload run in the background (in a goroutine) while the user has access to the itneractive shell?

The workload is run in the background -- this function just launches the goroutines.


pkg/cli/demo.go, line 306 at r4 (raw file):

Previously, knz (kena) wrote…

This looks like a possible API extension to the workload generator interface.
It seems weird to me to have code specific to the movr workload hanging in the cli package. Instead I'd say this code should be in the movr workload (in a place where folk who later change the workload will think about!), and reachable via a new method on the interface, perhaps PreparePartitioning(). That method would (presumably) be a no-op for the other workloads, until/unless we find interest into partitioning other workloads.

(Actually tpc-c has partitioning code defined already, but I'd recommend against you trying to look into this in the same PR, so presumably you'd leave a TODO in the tpc-implemenation of PreparePartitioning())

Just to clarify, I should look into adding an interface method to workload in a future PR? I can drop a comment on the TPCC partitioning and here, and open an issue as well.


pkg/cli/cliflags/flags.go, line 925 at r4 (raw file):

Previously, knz (kena) wrote…

"across 3 virtual regions named ..."

(otherwise you may misguide users into thinking that demo deploys to GCE)

Done.

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)


pkg/cli/demo.go, line 129 at r4 (raw file):

Previously, knz (kena) wrote…

I think it's important to inform the user when they use contradictory settings. Otherwise they will be under the mistaken assumption that, if no error is produced, their desired number of nodes was respected.

You can test whether a flag was set using pflag.Lookup(...).Changed.

done

@rohany
Copy link
Contributor Author

rohany commented Sep 5, 2019

RFAL

There is a wider discussion/design going on to serve licenses in
a more general way in the future, so this commit is not aiming
for a future-proof design, but instead an MVP to allow users to
demo enterprise features within `cockroach demo`.

We are not concerned about offline usage of enterprise features
as users can obtain a license and enable features manually using SET.

Fixes cockroachdb#40222.

Release note (cli change): cockroach demo attempts to contact a license
server to obtain a temporary license. cockroach demo now enables
telemetry for the demo cluster. This feature can be opted out of by
setting the `COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING` environment variable
(https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html).
@rohany rohany requested a review from knz September 10, 2019 19:08
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I am starting to like this. Thanks for all the good work.

Reviewed 2 of 6 files at r5, 7 of 8 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, @knz, and @rohany)


pkg/cli/demo.go, line 306 at r4 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Just to clarify, I should look into adding an interface method to workload in a future PR? I can drop a comment on the TPCC partitioning and here, and open an issue as well.

I am saying that your PR as is is creating technical debt and therefore I am not exactly happy about it.

The bare minimum would be to move this particular function to the same package where movr is defined (so that folk who work on future changes to movr are forced to also think about updating this function), then inject it as a dependency into cli upon initialization (so that you don't depend on a CCL package from here)

Then a step beyond the bare minimum would be to extend the workload interface, to give any workload (not just movr) a chance to set up partitioning. For that extra step you can either do it here or file an issue and leave a TODO with the issue number.


pkg/cli/demo.go, line 128 at r6 (raw file):

	// This demo only works on a 9 node cluster, so set the node count as such.
	// Ignore input user localities.

Move this override below in runDemo so that it is clearly linked to the command line parsing.


pkg/cli/demo.go, line 489 at r6 (raw file):

		return errors.New("--nodes cannot be used with --geo-partitioned-replicas")
	}

Also add either:

  • "if --geo-partitioned-replicas is set and --nodes is Changed and set to a different value than 9, print an error" (we don't want to silently dismiss a setting that was purposefully entered by the user)
  • "if --geo-partitioned-replicas is set and --nodes is not Changed, then override it to 9 and print an info message to the user (on stdout) to inform them that the demo cluster size is grown from the default" (we don't want to let a difference from the standard behavior remain silent)

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)


pkg/cli/demo.go, line 306 at r4 (raw file):

Previously, knz (kena) wrote…

I am saying that your PR as is is creating technical debt and therefore I am not exactly happy about it.

The bare minimum would be to move this particular function to the same package where movr is defined (so that folk who work on future changes to movr are forced to also think about updating this function), then inject it as a dependency into cli upon initialization (so that you don't depend on a CCL package from here)

Then a step beyond the bare minimum would be to extend the workload interface, to give any workload (not just movr) a chance to set up partitioning. For that extra step you can either do it here or file an issue and leave a TODO with the issue number.

Done. This was actually an easy change to add a partition step to the Hooks interface.


pkg/cli/demo.go, line 128 at r6 (raw file):

Previously, knz (kena) wrote…

Move this override below in runDemo so that it is clearly linked to the command line parsing.

Done.


pkg/cli/demo.go, line 489 at r6 (raw file):

Previously, knz (kena) wrote…

Also add either:

  • "if --geo-partitioned-replicas is set and --nodes is Changed and set to a different value than 9, print an error" (we don't want to silently dismiss a setting that was purposefully entered by the user)
  • "if --geo-partitioned-replicas is set and --nodes is not Changed, then override it to 9 and print an info message to the user (on stdout) to inform them that the demo cluster size is grown from the default" (we don't want to let a difference from the standard behavior remain silent)

The first should already be handled by the above -- if geoPartitionedReplicas && nodes is changed we throw an error. I'll add the second.

@rohany rohany requested a review from knz September 11, 2019 13:51
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo adding a test for the new feature.
(and a nit)
You also have a linter error to address.

Reviewed 3 of 3 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)


pkg/cli/demo.go, line 284 at r7 (raw file):

					}
					partitionFunc := hookser.Hooks().Partition
					if partitionFunc == nil {

I think you need this instead:

if partitionFunc != nil {
  if err := partitionFunc(db); err != nil {
    ...
  }
}

otherwise the other workloads could break.


pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 11 at r7 (raw file):

spawn $argv demo --geo-partitioned-replicas
# expect a failure
eexpect "Error: license acquisition was unsuccessful. Enterprise features are needed to partition data"

You probably need another test that asserts that partitioning is successful when the env var is not set.

(I did not find a test for the new feature elsewhere)

@rohany
Copy link
Contributor Author

rohany commented Sep 11, 2019

Yeah, I'll add an .tcl test for this. However the testing kind of relates to my questions in #40613 -- The partitioning is happening asynchronously, so we want to repeatedly send a SHOW PARTITIONS command to the demo until we get all the results we want. I don't know how to do this in tcl/expect

@knz
Copy link
Contributor

knz commented Sep 11, 2019

it's very crude but you could do something like this:

report "waiting for partitioning to become available..."
system "for i in `seq 1 5`; do
   $argv sql -e 'select ...' >result.txt
   if grep ... result.txt; then 
      found=1
      break
   fi
   sleep 1
done
if test x$found = x; then
   exit 1
fi"
report "done"

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, @knz, and @rohany)


pkg/cli/demo.go, line 284 at r7 (raw file):

Previously, knz (kena) wrote…

I think you need this instead:

if partitionFunc != nil {
  if err := partitionFunc(db); err != nil {
    ...
  }
}

otherwise the other workloads could break.

I dont see the difference is here -- if partitionFunc is nil when we requested geo-partitioned-replicas on a dataset, then we should throw an error if it is not defined.

@rohany
Copy link
Contributor Author

rohany commented Sep 11, 2019

Tests added.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests :)

Reviewed 2 of 2 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)


pkg/cli/demo.go, line 278 at r8 (raw file):

					}
					defer db.Close()
					configErr := errors.New(fmt.Sprintf("workload %s is not configured to have a partitioning step", gen.Meta().Name))

I have thought about this more and I am going to change my advice:
These are config checks. Having these checks so late in the initialization process, moreso in an async goroutine, is a design mistake.
When the code arrives at this point the error path should be limited to unexpected errors unrelated to user input.

I think the checks should be present alongside the command line parsing. It would also clarify the relationship between the command-line arguments and the prerequisites on the generators.


pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 20 at r8 (raw file):

start_test "Expect partitioning succeeds"
# test that partitioning works if a license could be acquired
unset env(COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING)

Move the "positive" tests to the beginning, this way you don't need to tweak the env var twice.
(Also it would be the more "natural" order for reading/understanding tests: first positive tests, then negative)

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)


pkg/cli/demo.go, line 278 at r8 (raw file):

Previously, knz (kena) wrote…

I have thought about this more and I am going to change my advice:
These are config checks. Having these checks so late in the initialization process, moreso in an async goroutine, is a design mistake.
When the code arrives at this point the error path should be limited to unexpected errors unrelated to user input.

I think the checks should be present alongside the command line parsing. It would also clarify the relationship between the command-line arguments and the prerequisites on the generators.

Done.


pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 20 at r8 (raw file):

Previously, knz (kena) wrote…

Move the "positive" tests to the beginning, this way you don't need to tweak the env var twice.
(Also it would be the more "natural" order for reading/understanding tests: first positive tests, then negative)

Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: with a small suggestion for improvement

Reviewed 3 of 3 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)


pkg/cli/demo.go, line 383 at r9 (raw file):

	// If geo-partition-replicas is requested, make sure the workload is MovR and has a Partitioning step.
	if demoCtx.geoPartitionedReplicas {
		if gen.Meta().Name != "movr" {

If you would remove this specific check, this would enable additional workloads to "opt in" the flag smoothly by just implementing the Partition hook, without having to modify demo.go.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also spell out the name of the flag in the release note? thx

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @rohany)

@rohany
Copy link
Contributor Author

rohany commented Sep 12, 2019

As usual, thanks for the very detailed review @knz !

Work for cockroachdb#39945.

This PR adds a Partition function to the workload.Hooks struct so that
workloads that have a partitioning step can just implement that function
within the workload package.

Release note (cli change): Add an option for cockroach demo to
automatically apply the geo-partitioned replicas topology to the movr
dataset using the --geo-partitioned-replicas flag.
@rohany
Copy link
Contributor Author

rohany commented Sep 12, 2019

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 12, 2019

Build failed

@rohany
Copy link
Contributor Author

rohany commented Sep 12, 2019

bors r+

Test flake i believe

craig bot pushed a commit that referenced this pull request Sep 12, 2019
40355: demo: Add option to automatically apply the Geo-Partitioned Replicas topology to Movr r=rohany a=rohany

Work for #39945.

This PR adds a Partition function to the workload.Hooks struct so that
workloads that have a partitioning step can just implement that function
within the workload package.

Release note (cli change): Add an option for cockroach demo to
automatically apply the geo-partitioned replicas topology to the movr
dataset using the --geo-partitioned-replicas flag.


Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@craig
Copy link
Contributor

craig bot commented Sep 12, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants